AI Engineering
Posts
Train Your OpenClaw Agent Just by Talking to It

Train Your OpenClaw Agent Just by Talking to It

... PLUS: Kimi Applies Attention to Layer Depth (Not Just Tokens)

Sumanth P
March 26, 2026

In today's newsletter:

How OpenClaw RL turns live conversations into training data
Kimi's attention residuals solve the layer dilution problem

Reading time: 3 minutes.

Train Your OpenClaw Agent Just by Talking to It

Most RL systems require batch-mode training with pre-collected datasets. You label data manually, train offline, deploy, and hope it works. Debug a failure? Collect new data, relabel, retrain. Three iterations = weeks of work.

OpenClaw RL wraps your self-hosted model as an OpenAI-compatible API, intercepts live conversations, and trains the policy in the background while you use it.

How it works:

Agent serving handles your requests
Rollout collection records interactions
Reward judging scores performance
Policy training updates the model

None of these block each other. The agent keeps responding while training happens.

It learns in two ways.

Binary RL (GRPO) scores each turn as good/bad/neutral using a reward model. Works with thumbs up/down or environment success/failure.
On-Policy Distillation (OPD) extracts textual hints from feedback. When you tell the agent "you should have checked the file first," it uses that as a training signal to learn the correct sequence of actions.

The framework supports personal agents (conversational, single-user) and general agents (terminal, GUI, SWE, tool-call).

Everything runs on your infrastructure. No external API keys required. Conversation data stays local.

Check out OpenClaw RL on GitHub →

Kimi's Attention Residuals Boost Multi-Step Reasoning

Every Transformer adds layers together with residual connections. Layer 1 contributes with weight 1. Layer 10 contributes with weight 1. Layer 40 contributes with weight 1.

This uniform weighting causes a problem: hidden states grow larger with each layer while each individual layer's contribution gets progressively weaker. By layer 40, the output from layer 10 has been diluted by 30 subsequent additions. If layer 40 needs specific information that layer 10 calculated, it can't retrieve it selectively—it gets everything mixed together with equal weight.

What Kimi changed:

Instead of adding all previous layers with weight 1, each layer calculates softmax attention scores across earlier layers. Layer 40 can assign high attention to layer 10's output if that's what it needs, low attention to irrelevant intermediate layers, and retrieve exactly the right information.

The mechanism uses one learnable query vector per layer. When layer 40 needs input, it computes attention weights by comparing its query against all previous layer outputs, then builds its input as a weighted sum based on those scores.

Example:

Take a multi-step reasoning problem where layer 10 identified the core equation and layers 11-39 worked through intermediate algebra. Layer 40 needs that original equation to formulate the final answer. Traditional residuals force it to process all 39 previous layers equally weighted. Attention Residuals let it focus directly on layer 10.

The impact on reasoning tasks:

+7.5 points on GPQA-Diamond (multi-step reasoning)
+3.1 points on HumanEval (code generation)

Training stability improved:

In standard models, early layers receive disproportionately large gradients while deep layers get weaker signals. Attention Residuals distribute gradients more uniformly across all layers. The model scales to greater depths without degradation.

The architecture is a drop-in replacement for standard residual connections. Inference latency overhead under 2%. Used in Kimi K2.5 (1T total parameters, 32B activated).

View the code on GitHub →

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 160K+ AI developers? Get in touch today by replying to this email.