https://github.com/aiming-lab/metaclaw
Just talk to your agent β it learns and EVOLVES.
https://github.com/aiming-lab/metaclaw
agent ai-agent fine-tuning llm lora metaclaw online-learning openai-compatible openclaw reinforcement-learning skill-learning
Last synced: 3 months ago
JSON representation
Just talk to your agent β it learns and EVOLVES.
- Host: GitHub
- URL: https://github.com/aiming-lab/metaclaw
- Owner: aiming-lab
- License: mit
- Created: 2026-03-09T13:47:13.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-12T05:02:32.000Z (3 months ago)
- Last Synced: 2026-03-12T11:31:25.378Z (3 months ago)
- Topics: agent, ai-agent, fine-tuning, llm, lora, metaclaw, online-learning, openai-compatible, openclaw, reinforcement-learning, skill-learning
- Language: Python
- Homepage:
- Size: 46.1 MB
- Stars: 391
- Watchers: 2
- Forks: 42
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
---
## π₯ News
- **[03/09/2026]** We release **MetaClaw** β Just talk to your agent and let it evolve automatically. **NO** GPU deployment required; just plug into the **API**.
---
## π₯ Demo
https://github.com/user-attachments/assets/1c2919fc-5612-40f7-bb97-c74ab50619d5
---
## π Overview
**MetaClaw turns live conversations into continuous training data β automatically.**
Just talk to your agent as usual, and MetaClaw handles the learning loop behind the scenes.
It wraps your model behind an OpenAI-compatible API, intercepts interactions from OpenClaw, scores each turn, and continuously improves the policy through online fine-tuning. Updated weights are hot-swapped into production with no service interruption.
There is no need to maintain a dedicated GPU cluster. MetaClaw is built around **Kimi-2.5** (~200B MoE) using [Tinker](https://www.thinkingmachines.ai/tinker/) for cloud-based LoRA training, with **Qwen3-4B** available as a lightweight alternative.
## π€ Key Features
### **Train from real usage**
MetaClaw learns directly from live user-agent conversations. Instead of collecting static datasets and retraining offline, it continuously improves from actual deployment.
### **Skill injection**
At every turn, MetaClaw retrieves the most relevant skill instructions and injects them into the agentβs system prompt. This enables immediate behavior improvement without waiting for retraining.
### **Skill evolution**
When the agent fails, MetaClaw analyzes the full interaction trajectory and uses an LLM to generate new skills automatically. Over time, the system becomes more capable by learning from its own mistakes. If you're interested in the broader idea of skill-augmented RL, check out our [SkillRL](https://github.com/aiming-lab/SkillRL) project.
### **No GPU cluster required**
Training is offloaded to Tinker cloud, so any machine with network access can run the full system. This makes continual learning much easier to deploy and maintain.
### **Asynchronous by design**
Serving, reward modeling, and training are fully decoupled. The agent continues responding in real time while scoring and optimization run in parallel.
### **Two learning modes**
MetaClaw supports both:
- **RL (GRPO)** for learning from implicit feedback signals
- **On-Policy Distillation (OPD)** for leveraging richer natural-language supervision
This gives you a practical path to improve agents from both lightweight signals and high-quality textual feedback.
---
## π Quick Start
### 1. Install dependencies
```bash
pip install fastapi uvicorn httpx openai transformers
pip install tinker tinker-cookbook # Tinker SDK
```
### 2. Configure OpenClaw
Run the setup script once to point the OpenClaw gateway at the MetaClaw proxy:
```bash
bash openclaw_model_kimi.sh # Kimi-2.5 (recommended)
```
### 3. Start training
```bash
export TINKER_API_KEY="..."
cd /path/to/metaclaw
python examples/run_conversation_rl.py
```
That's it. Start chatting with your agent β MetaClaw automatically collects conversation turns, scores them, and trains the model. After every `batch_size` samples, new weights are hot-swapped in with no restart.
---
## βοΈ Configuration
All settings are in `MetaClawConfig` (`metaclaw/config.py`). The most commonly adjusted fields:
| Field | Default | Description |
|-------|---------|-------------|
| `model_name` | `"moonshotai/Kimi-2.5"` | Base model |
| `lora_rank` | `32` | LoRA rank |
| `batch_size` | `32` | Samples before each training step |
| `max_steps` | `1000` | Total training steps |
| `loss_fn` | `"importance_sampling"` | `"importance_sampling"` / `"ppo"` / `"cispo"` |
| `use_prm` | `True` | Enable PRM reward scoring |
| `prm_url` | `"https://api.openai.com/v1"` | Any OpenAI-compatible judge endpoint |
| `prm_model` | `"gpt-5.2"` | Judge model |
| `use_skills` | `False` | Enable skill injection |
| `enable_skill_evolution` | `False` | Auto-generate skills from failures |
| `proxy_port` | `30000` | Proxy listen port |
| `tinker_sampling_url` | `"http://localhost:8080"` | Tinker sampling endpoint |
For programmatic rollout (no IDE needed), set `openclaw_env_data_dir` to a directory of JSONL task files:
```json
{"task_id": "task_1", "instruction": "Register the webhook at https://example.com/hook"}
```
---
## πͺ Skills
Skills are short Markdown instructions injected into the agent's system prompt at each turn. They're organized in `memory_data/conversation/conversation_skills.json` by category (`coding`, `security`, `agentic`, etc.).
Enable with:
```python
config = MetaClawConfig(use_skills=True)
```
To automatically generate new skills when the agent struggles:
```python
config = MetaClawConfig(
use_skills=True,
enable_skill_evolution=True,
azure_openai_deployment="gpt-5.2",
)
```
```bash
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
```
---
## π Citation
```bibtex
@misc{xia2026metaclaw,
author = {Xia, Peng and Chen, Jianwen and Yang, Xinyu and Han, Siwei and Qiu, Shi and Zheng, Zeyu and Xie, Cihang and Yao, Huaxiu},
title = {MetaClaw},
year = {2026},
organization = {GitHub},
url = {https://github.com/aiming-lab/MetaClaw},
}
```
---
## π Acknowledgements
MetaClaw builds on top of the following open-source projects:
- [OpenClaw](https://openclaw.ai) β the core agent framework.
- [SkillRL](https://github.com/aiming-lab/SkillRL) β our skill-augmented RL framework.
- [Tinker](https://www.thinkingmachines.ai/tinker/) β used for online RL training.
- [OpenClaw-RL](https://github.com/Gen-Verse/OpenClaw-RL) β inspiration for our RL design.
- [awesome-openclaw-skills](https://github.com/VoltAgent/awesome-openclaw-skills) β provides the foundation for our skill bank.
---
## π License
This project is licensed under the [MIT License](LICENSE).
