https://github.com/microsoft/SkillOpt

SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.
https://github.com/microsoft/SkillOpt

agent-skills self-evolving-agents

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/microsoft/SkillOpt
Owner: microsoft
License: mit
Created: 2026-05-08T06:41:01.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-06-15T17:12:43.000Z (about 1 month ago)
Last Synced: 2026-06-15T19:15:12.198Z (about 1 month ago)
Topics: agent-skills, self-evolving-agents
Language: Python
Homepage: https://aka.ms/skillopt
Size: 21.6 MB
Stars: 7,160
Watchers: 36
Forks: 683
Open Issues: 16
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

awesome-azure-openai-copilot - SkillOpt - Trains reusable natural-language skills for frozen LLM agents. (Microsoft Research)
Awesome-Agent-Memory - [code
awesome - microsoft/SkillOpt - SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts. (<a name="Python"></a>Python)
awesome-harness-engineering - SkillOpt - language skills for frozen LLM agents through trajectory-driven edits and validation-gated updates, producing deployable `best_skill.md` artifacts. The key harness insight is that skills should be treated as optimizable parameters that improve with execution feedback, not static prompt fragments written once and forgotten. ![Stars](https://img.shields.io/github/stars/microsoft/SkillOpt?style=flat-square&label=★&color=yellow) (Design Primitives / Skills & MCP)
awesome-agent-harness - GitHub - 8484-f4b400?style=flat-square)](https://github.com/microsoft/SkillOpt) | skills, optimization, validation-gates | Microsoft optimizer that trains reusable natural-language agent skills through trajectory edits, validation gates, and deployable skill artifacts. | (Catalog / Context & Working-State Engineering)
claude-code-skills-zh - microsoft/SkillOpt
awesome-ai-coding - SkillOpt

README

          # SkillOpt: Executive Strategy for Self-Evolving Agent Skills

*Train agent skills like you train neural networks — with epochs, (mini-)batchsize, learning rates, and validation gates — but without touching model weights.*

[![Project Page](https://img.shields.io/badge/Project%20Page-SkillOpt-8dbb3c)](https://microsoft.github.io/SkillOpt/) [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b)](https://arxiv.org/abs/2605.23904) [![Project Video](https://img.shields.io/badge/Project%20Video-Watch%20Demo-ff0000)](https://youtu.be/JUBMDTCiM0M) [![PyPI](https://img.shields.io/badge/PyPI-skillopt-green.svg)](https://pypi.org/project/skillopt/) [![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

> 📖 **For installation, data preparation, training/eval commands, the full configuration reference, and framework internals, see the [Documentation & Reproduction Guide](https://microsoft.github.io/SkillOpt/docs/guideline.html)** (rendered on GitHub Pages).

---

## News 🔥🔥🔥

- **[2026-06-15]** 😴 **SkillOpt-Sleep (preview)** — a nightly offline self-evolution companion for local coding agents (Claude Code / Codex / Copilot): review past sessions, replay recurring tasks, and consolidate validated skills behind a held-out gate. See **[`docs/sleep/README.md`](docs/sleep/README.md)** for what it is, how to use it, and results.

- **[2026-06-03]** 🎉 **[gbrain](https://github.com/garrytan/gbrain), [gbrain-evals](https://github.com/garrytan/gbrain-evals/blob/main/docs/benchmarks/2026-06-03-skillopt.md), and [darwin-skill](https://github.com/alchaincyf/darwin-skill) have all integrated SkillOpt.**

- **[2026-06-02]** 🎉 **SkillOpt [v0.1.0](https://github.com/microsoft/SkillOpt/releases/tag/v0.1.0) is now available on [PyPI](https://pypi.org/project/skillopt/)!** Install with `pip install skillopt`. This initial release includes the full training loop (rollout → reflect → aggregate → select → update → evaluate), multi-backend support (OpenAI / Azure / Claude / Qwen / MiniMax), six built-in benchmarks, and WebUI dashboard.

---

## Overview

Modern agent skills are usually hand-crafted, generated one-shot by a strong

LLM, or evolved through loosely controlled self-revision — none of which

behaves like a deep-learning optimizer for the skill itself, and none of

which reliably improves over its starting point under feedback.

**SkillOpt treats the skill document as the trainable state of a frozen

agent**, and trains it with the discipline that makes weight-space

optimization reproducible. A separate optimizer model turns scored rollouts

into bounded add / delete / replace edits on a single skill document; a

candidate edit is accepted only when it strictly improves a held-out

validation score. A textual learning-rate budget, a rejected-edit buffer,

and an epoch-wise slow / meta update make skill training stable while

adding **zero inference-time model calls** at deployment.

The deployed artifact is a compact `best_skill.md` (typically 300–2,000

tokens) that runs against the unchanged target model. Across **six

benchmarks, seven target models, and three execution harnesses** (direct

chat, Codex CLI, Claude Code CLI), SkillOpt is best or tied-best on **all

52 evaluated (model, benchmark, harness) cells** and on GPT-5.5 lifts the

average no-skill accuracy by **+23.5 points in direct chat, +24.8 inside

the Codex agentic loop, and +19.1 inside Claude Code**. Optimized skill

artifacts transfer across model scales, between Codex and Claude Code

harnesses, and to nearby benchmarks without further optimization.

For the full method, ablations, and per-cell results see the [paper](https://arxiv.org/abs/2605.23904); for a visual walkthrough of the loop see the [project page](https://microsoft.github.io/SkillOpt/); for deeper API / backend / benchmark docs see [`docs/`](docs/).

## 🎬 Demo Video

https://github.com/user-attachments/assets/eb12d3bc-371c-467f-904d-91b61f339ed7



  ▶ Watch the full demo on YouTube



---

## Extensibility & WebUI

### Adding a new backend

A backend = a chat / exec target (e.g. `openai_chat`, `claude_chat`,

`qwen_chat`, `minimax_chat`, `codex_exec`, `claude_code_exec`). See

[`docs/guide/new-backend.md`](docs/guide/new-backend.md) for the full

contract; in short you add a `skillopt/model/_backend.py` module,

register it in `skillopt/model/common.py` + `backend_config.py`, and wire

it through the router in `skillopt/model/__init__.py`. `qwen_backend.py`

and `minimax_backend.py` are good templates.

### Adding a new benchmark

A benchmark = a `skillopt/envs//` package with a `dataloader.py`, a

`rollout.py`, and an `initial.md` seed skill. See

[`docs/guide/new-benchmark.md`](docs/guide/new-benchmark.md) for the full

contract; the simplest reference is `skillopt/envs/searchqa/`.

### WebUI

Launch the monitoring dashboard (optional):

```bash

pip install -e ".[webui]"

python -m skillopt_webui.app

```

| Flag | Default | Description |

|---|---|---|

| `--port` | 7860 | Server port |

| `--host` | `0.0.0.0` | Bind address |

| `--share` | off | Create a public Gradio share link |

---

## Citation

```bibtex

@misc{yang2026skilloptexecutivestrategyselfevolving,

      title={SkillOpt: Executive Strategy for Self-Evolving Agent Skills}, 

      author={Yifan Yang and Ziyang Gong and Weiquan Huang and Qihao Yang and Ziwei Zhou and Zisu Huang and Yan Li and Xuemei Gao and Qi Dai and Bei Liu and Kai Qiu and Yuqing Yang and Dongdong Chen and Xue Yang and Chong Luo},

      year={2026},

      eprint={2605.23904},

      archivePrefix={arXiv},

      primaryClass={cs.AI},

      url={https://arxiv.org/abs/2605.23904}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/microsoft/SkillOpt

Awesome Lists containing this project

README