{"id":51316708,"url":"https://github.com/hjnnjh/agent-learning","last_synced_at":"2026-07-01T08:02:40.026Z","repository":{"id":365482090,"uuid":"1272265358","full_name":"hjnnjh/agent-learning","owner":"hjnnjh","description":"学习型项目：Agent Harness 与 RL 后训练学习路线（资料/笔记/动手锚点），含 learn-claude-code 主线，.claude 配置参考 RQ-TPP","archived":false,"fork":false,"pushed_at":"2026-06-17T13:59:32.000Z","size":78,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-17T16:29:52.800Z","etag":null,"topics":["agent-harness","agentic-rl","claude-code","grpo","learning-resources","llm-agents","post-training","reinforcement-learning","rlhf","rlvr"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hjnnjh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-17T12:51:40.000Z","updated_at":"2026-06-17T14:00:58.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/hjnnjh/agent-learning","commit_stats":null,"previous_names":["hjnnjh/agent-learning"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/hjnnjh/agent-learning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjnnjh%2Fagent-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjnnjh%2Fagent-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjnnjh%2Fagent-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjnnjh%2Fagent-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hjnnjh","download_url":"https://codeload.github.com/hjnnjh/agent-learning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hjnnjh%2Fagent-learning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34997947,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-01T02:00:05.325Z","response_time":130,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-harness","agentic-rl","claude-code","grpo","learning-resources","llm-agents","post-training","reinforcement-learning","rlhf","rlvr"],"created_at":"2026-07-01T08:02:39.079Z","updated_at":"2026-07-01T08:02:40.017Z","avatar_url":"https://github.com/hjnnjh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧭 Agent 学习路线：Harness 与 RL 后训练\n\n![stage](https://img.shields.io/badge/stages-Harness%20%E2%86%92%20RL%20%E2%86%92%20Agentic%20RL-1f6feb)\n![type](https://img.shields.io/badge/type-learning%20project-success)\n![python](https://img.shields.io/badge/python-%E2%89%A53.11-3776ab?logo=python\u0026logoColor=white)\n![uv](https://img.shields.io/badge/deps-uv-de5fe9)\n![Claude Code](https://img.shields.io/badge/config-Claude%20Code-d97757)\n![license](https://img.shields.io/badge/license-personal%20study-lightgrey)\n\n\u003e 一个**学习型项目**：把《Agent 学习路线：Harness 与 RL 后训练》落成可打卡的资料、笔记与动手锚点——\n\u003e 从徒手写最小 agent loop，到在小模型上跑通 GRPO，再到多轮工具调用的 Agentic RL。\n\n**🔗 快速链接**　·　[阶段一 · Harness](stage1-harness/README.md)　·　[阶段二 · RL 后训练](stage2-rl-posttraining/README.md)　·　[阶段三 · Agentic RL](stage3-agentic-rl/README.md)　·　[Anthropic 博客精读](anthropic-blog/reading-list.md)　·　[项目指引 CLAUDE.md](CLAUDE.md)　·　[配置说明 .claude](.claude/CLAUDE.md)\n\n**参考**　·　主线课程 [learn-claude-code](https://github.com/shareAI-lab/learn-claude-code)　·　配置范式 [RQ-TPP](https://github.com/hjnnjh/RQ-TPP)（规则自动加载 / 技能按需触发 / 子智能体 / hooks 护栏）\n\n---\n\n学习型项目。承载 Notion 路线《Agent 学习路线：Harness 与 RL 后训练》的资料、笔记与动手锚点。\n配置范式参考 [RQ-TPP](https://github.com/hjnnjh/RQ-TPP)（规则自动加载 / 技能按需触发 / 子智能体 / hooks 护栏）。\n\n\u003e **怎么用这个仓库**：按阶段推进，每个阶段做完对应的「动手锚点」再进入下一阶段——只读不写在这个领域几乎学不到东西。\n\u003e 资料链接在各 `stage*/README.md`（均经 2026-06 核实），笔记写进 `stage*/notes/`，动手代码写进 `stage*/hands-on/`。\n\n## 核心认知\n\nHarness 与 RL 后训练是同一件事的两面：\n\n- **Harness（推理时）**：让模型把活干好的脚手架——工具调用循环、上下文管理、子 agent 编排、评估体系。\n- **RL 后训练（训练时）**：让模型本身变强的方法——SFT → 偏好优化（DPO）→ RLVR（GRPO 等可验证奖励 RL）。\n- 两条线最终在 **Agentic RL** 汇合：把 harness 当作 RL 环境，对多轮工具调用轨迹做强化学习。\n\n## 路线图\n\n| 阶段 | 主题 | 周期 | 动手锚点 |\n|---|---|---|---|\n| [阶段一](stage1-harness/README.md) | Agent Harness | 约 3-5 周 | 徒手写出一个能修简单 bug 的最小 agent loop |\n| [阶段二](stage2-rl-posttraining/README.md) | RL 后训练 | 约 6-10 周 | 用 TRL 在 Qwen 小模型上跑通 GSM8K GRPO，看到奖励曲线上升 |\n| [阶段三](stage3-agentic-rl/README.md) | Agentic RL | 长期 | 用 verifiers/verl 端到端跑通 multi-turn tool-use GRPO |\n| [并行](anthropic-blog/reading-list.md) | Anthropic 博客精读 | 与阶段一同步 | 三梯度阅读打卡（工程实践 → 系统设计 → 研究对齐） |\n\n## 主线课程（贯穿阶段一）\n\n- **[learn-claude-code（shareAI-lab）](https://github.com/shareAI-lab/learn-claude-code)** — 20 课的 harness 工程系统课程（s01–s20）：\n  从 agent loop / 工具分发 / 权限，到上下文压缩、子 agent、任务图、多 agent 协作、MCP 集成。\n  核心主张「Agency comes from the model. The harness gives agency a place to land.」——\n  不是抄源码，而是抓住关键设计自己重建，是阶段一「读成熟 harness」与「上下文工程」的最佳骨架课程。\n  阶段一 README 含逐课进度打卡表。\n\n## 学习建议\n\n- **可以并行**：阶段一的「徒手写 loop」和阶段二的「RL 基础」互不依赖；Anthropic 博客精读与阶段一同步。\n- **资源比例**：博客和开源代码比教科书重要，论文以技术报告为主、理论文章为辅。\n- **长期订阅**：Interconnects（Nathan Lambert）、Ahead of AI（Sebastian Raschka）、Lil'Log（Lilian Weng）。\n- **最重要的一条**：每个阶段都有一个必须完成的动手锚点，大量知识（上下文管理的坑、rollout 工程细节、\n  reward hacking 的具体形态）只存在于实践中。\n\n## 环境\n\n- 笔记：纯 Markdown，无需环境。\n- 动手代码（阶段二/三）：Python，`uv` 管理（见根 `pyproject.toml`）。`uv venv \u0026\u0026 uv sync` 起步。\n\n配置说明见 [CLAUDE.md](CLAUDE.md) 与 [.claude/CLAUDE.md](.claude/CLAUDE.md)。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhjnnjh%2Fagent-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhjnnjh%2Fagent-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhjnnjh%2Fagent-learning/lists"}