{"id":50853231,"url":"https://github.com/2002yy/study-agent","last_synced_at":"2026-06-14T16:04:00.635Z","repository":{"id":356651342,"uuid":"1233496159","full_name":"2002yy/study-agent","owner":"2002yy","description":"AI 学习搭子系统 — 联网搜索 + 角色群聊 + 课后总结","archived":false,"fork":false,"pushed_at":"2026-06-07T09:54:56.000Z","size":66350,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-07T11:06:53.426Z","etag":null,"topics":["ai","learning-assistant","llm","python","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/2002yy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-09T02:56:05.000Z","updated_at":"2026-06-07T09:54:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"9db311bf-4d2f-4c12-aadb-eed64e3abe56","html_url":"https://github.com/2002yy/study-agent","commit_stats":null,"previous_names":["2002yy/-study-agent","2002yy/study-agent"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/2002yy/study-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2002yy%2Fstudy-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2002yy%2Fstudy-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2002yy%2Fstudy-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2002yy%2Fstudy-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/2002yy","download_url":"https://codeload.github.com/2002yy/study-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2002yy%2Fstudy-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34326241,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","learning-assistant","llm","python","streamlit"],"created_at":"2026-06-14T16:03:59.845Z","updated_at":"2026-06-14T16:04:00.626Z","avatar_url":"https://github.com/2002yy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Study Agent\n\n\u003cp\u003e\n \u003ca href=\"https://github.com/2002yy/study-agent/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/2002yy/study-agent/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n \u003cimg src=\"https://img.shields.io/badge/python-3.12-blue\" alt=\"Python 3.12\"\u003e\n \u003cimg src=\"https://img.shields.io/badge/tests-314%20passed-green\" alt=\"314 tests passed\"\u003e\n\u003c/p\u003e\n\nA local AI learning assistant with long-term memory, role-based group chat,\nweb search, model routing and context-tier management.\n\n## One-minute Overview\n\nStudy Agent 是一个本地优先的 AI 学习助手，重点不是简单调用大模型，而是把 LLM 接入完整应用流程：\n\n- **多 Provider LLM 接入**：OpenAI / DeepSeek / OpenRouter / SiliconFlow / local models\n- **长期记忆**：Markdown memory + safe writer\n- **上下文分层**：fast / light / deep / archive\n- **联网搜索**：RSS / News fetch → article extraction → LLM digest → source tracing\n- **RAG MVP**：本地 Markdown / TXT / DOCX / PDF 索引、关键词 / 本地向量原型 / hybrid / backend-vector 检索、可配置 embedding provider、可选 Chroma 持久化、受控本地知识检索工具、引用上下文、来源块、Streamlit 检索/调试面板、聊天注入和 FastAPI RAG / chat / memory 基础接口\n- **工程安全**：SSRF protection、detect-secrets、配置模板\n- **工程质量**：pytest 测试套件、Ruff、GitHub Actions CI、打包检查\n\n## Highlights\n\n- **Multi-provider LLM client**: OpenAI / DeepSeek / OpenRouter / SiliconFlow / local models\n- **Model routing** with fast / light / deep / archive context tiers\n- **Long-term memory** based on Markdown files and safe-writer persistence\n- **Web search pipeline**: feed registry → URL safety checks → article extraction → LLM digest → auditable source trace\n- **RAG MVP**: local Markdown / TXT / DOCX / PDF indexing, lexical / local vector prototype / hybrid / backend-vector retrieval, configurable embedding providers, optional Chroma persistence, a controlled local-knowledge retrieval tool, citation-first context formatting, source blocks, a Streamlit retrieval/debug panel, optional chat injection, FastAPI RAG / chat / memory / tools / workflows foundation endpoints, and a first React / Vite / TypeScript console\n- **SSRF protection** for article fetching, **detect-secrets** in CI\n- **Batched session logging** and multi-layer caching for performance\n- **Performance budget**: mode-based `max_tokens` bounds on the main chat, WeChat, and news LLM paths\n- **314 pytest tests**, Ruff clean, mypy clean, GitHub Actions CI workflow\n\nFor a detailed breakdown of the stack and engineering highlights, see [Technical Stack \u0026 Engineering Highlights](docs/TECH_STACK.md).\n\n---\n\n**一个面向个人学习复盘的本地 AI 学习搭子系统** — 支持角色群聊、联网搜索、长期记忆和课后总结。\n\n\u003e 不是又一个 AI 问答工具，而是一个会记住你学什么的 AI 学习伙伴。\n\n---\n\n## 为什么做这个\n\n通用 AI 对话工具擅长回答问题，但不擅长「陪伴学习」：\n\n- 它们不记得你**昨天**学了什么、**上周**卡在了哪里\n- 它们不会主动帮你**总结**学习进展\n- 它们没有「角色感」—— 严肃还是轻松？鼓励还是挑战？全看随机\n\nStudy Agent 的定位很明确：**一个运行在你本地的、有长期记忆的、有角色区分的 AI 学习搭子**。它会记住你的学习轨迹，在群聊中用不同角色和你讨论，课后自动总结进展，并把新的知识写进长期记忆。\n\n---\n\n## Why It Is Not Just a Prompt Demo\n\n普通 AI demo 通常只是把用户输入转发给模型。Study Agent 重点解决的是：\n\n| 问题 | 工程方案 |\n|---|---|\n| 模型供应商更换困难 | Provider profile + OpenAI-compatible client |\n| 上下文越来越长 | context-tier routing |\n| 学习记录无法沉淀 | Markdown long-term memory |\n| 写入记忆不安全 | safe writer + preview/confirm |\n| 联网内容不可追溯 | source-traced news pipeline |\n| 运行不稳定 | caching, batched logging, tests, CI |\n\n---\n\n## Demo\n\n| 界面 | 截图 |\n|------|------|\n| 首页 — 状态看板、当前重点、版本信息 | ![home](assets/screenshots/home.png) |\n| 微信群聊 — 三位角色群内讨论 | ![group-chat](assets/screenshots/group-chat.png) |\n| 联网搜索 — 多源新闻聚合与来源追溯 | ![news-search](assets/screenshots/news-search.png) |\n| 记忆候选 — 课后更新预览与确认写入 | ![memory-capture](assets/screenshots/memory-capture.png) |\n\n---\n\n```\n启动 App → 选择学习模式 (氛围/专注度)\n │\n ├── 单人对话 ──→ 提问/讨论 ──→ 课后总结 ──→ 记忆更新\n │\n └── 微信群聊 ──→ 生成开场 / 聊新闻 / 查资料\n │\n ┌────┴────┐\n │ │\n 联网搜索角色互动讨论\n │ │\n 来源追溯写入观点碰撞\n │ │\n └────┬────┘\n │\n 课后总结 → 确认 → 写入长期记忆\n```\n\n---\n\n## 核心功能\n\n| 功能 | 说明 |\n|------|------|\n| **单人对话** | 与 AI 一对一讨论学习内容，支持 flash/pro 模型切换 |\n| **角色群聊** | 四位角色（三月七、刻晴、纳西妲、流萤）群聊讨论，各有独立人设 |\n| **联网搜索** | Google News + Bing News + RSSHub 多源聚合，页面正文三层提取 |\n| **来源追溯** | 搜索结果写入群聊记录，可回溯依据 |\n| **RAG MVP** | 本地 Markdown / TXT / DOCX / PDF 文档索引，前端面板返回带文件路径、行号、分数、命中词和 score breakdown 的引用片段，并可注入单人聊天和微信群互动回复；FastAPI 提供 `/health`、`/rag`、`/rag/index`、`/rag/query`、`/rag/status`、`/rag/upload`、`/rag/local-knowledge` |\n| **课后总结** | 学习完成后自动总结进展，用户确认后写入记忆 |\n| **长期记忆** | 学习者画像、进度追踪、项目上下文、当前焦点，多级记忆档案 |\n| **多 Provider** | 支持 OpenAI / DeepSeek / OpenRouter / SiliconFlow / 本地模型 |\n| **氛围选择** | warm / close / standard 多种互动氛围切换 |\n\n---\n\n## 架构\n\n![architecture](assets/screenshots/arch.png)\n\n```\nstreamlit run app.py\n │\n┌──────┴──────┐\n│ app.py │ Streamlit 入口，路由到各 UI 面板\n└──────┬──────┘\n │\n┌──────┴──────────────────────────────────────────┐\n│ src/ui/ │\n│ ├── main_panel.py 主页 │\n│ ├── chat_panel.py 对话面板 │\n│ ├── wechat_panel.py 微信群面板 │\n│ ├── after_session_panel.py 课后总结面板 │\n│ └── sidebar.py 侧边栏 │\n└──────┬──────────────────────────────────────────┘\n │\n┌──────┴──────┬──────────────┬──────────────┬──────────────┐\n│ LLM Layer │ News Layer │ Memory │ WeChat │\n│ │ │ Layer │ Layer │\n│ llm_client │ news/ │ memory.py │ wechat_*.py │\n│ llm_router │ ├─rss_fetc │ memory_tools │ (format, │\n│ context_bui │ ├─article_e │ memory_writer│ state, │\n│ -ilder │ ├─link_reso │ │ generator, │\n│ │ ├─digest │ session_log │ prompt) │\n│ config.py │ └─article_f │ -ger │ │\n│ router.py │ etcher │ │ wechat_serv│\n│ │ │ │ -ice.py │\n└──────┬──────┴──────┬──────┴──────┬───────┴──────┬───────┘\n │ │ │ │\n .env.example chat/ memory/ roles/\n (5 providers) (群聊记录) (记忆文件) (角色人设)\n```\n\n---\n\n## 快速开始\n\n```bash\ngit clone \u003crepo-url\u003e study-agent\ncd study-agent\ncp .env.example .env\n# 编辑 .env，填入 API Key\n\n# 初始化记忆文件（新用户首次运行，应用会自动创建；也可手动复制模板）\ncp -r memory.example/* memory/ 2\u003e/dev/null || :\n\n# 稳定安装（推荐，锁定版本）\npip install -r requirements.txt\npip install -r requirements-dev.txt\n\nstreamlit run app.py\n```\n\n浏览器打开 `http://localhost:8501`\n\n### 依赖管理\n\n本项目使用 [pip-tools](https://github.com/jazzband/pip-tools) 管理依赖：\n\n- [`requirements.in`](requirements.in) / [`requirements-dev.in`](requirements-dev.in) — **人类维护**，写范围版本\n- [`requirements.txt`](requirements.txt) / [`requirements-dev.txt`](requirements-dev.txt) — **自动生成**，写精确版本（lock 文件）\n\n修改依赖后重新生成 lock 文件：\n\n```bash\npip install pip-tools\npip-compile requirements.in # 重新锁定主依赖\npip-compile requirements-dev.in # 重新锁定开发依赖\n```\n\n---\n\n## 环境配置\n\n通过 `LLM_PROVIDER_PROFILE` 切换 LLM 提供商（`openai` / `deepseek` / `openrouter` / `siliconflow` / `local`），每个 provider 读写独立的环境变量：\n\n| Provider | 环境变量前缀 | 默认 Base URL |\n|----------|-------------|---------------|\n| `deepseek` | `DEEPSEEK_*` | `https://api.deepseek.com/v1` |\n| `openrouter` | `OPENROUTER_*` | `https://openrouter.ai/api/v1` |\n| `siliconflow` | `SILICONFLOW_*` | `https://api.siliconflow.cn/v1` |\n| `local` | `LOCAL_*` | `http://127.0.0.1:8000/v1` |\n| `openai` | `OPENAI_*` | — |\n\n参数优先级：代码显式参数 → 任务级环境变量 → 任务默认值 → 全局环境变量 → provider 级环境变量。完整配置见 [`.env.example`](.env.example) 和 [用户指南](USER_GUIDE.md)。\n\nRAG 向量后端默认使用 `local`，不需要额外服务；可选 `chroma` adapter 需要用户自行安装 `chromadb`。Embedding provider 默认 `local_hash`，生产检索可显式切到 OpenAI-compatible embeddings：\n\n```bash\nRAG_VECTOR_BACKEND=local\n# RAG_VECTOR_BACKEND=chroma\n# RAG_CHROMA_PATH=logs/chroma\n# RAG_CHROMA_COLLECTION=study_agent\n\nRAG_EMBEDDING_PROVIDER=local_hash\n# RAG_EMBEDDING_PROVIDER=openai\n# RAG_EMBEDDING_MODEL=text-embedding-3-small\n# RAG_EMBEDDING_DIMENSIONS=1536\n# RAG_EMBEDDING_API_KEY=...\n```\n\n---\n\n## 项目结构\n\n```\n├── app.py # Streamlit 入口\n├── src/\n│ ├── llm_client.py # LLM 调用（chat / stream）\n│ ├── llm_router.py # 模型路由分发\n│ ├── context_builder.py # 上下文构建\n│ ├── mode_manager.py # 模式管理（版本/性能/氛围）\n│ ├── api.py # FastAPI health / chat / memory / sessions / RAG / tools / workflows endpoints\n│ ├── role_manager.py # 角色加载与管理\n│ ├── performance_budget.py # 性能预算（max_tokens 分级）\n│ ├── memory.py # 记忆系统\n│ ├── memory_tools.py # 记忆工具\n│ ├── memory_writer.py # 记忆写入\n│ ├── wechat_format.py # 群聊文本格式化\n│ ├── wechat_state.py # 群聊 I/O、状态管理\n│ ├── wechat_generator.py # LLM 生成逻辑\n│ ├── wechat_prompt.py # Prompt 模板加载\n│ ├── wechat_memory.py # 群聊记忆提取\n│ ├── after_session.py # 课后总结\n│ ├── session_logger.py # 会话日志\n│ ├── config.py # 全局配置\n│ ├── router.py # 路由配置\n│ ├── news/ # 新闻聚合链路\n│ ├── rag/ # 本地 RAG MVP：加载、分块、索引、关键词/向量原型/embedding/可选后端检索\n│ ├── tools/ # 受控工具边界：本地知识检索等\n│ └── ui/ # Streamlit UI 组件\n├── tests/ # pytest 测试套件\n├── frontend/ # React + Vite + TypeScript console\n├── docs/ # 设计文档与工程说明\n│ ├── TECH_STACK.md # 技术栈与项目亮点\n│ ├── RAG.md # RAG MVP 状态与边界\n│ └── STATE_MODEL.md # 状态模型\n├── chat/ # 群聊记录\n├── memory/ # AI 长期记忆\n├── roles/ # 角色人设\n├── templates/ # Prompt 模板\n├── config/ # YAML 配置\n├── requirements.in # 依赖声明（范围版本）\n└── assets/ # 视觉资源\n```\n\n---\n\n## 测试\n\n```bash\npytest tests/ -v # current local baseline: 314 passed\npytest tests/ --cov=src # 覆盖率\nruff check src/ tests/ # linting\nmypy --explicit-package-bases src/ # type check\n```\n\nCI 通过 GitHub Actions 在 push / pull request 上运行，集成 `pytest`、`ruff`、打包检查、`detect-secrets` 扫描，以及 `mypy` soft check。当前验证状态见 [docs/TESTING.md](docs/TESTING.md)。\n\n---\n\n## 版本历史\n\n### v0.8.0 — 文档同步 + UI 中文标签 + 工程收口\n\n文档版本同步（5 份文档统一升级）；UI 中文标签（模型/性能/状态栏全中文）；合并性能预算系统、依赖锁定、状态模型文档化、CI 门禁升级、入口页新闻流程修复。当前验证状态见 [docs/TESTING.md](docs/TESTING.md)。\n\n### v0.7.8 — 性能预算 + 状态模型 + 工程收口\n\n### v0.7.7 — 模块拆分与服务层解耦\n\n新闻链路拆分为 4 个专注模块 + 兼容门面；服务层直连子模块；UI 逐阶段新闻流；SSRF 安全加固；Session logger 自动 flush 保护。**112 tests，Ruff clean**。\n\n### v0.7.6 — 工程安全与新闻链路收口\n\n完整历史见 [CHANGELOG.md](CHANGELOG.md)。\n\n---\n\n## Roadmap\n\n| 版本 | 方向 |\n|------|------|\n| v0.8.1 | 稳定性和 UI 打磨 |\n| v0.9 | 知识库 / RAG 能力 |\n| v0.10 | 多语言支持、导出增强 |\n| v1.0 | 插件化架构 + 自定义角色 |\n\n---\n## Engineering Roadmap\n\n求职导向的技术演进路线：\n\n- [x] FastAPI service layer foundation: `/health`, `/chat`, `/memory/preview`, `/memory/commit`, `/sessions`, `/rag`, `/rag/index`, `/rag/query`, `/rag/status`, `/rag/upload`, `/rag/local-knowledge`, `/tools` and `/workflows/runs` implemented; optional local API token and CORS allowlist implemented; streaming and broader deployment hardening remain planned\n- [x] RAG MVP: Markdown / TXT / DOCX / PDF loading, chunking, local keyword retrieval, local vector prototype, hybrid retrieval, backend-vector retrieval, configurable embedding provider, optional Chroma adapter, controlled local-knowledge retrieval, citation context, source blocks, Streamlit retrieval panel, optional single-chat and WeChat interactive injection\n- [ ] RAG document QA (partial): PDF parsing has file-size, page-count, extracted-text and encrypted-file guards; production embedding requires explicit API/env configuration and Chroma remains optional\n- [ ] Vector store: Chroma optional adapter implemented; FAISS local prototype and pgvector engineering version remain planned\n- [x] P8.4 evaluation sets foundation: retrieval, answer grounding, tool routing, workflow events and safety regression cases before expanding agentic behavior\n- [x] P8.5 execution foundation: workflow run / step / event JSONL timeline plus controlled local-knowledge tool use behind typed schemas, permissions and audit logs\n- [x] P9 web UI: React + Vite + TypeScript console implemented with non-streaming chat, document upload/indexing, source table, workflow timeline detail, controlled tool preview/call and memory status panels; streaming chat, auth, CORS and production static hosting remain planned\n- [ ] P10 hardening and integration: optional local auth/CORS implemented; Docker, OpenAPI examples, optional read-only MCP server, trace_id, token usage, latency, provider fallback logs and streaming remain planned\n- [ ] P11 optional RPA: browser automation as a future read-first adapter for no-API learning systems, gated by domain allowlists and human confirmation\n\n\n## 许可\n\n仅供个人学习使用。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F2002yy%2Fstudy-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F2002yy%2Fstudy-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F2002yy%2Fstudy-agent/lists"}