https://github.com/2002yy/study-agent

AI 学习搭子系统 — 联网搜索 + 角色群聊 + 课后总结
https://github.com/2002yy/study-agent

ai learning-assistant llm python streamlit

Last synced: 11 days ago
JSON representation

AI 学习搭子系统 — 联网搜索 + 角色群聊 + 课后总结

Host: GitHub
URL: https://github.com/2002yy/study-agent
Owner: 2002yy
License: mit
Created: 2026-05-09T02:56:05.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-06-07T09:54:56.000Z (19 days ago)
Last Synced: 2026-06-07T11:06:53.426Z (19 days ago)
Topics: ai, learning-assistant, llm, python, streamlit
Language: Python
Homepage:
Size: 63.3 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: docs/SECURITY.md

Awesome Lists containing this project

README

# Study Agent

A local AI learning assistant with long-term memory, role-based group chat,
web search, model routing and context-tier management.

## One-minute Overview

Study Agent 是一个本地优先的 AI 学习助手，重点不是简单调用大模型，而是把 LLM 接入完整应用流程：

- **多 Provider LLM 接入**：OpenAI / DeepSeek / OpenRouter / SiliconFlow / local models
- **长期记忆**：Markdown memory + safe writer
- **上下文分层**：fast / light / deep / archive
- **联网搜索**：RSS / News fetch → article extraction → LLM digest → source tracing
- **RAG MVP**：本地 Markdown / TXT / DOCX / PDF 索引、关键词 / 本地向量原型 / hybrid / backend-vector 检索、可配置 embedding provider、可选 Chroma 持久化、受控本地知识检索工具、引用上下文、来源块、Streamlit 检索/调试面板、聊天注入和 FastAPI RAG / chat / memory 基础接口
- **工程安全**：SSRF protection、detect-secrets、配置模板
- **工程质量**：pytest 测试套件、Ruff、GitHub Actions CI、打包检查

## Highlights

- **Multi-provider LLM client**: OpenAI / DeepSeek / OpenRouter / SiliconFlow / local models
- **Model routing** with fast / light / deep / archive context tiers
- **Long-term memory** based on Markdown files and safe-writer persistence
- **Web search pipeline**: feed registry → URL safety checks → article extraction → LLM digest → auditable source trace
- **RAG MVP**: local Markdown / TXT / DOCX / PDF indexing, lexical / local vector prototype / hybrid / backend-vector retrieval, configurable embedding providers, optional Chroma persistence, a controlled local-knowledge retrieval tool, citation-first context formatting, source blocks, a Streamlit retrieval/debug panel, optional chat injection, FastAPI RAG / chat / memory / tools / workflows foundation endpoints, and a first React / Vite / TypeScript console
- **SSRF protection** for article fetching, **detect-secrets** in CI
- **Batched session logging** and multi-layer caching for performance
- **Performance budget**: mode-based `max_tokens` bounds on the main chat, WeChat, and news LLM paths
- **314 pytest tests**, Ruff clean, mypy clean, GitHub Actions CI workflow

For a detailed breakdown of the stack and engineering highlights, see [Technical Stack & Engineering Highlights](docs/TECH_STACK.md).

---

**一个面向个人学习复盘的本地 AI 学习搭子系统** — 支持角色群聊、联网搜索、长期记忆和课后总结。

> 不是又一个 AI 问答工具，而是一个会记住你学什么的 AI 学习伙伴。

---

## 为什么做这个

通用 AI 对话工具擅长回答问题，但不擅长「陪伴学习」：

- 它们不记得你**昨天**学了什么、**上周**卡在了哪里
- 它们不会主动帮你**总结**学习进展
- 它们没有「角色感」—— 严肃还是轻松？鼓励还是挑战？全看随机

Study Agent 的定位很明确：**一个运行在你本地的、有长期记忆的、有角色区分的 AI 学习搭子**。它会记住你的学习轨迹，在群聊中用不同角色和你讨论，课后自动总结进展，并把新的知识写进长期记忆。

---

## Why It Is Not Just a Prompt Demo

普通 AI demo 通常只是把用户输入转发给模型。Study Agent 重点解决的是：

| 问题 | 工程方案 |
|---|---|
| 模型供应商更换困难 | Provider profile + OpenAI-compatible client |
| 上下文越来越长 | context-tier routing |
| 学习记录无法沉淀 | Markdown long-term memory |
| 写入记忆不安全 | safe writer + preview/confirm |
| 联网内容不可追溯 | source-traced news pipeline |
| 运行不稳定 | caching, batched logging, tests, CI |

---

## Demo

| 界面 | 截图 |
|------|------|
| 首页 — 状态看板、当前重点、版本信息 | ![home](assets/screenshots/home.png) |
| 微信群聊 — 三位角色群内讨论 | ![group-chat](assets/screenshots/group-chat.png) |
| 联网搜索 — 多源新闻聚合与来源追溯 | ![news-search](assets/screenshots/news-search.png) |
| 记忆候选 — 课后更新预览与确认写入 | ![memory-capture](assets/screenshots/memory-capture.png) |

---

```
启动 App → 选择学习模式 (氛围/专注度)
│
├── 单人对话 ──→ 提问/讨论 ──→ 课后总结 ──→ 记忆更新
│
└── 微信群聊 ──→ 生成开场 / 聊新闻 / 查资料
│
┌────┴────┐
│ │
联网搜索角色互动讨论
│ │
来源追溯写入观点碰撞
│ │
└────┬────┘
│
课后总结 → 确认 → 写入长期记忆
```

---

## 核心功能

| 功能 | 说明 |
|------|------|
| **单人对话** | 与 AI 一对一讨论学习内容，支持 flash/pro 模型切换 |
| **角色群聊** | 四位角色（三月七、刻晴、纳西妲、流萤）群聊讨论，各有独立人设 |
| **联网搜索** | Google News + Bing News + RSSHub 多源聚合，页面正文三层提取 |
| **来源追溯** | 搜索结果写入群聊记录，可回溯依据 |
| **RAG MVP** | 本地 Markdown / TXT / DOCX / PDF 文档索引，前端面板返回带文件路径、行号、分数、命中词和 score breakdown 的引用片段，并可注入单人聊天和微信群互动回复；FastAPI 提供 `/health`、`/rag`、`/rag/index`、`/rag/query`、`/rag/status`、`/rag/upload`、`/rag/local-knowledge` |
| **课后总结** | 学习完成后自动总结进展，用户确认后写入记忆 |
| **长期记忆** | 学习者画像、进度追踪、项目上下文、当前焦点，多级记忆档案 |
| **多 Provider** | 支持 OpenAI / DeepSeek / OpenRouter / SiliconFlow / 本地模型 |
| **氛围选择** | warm / close / standard 多种互动氛围切换 |

---

## 架构

![architecture](assets/screenshots/arch.png)

```
streamlit run app.py
│
┌──────┴──────┐
│ app.py │ Streamlit 入口，路由到各 UI 面板
└──────┬──────┘
│
┌──────┴──────────────────────────────────────────┐
│ src/ui/ │
│ ├── main_panel.py 主页 │
│ ├── chat_panel.py 对话面板 │
│ ├── wechat_panel.py 微信群面板 │
│ ├── after_session_panel.py 课后总结面板 │
│ └── sidebar.py 侧边栏 │
└──────┬──────────────────────────────────────────┘
│
┌──────┴──────┬──────────────┬──────────────┬──────────────┐
│ LLM Layer │ News Layer │ Memory │ WeChat │
│ │ │ Layer │ Layer │
│ llm_client │ news/ │ memory.py │ wechat_*.py │
│ llm_router │ ├─rss_fetc │ memory_tools │ (format, │
│ context_bui │ ├─article_e │ memory_writer│ state, │
│ -ilder │ ├─link_reso │ │ generator, │
│ │ ├─digest │ session_log │ prompt) │
│ config.py │ └─article_f │ -ger │ │
│ router.py │ etcher │ │ wechat_serv│
│ │ │ │ -ice.py │
└──────┬──────┴──────┬──────┴──────┬───────┴──────┬───────┘
│ │ │ │
.env.example chat/ memory/ roles/
(5 providers) (群聊记录) (记忆文件) (角色人设)
```

---

## 快速开始

```bash
git clone study-agent
cd study-agent
cp .env.example .env
# 编辑 .env，填入 API Key

# 初始化记忆文件（新用户首次运行，应用会自动创建；也可手动复制模板）
cp -r memory.example/* memory/ 2>/dev/null || :

# 稳定安装（推荐，锁定版本）
pip install -r requirements.txt
pip install -r requirements-dev.txt

streamlit run app.py
```

浏览器打开 `http://localhost:8501`

### 依赖管理

本项目使用 [pip-tools](https://github.com/jazzband/pip-tools) 管理依赖：

- [`requirements.in`](requirements.in) / [`requirements-dev.in`](requirements-dev.in) — **人类维护**，写范围版本
- [`requirements.txt`](requirements.txt) / [`requirements-dev.txt`](requirements-dev.txt) — **自动生成**，写精确版本（lock 文件）

修改依赖后重新生成 lock 文件：

```bash
pip install pip-tools
pip-compile requirements.in # 重新锁定主依赖
pip-compile requirements-dev.in # 重新锁定开发依赖
```

---

## 环境配置

通过 `LLM_PROVIDER_PROFILE` 切换 LLM 提供商（`openai` / `deepseek` / `openrouter` / `siliconflow` / `local`），每个 provider 读写独立的环境变量：

| Provider | 环境变量前缀 | 默认 Base URL |
|----------|-------------|---------------|
| `deepseek` | `DEEPSEEK_*` | `https://api.deepseek.com/v1` |
| `openrouter` | `OPENROUTER_*` | `https://openrouter.ai/api/v1` |
| `siliconflow` | `SILICONFLOW_*` | `https://api.siliconflow.cn/v1` |
| `local` | `LOCAL_*` | `http://127.0.0.1:8000/v1` |
| `openai` | `OPENAI_*` | — |

参数优先级：代码显式参数 → 任务级环境变量 → 任务默认值 → 全局环境变量 → provider 级环境变量。完整配置见 [`.env.example`](.env.example) 和 [用户指南](USER_GUIDE.md)。

RAG 向量后端默认使用 `local`，不需要额外服务；可选 `chroma` adapter 需要用户自行安装 `chromadb`。Embedding provider 默认 `local_hash`，生产检索可显式切到 OpenAI-compatible embeddings：

```bash
RAG_VECTOR_BACKEND=local
# RAG_VECTOR_BACKEND=chroma
# RAG_CHROMA_PATH=logs/chroma
# RAG_CHROMA_COLLECTION=study_agent

RAG_EMBEDDING_PROVIDER=local_hash
# RAG_EMBEDDING_PROVIDER=openai
# RAG_EMBEDDING_MODEL=text-embedding-3-small
# RAG_EMBEDDING_DIMENSIONS=1536
# RAG_EMBEDDING_API_KEY=...
```

---

## 项目结构

```
├── app.py # Streamlit 入口
├── src/
│ ├── llm_client.py # LLM 调用（chat / stream）
│ ├── llm_router.py # 模型路由分发
│ ├── context_builder.py # 上下文构建
│ ├── mode_manager.py # 模式管理（版本/性能/氛围）
│ ├── api.py # FastAPI health / chat / memory / sessions / RAG / tools / workflows endpoints
│ ├── role_manager.py # 角色加载与管理
│ ├── performance_budget.py # 性能预算（max_tokens 分级）
│ ├── memory.py # 记忆系统
│ ├── memory_tools.py # 记忆工具
│ ├── memory_writer.py # 记忆写入
│ ├── wechat_format.py # 群聊文本格式化
│ ├── wechat_state.py # 群聊 I/O、状态管理
│ ├── wechat_generator.py # LLM 生成逻辑
│ ├── wechat_prompt.py # Prompt 模板加载
│ ├── wechat_memory.py # 群聊记忆提取
│ ├── after_session.py # 课后总结
│ ├── session_logger.py # 会话日志
│ ├── config.py # 全局配置
│ ├── router.py # 路由配置
│ ├── news/ # 新闻聚合链路
│ ├── rag/ # 本地 RAG MVP：加载、分块、索引、关键词/向量原型/embedding/可选后端检索
│ ├── tools/ # 受控工具边界：本地知识检索等
│ └── ui/ # Streamlit UI 组件
├── tests/ # pytest 测试套件
├── frontend/ # React + Vite + TypeScript console
├── docs/ # 设计文档与工程说明
│ ├── TECH_STACK.md # 技术栈与项目亮点
│ ├── RAG.md # RAG MVP 状态与边界
│ └── STATE_MODEL.md # 状态模型
├── chat/ # 群聊记录
├── memory/ # AI 长期记忆
├── roles/ # 角色人设
├── templates/ # Prompt 模板
├── config/ # YAML 配置
├── requirements.in # 依赖声明（范围版本）
└── assets/ # 视觉资源
```

---

## 测试

```bash
pytest tests/ -v # current local baseline: 314 passed
pytest tests/ --cov=src # 覆盖率
ruff check src/ tests/ # linting
mypy --explicit-package-bases src/ # type check
```

CI 通过 GitHub Actions 在 push / pull request 上运行，集成 `pytest`、`ruff`、打包检查、`detect-secrets` 扫描，以及 `mypy` soft check。当前验证状态见 [docs/TESTING.md](docs/TESTING.md)。

---

## 版本历史

### v0.8.0 — 文档同步 + UI 中文标签 + 工程收口

文档版本同步（5 份文档统一升级）；UI 中文标签（模型/性能/状态栏全中文）；合并性能预算系统、依赖锁定、状态模型文档化、CI 门禁升级、入口页新闻流程修复。当前验证状态见 [docs/TESTING.md](docs/TESTING.md)。

### v0.7.8 — 性能预算 + 状态模型 + 工程收口

### v0.7.7 — 模块拆分与服务层解耦

新闻链路拆分为 4 个专注模块 + 兼容门面；服务层直连子模块；UI 逐阶段新闻流；SSRF 安全加固；Session logger 自动 flush 保护。**112 tests，Ruff clean**。

### v0.7.6 — 工程安全与新闻链路收口

完整历史见 [CHANGELOG.md](CHANGELOG.md)。

---

## Roadmap

| 版本 | 方向 |
|------|------|
| v0.8.1 | 稳定性和 UI 打磨 |
| v0.9 | 知识库 / RAG 能力 |
| v0.10 | 多语言支持、导出增强 |
| v1.0 | 插件化架构 + 自定义角色 |

---
## Engineering Roadmap

求职导向的技术演进路线：

- [x] FastAPI service layer foundation: `/health`, `/chat`, `/memory/preview`, `/memory/commit`, `/sessions`, `/rag`, `/rag/index`, `/rag/query`, `/rag/status`, `/rag/upload`, `/rag/local-knowledge`, `/tools` and `/workflows/runs` implemented; optional local API token and CORS allowlist implemented; streaming and broader deployment hardening remain planned
- [x] RAG MVP: Markdown / TXT / DOCX / PDF loading, chunking, local keyword retrieval, local vector prototype, hybrid retrieval, backend-vector retrieval, configurable embedding provider, optional Chroma adapter, controlled local-knowledge retrieval, citation context, source blocks, Streamlit retrieval panel, optional single-chat and WeChat interactive injection
- [ ] RAG document QA (partial): PDF parsing has file-size, page-count, extracted-text and encrypted-file guards; production embedding requires explicit API/env configuration and Chroma remains optional
- [ ] Vector store: Chroma optional adapter implemented; FAISS local prototype and pgvector engineering version remain planned
- [x] P8.4 evaluation sets foundation: retrieval, answer grounding, tool routing, workflow events and safety regression cases before expanding agentic behavior
- [x] P8.5 execution foundation: workflow run / step / event JSONL timeline plus controlled local-knowledge tool use behind typed schemas, permissions and audit logs
- [x] P9 web UI: React + Vite + TypeScript console implemented with non-streaming chat, document upload/indexing, source table, workflow timeline detail, controlled tool preview/call and memory status panels; streaming chat, auth, CORS and production static hosting remain planned
- [ ] P10 hardening and integration: optional local auth/CORS implemented; Docker, OpenAPI examples, optional read-only MCP server, trace_id, token usage, latency, provider fallback logs and streaming remain planned
- [ ] P11 optional RPA: browser automation as a future read-first adapter for no-API learning systems, gated by domain allowlists and human confirmation

## 许可

仅供个人学习使用。

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/2002yy/study-agent

Awesome Lists containing this project

README