https://github.com/willoscar/arxiv_paper
https://github.com/willoscar/arxiv_paper
arxiv arxiv-org arxiv-papers automatic curation interest langgraph llm paper pipeline research webhook
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/willoscar/arxiv_paper
- Owner: WILLOSCAR
- Created: 2025-07-27T15:15:41.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2026-02-07T11:34:49.000Z (5 months ago)
- Last Synced: 2026-02-07T20:16:56.507Z (5 months ago)
- Topics: arxiv, arxiv-org, arxiv-papers, automatic, curation, interest, langgraph, llm, paper, pipeline, research, webhook
- Language: Python
- Homepage:
- Size: 410 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# arXiv Paper Bot - Your Personal AI Paper Assistant
A personalized and self-learning AI assistant for arXiv papers. It not only automates your research paper feed but also continuously learns and adapts to your academic preferences based on daily feedback, enabling truly personalized content recommendations.
Say goodbye to manually browsing arXiv or filtering by keywords β let the knowledge that understands you the most come to you proactively.
## β¨ Core Features
- **π― Accurate Retrieval**: Customize one or more arXiv categories (e.g., `cs.CV`, `stat.ML`) to fetch the latest papers daily on a scheduled basis.
- **π§ Dual-Mode Intelligent Filtering**:
- **Static Mode**: Quickly filters papers based on your predefined keyword list.
- **Dynamic Mode**: Activates the **Self-Learning Preference Model**, ranking papers based on semantic relevance β the more you use it, the better it understands you.
- **π€ AI-Powered Summarization**: Uses large language models (LLMs, e.g., OpenAI GPT) to distill long abstracts into structured short summaries like βone-sentence highlightβ and βcore method,β helping you grasp the essence in seconds.
- **π‘ Self-Learning Preference Model**:
- Learns your true interests based on '**π Like**' / '**π Dislike**' feedback on pushed messages.
- Automatically adjusts recommendation weights for better precision, and surfaces βdark horseβ papers you might otherwise miss.
- **π Multi-Platform Push**: Seamlessly delivers content to your favorite platforms. Currently supported:
- **Feishu Group Bot** (Webhook with interactive cards)
- **Telegram Bot** (with interactive buttons)
- **WeChat Work Bot** (Webhook)
- **Local Markdown/JSON files**, for archiving and further processing
- **βοΈ Highly Configurable**: All features are managed via a single `config.yaml` file, including fetch rules, filter modes, LLM API keys, and push channel switches β no need to touch the codebase.
## π Workflow (with Feedback Loop)
The core workflow is an intelligent pipeline with a **closed feedback loop**:
**[Scheduled Trigger] β [β Fetch] β [β‘ Filter & Rank] β [β’ AI Summarize] β [β£ Push] β [β€ User Feedback π/π] β [β₯ Update Preference Model] β (Affects Next-Day Step β‘)**
1. **Fetch**: Scheduled task retrieves new papers published that day.
2. **Filter & Rank**:
- **Static Mode**: Scores papers based only on keywords.
- **Dynamic Mode**: Computes a personalized recommendation score using the **preference model**, combined with keyword scores.
3. **Summarize**: Generates concise AI-powered summaries for top-ranked papers.
4. **Push**: Sends summaries with `π/π` buttons to your configured platforms.
5. **Feedback**: User reactions help the system understand preferences.
6. **Learn**: System updates your **preference vector** in real time for future recommendations.
## π οΈ Tech Stack
- **Language**: Python 3.8+
- **Core Libraries**:
- Paper Retrieval: `arxiv`
- AI Summarization: `openai`
- Task Scheduling: `APScheduler`
- **Preference Model**:
- Embedding: `sentence-transformers`
- Vector Similarity: `scikit-learn`, `numpy`
- Storage: Built-in `SQLite`
- **Utilities**:
- Push Service: `requests`
- Config Management: `PyYAML`
## π Quick Start