An open API service indexing awesome lists of open source software.

https://github.com/willoscar/arxiv_paper


https://github.com/willoscar/arxiv_paper

arxiv arxiv-org arxiv-papers automatic curation interest langgraph llm paper pipeline research webhook

Last synced: 5 months ago
JSON representation

Awesome Lists containing this project

README

          

# arXiv Paper Bot - Your Personal AI Paper Assistant

A personalized and self-learning AI assistant for arXiv papers. It not only automates your research paper feed but also continuously learns and adapts to your academic preferences based on daily feedback, enabling truly personalized content recommendations.

Say goodbye to manually browsing arXiv or filtering by keywords β€” let the knowledge that understands you the most come to you proactively.

## ✨ Core Features

- **🎯 Accurate Retrieval**: Customize one or more arXiv categories (e.g., `cs.CV`, `stat.ML`) to fetch the latest papers daily on a scheduled basis.

- **🧠 Dual-Mode Intelligent Filtering**:
- **Static Mode**: Quickly filters papers based on your predefined keyword list.
- **Dynamic Mode**: Activates the **Self-Learning Preference Model**, ranking papers based on semantic relevance β€” the more you use it, the better it understands you.

- **πŸ€– AI-Powered Summarization**: Uses large language models (LLMs, e.g., OpenAI GPT) to distill long abstracts into structured short summaries like β€œone-sentence highlight” and β€œcore method,” helping you grasp the essence in seconds.

- **πŸ’‘ Self-Learning Preference Model**:
- Learns your true interests based on '**πŸ‘ Like**' / '**πŸ‘Ž Dislike**' feedback on pushed messages.
- Automatically adjusts recommendation weights for better precision, and surfaces β€œdark horse” papers you might otherwise miss.

- **πŸš€ Multi-Platform Push**: Seamlessly delivers content to your favorite platforms. Currently supported:
- **Feishu Group Bot** (Webhook with interactive cards)
- **Telegram Bot** (with interactive buttons)
- **WeChat Work Bot** (Webhook)
- **Local Markdown/JSON files**, for archiving and further processing

- **βš™οΈ Highly Configurable**: All features are managed via a single `config.yaml` file, including fetch rules, filter modes, LLM API keys, and push channel switches β€” no need to touch the codebase.

## πŸ” Workflow (with Feedback Loop)

The core workflow is an intelligent pipeline with a **closed feedback loop**:

**[Scheduled Trigger] β†’ [β‘  Fetch] β†’ [β‘‘ Filter & Rank] β†’ [β‘’ AI Summarize] β†’ [β‘£ Push] β†’ [β‘€ User Feedback πŸ‘/πŸ‘Ž] β†’ [β‘₯ Update Preference Model] β†’ (Affects Next-Day Step β‘‘)**

1. **Fetch**: Scheduled task retrieves new papers published that day.
2. **Filter & Rank**:
- **Static Mode**: Scores papers based only on keywords.
- **Dynamic Mode**: Computes a personalized recommendation score using the **preference model**, combined with keyword scores.
3. **Summarize**: Generates concise AI-powered summaries for top-ranked papers.
4. **Push**: Sends summaries with `πŸ‘/πŸ‘Ž` buttons to your configured platforms.
5. **Feedback**: User reactions help the system understand preferences.
6. **Learn**: System updates your **preference vector** in real time for future recommendations.

## πŸ› οΈ Tech Stack

- **Language**: Python 3.8+
- **Core Libraries**:
- Paper Retrieval: `arxiv`
- AI Summarization: `openai`
- Task Scheduling: `APScheduler`
- **Preference Model**:
- Embedding: `sentence-transformers`
- Vector Similarity: `scikit-learn`, `numpy`
- Storage: Built-in `SQLite`
- **Utilities**:
- Push Service: `requests`
- Config Management: `PyYAML`

## πŸš€ Quick Start