https://github.com/zjzhao1002/arxivflow
A Python-based automation tool that streamlines research paper tracking by fetching data from arXiv, downloading PDFs, performing local AI-driven keyword extraction, and synchronizing everything to Google Sheets.
https://github.com/zjzhao1002/arxivflow
artificial-intelligence arxiv-api automation google-sheets-api
Last synced: about 2 months ago
JSON representation
A Python-based automation tool that streamlines research paper tracking by fetching data from arXiv, downloading PDFs, performing local AI-driven keyword extraction, and synchronizing everything to Google Sheets.
- Host: GitHub
- URL: https://github.com/zjzhao1002/arxivflow
- Owner: zjzhao1002
- License: mit
- Created: 2026-04-23T11:28:38.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-01T03:57:50.000Z (about 2 months ago)
- Last Synced: 2026-05-01T04:20:11.516Z (about 2 months ago)
- Topics: artificial-intelligence, arxiv-api, automation, google-sheets-api
- Language: Python
- Homepage:
- Size: 26.4 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# arXivFlow 🚀
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://pypi.org/project/arxivflow/)
[](https://ollama.ai/)
[](https://arxiv.org/help/api/index)
**arXivFlow** is a powerful Python-based automation tool designed to streamline the research paper discovery and tracking process. It autonomously fetches metadata from arXiv, performs local AI-driven analysis using **Ollama (Llama 3.2)**, and synchronizes the results with **Google Sheets** and local databases.
---
## ✨ Features
- **Automated Retrieval**: Fetch the latest papers from specific arXiv categories (e.g., `cs.AI`, `cs.LG`, `hep-ph`) within any date range.
- **Local AI Analysis**: Uses **Ollama (Llama 3.2)** to extract keywords and contact information (emails/affiliations) directly from PDF text. No cloud API costs or data privacy concerns.
- **Intelligent PDF Handling**: Automatically downloads PDFs and extracts text for deep analysis. Supports custom storage paths.
- **Multi-Format Export**: Save your research data to **CSV**, **JSON**, **Excel**, or **SQLite** for flexible offline analysis.
- **Google Sheets Sync**: Seamlessly push compiled research data to a shared Google Sheet for team collaboration.
- **Type-Safe & Modular**: Clean, documented Python code with full type hinting and a class-based architecture.
---
## 🛠️ Prerequisites
1. **Python 3.13+**: Ensure you have a modern Python environment.
2. **Ollama**: Install [Ollama](https://ollama.ai/) and download the required model:
```bash
ollama pull llama3.2
```
3. **Google Cloud Credentials**:
- Enable the **Google Sheets** and **Google Drive** APIs.
- Create a **Service Account** and download the JSON key as `credentials.json`.
- Ensure the service account has 'Editor' permissions on the sheet.
---
## 🚀 Installation
### From PyPI (Recommended)
```bash
pip install arxivflow
```
### From Source (For Development)
1. **Clone the repository**:
```bash
git clone https://github.com/zjzhao/arXivFlow.git
cd arXivFlow
```
2. **Set up virtual environment**:
```bash
python -m venv .
source bin/activate # On Windows: Scripts\activate
```
3. **Install dependencies**:
```bash
pip install -e .
```
---
## 📖 Usage
### Quick Start
```python
from arxivflow import arXivFlow
import datetime
# 1. Initialize the flow
flow = arXivFlow(
categories=["cs.AI", "cs.CV"],
ollama_model="llama3.2",
max_results=20,
start_date=datetime.datetime.now() - datetime.timedelta(days=7)
)
# 2. Fetch data & Extract info (Keywords/Contacts)
df = flow.get_arxiv_data(download_pdfs=True)
# 3. Save to your preferred formats
flow.save_to_csv("my_research.csv")
flow.save_to_sqlite("research.db")
# 4. Sync with Google Sheets
flow.save_to_google_sheet(
sheet_id="YOUR_SHEET_ID",
credentials_file="credentials.json"
)
```
---
## 🏗️ Architecture
The project follows a modular structure for easy extension:
- `src/arxivflow/arxivflow.py`: The main orchestrator class (`arXivFlow`).
- `src/arxivflow/ollama_functions.py`: Local LLM interface using the Ollama API.
---
## 📜 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🤝 Contributing
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request