https://github.com/torchstack-ai/rag-summarization-pdf
https://github.com/torchstack-ai/rag-summarization-pdf
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/torchstack-ai/rag-summarization-pdf
- Owner: torchstack-ai
- License: gpl-3.0
- Created: 2025-03-03T22:51:40.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-03T22:53:44.000Z (over 1 year ago)
- Last Synced: 2025-05-05T09:28:37.536Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 4.29 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RAG Paper Summarizer
A Python application that implements Retrieval Augmented Generation (RAG) to download and summarize academic papers. Currently configured to process the ReAct paper from arXiv.
## Features
- Automatic paper download from arXiv
- PDF processing and text chunking
- Vector store creation using Chroma
- RAG-based summarization using OpenAI's GPT-4 and LangChain
## Prerequisites
- Python 3.x
- OpenAI API key
## Installation
1. Clone the repository:
```bash
git clone
cd rag-inboundsquare
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Create a `.env` file in the root directory and add your OpenAI API key:
```bash
OPENAI_API_KEY=your_api_key_here
```
## Usage
Run the script using:
```bash
python rag.py
```
The script will:
1. Download the ReAct paper if not already present
2. Process the PDF and split it into chunks
3. Create a vector store using Chroma
4. Generate a comprehensive summary using RAG
## Dependencies
- langchain
- openai
- chromadb
- arxiv
- python-dotenv
- requests
## Note
The current implementation is configured to summarize the ReAct paper (arXiv:2210.03629). You can modify the `process_pdf` function to work with other papers or PDF documents.
## License
[Add your license here]