https://github.com/torchstack-ai/rag-summarization-pdf

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/torchstack-ai/rag-summarization-pdf
Owner: torchstack-ai
License: gpl-3.0
Created: 2025-03-03T22:51:40.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-03T22:53:44.000Z (over 1 year ago)
Last Synced: 2025-05-05T09:28:37.536Z (about 1 year ago)
Language: Jupyter Notebook
Size: 4.29 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# RAG Paper Summarizer

A Python application that implements Retrieval Augmented Generation (RAG) to download and summarize academic papers. Currently configured to process the ReAct paper from arXiv.

## Features

- Automatic paper download from arXiv
- PDF processing and text chunking
- Vector store creation using Chroma
- RAG-based summarization using OpenAI's GPT-4 and LangChain

## Prerequisites

- Python 3.x
- OpenAI API key

## Installation

1. Clone the repository:
```bash
git clone
cd rag-inboundsquare
```

2. Install the required dependencies:
```bash
pip install -r requirements.txt
```

3. Create a `.env` file in the root directory and add your OpenAI API key:
```bash
OPENAI_API_KEY=your_api_key_here
```

## Usage

Run the script using:

```bash
python rag.py
```

The script will:
1. Download the ReAct paper if not already present
2. Process the PDF and split it into chunks
3. Create a vector store using Chroma
4. Generate a comprehensive summary using RAG

## Dependencies

- langchain
- openai
- chromadb
- arxiv
- python-dotenv
- requests

## Note

The current implementation is configured to summarize the ReAct paper (arXiv:2210.03629). You can modify the `process_pdf` function to work with other papers or PDF documents.

## License

[Add your license here]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/torchstack-ai/rag-summarization-pdf

Awesome Lists containing this project

README