https://github.com/tristan-mcinnis/arXiv-Paper-Processor
This project allows you to search for academic papers on arXiv, download and process them, and generate responses to specific questions using embeddings and language models. The application leverages several tools including Gradio for the interface, ChromaDB for embedding storage, and LangChain for text processing.
https://github.com/tristan-mcinnis/arXiv-Paper-Processor
arxiv langhchain ollama research
Last synced: 4 days ago
JSON representation
This project allows you to search for academic papers on arXiv, download and process them, and generate responses to specific questions using embeddings and language models. The application leverages several tools including Gradio for the interface, ChromaDB for embedding storage, and LangChain for text processing.
- Host: GitHub
- URL: https://github.com/tristan-mcinnis/arXiv-Paper-Processor
- Owner: tristan-mcinnis
- License: mit
- Created: 2024-08-10T09:04:11.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-10T09:10:50.000Z (about 1 year ago)
- Last Synced: 2024-12-22T06:40:11.164Z (10 months ago)
- Topics: arxiv, langhchain, ollama, research
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# arXiv Paper Processor
This project allows you to search for academic papers on arXiv, download and process them, and generate responses to specific questions using embeddings and language models. The application leverages several tools including Gradio for the interface, ChromaDB for embedding storage, and LangChain for text processing.
## Features
- **arXiv Paper Search**: Search for papers based on a query, download them in PDF format, and extract relevant metadata.
- **Text Processing**: Extract and split the content of downloaded papers into manageable chunks.
- **Embedding Generation**: Generate text embeddings using the `nomic-embed-text` model and store them in ChromaDB.
- **Document Retrieval**: Retrieve the most relevant document based on a query using the generated embeddings.
- **Response Generation**: Generate responses to specific questions using the LLaMA model, incorporating references to the source documents.## Requirements
- Python 3.8+
- Install dependencies with:
```bash
pip install -r requirements.txt
```Usage
Clone the Repository:```bash
git clone https://github.com/yourusername/arXiv-Paper-Processor.git
```
cd arXiv-Paper-Processor
Install Dependencies:```bash
pip install -r requirements.txt
```## Run the Application:
```bash
python app.py
```
Access the Gradio Interface:
The interface will launch in your web browser. Enter your search query and question to start processing papers.## Project Structure
app.py: Main application file containing the core logic.
requirements.txt: List of required Python packages.
README.md: Project documentation.
License
This project is licensed under the MIT License. See the LICENSE file for more details.