https://github.com/orengrinker/pdfllm

The PDF Question Answering App uses Streamlit for a user-friendly interface where users can upload PDFs and ask questions. It employs LlamaIndex to index PDF content and PyMuPDF4LLM to parse files, enabling efficient, accurate answers based on the document’s text.
https://github.com/orengrinker/pdfllm

llamaindex openai pymupdf pymupdf4llm python3 streamlit

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/orengrinker/pdfllm
Owner: OrenGrinker
Created: 2024-11-07T13:50:59.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-11-16T16:54:12.000Z (7 months ago)
Last Synced: 2025-01-21T05:28:33.530Z (5 months ago)
Topics: llamaindex, openai, pymupdf, pymupdf4llm, python3, streamlit
Language: Python
Homepage:
Size: 6.84 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# PDF Question Answering App

This is a **Streamlit-based application** that allows users to upload multiple PDF files and ask questions about their content. The application uses **OpenAI language models** to interpret queries and provide answers based on the uploaded PDFs. The app employs a vector-based index to enhance the question-answering capabilities on multi-page or multi-document PDFs.

## Features

- **Multi-Document Upload**: Upload up to 5 PDF files simultaneously.
- **Model Selection**: Choose between different OpenAI language models for custom performance.
- **Dynamic Index Creation**: Automatically creates an index from uploaded PDFs to enable efficient querying.
- **Interactive Q&A**: Enter questions and receive detailed answers based on the document content.
- **Conversation History**: Track past questions and answers within the session.

## Requirements

- **Python 3.8+**
- **Streamlit**
- **OpenAI API Key**
- **PyMuPDF4LLM**
- **LlamaIndex**

## Setup and Installation

1. **Clone the Repository**

```bash
git clone https://github.com/OrenGrinker/pdfLLM.git
cd pdfLLM
```

2. **Install Dependencies**

Make sure you have pip installed. Run the following command to install the required Python packages:
```bash
pip install -r requirements.txt
```

3.**Set Up OpenAI API Key**

The app requires an OpenAI API key to function. You will be prompted to enter your API key when running the application.

## Usage

1. **Run the Application**

Start the app by running the following command:
```bash
streamlit run app.py
```

2. **Upload PDF Files**

In the sidebar, you can upload up to 5 PDF files. The app will create an index for efficient querying.

3. **Ask Questions**

Enter questions in the main content area, and the app will retrieve answers based on the content of the uploaded PDFs.

4. **Choose Model**
Select a model from the sidebar (e.g., gpt-4o, gpt-4o-mini, or gpt-4) to adjust response specificity and speed.

## Code Structure

- app.py: Main file that initializes the Streamlit app, sets up user interface components, and handles interactions.
- utils/index_utils.py: Contains functions to create a vector-based index from uploaded PDFs.
- utils/query_utils.py: Provides functions to query the index and retrieve answers based on user input.

## Example Workflow

- Upload your PDF files in the sidebar.
- Choose the desired model for response generation.
- Ask questions about the content of the PDFs in the main input field.
- View answers and scroll through conversation history to see previous queries and responses.

## Contributing

Feel free to open issues or submit pull requests for any improvements or bug fixes.

## License

This project is licensed under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/orengrinker/pdfllm

Awesome Lists containing this project

README