https://github.com/S4mpl3r/chat-with-pdf
Chat with your PDF files for free, using Langchain, Groq, ChromaDB, and Jina AI embeddings.
https://github.com/S4mpl3r/chat-with-pdf
chat-with-pdf embeddings groq groq-ai jina langchain llama llama3 llm machine-learning mixtral-8x7b python python3 rag retrieval-augmented-generation
Last synced: 7 months ago
JSON representation
Chat with your PDF files for free, using Langchain, Groq, ChromaDB, and Jina AI embeddings.
- Host: GitHub
- URL: https://github.com/S4mpl3r/chat-with-pdf
- Owner: S4mpl3r
- License: mit
- Created: 2024-03-10T09:59:00.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-16T17:38:58.000Z (over 1 year ago)
- Last Synced: 2024-10-18T21:16:48.699Z (12 months ago)
- Topics: chat-with-pdf, embeddings, groq, groq-ai, jina, langchain, llama, llama3, llm, machine-learning, mixtral-8x7b, python, python3, rag, retrieval-augmented-generation
- Language: Python
- Homepage:
- Size: 12.7 KB
- Stars: 15
- Watchers: 2
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Chat With PDFs
Chat with your PDF files for free, using [Langchain](https://python.langchain.com/docs/get_started/quickstart), [Groq](https://console.groq.com/), [Chroma](https://docs.trychroma.com/getting-started) vector store, and [Jina AI](https://jina.ai/embeddings/) embeddings. This repository contains a simple Python implementation of the RAG (Retrieval-Augmented-Generation) system. The RAG model is used to retrieve relevant chunks of the user PDF file based on user queries and provide informative responses.## Installation
Follow these steps:
1. Clone the repository
```
git clone https://github.com/S4mpl3r/chat-with-pdf.git
```
2. Create a virtual environment and activate it (optional, but highly recommended).
```
python -m venv .venv
Windows: .venv\Scripts\activate
Linux: source .venv/bin/activate
```
3. Install required packages:
```
python -m pip install -r requirements.txt
```
4. Create a .env file in the root of the project and populate it with the following keys. You'll need to obtain your api keys:
```
JINA_API_KEY=
GROQ_API_KEY=
HF_TOKEN=
HF_HOME=
```
5. Run the program:
```
python main.py
```
## Configuration
You can customize the behavior of the system by modifying the constants and parameters in the main.py file:* EMBED_MODEL_NAME: Specify the name of the Jina embedding model to be used.
* LLM_NAME: Specify the name of the language model (Refer to [Groq](https://groq.com/) for the list of available models).
* LLM_TEMPERATURE: Set the temperature parameter for the language model.
* CHUNK_SIZE: Specify the maximum chunk size allowed by the embedding model.
* DOCUMENT_DIR: Specify the directory where PDF documents are stored.
* VECTOR_STORE_DIR: Specify the directory where vector embeddings are stored.
* COLLECTION_NAME: Specify the name of the collection for the chroma vector store.## Resources
Kudos to the amazing libraries and services listed below:
* [Langchain](https://www.langchain.com/)
* [Groq](https://groq.com/)
* [Jina AI](https://jina.ai/)
* [ChromaDB](https://www.trychroma.com/)## License
MIT