Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/behkamfallah/chat-duck
This repository is a 'Chat-with-your-PDF' project using RAG approach.
https://github.com/behkamfallah/chat-duck
elasticsearch huggingface hybrid-retrieval knn langchain openai pinecone rag rrf streamlit
Last synced: 3 months ago
JSON representation
This repository is a 'Chat-with-your-PDF' project using RAG approach.
- Host: GitHub
- URL: https://github.com/behkamfallah/chat-duck
- Owner: behkamfallah
- License: mit
- Created: 2024-06-28T10:25:16.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-16T10:59:31.000Z (5 months ago)
- Last Synced: 2024-10-11T22:03:35.798Z (3 months ago)
- Topics: elasticsearch, huggingface, hybrid-retrieval, knn, langchain, openai, pinecone, rag, rrf, streamlit
- Language: Python
- Homepage:
- Size: 6.85 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### About the Project
Chat with your PDFThis repository is a 'Chat-with-your-PDF' project using two different implementations, namely Light and Enterprise. Me and @Pardis-Rahbarsooreh have worked on this project.
### Prerequisites
Ensure that you have installed the libraries in `requirements.txt` which is located in the `.\source\requirements.txt`.
You can run this code from terminal:
```py
!pip install -r requirements.txt
```If you get "recursive_guard" error while running the code, try using python 3.11.
If you would like to fork the repository be sure that create an .env file in the ./source and put the API keys in it.
These APIs will be needed if you would like to fully operate this code:
```py
OPENAI_API_KEY='...'
ELASTIC_API_KEY='...'
ELASTIC_CLOUD_ID='...'
ELASTIC_END_POINT='...'
UNSTRUCTURED_API_KEY='...'
UNSTRUCTURED_SERVER_URL='...'
PINECONE_API_KEY='...'
```### Files and Folders
This repository has three main folders:
1. ```./data``` is the folder you should put your pdf file there.2. ```./source``` is the folder that consists of ```.py``` files.
This folder has these python files with these usages:
1. To insert data to databases, use these files:
1. ```data_to_ElasticCloud.py```
2. ```data_to_Pinecone.py```
Simply specify your file in the line 12 and run the file.
2. To run the whole application on Streamlit you will need the ```streamlit_app.py```:
Open Terminal an change directory to ```./source``` and then type:
```.py
streamlit run streamlit_app.py
```
3. ```document_loader.py``` has the responsibility to Load PDFs. You can call an instance of LoadDocument class that is implemented in this file.
4. ```chunker.py``` has the responsibility to chunk the data. This file is used only for dealing with the data that will be indexed to Pinecone database.
5. ```pinecone_handler.py``` handles the client and connection to Pinecone servers. It also retrieves data.
6. ```elasticsearchhandler.py``` handles the client and connection to Elastic Cloud.
7. ```unstructured_io_handler.py``` handles the connection and getting results from the 'Unstructured.io' servers.
8. ```light_model.py``` has the chain related to Light Model.
9. ```enterprise_model.py``` has the chain related to Enterprise Model.
10. ```test_synthetic_data.py``` is for testing the app via benchmarks. If you want to run this file, remember to change context window of light model and use ```enterprise_model_for_test.py``` instead of ```enterprise_model.py```.