https://github.com/muktadiur/clark
Chat with private documents(CSV, pdf, docx, doc, txt) using LangChain, OpenAI, HuggingFace, and FastAPI.
https://github.com/muktadiur/clark
fastapi huggingface javascript langchain openai python
Last synced: 5 months ago
JSON representation
Chat with private documents(CSV, pdf, docx, doc, txt) using LangChain, OpenAI, HuggingFace, and FastAPI.
- Host: GitHub
- URL: https://github.com/muktadiur/clark
- Owner: muktadiur
- License: mit
- Created: 2023-06-29T20:08:00.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2024-01-05T23:10:58.000Z (over 2 years ago)
- Last Synced: 2024-01-06T00:49:13.030Z (over 2 years ago)
- Topics: fastapi, huggingface, javascript, langchain, openai, python
- Language: Python
- Homepage:
- Size: 335 KB
- Stars: 7
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Clark
Chat with private documents(CSV, pdf, docx, doc, txt) using LangChain, OpenAI, HuggingFace, GPT4ALL, and FastAPI.

## Installation
Install required packages.
```
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
Rename `.env.example` to `.env` and update the OPENAI_API_KEY [OpenAI API key](https://platform.openai.com/account/api-keys), HUGGINGFACEHUB_API_TOKEN [HuggingFace Access Tokens] (https://huggingface.co/settings/tokens).
Place your own data (CSV, pdf, docx, doc, txt) into `data/` folder.
## Run
### Console
```
python console.py # default: openai
```
```
Welcome to the Clark!
(type 'exit' to quit)
You: what is the capital of Uzbekistan?
Clark: The capital of Uzbekistan is Tashkent.
You: exit
```
### Web
```
python app.py # default: openai
URL:
http://127.0.0.1:8000/
http://127.0.0.1:8000/docs
http://127.0.0.1:8000/redoc
```
### Run in docker
```
docker build -t clark .
docker run -p 8000:8000 -it clark
```
### To use HuggingFace
```
Change the CONVERSATION_ENGINE: from `openai`: to `hf` in the `.env` file.
```
### To use GPT4ALL (Slow)
```
mkdir models
cd models
wget https://huggingface.co/nomic-ai/gpt4all-falcon-ggml/resolve/main/ggml-model-gpt4all-falcon-q4_0.bin
Change the CONVERSATION_ENGINE: from `openai`: to `gpt4all` in the `.env` file.
Now read your documents locally using an LLM.
Note: gpt4all performance is slow for now. The average response time is 50 seconds.
```
## Project structure
```
.
├── Dockerfile
├── LICENSE
├── README.md
├── app.py
├── clark
│ ├── __init__.py
│ ├── base.py
│ ├── document.py
│ ├── gpt4all.py
│ ├── helpers.py
│ ├── hf.py
│ └── openai.py
├── console.py
├── data
│ └── sample_capitals.csv
├── requirements.txt
├── static
│ ├── auth
│ │ ├── login.html
│ │ └── signup.html
│ ├── base.html
│ ├── css
│ │ ├── font-awesome.min.css
│ │ └── main.css
│ ├── home.html
│ ├── images
│ │ ├── clark.png
│ │ └── favicon.ico
│ ├── index.html
│ ├── index.js
│ ├── js
│ └── spinner.gif
└── test_api.py
8 directories, 26 files
```