https://github.com/seonglae/llama2gptq

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
https://github.com/seonglae/llama2gptq

chatai chatbot chatgpt cuda gpt langchain llama-2 llama2 model-quantization quantization question-answering rye streamlit-chat transformers

Last synced: 3 months ago
JSON representation

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

Host: GitHub
URL: https://github.com/seonglae/llama2gptq
Owner: seonglae
License: mit
Created: 2023-06-02T12:31:29.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-11-25T09:23:11.000Z (over 1 year ago)
Last Synced: 2025-04-19T11:53:58.211Z (3 months ago)
Topics: chatai, chatbot, chatgpt, cuda, gpt, langchain, llama-2, llama2, model-quantization, quantization, question-answering, rye, streamlit-chat, transformers
Language: Python
Homepage: https://llama2gptq.nuxt.space/
Size: 9.48 MB
Stars: 29
Watchers: 1
Forks: 0
Open Issues: 5
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

        # LLaMa2 GPTQ

Chat AI which can provide responses with reference documents by Prompt engineering over vector database. It suggests related web pages provided through the integration with my previous product, Texonom.







Pursuing local, private and personal AI without requesting external API attained by optimizing inference performance with GPTQ model quantization. This project was inspired by the [langchain](https://github.com/hwchase17/langchain) projects like [notion-qa](https://github.com/hwchase17/notion-qa), [localGPT](https://github.com/PromtEngineer/localGPT).

# Demos

### CLI Demo

https://github.com/seonglae/llama2gptq/assets/27716524/dba5cd39-ea5c-44d9-bf29-2e8f04039413

### Chat Demo

https://github.com/seonglae/llama2gptq/assets/27716524/258de629-0b61-4670-b76b-9f2357adf4c7




## Install

This project is using [rye](https://mitsuhiko.github.io/rye/) as package manager

Currently only available with [CUDA](https://texonom.com/a9e934a523d346c5a984d95e3d0676e3)

```

rye sync

```

or using pip

```

CUDA_VERSION=cu118

TORCH_VERSION=2.0.1

pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION --force

pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION

pip install .

```

## QA

### 1. Chat with Web UI

```zsh

streamlit run chat.py

```

### 2. Chat with CLI

```zsh

python main.py chat

```

## Ingest Documents

Currently code structure is mainly focussed on Notion's csv exported data

### Custom source documents

```zsh

# Put document files to ./knowledge folder

python main.py process

# Or use provided Texonom DB

git clone https://huggingface.co/datasets/texonom/md-chroma-instructor-xl db

```

## Quantize Model

Default model is orca 3b for now

```zsh

python main quantize --source_model facebook/opt-125m --output opt-125m-4bit-gptq --push

```

## Future Plan

- [ ] [MPS](https://texonom.com/8d71e4de36e4416c83f65ee7bdaa412b) support using dynamic model selecting

- [ ] Stateful Web App support like [chat-langchain](https://chat.langchain.dev/)

## App Stack

### LLM Stack

- [Langchain](https://texonom.com/945567c597364cbb98336ca08c059856) for Prompt Engineering

- [ChromaDB](https://texonom.com/8af886db7d684e03911a86b652620816) for storing embeddings

- [Transformers](https://texonom.com/f5101287cc9249ab812e281e374e5629) for LLM engine

- [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) for Quantization & Inference

### Python Stack

- [Rye](https://texonom.com/rye-429b5d5f3d7f4026ab5d1abd61facc73) for package management

- [Mypy](https://texonom.com/8a894731430f4138ac0fdd522cd74772) for type checking

- [Fire](https://github.com/google/python-fire) for CLI implementation

- [Streamlit](https://texonom.com/9e295c64d27e4999878a022b1c538964) for Web UI implementation

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/seonglae/llama2gptq

Awesome Lists containing this project

README