Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jenkins-infra/enhancing-llm-with-jenkins-knowledge

🚀 this project aims to develop an app using an existing open-source LLM with data collected for domain-specific Jenkins knowledge that can be fine-tuned locally and set up with a proper UI for the user to interact with.
https://github.com/jenkins-infra/enhancing-llm-with-jenkins-knowledge

chatbot llama2 llm python pytorch rag

Last synced: 26 days ago
JSON representation

Host: GitHub
URL: https://github.com/jenkins-infra/enhancing-llm-with-jenkins-knowledge
Owner: jenkins-infra
License: mit
Created: 2024-10-30T12:08:56.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-10-30T12:10:34.000Z (3 months ago)
Last Synced: 2024-12-16T09:58:28.995Z (about 1 month ago)
Topics: chatbot, llama2, llm, python, pytorch, rag
Language: Jupyter Notebook
Homepage: https://www.jenkins.io/blog/2024/05/01/google-summer-of-code-congrats-and-welcome/
Size: 16.4 MB
Stars: 2
Watchers: 9
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Enhancing-LLM-with-Jenkins-Knowledge

## Overview

Built using Python.

This Project is from Google Summer of code 2024

## Setting up env

you may need to update the environment variables set in `BE/.env` and `FE/.env`

### Frontend env setup

- reach to `.env` file in `FE/` directory, you will find the url setup by default to localhost

```sh
VITE_SERVER_URL = http://127.0.0.1:5000/
```

### Backend env setup

- reach to `.env` file in `BE/` directory, you will find also both `HOST` and `PORT` which are configured to localhost be default

```sh
FLASK_RUN_HOST = 0.0.0.0
FLASK_RUN_PORT = 5000
```

## How To Run

Open a new terminal in the project directory

### Frontend server setup

- Need to install [Node](https://nodejs.org/en/download/package-manager) first
- Install all required packages

```sh
cd ./FE
npm install
```
- Start the server

```sh
npm run dev
```
- You will get a message that the server is running at http://localhost:5173/

### Backend server setup

- Install the needed packages.

```sh
cd ./BE
python3 -m venv .
source ./bin/activate
pip install -r ./requirements.txt
```

- Start the server

```sh
python app.py
```

- note that if you are running the BE server for the first time so it will download the model locally on your machine and it is about 6GB, notice: this is for the first time you are running this only

## Fine-Tune your version

You can fine-tune your own version and get it uploaded on hugging face using the following steps

- we fine-tune llama2 using colab free resources of T4 GPU with 16 GB VRAM
- we provided `./src/Fine-Tuning.ipynb`
- we clone our repository to access the dataset provided for training
```sh
git clone https://github.com/nouralmulhem/Enhancing-LLM-with-Jenkins-Knowledge.git
```

- drive is used to store the checkpoints just to ensure its persistance in case of colab enviornment crashes

you can edit the path to drive you want to save the model in by editting `new_model_path` variable

- you also can set the number of epochs you would like to use to fine-tune the model by updating `num_train_epochs` variable

- after getting done with fine-tuning the model you can access `./src/Upload_Model.ipynb` to merge lora weights with the model and upload your own model on hugging face and start using it

- at this stage you need to update `new_model_path` variable to the correct path on your drive

- as a final step you need to update `repo_id` variable to match your repo on hugging face

VOILA! you got your own model

## Convert fine-tuned to GGML

### CPU model

You can load this full model onto the GPU and run it like you would any other hugging face model, but we are here to take it to the next level of running this model on the CPU.

we are using llama.cpp, so first of all we need to clone the repo

```sh
git clone https://github.com/ggerganov/llama.cpp.git
```

Llama.cpp has a script called `convert_hf_to_gguf.py` that is used to convert models to the binary GGML format that can be loaded and run on CPU.

```sh
python convert_hf_to_gguf.py path/to/fine-tuned/model/ --outtype f16 --outfile path/to/binary/model.bin
```

This should output a 13GB binary file at the specified `path/to/binary/model.bin` that is ready to run on CPU with the same code that we started with!

### Quantization

Part of the appeal of the GGML library is being able to quantize this 13GB model into smaller models that can be run even faster. There is a tool called quantize in the Llama.cpp repo that can be used to convert the model to different quantization levels.

First you need to build the tools in the Llama.cpp repository.

```sh
cd llama.cpp
cmake -B build
cmake --build build --config Release
```

This will create the tools in the bin directory. You can now use the quantize tool to shrink our model to q8_0 by running:

```sh
cd build/bin/release
./llama-quantize.exe path/to/binary/model.bin path/to/binary/merged-q8_0.bin q8_0
```
Now we have a 6.7 GB model at path/to/binary/merged-q8_0.bin

To upload the local quantized model on huggingface
```sh
huggingface-cli upload username/repo_id path/to/binary/quantized/model.bin model.bin
```

## Contributors

_{Nour Almulhem}

## 🔒 License

> **Note**: This software is licensed under MIT License, See [License](https://github.com/nouralmulhem/Enhancing-LLM-with-Jenkins-Knowledge/blob/main/LICENSE) for more information ©nouralmulhem.