https://github.com/prasadg193/gener8-llama2
https://github.com/prasadg193/gener8-llama2
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/prasadg193/gener8-llama2
- Owner: PrasadG193
- License: mit
- Created: 2024-01-12T05:38:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-12T10:25:54.000Z (over 1 year ago)
- Last Synced: 2025-03-26T13:28:25.694Z (7 months ago)
- Language: CSS
- Size: 164 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Gener8-Llama2
Generate Kubernetes resource YAML manifests from a text promptGener8-Llama2 is a simple Kubernetes resource YAML generator based on Meta's Llama-2 model
## Architecture

## Prerequisites
Please make you have Python 3.8.X or higher version
### Requesting access to Llama Models
Request for accessing Llama models [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
You will receive a mail with the URL to download the model which we will use later.
### Setup Llama2 Model
Make sure you have all the repos downloaded: `llama`, and `llama.cpp`First download the `llama-2–7b-chat` model from llama.
```sh
$ cd llama/
$ /bin/bash ./download.sh
Enter the URL from email: https://download.llamameta.net/*?XXXXXXXXXXXXX
Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B-chat
```
### Converting and Quantizing Downloaded Model
Now we have to convert the downloaded model to f16 format and quantize it to reduce its size.1. Build llama.cpp project
```
$ cd llama.cpp
$ make
2. First activate a virtual env and install all the requirements
```sh
$ python3 -m venv llama2
$ source llama2/bin/activate
$ python3 -m pip install -r requirements.txt
```3. Then convert the model into f16 format and quantize it
```sh
$ python3 convert.py --outfile models/7B-chat/ggml-model-f16.bin --outtype f16 ../../llama2/llama/llama-2-7b-chat --vocab-dir ../../llama2/llama
$ ./quantize ./models/7B-chat/ggml-model-f16.bin ./models/7B-chat/ggml-model-q4_0.bin q4_0
```
4. Make sure you change the `vocab_size` in llama/llama-2-7b-chat/params.json to 32000
```sh
$ cat llama/llama-2-7b-chat/params.json
{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": 32000}
```## Build
Before proceeding further, please make sure you have setup the Llama2 model using the steps given in Prerequisites section
1. Run python server
```
$ python app.py
* Serving Flask app 'app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
```2. Use Curl or Webapp to send query to server
To query using webapp, open `/PATH/TO/REPO/Gener8-Llama2/frontend/index.html` in your browser
and enter the description of the K8s resource you want to generate specs for
## Contributing
We love your input! We want to make contributing to this project as easy and transparent as possible, whether it's:
- Reporting a bug
- Discussing the current state of the code
- Submitting a fix
- Proposing new features