https://github.com/ibebbs/llm-clasification
https://github.com/ibebbs/llm-clasification
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/ibebbs/llm-clasification
- Owner: ibebbs
- Created: 2023-05-05T10:37:14.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2023-05-05T10:37:24.000Z (about 3 years ago)
- Last Synced: 2025-07-29T14:58:21.977Z (11 months ago)
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
## Docker
This solution is best deployed via Docker using the attached `dockerfile`. Instructions below and for Windows and may need some finessing for other platforms.
### Requirements:
1. Docker Desktop
2. A CUDA compatible GPU with >20Gb of VRAM (i.e. an RTX 3080 or above; I use an RTX A4500)
3. Latest GFX card drives installer
### Build
First build the docker image by running the following command from the repo directory: `docker build -t /llama-lora-tuner:hackdays .` (replace username with your docker hub user name if you have one, otherwise any other short value)
This will download and install all the requirements to run `https://github.com/zetavg/LLaMA-LoRA`
### Running
Once built, the docker container can be run using: `docker run -it --rm --gpus=all -p 7860:7860 -v ${PWD}/data:/data -v ${PWD}/cache:/LLaMA-LoRA-Tuner/cache -e PYTORCH_TRANSFORMERS_CACHE=/LLaMA-LoRA-Tuner/cache ibebbs/llama-lora-tuner:hackdays` (replace username with your docker hub user name if you have one, otherwise any other short value)
The first time you run this container, it will download a base model (decapoda/llama-7b) for evaluation/fine tuning to the local `cache` directory (so subsequent runs don't need to download it again). Once downloaded and initialized, a webserver will be started at "http://0.0.0.0:7860" which can be loaded in your browser by visiting "http://localhost:7860".
### Validating
One running, the app should have everything needed to perform inference. Simply enter a prompt (i.e. "Tell me about Alpacas") in the "Instrunction" box and click the "Generate" button. In 40-50 seconds (the first time is slow as it prepares the model for inference, subsequent calls should be much faster) you should see some text generated in the "Output" box.
### Restarting
Now everything is downloaded, you should be able to quickly restart the docker container without needing to re-download the model by using the same command above.