https://github.com/uberi/robot-agent
Fine-tuned LLaMa2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting.
https://github.com/uberi/robot-agent
Last synced: 3 months ago
JSON representation
Fine-tuned LLaMa2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting.
- Host: GitHub
- URL: https://github.com/uberi/robot-agent
- Owner: Uberi
- License: mit
- Created: 2023-07-15T23:20:55.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-23T08:13:19.000Z (almost 3 years ago)
- Last Synced: 2025-04-02T16:50:25.524Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 44.9 KB
- Stars: 18
- Watchers: 2
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Robot Agent
===========
Fine-tuned Llama2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting. The codebase has the following desirable features:
* Entire training procedure runs out of the box on a single computer with 32GB of RAM and 24GB of VRAM (i.e. consumer-grade graphics cards such as the RTX 3090 and RTX 4090) with less than 30 hours of compute time.
* Carefully tuned to use no more than 27GiB of RAM and 23.6GiB of VRAM.
* This is accomplished through quantization, FP16, TF32, and the usual gradient accumulation/checkpointing settings.
* Training is fully interruptible/resumable.
* Heavily commented, short, clean, and reproducible training code.
* All library dependency versions fully pinned, base models and datasets are pinned and downloaded as part of setup process.
* After initial setup, training process does not require network access - entire project folder is portable, can be moved into airgapped and offline environments.
* Use SafeTensors everywhere for speed and security.
Technical details:
* Based on [Llama2 13B](https://huggingface.co/NousResearch/Llama-2-13b-hf).
* QLoRA training, a 128 rank LoRA similar to [Guanaco](https://github.com/artidoro/qlora/blob/cc488110b5ea23594a418daca7085000a9420625/qlora.py#L324).
* 2048-token context window used in supervised finetuning, 1536-token context window used in direct preference finetuning.
* Supervised finetuning using [Airoboros' self-instruct dataset](https://huggingface.co/datasets/jondurbin/airoboros-gpt4-1.4.1), generated by [Airoboros' self-instruct implementation](https://github.com/jondurbin/airoboros).
* The dataset has been filtered for refusals, and so could be considered "uncensored".
* The dataset generation code also uses a GPT4 jailbreak to reduce the number of refusals in the first place.
* Direct preference finetuning using [Anthropic's hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf)
* This replaces the reward modelling and reinforcement learning steps in a standard RLHF pipeline.
* Codebase takes ideas and inspiration from [StackLLaMa](https://github.com/lvwerra/trl/tree/5c7bfbc8d9aeabee893290cc02121d7260636978/examples/research_projects/stack_llama/scripts), [QLoRA](https://github.com/artidoro/qlora), [LLaMA-TRL](https://github.com/jasonvanf/llama-trl), [Airoboros](https://github.com/jondurbin/airoboros), .
Roadmap
-------
* [x] Full reproducible environment with all datasets, base models, and dependencies included.
* [x] Supervised finetuning script using high-quality publically-available instruct datasets.
* [x] Human-preference finetuning script based on Anthropic's hh-rlhf "helpfulness" dataset.
* [x] Accidentally delete the training results on my GPU server and start the training over again from scratch.
* [ ] Fiddle with agentic dataset generation using Charades dataset.
* [ ] If that doesn't work, fiddle with video captioning using multimodal models like Otter to generate agentic captions from how-to videos on Youtube.
Prompt Format
-------------
```
### Human:
INSTRUCTIONS_GO_HERE
### Assistant:
```
Note that there is a single newline at the end of the prompt. Example:
```
### Human:
What color is the sky?
### Assistant:
The sky is blue.
```
Training
--------
First, download everything that requires an internet connection into the current project folder. It will increase to around 30GiB in size:
```sh
make download-datasets-and-models
```
Next, transfer the current project folder to the training machine, where the rest of the training can be performed fully offline:
```sh
make train
```
Inference
---------
To use the model, a simple chat-like interface is included for demo purposes, it's not very fancy but it's good enough for testing purposes:
```sh
make chat
```
### Using Llama.cpp
First, run the following command to create `./exported-models/ggml-robot-agent-q5_K_M.bin`, an 8.6GiB GGML file compatible with Llama.cpp:
```sh
make generate-ggml
```
Now to load the model using Llama.cpp:
```sh
make chat-llama-cpp
```
To use Llama.cpp manually, navigate to your llama.cpp folder and start using the model with the following command (replace `PATH_TO_PROJECT_FOLDER` with the path to the current project folder):
```sh
./main --model PATH_TO_PROJECT_FOLDER/exported-models/ggml-robot-agent-q5_K_M.bin --color --interactive --interactive-first --mirostat 2 --ctx-size 2048 --reverse-prompt $'\n\n### Human:\n' --prompt $'\n\n### Human:\n' --in-suffix $'\n### Assistant:\n'
```