https://github.com/uberi/robot-agent

Fine-tuned LLaMa2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting.
https://github.com/uberi/robot-agent

Last synced: 5 months ago
JSON representation

Fine-tuned LLaMa2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting.

Host: GitHub
URL: https://github.com/uberi/robot-agent
Owner: Uberi
License: mit
Created: 2023-07-15T23:20:55.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-07-23T08:13:19.000Z (about 3 years ago)
Last Synced: 2025-04-02T16:50:25.524Z (over 1 year ago)
Language: Python
Homepage:
Size: 44.9 KB
Stars: 18
Watchers: 2
Forks: 3
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Robot Agent
===========

Fine-tuned Llama2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting. The codebase has the following desirable features:

* Entire training procedure runs out of the box on a single computer with 32GB of RAM and 24GB of VRAM (i.e. consumer-grade graphics cards such as the RTX 3090 and RTX 4090) with less than 30 hours of compute time.
* Carefully tuned to use no more than 27GiB of RAM and 23.6GiB of VRAM.
* This is accomplished through quantization, FP16, TF32, and the usual gradient accumulation/checkpointing settings.
* Training is fully interruptible/resumable.
* Heavily commented, short, clean, and reproducible training code.
* All library dependency versions fully pinned, base models and datasets are pinned and downloaded as part of setup process.
* After initial setup, training process does not require network access - entire project folder is portable, can be moved into airgapped and offline environments.
* Use SafeTensors everywhere for speed and security.

Technical details:

* Based on [Llama2 13B](https://huggingface.co/NousResearch/Llama-2-13b-hf).
* QLoRA training, a 128 rank LoRA similar to [Guanaco](https://github.com/artidoro/qlora/blob/cc488110b5ea23594a418daca7085000a9420625/qlora.py#L324).
* 2048-token context window used in supervised finetuning, 1536-token context window used in direct preference finetuning.
* Supervised finetuning using [Airoboros' self-instruct dataset](https://huggingface.co/datasets/jondurbin/airoboros-gpt4-1.4.1), generated by [Airoboros' self-instruct implementation](https://github.com/jondurbin/airoboros).
* The dataset has been filtered for refusals, and so could be considered "uncensored".
* The dataset generation code also uses a GPT4 jailbreak to reduce the number of refusals in the first place.
* Direct preference finetuning using [Anthropic's hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf)
* This replaces the reward modelling and reinforcement learning steps in a standard RLHF pipeline.
* Codebase takes ideas and inspiration from [StackLLaMa](https://github.com/lvwerra/trl/tree/5c7bfbc8d9aeabee893290cc02121d7260636978/examples/research_projects/stack_llama/scripts), [QLoRA](https://github.com/artidoro/qlora), [LLaMA-TRL](https://github.com/jasonvanf/llama-trl), [Airoboros](https://github.com/jondurbin/airoboros), .

Roadmap
-------

* [x] Full reproducible environment with all datasets, base models, and dependencies included.
* [x] Supervised finetuning script using high-quality publically-available instruct datasets.
* [x] Human-preference finetuning script based on Anthropic's hh-rlhf "helpfulness" dataset.
* [x] Accidentally delete the training results on my GPU server and start the training over again from scratch.
* [ ] Fiddle with agentic dataset generation using Charades dataset.
* [ ] If that doesn't work, fiddle with video captioning using multimodal models like Otter to generate agentic captions from how-to videos on Youtube.

Prompt Format
-------------

```
### Human:
INSTRUCTIONS_GO_HERE

### Assistant:
```

Note that there is a single newline at the end of the prompt. Example:

```
### Human:
What color is the sky?

### Assistant:
The sky is blue.
```

Training
--------

First, download everything that requires an internet connection into the current project folder. It will increase to around 30GiB in size:

```sh
make download-datasets-and-models
```

Next, transfer the current project folder to the training machine, where the rest of the training can be performed fully offline:

```sh
make train
```

Inference
---------

To use the model, a simple chat-like interface is included for demo purposes, it's not very fancy but it's good enough for testing purposes:

```sh
make chat
```

### Using Llama.cpp

First, run the following command to create `./exported-models/ggml-robot-agent-q5_K_M.bin`, an 8.6GiB GGML file compatible with Llama.cpp:

```sh
make generate-ggml
```

Now to load the model using Llama.cpp:

```sh
make chat-llama-cpp
```

To use Llama.cpp manually, navigate to your llama.cpp folder and start using the model with the following command (replace `PATH_TO_PROJECT_FOLDER` with the path to the current project folder):

```sh
./main --model PATH_TO_PROJECT_FOLDER/exported-models/ggml-robot-agent-q5_K_M.bin --color --interactive --interactive-first --mirostat 2 --ctx-size 2048 --reverse-prompt $'\n\n### Human:\n' --prompt $'\n\n### Human:\n' --in-suffix $'\n### Assistant:\n'
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/uberi/robot-agent

Awesome Lists containing this project

README