https://github.com/anchen1011/fireact

FireAct: Toward Language Agent Fine-tuning
https://github.com/anchen1011/fireact

agent fine-tuning large-language-models llm reasoning

Last synced: 3 months ago
JSON representation

FireAct: Toward Language Agent Fine-tuning

Host: GitHub
URL: https://github.com/anchen1011/fireact
Owner: anchen1011
License: mit
Created: 2023-10-07T19:53:16.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-10-22T17:05:17.000Z (over 1 year ago)
Last Synced: 2025-04-02T11:08:26.754Z (4 months ago)
Topics: agent, fine-tuning, large-language-models, llm, reasoning
Language: Python
Homepage: https://fireact-agent.github.io
Size: 13.4 MB
Stars: 275
Watchers: 2
Forks: 20
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # FireAct: Toward Language Agent Fine-tuning



    

        

    

    

        

    

    

        

    



![teaser](teaser.png)

This repository is based on our publication *FireAct: Toward Language Agent Fine-tuning* ([PDF](https://browse.arxiv.org/pdf/2310.05915.pdf)). It contains prompts, demo code and fine-tuning data we generated. It also includes the description and directory for the model family we fine-tuned. If you use this code or data in your work, please cite:

```

@misc{chen2023fireact,

      title={FireAct: Toward Language Agent Fine-tuning}, 

      author={Baian Chen and Chang Shu and Ehsan Shareghi and Nigel Collier and Karthik Narasimhan and Shunyu Yao},

      year={2023},

      eprint={2310.05915},

      archivePrefix={arXiv},

      primaryClass={cs.CL}

}

```

## Overview

- Define tools in `tools/`

- Define tasks in `tasks/`

- Collect data & run experiments via `generation.py` 

- Results will be saved in `trajs/`

## Data & Prompts

- Data to generate training data and run experiments in `data/`. We also include samples of training data for both Alpaca format and GPT format. See details [here](https://github.com/anchen1011/FireAct/tree/main/data).

- Prompts to generate training data and run experiments in `prompts/`

## Setup

Set up OpenAI API key and store in environment variable  (see [here](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety))

```

export OPENAI_API_KEY=

```

 

Set up SERP API key and store in environment variable (see [here](https://serpapi.com))

```

export SERPAPI_API_KEY=

```

Create virtual env, for example with conda

```

conda create -n fireact python=3.9

conda activate fireact

```

Clone this repo and install dependencies

```

git clone https://github.com/anchen1011/FireAct.git

pip install -r requirements.txt

```

## Run Demo

#### Data Generation

Example:

```

python generation.py \

    --task hotpotqa \

    --backend gpt-4 \

    --promptpath default \

    --evaluate \

    --random \

    --task_split val \

    --temperature 0 \

    --task_end_index 5

```

See details with command `python generation.py -h`

You need to set a high number (thousands) of `--task_end_index` to get sufficient good data samples. **[WARNING] This is costly with gpt-4 and serpapi.**

You need to convert trajectories into [alpaca format](https://github.com/tatsu-lab/stanford_alpaca#data-release) or [gpt format](https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset) for training. See our examples [here](https://github.com/anchen1011/FireAct/tree/main/data/finetune).

#### Supervised Fine-tuning

Example:

```

cd finetune/llama_lora

python finetune.py \

    --base_model meta-llama/Llama-2-13b-chat-hf \

    --data_path ../../data/finetune/alpaca_format/hotpotqa.json \

    --micro_batch_size 8 \

    --num_epochs 30 \

    --output_dir ../../models/lora/fireact-llama-2-13b \

    --val_set_size 0.01 \

    --cutoff_len 512 \

```

See details [here](https://github.com/anchen1011/FireAct/tree/main/finetune).

#### Inference

Example (FireAct Llama):

```

python generation.py \

    --task hotpotqa \

    --backend llama \

    --evaluate \

    --random \

    --task_split dev \

    --task_end_index 5 \

    --modelpath meta-llama/Llama-2-7b-chat \

    --add_lora \

    --alpaca_format \

    --peftpath forestai/fireact_llama_2_7b_lora 

```

Example (FireAct GPT):

```

python generation.py \

    --task hotpotqa \

    --backend ft:gpt-3.5-turbo-0613: \

    --evaluate \

    --random \

    --task_split dev \

    --temperature 0 \

    --chatgpt_format \

    --task_end_index 5

```

See details with command `python generation.py -h`

Set `--task_end_index 500` for quantitative evaluations. See our examples [here](https://github.com/anchen1011/FireAct/tree/main/trajs).

## Model Zoo

We release a selected set of multitask models based on Llama family. Details can be found in their model cards. 

| Base Model    | Training Method | Hugging Face                                               |

|---------------|-----------------|------------------------------------------------------------|

| Llama2-7B     | LoRA            | [forestai/fireact\_llama\_2\_7b\_lora](https://huggingface.co/forestai/fireact_llama_2_7b_lora)    |

| Llama2-13B    | LoRA            | [forestai/fireact\_llama\_2\_13b\_lora](https://huggingface.co/forestai/fireact_llama_2_13b_lora)   |

| CodeLlama-7B  | LoRA            | [forestai/fireact\_codellama\_7b\_lora](https://huggingface.co/forestai/fireact\_codellama\_7b\_lora)  |

| CodeLlama-13B | LoRA            | [forestai/fireact\_codellama\_13b\_lora](https://huggingface.co/forestai/fireact\_codellama\_13b\_lora) |

| CodeLlama-34B | LoRA            | [forestai/fireact\_codellama\_34b\_lora](https://huggingface.co/forestai/fireact\_codellama\_34b\_lora) |

| Llama2-7B     | Full Model      | [forestai/fireact\_llama\_2\_7b](https://huggingface.co/forestai/fireact_llama_2_7b)         |

## References

1. Our generation code is based on [ysymyth/ReAct](https://github.com/ysymyth/ReAct)

2. Our Llama full model training code is based on [tatsu-lab/stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca)

3. Our Llama LoRA training code is based on [tloen/alpaca-lora](https://github.com/tloen/alpaca-lora)

4. Our GPT fine-tuning code is based on [anchen1011/chatgpt-finetune-ui](https://github.com/anchen1011/chatgpt-finetune-ui/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anchen1011/fireact

Awesome Lists containing this project

README