Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rahulunair/simpsons_llm_xpu
Finetune an LLM on intel discrete GPUs to generate dialogues based on the simpsons dataset
https://github.com/rahulunair/simpsons_llm_xpu
huggingface-transformers intel-arc intel-gpu intel-gpu-max ipex llm-inference llm-training lora pytorch
Last synced: 24 days ago
JSON representation
Finetune an LLM on intel discrete GPUs to generate dialogues based on the simpsons dataset
- Host: GitHub
- URL: https://github.com/rahulunair/simpsons_llm_xpu
- Owner: rahulunair
- License: apache-2.0
- Created: 2023-06-27T03:11:15.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-08-03T02:27:21.000Z (over 1 year ago)
- Last Synced: 2025-01-11T08:59:01.627Z (27 days ago)
- Topics: huggingface-transformers, intel-arc, intel-gpu, intel-gpu-max, ipex, llm-inference, llm-training, lora, pytorch
- Language: Python
- Homepage:
- Size: 168 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome_ai_agents - Simpsons_Llm_Xpu - Finetune an LLM on intel discrete GPUs to generate dialogues based on the simpsons dataset (Building / Datasets)
- awesome_ai_agents - Simpsons_Llm_Xpu - Finetune an LLM on intel discrete GPUs to generate dialogues based on the simpsons dataset (Building / Datasets)
README
## Simpson's LLM on XPUs
Welcome to the 'Simpson's LLM XPU' repository where we finetune a Language Model (LLM) on Intel discrete GPUs to generate dialogues based on the 'Simpsons' dataset.
The implementation leverages the original idea and exceptional work done by Replicate for the dataset prep. In case the [Replicate link](https://replicate.com/blog/fine-tune-llama-to-speak-like-homer-simpson) is unavailable, please refer to my forked [version](https://github.com/rahulunair/homerbot_errata) for guidelines on preparing the dataset. The preparation steps are laid out simply in a Jupyter notebook.
After the data is generated, copy the data.json file to `utils` in the repo and run `python rename_data_keys.py` to get `isdata.json`, which is our dataset on which finetuning will be done.
### Getting Started
To utilize this code, start by preparing the dataset as suggested in the Replicate blog.
### Finetuning using standard PyTorch Training loop
#### Full model finetuning
```bash
python finetune_no_trainer.py
```#### Fine tuning with LoRA
```bash
python finetune_no_trainer_lora.py
```### Finetuning with Trainer API of Transformers
**Note** - For this, you will have to clone a patched version for Transformers from this [repo](https://github.com/rahulunair/transformers_xpu) and install it manually using
```bash
git clone https://github.com/rahulunair/transformers_xpu
cd transformers_xpu
git checkout xpu_trainer
python setup.py install
cd .. && rm -rf transformers_xpu
```#### For a Single XPU Device:
```bash
python finetune.py
````#### For a Multi-XPU Configuration (Multiple dGPUs) using oneCCL:
Regarding oneCCL, Intel oneAPI Collective Communications Library (oneCCL) is a library that provides routines needed for communication between devices in distributed systems. These routines are built with a focus on performance and provide efficient inter-node and intra-node communication, making them suitable for multi-node, multi-core CPUs, and accelerators. We use PyTorch bindings for oneCCL (`torch_ccl`) to do distributed training. We can install `torch_ccl` by using prebuilt weels from [here](https://github.com/intel/torch-ccl#install-prebuilt-wheel).
As we are using HuggingFace Trainer* object, we don't have to change the code in anyway, but execute the code using `mpi`.
First, set up the oneCCL environment variables by executing:
```bash
oneccl_path=$(python -c "from oneccl_bindings_for_pytorch import cwd; print(cwd)")
source $oneccl_path/env/setvars.sh
```Then set these environment variables for MPI:
```bash
export MASTER_ADDR=127.0.0.1
export CCL_ZE_IPC_EXCHANGE=sockets
export FI_PROVIDER=sockets
```Then, execute the following command to initiate the finetuning process across multiple XPUs:
```bash
mpirun -n 4 python finetune.py # uses 4 Intel Data Center GPU Max 1550
```
![image](https://github.com/rahulunair/simpsons_llm_xpu/assets/786476/93574ca5-3077-4807-99ce-724afd481885)To debug oneccl backend, use this env variable before executing `mpirun`:
```bash
export CCL_LOG_LEVEL=debug
```I have also provided a small standalone program in the `utils` directory to check if your setup for distributed communication works correctly. To run it, use:
```bash
mpirun -n 2 utils/oneccl_test.py # usig 2 processes.
```### Post Finetuning:
Once the finetuning is complete, you can test the model with the following command:
```bash
python inference.py --infer
```### Literate Version of Finetuning
To get a better understanding of the Low-rank Option for finetuning Transformers (LORΛ) and the finetuning approach, I have added a literate version of the finetune.py file as a Jupyter notebook - literate_finetune.ipynb. This version provides detailed explanations of each step and includes code snippets to provide a comprehensive understanding of the finetuning process.
By going through this literate version, I hope that you can gain insights into the workings of LORΛ, how it interacts with the training process, and how you can utilize Intel GPUs for efficient finetuning. This is especially beneficial for practitioners new to language model finetuning, or those looking to gain a deeper understanding of the process.
Happy Finetuning!
*. we use a forked version of huggingface transformers, it can be found [here](https://github.com/rahulunair/transformers_xpu).