https://github.com/stochasticai/xTuring

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
https://github.com/stochasticai/xTuring

adapter alpaca deep-learning fine-tuning finetuning gen-ai generative-ai gpt-2 gpt-j language-model llama llm lora mistral mixed-precision peft quantization

Last synced: 10 months ago
JSON representation

Host: GitHub
URL: https://github.com/stochasticai/xTuring
Owner: stochasticai
License: apache-2.0
Created: 2023-03-19T15:59:38.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-09-23T09:40:48.000Z (over 1 year ago)
Last Synced: 2024-10-29T15:18:01.868Z (about 1 year ago)
Topics: adapter, alpaca, deep-learning, fine-tuning, finetuning, gen-ai, generative-ai, gpt-2, gpt-j, language-model, llama, llm, lora, mistral, mixed-precision, peft, quantization
Language: Python
Homepage: https://xturing.stochastic.ai
Size: 18.4 MB
Stars: 2,603
Watchers: 33
Forks: 206
Open Issues: 16
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - stochasticai/xTuring - J，Galactica等。通过提供一个易于使用的界面来微调LLM到您自己的数据和应用程序，xTuring使构建，自定义和控制LLM变得简单。整个过程可以在您的计算机内部或私有云中完成，确保数据隐私和安全。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
awesome - stochasticai/xTuring - Build, personalize and control your own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6 (Python)

README

          


  

  



Build, modify, and control your own personalized LLMs




  

    

  

  

    

  

  

    

  






___

`xTuring` provides fast, efficient and simple fine-tuning of open-source LLMs, such as Mistral, LLaMA, GPT-J, and more.

By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it

simple to build, modify, and control LLMs. The entire process can be done inside your computer or in your

private cloud, ensuring data privacy and security.

With `xTuring` you can,

- Ingest data from different sources and preprocess them to a format LLMs can understand

- Scale from single to multiple GPUs for faster fine-tuning

- Leverage memory-efficient methods (i.e. INT4, LoRA fine-tuning) to reduce hardware costs by up to 90%

- Explore different fine-tuning methods and benchmark them to find the best performing model

- Evaluate fine-tuned models on well-defined metrics for in-depth analysis




## ⚙️ Installation

```bash

pip install xturing

```




## 🚀 Quickstart

```python

from xturing.datasets import InstructionDataset

from xturing.models import BaseModel

# Load the dataset

instruction_dataset = InstructionDataset("./examples/models/llama/alpaca_data")

# Initialize the model

model = BaseModel.create("llama_lora")

# Finetune the model

model.finetune(dataset=instruction_dataset)

# Perform inference

output = model.generate(texts=["Why LLM models are becoming so important?"])

print("Generated output by the model: {}".format(output))

```

You can find the data folder [here](examples/models/llama/alpaca_data).




## 🌟 What's new?

We are excited to announce the latest enhancements to our `xTuring` library:

1. __`LLaMA 2` integration__ - You can use and fine-tune the _`LLaMA 2`_ model in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_ and _LoRA fine-tuning with INT4 precision_ using the `GenericModel` wrapper and/or you can use the `Llama2` class from `xturing.models` to test and finetune the model.

```python

from xturing.models import Llama2

model = Llama2()

## or

from xturing.models import BaseModel

model = BaseModel.create('llama2')

```

2. __`Evaluation`__ - Now you can evaluate any `Causal Language Model` on any dataset. The metrics currently supported is [`perplexity`](https://en.wikipedia.org/wiki/Perplexity).

```python

# Make the necessary imports

from xturing.datasets import InstructionDataset

from xturing.models import BaseModel

# Load the desired dataset

dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model

model = BaseModel.create('gpt2')

# Run the Evaluation of the model on the dataset

result = model.evaluate(dataset)

# Print the result

print(f"Perplexity of the evalution: {result}")

```

3. __`INT4` Precision__ - You can now use and fine-tune any LLM with `INT4 Precision` using `GenericLoraKbitModel`.

```python

# Make the necessary imports

from xturing.datasets import InstructionDataset

from xturing.models import GenericLoraKbitModel

# Load the desired dataset

dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model for INT4 bit fine-tuning

model = GenericLoraKbitModel('tiiuae/falcon-7b')

# Run the fine-tuning

model.finetune(dataset)

```

4. __CPU inference__ - The CPU, including laptop CPUs, is now fully equipped to handle LLM inference. We integrated [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md) and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.

```python

# Make the necessary imports

from xturing.models import BaseModel

# Initializes the model: quantize the model with weight-only algorithms

# and replace the linear with Itrex's qbits_linear kernel

model = BaseModel.create("llama2_int8")

# Once the model has been quantized, do inferences directly

output = model.generate(texts=["Why LLM models are becoming so important?"])

print(output)

```

5. __Batch integration__ - By tweaking the 'batch_size' in the .generate() and .evaluate() functions, you can expedite results. Using a 'batch_size' greater than 1 typically enhances processing efficiency.

```python

# Make the necessary imports

from xturing.datasets import InstructionDataset

from xturing.models import GenericLoraKbitModel

# Load the desired dataset

dataset = InstructionDataset('../llama/alpaca_data')

# Load the desired model for INT4 bit fine-tuning

model = GenericLoraKbitModel('tiiuae/falcon-7b')

# Generate outputs on desired prompts

outputs = model.generate(dataset = dataset, batch_size=10)

```

An exploration of the [Llama LoRA INT4 working example](examples/features/int4_finetuning/LLaMA_lora_int4.ipynb) is recommended for an understanding of its application.

For an extended insight, consider examining the [GenericModel working example](examples/features/generic/generic_model.py) available in the repository.




## CLI playground



```bash

$ xturing chat -m ""

```

## UI playground



```python

from xturing.datasets import InstructionDataset

from xturing.models import BaseModel

from xturing.ui import Playground

dataset = InstructionDataset("./alpaca_data")

model = BaseModel.create("")

model.finetune(dataset=dataset)

model.save("llama_lora_finetuned")

Playground().launch() ## launches localhost UI

```




## 📚 Tutorials

- [Preparing your dataset](examples/datasets/preparing_your_dataset.py)

- [Cerebras-GPT fine-tuning with LoRA and INT8](examples/models/cerebras/cerebras_lora_int8.ipynb)   [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1eKq3oF7dnK8KuIfsTE70Gvvniwr1O9D0?usp=sharing)

- [Cerebras-GPT fine-tuning with LoRA](examples/models/cerebras/cerebras_lora.ipynb)   [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1VjqQhstm5pT4EjPjx4Je7b3W2X1V3vDo?usp=sharing)

- [LLaMA fine-tuning with LoRA and INT8](examples/models/llama/llama_lora_int8.py)   [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SQUXq1AMZPSLD4mk3A3swUIc6Y2dclme?usp=sharing)

- [LLaMA fine-tuning with LoRA](examples/models/llama/llama_lora.py)

- [LLaMA fine-tuning](examples/models/llama/llama.py)

- [GPT-J fine-tuning with LoRA and INT8](examples/models/gptj/gptj_lora_int8.py)   [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hB_8s1V9K4IzifmlmN2AovGEJzTB1c7e?usp=sharing)

- [GPT-J fine-tuning with LoRA](examples/models/gptj/gptj_lora.py)

- [GPT-2 fine-tuning with LoRA](examples/models/gpt2/gpt2_lora.py)   [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/1Sh-ocNpKn9pS7jv6oBb_Q8DitFyj1avL/view?usp=sharing)




## 📊 Performance

Here is a comparison for the performance of different fine-tuning techniques on the LLaMA 7B model. We use the [Alpaca dataset](examples/models/llama/alpaca_data/) for fine-tuning. The dataset contains 52K instructions.

Hardware:

4xA100 40GB GPU, 335GB CPU RAM

Fine-tuning parameters:

```javascript

{

  'maximum sequence length': 512,

  'batch size': 1,

}

```

|      LLaMA-7B      | DeepSpeed + CPU Offloading | LoRA + DeepSpeed  | LoRA + DeepSpeed + CPU Offloading |

| :---------: | :----: | :----: | :----: |

| GPU | 33.5 GB | 23.7 GB | 21.9 GB |

| CPU | 190 GB  | 10.2 GB | 14.9 GB |

| Time/epoch | 21 hours  | 20 mins | 20 mins |

Contribute to this by submitting your performance results on other GPUs by creating an issue with your hardware specifications, memory consumption and time per epoch.




## 📎 Fine-tuned model checkpoints

We have already fine-tuned some models that you can use as your base or start playing with.

Here is how you would load them:

```python

from xturing.models import BaseModel

model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")

```

| model               | dataset | Path          |

|---------------------|--------|---------------|

| DistilGPT-2 LoRA | alpaca | `x/distilgpt2_lora_finetuned_alpaca` |

| LLaMA LoRA          | alpaca | `x/llama_lora_finetuned_alpaca` |




## Supported Models

Below is a list of all the supported models via `BaseModel` class of `xTuring` and their corresponding keys to load them.

|  Model |  Key |

| -- | -- |

|Bloom | bloom|

|Cerebras | cerebras|

|DistilGPT-2 | distilgpt2|

|Falcon-7B | falcon|

|Galactica | galactica|

|GPT-J | gptj|

|GPT-2 | gpt2|

|LlaMA | llama|

|LlaMA2 | llama2|

|OPT-1.3B | opt|

The above mentioned are the base variants of the LLMs. Below are the templates to get their `LoRA`, `INT8`, `INT8 + LoRA` and `INT4 + LoRA` versions.

| Version | Template |

| -- | -- |

| LoRA|  _lora|

| INT8|  _int8|

| INT8 + LoRA|  _lora_int8|

** In order to load any model's __`INT4+LoRA`__ version, you will need to make use of `GenericLoraKbitModel` class from `xturing.models`. Below is how to use it:

```python

model = GenericLoraKbitModel('')

```

The `model_path` can be replaced with you local directory or any HuggingFace library model like `facebook/opt-1.3b`.

## 📈 Roadmap

- [x] Support for `LLaMA`, `GPT-J`, `GPT-2`, `OPT`, `Cerebras-GPT`, `Galactica` and `Bloom` models

- [x] Dataset generation using self-instruction

- [x] Low-precision LoRA fine-tuning and unsupervised fine-tuning

- [x] INT8 low-precision fine-tuning support

- [x] OpenAI, Cohere and AI21 Studio model APIs for dataset generation

- [x] Added fine-tuned checkpoints for some models to the hub

- [x] INT4 LLaMA LoRA fine-tuning demo

- [x] INT4 LLaMA LoRA fine-tuning with INT4 generation

- [x] Support for a `Generic model` wrapper

- [x] Support for `Falcon-7B` model

- [x] INT4 low-precision fine-tuning support

- [x] Evaluation of LLM models

- [ ] INT3, INT2, INT1 low-precision fine-tuning support

- [ ] Support for Stable Diffusion




## 🤝 Help and Support

If you have any questions, you can create an issue on this repository.

You can also join our [Discord server](https://discord.gg/TgHXuSJEk6) and start a discussion in the `#xturing` channel.




## 📝 License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.




## 🌎 Contributing

As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our [contributing guide](CONTRIBUTING.md) to learn how you can get involved.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/stochasticai/xTuring

Awesome Lists containing this project

README

Build, modify, and control your own personalized LLMs