https://github.com/lapp0/distily

Distily: Language Model Distillation Toolkit and Library
https://github.com/lapp0/distily

bitnet distillation knowledge-distillation language-model transformer

Last synced: 4 months ago
JSON representation

Distily: Language Model Distillation Toolkit and Library

Host: GitHub
URL: https://github.com/lapp0/distily
Owner: lapp0
License: agpl-3.0
Created: 2024-08-03T22:30:33.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-09-25T11:07:25.000Z (about 1 year ago)
Last Synced: 2025-04-05T15:45:57.739Z (6 months ago)
Topics: bitnet, distillation, knowledge-distillation, language-model, transformer
Language: Python
Homepage:
Size: 342 KB
Stars: 7
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

README

          # Distily

#### In one command, distill an existing LLM into a smaller or different architecture.

## Install

```

pip install -U "git+https://github.com/lapp0/distily.git"

```

## Features

Distily allows you to distill a model with

- Quantized weights: e.g. TriLM, bitnet

- Distinct architecture: State-Space models such as Mamba, Mixture-of-Experts (MoE)

- Modified architecture: Decrease (or increase) the

  - number of layers

  - width and depth of attention heads and dense layer.

  - the number of attention and KV heads.

## Usage

**Minimal Example: `distily_gpt2`**

Command to create a distilled `gpt2` with only 6 layers:

```

python3 -m distily.run \

    --teacher_model_name_or_path gpt2 \

    --output_dir distily_gpt2 \

    --hub_model_id "distily/distily_gpt2" \

    --push_to_hub True \

    --student_model_config {"n_layers": 6} \

    --student_model_as_bitnet True

```

The [Resulting `distily_gpt2` Model](https://huggingface.co/distily/distily_gpt2) has (TODO: explain metrics).

For more examples, review the [Examples](./docs/examples.md) documentation.

#### Note on Hub Credentials

To push to hub, you must prepare your hub token

```

HF_WRITE= python3 -c "from huggingface_hub.hf_api import HfFolder; HfFolder.save_token('${HF_WRITE}')"

```

## Further Reading

TODO: commit the linked docs once complete

**Using Distily**

- How Distillation Works: [The Distily Recipe](./docs/recipe.md)

- [Quickstart / Examples](./docs/using.md)

- [Parameter Selection](./docs/params.md)

**Available Models**

- [Official Distily Models](./docs/official_models.md)

- [All HF Models Created With Distily](https://huggingface.co/models?library=Distily)

**Contributing**

- [Contributing Guidelines](./docs/contributing.md)

## Roadmap

#### Improved performance / sampling efficiency:

- [X] Standard knowledge distillation using logits.

- [x] Distill using intermediate features including hidden states and attentions.

- [ ] Implement [Value-Transfer](https://arxiv.org/pdf/2002.10957) (simply distillation loss on v of q,k,v)

- [ ] Improve sampling efficiency through synthetic data generation.

- [ ] Implement cross-entropy classification loss (traditional LLM loss function)

- [ ] Apply projector to logits (https://arxiv.org/pdf/2310.17183)

- [ ] Apply "teacher recording", run teacher inference once, use features dataset any number of times.

#### Distill to a different model shape / size:

- [x] Distill to model with fewer `num_hidden_layers` by implementing layer mappers.

- [x] Distill to a model with modified module dimensions and behaviors (e.g., `intermediate_size`, `hidden_act`) by employing projectors.

- [x] Distill to a model with modified `num_attention_heads` and `num_key_value_heads`.

#### Distill to a different architecture:

- [x] Distill to Bitnet (b1.58)

- [ ] Distill to State-Space / Mamba

- [ ] Distill to MoE

- [ ] Distill to Parameter Sharing (ALBERT-style) Model

#### Additional Techniques:

- [ ] [Distill from multiple models at once](https://arxiv.org/pdf/2106.01023)

- [ ] [Pruning](https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lapp0/distily

Awesome Lists containing this project

README