https://github.com/raumberg/myllm

LLM Training Framework
https://github.com/raumberg/myllm

deep-neural-networks deepspeed framework huggingface huggingface-transformers llm llm-training python reinforcement-learning torch

Last synced: 3 months ago
JSON representation

LLM Training Framework

Host: GitHub
URL: https://github.com/raumberg/myllm
Owner: Raumberg
License: apache-2.0
Created: 2025-02-03T08:01:21.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-04-21T09:58:06.000Z (6 months ago)
Last Synced: 2025-04-21T10:51:22.515Z (6 months ago)
Topics: deep-neural-networks, deepspeed, framework, huggingface, huggingface-transformers, llm, llm-training, python, reinforcement-learning, torch
Language: Python
Homepage:
Size: 1.43 MB
Stars: 12
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          


  

        MyLLM

    

    

        

    

    


    


    

        

        


        

        

        


    



# LLM Framework | Toolkit for various training stages

Initially derived from [Effective LLM Alignment](https://github.com/VikhrModels/effective_llm_alignment/) by VikhrModels.  

Many credits goes to the Vikhr Team.

## 🚀 [Methods and Stages supported]:

- Supervised Finetuning (Full/LoRa/QLoRa)

- Distillation (KL Divergence, MSE, Cosine and others)

- Reinforcement Learning (GRPO, DPO, PPO)

- Adapters merging

- Tokenizer extensions

## 🛠️ [Technical details]:

- Built on top of PyTorch, Transformers, TRL, Peft. No 'magic' libraries like unsloth.

- Distributed training via Accelerate, FSDP and DeepSpeed (Stage 2, 3).

- Acceleration with vLLM, FlashAttn, Liger Kernels and fusion.

- Logging options: wandb, clearml

- Convenient config management using TOML

## 🧠 [Training an LLM]

- Everything is available from the root (MyLLM) folder. 

- What you need to do is start any desired script using accelerate:  

```bash

# ~/../myllm >

accelerate launch --config_file   

# example SFT:

accelerate launch --config_file configs/accelerate/stage3_config.yaml src/train/sft.py configs/train/sft/full-sft-watari.toml

# example GRPO:

accelerate launch src/train/grpo.py configs/train/grpo/rl-grpo-zariman-no-vllm.toml

```  

- Example launching GRPO with VLLM support:

```bash

> CUDA_VISIBLE_DEVICES=1 trl vllm-serve --model  --tensor_parallel_size 1 --max_model_len 4096

> CUDA_VISIBLE_DEVICES=0 accelerate launch src/train/grpo.py configs/train/grpo/.toml 

```

   

> **⚠️ Disclaimer:**  

> GRPO scripts can be unstable, the work is still going on. If you encounter any errors, please, open an Issue.  

## 📟 [Useful scripts]:

The folder `myllm/src/helpers` contains useful scripts that you can utilize for your models:

- Merge your LoRA adapters with original model using `adapters.py` by:

```bash

cd myllm/src/helpers

python adapters.py merge --source ../../models/attn-signs-watari-32/checkpoint-5500/ --output ../../models/attn-signs-watari-32/watari-32-merged --dtype bf16

```

- Extend model tokenizer by using `tokenizer.py`

# Latest changes:

- Added lora-sft-watari-32-stage-n.toml training configs from [Attention Signs HuggingFace Page](https://huggingface.co/attn-signs/Watari-32b-v0)

- Added new [fusion] toml group for fused kernels. Example:  

```toml

[fusion]

use_liger = true

patch_dyntanh = true # Nightly function, may be unstable

```

- Added new modules: `stdout` and `data_processors` and `liger`.

    - **stdout:** print your model config, script arguments and training config in table. Example:

    ```

        Model Inspection:

    +----------------------+----------------------------+

    | Config key           | Config value               |

    +======================+============================+

    | Model Architecture   | Qwen2ForCausalLM           |

    +----------------------+----------------------------+

    | Total Parameters     | 0                          |

    +----------------------+----------------------------+

    | Trainable Parameters | 0                          |

    +----------------------+----------------------------+

    | Dtype                | torch.bfloat16             |

    +----------------------+----------------------------+

    | Device               | cuda:0                     |

    +----------------------+----------------------------+

    | Tokenizer Vocab Size | 147200                     |

    +----------------------+----------------------------+

    | Model Embedding Size | 0                          |

    +----------------------+----------------------------+

    | Padding Token        | <|endoftext|> (ID: 147075) |

    +----------------------+----------------------------+

    | EOS Token            | <|im_end|> (ID: 147077)    |

    +----------------------+----------------------------+

    | Max Sequence Length  | 32768                      |

    +----------------------+----------------------------+

    | Architecture         | Qwen2ForCausalLM           |

    +----------------------+----------------------------+

    | Hidden Size          | 5120                       |

    +----------------------+----------------------------+

    | Attention Heads      | 40                         |

    +----------------------+----------------------------+

    ```

    - **data_processors**: moved all tokenizer processing functions to a separate module. Added support for default processing and history processing.

    - **liger**: moved all liger kernels to a separate module

- `resume_from` now is a **boolean flag** instead of a string representing a model path. When providing `resume_from=true`, the initial model_name_or_path should be the path to your local checkpoint.

- Added `construct_history` **boolean flag** that constructs history out of the dataset. If `construct_history=false`, the script will use `default_row_processor` function. 

Overall, the training scripts are becoming more easy to read and user-friendly, outsourcing difficult tasks under the hood.

# Nightly | Development functions:

- Added `fusion` in which native / custom CUDA/Triton kernels will be developed

- Added Fused Dynamic Tanh kernel, Torch Interface and patching function.  

What is **Dynamic Tanh**?

Dynamic Tanh is the most recent discovery by Meta, attempting to replace LayerNorm with Tanh() to speed up training stages and minimize total parameters.  

DynTanh is a novel approach, thus, can be unstable (until we release the final version). At the same time it is also a debatable method for now.  

```

Based on arxiv paper: https://www.alphaxiv.org/abs/2503.10622

Based on authors code: https://github.com/jiachenzhu/DyT/tree/main

```

> [!IMPORTANT]

> Thank you for your interest in MyLLM! We look forward to your contributions and feedback! 🚀

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/raumberg/myllm

Awesome Lists containing this project

README

MyLLM