{"id":20683905,"url":"https://github.com/tudb-labs/moe-peft","last_synced_at":"2025-04-05T03:02:42.278Z","repository":{"id":253384336,"uuid":"843338604","full_name":"TUDB-Labs/MoE-PEFT","owner":"TUDB-Labs","description":"An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT","archived":false,"fork":false,"pushed_at":"2025-03-11T06:29:17.000Z","size":7530,"stargazers_count":82,"open_issues_count":4,"forks_count":10,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-29T02:08:29.079Z","etag":null,"topics":["mixlora","mlora","peft","peft-fine-tuning-llm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TUDB-Labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-16T09:47:15.000Z","updated_at":"2025-03-28T03:17:41.000Z","dependencies_parsed_at":"2024-09-12T12:30:46.343Z","dependency_job_id":"ef293d75-9f0b-40e6-9d13-3324e3308164","html_url":"https://github.com/TUDB-Labs/MoE-PEFT","commit_stats":{"total_commits":409,"total_committers":10,"mean_commits":40.9,"dds":0.3251833740831296,"last_synced_commit":"2b6955e090b2bb4b93bdb87c69261c0023aa51a9"},"previous_names":["tudb-labs/mole-factory","tudb-labs/moe-peft"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUDB-Labs%2FMoE-PEFT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUDB-Labs%2FMoE-PEFT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUDB-Labs%2FMoE-PEFT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUDB-Labs%2FMoE-PEFT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TUDB-Labs","download_url":"https://codeload.github.com/TUDB-Labs/MoE-PEFT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247280190,"owners_count":20912967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mixlora","mlora","peft","peft-fine-tuning-llm"],"created_at":"2024-11-16T22:18:21.499Z","updated_at":"2025-04-05T03:02:42.244Z","avatar_url":"https://github.com/TUDB-Labs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MoE-PEFT: An Efficient LLM Fine-Tuning Factory for Mixture of Expert (MoE) Parameter-Efficient Fine-Tuning.\n[![](https://github.com/TUDB-Labs/MoE-PEFT/actions/workflows/python-test.yml/badge.svg)](https://github.com/TUDB-Labs/MoE-PEFT/actions/workflows/python-test.yml)\n[![](https://img.shields.io/github/stars/TUDB-Labs/MoE-PEFT?logo=GitHub\u0026style=flat)](https://github.com/TUDB-Labs/MoE-PEFT/stargazers)\n[![](https://img.shields.io/github/v/release/TUDB-Labs/MoE-PEFT?logo=Github)](https://github.com/TUDB-Labs/MoE-PEFT/releases/latest)\n[![](https://img.shields.io/pypi/v/moe_peft?logo=pypi)](https://pypi.org/project/moe_peft/)\n[![](https://img.shields.io/docker/v/mikecovlee/moe_peft?logo=Docker\u0026label=docker)](https://hub.docker.com/r/mikecovlee/moe_peft/tags)\n[![](https://img.shields.io/github/license/TUDB-Labs/MoE-PEFT)](http://www.apache.org/licenses/LICENSE-2.0)\n\nMoE-PEFT is an open-source *LLMOps* framework built on [m-LoRA](https://github.com/TUDB-Labs/mLoRA). It is designed for high-throughput fine-tuning, evaluation, and inference of Large Language Models (LLMs) using techniques such as MoE + Others (like LoRA, DoRA). Key features of MoE-PEFT include:\n\n- Concurrent fine-tuning, evaluation, and inference of multiple adapters with a shared pre-trained model.\n\n- **MoE PEFT** optimization, mainly for [MixLoRA](https://github.com/TUDB-Labs/MixLoRA) and other MoLE implementation.\n\n- Support for multiple PEFT algorithms and various pre-trained models.\n\n- Seamless integration with the [HuggingFace](https://huggingface.co) ecosystem.\n\nYou can try MoE-PEFT with [Google Colab](https://colab.research.google.com/github/TUDB-Labs/MoE-PEFT/blob/main/misc/finetune-demo.ipynb) before local installation.\n\n## Supported Platform\n\n| OS      | Executor | Model Precision        | Quantization  | Flash Attention |\n|---------|---------|------------------------|---------------|-----------------|\n| Linux   | CUDA    | FP32, FP16, TF32, BF16 | 8bit and 4bit | \u0026check;         |\n| Windows | CUDA    | FP32, FP16, TF32, BF16 | 8bit and 4bit | -               |\n| macOS   | MPS     | FP32, FP16, BF16       | \u0026cross;       | \u0026cross;         |\n| All     | CPU     | FP32, FP16, BF16       | \u0026cross;       | \u0026cross;         |\n\nYou can use the `MOE_PEFT_EXECUTOR_TYPE` environment variable to force MoE-PEFT to use a specific executor. For example, if you want MoE-PEFT to run only on CPU, you can set `MOE_PEFT_EXECUTOR_TYPE=CPU` before importing `moe_peft`.\n\n## Supported Pre-trained Models\n\n|         | Model                                            | Model Size  |\n|---------|--------------------------------------------------|-------------|\n| \u0026check; | [LLaMA 1/2](https://huggingface.co/meta-llama)   | 7B/13B/70B  |\n| \u0026check; | [LLaMA 3.x](https://huggingface.co/meta-llama)   | 3B/8B/70B   |\n| \u0026check; | [Yi 1/1.5](https://huggingface.co/01-ai)         | 6B/9B/34B   |\n| \u0026check; | [TinyLLaMA](https://huggingface.co/TinyLlama)    | 1.1B        |\n| \u0026check; | [Qwen 1.5/2.x](https://huggingface.co/Qwen)      | 0.5B ~ 72B  |\n| \u0026check; | [Gemma](https://huggingface.co/google)           | 2B/7B       |\n| \u0026check; | [Gemma 2](https://huggingface.co/google)         | 9B/27B      |\n| \u0026check; | [Mistral](https://huggingface.co/mistralai)      | 7B          |\n| \u0026check; | [Phi 1.5/2](https://huggingface.co/microsoft)    | 2.7B        |\n| \u0026check; | [Phi 3.x/4](https://huggingface.co/microsoft)    | 3.8B/7B/14B |\n| \u0026check; | [ChatGLM 1/2/3](https://huggingface.co/THUDM)    | 6B          |\n| \u0026check; | [GLM 4](https://huggingface.co/THUDM)            | 6B          |\n\n\n## Supported PEFT Methods\n\n|         | PEFT Methods                                             | Arguments*                                                |\n|---------|----------------------------------------------------------|-----------------------------------------------------------|\n| \u0026check; | [MoLA](https://arxiv.org/abs/2402.08562)                 | `\"routing_strategy\": \"mola\", \"num_experts\": 8`            |\n| \u0026check; | [LoRAMoE](https://arxiv.org/abs/2312.09979)              | `\"routing_strategy\": \"loramoe\", \"num_experts\": 8`         |\n| \u0026check; | [MixLoRA](https://arxiv.org/abs/2404.15159)              | `\"routing_strategy\": \"mixlora\", \"num_experts\": 8`         |\n| \u0026check; | [LoRA](https://arxiv.org/abs/2106.09685)                 | `\"r\": 8, \"lora_alpha\": 16, \"lora_dropout\": 0.05`          |\n| \u0026check; | [QLoRA](https://arxiv.org/abs/2402.12354)                | See *Quantize Methods*                                    |\n| \u0026check; | [LoRA+](https://arxiv.org/abs/2402.12354)                | `\"loraplus_lr_ratio\": 20.0`                               |\n| \u0026check; | [DoRA](https://arxiv.org/abs/2402.09353)                 | `\"use_dora\": true`                                        |\n| \u0026check; | [rsLoRA](https://arxiv.org/abs/2312.03732)               | `\"use_rslora\": true`                                      |\n\n*: Arguments of configuration file\n\n### Notice of PEFT supports\n1. MoE-PEFT supports specific optimized operators for these PEFT methods, which can effectively improve the computing performance during training, evaluation and inference. However, these operators may cause a certain degree of accuracy loss (less than 5%). You can disable the optimized operators by defining the `MOE_PEFT_EVALUATE_MODE` environment variable in advance.\n2. Auxiliary Loss is not currently supported for MoE PEFT methods other than MixLoRA.\n3. You can check detailed arguments of MixLoRA in [TUDB-Labs/MixLoRA](https://github.com/TUDB-Labs/MixLoRA).\n\n## Supported Attention Methods\n\n|         | Attention Methods                                            | Name           | Arguments*               |\n|---------|--------------------------------------------------------------|----------------|--------------------------|\n| \u0026check; | [Scaled Dot Product](https://arxiv.org/abs/1706.03762)       | `\"eager\"`      | `--attn_impl eager`      |\n| \u0026check; | [Flash Attention 2](https://arxiv.org/abs/2307.08691)        | `\"flash_attn\"` | `--attn_impl flash_attn` |\n| \u0026check; | [Sliding Window Attention](https://arxiv.org/abs/2004.05150) | -              | `--sliding_window`       |\n\n*: Arguments of `moe_peft.py`\n\nMoE-PEFT only supports scaled-dot product attention (eager) by default. Additional requirements are necessary for flash attention.\n\nFor flash attention, manual installation of the following dependencies is required:\n\n```bash\npip3 install ninja\npip3 install flash-attn==2.5.8 --no-build-isolation\n```\n\nIf any attention method is not specified, flash attention is used if available.\n\n## Supported Quantize Methods\n\n|         | Quantize Methods      | Arguments*    |\n|---------|-----------------------|---------------|\n| \u0026check; | Full Precision (FP32) | by default    |\n| \u0026check; | Tensor Float 32       | `--tf32`      |\n| \u0026check; | Half Precision (FP16) | `--fp16`      |\n| \u0026check; | Brain Float 16        | `--bf16`      |\n| \u0026check; | 8bit Quantize         | `--load_8bit` |\n| \u0026check; | 4bit Quantize         | `--load_4bit` |\n\n*: Arguments of `moe_peft.py`\n\nMoE-PEFT offers support for various model accuracy and quantization methods. By default, MoE-PEFT utilizes full precision (Float32), but users can opt for half precision (Float16) using `--fp16` or BrainFloat16 using `--bf16`. Enabling half precision reduces the model size by half, and for further reduction, quantization methods can be employed.\n\nQuantization can be activated using `--load_4bit` for 4-bit quantization or `--load_8bit` for 8-bit quantization. However, when only quantization is enabled, MoE-PEFT utilizes Float32 for calculations. To achieve memory savings during training, users can combine quantization and half-precision modes.\n\nTo enable quantization support, please manually install `bitsandbytes`:\n\n```bash\npip3 install bitsandbytes==0.43.1\n```\n\nIt's crucial to note that regardless of the settings, **LoRA weights are always calculated and stored at full precision**. For maintaining calculation accuracy, MoE-PEFT framework mandates the use of full precision for calculations when accuracy is imperative.\n\nFor users with NVIDIA Ampere or newer GPU architectures, the `--tf32` option can be utilized to enable full-precision calculation acceleration.\n\n## Offline Configuration\n\nMoE-PEFT relies on **HuggingFace Hub** to download necessary models, datasets, etc. If you cannot access the Internet or need to deploy MoE-PEFT in an offline environment, please refer to the following guide.\n\n1. Use `git-lfs` manually downloads models and datasets from [HuggingFace Hub](https://huggingface.co).\n2. Set `--data_path` to the local path to datasets when executing `launch.py gen`.\n3. Clone the [evaluate](https://github.com/huggingface/evaluate) code repository locally.\n4. Set environment variable `MOE_PEFT_METRIC_PATH` to the local path to `metrics` folder of evaluate code repository.\n5. Set `--base_model` to the local path to models when executing `launch.py run`.\n\nExample of (4): `export MOE_PEFT_METRIC_PATH=/path-to-your-git-repo/evaluate/metrics`\n\n## Known issues\n\n + Quantization with Qwen2 have no effect (same with transformers).\n + Applying quantization with DoRA will result in higher memory and computation cost (same with PEFT).\n + Sliding window attention with generate cache may product abnormal output.\n\n## Installation\n\nPlease refer to [MoE-PEFT Install Guide](./Install.md).\n\n## Quickstart\n\nYou can conveniently utilize MoE-PEFT via `launch.py`. The following example demonstrates a streamlined approach to training a dummy model with MoE-PEFT.\n\n```bash\n# Generating configuration\npython launch.py gen --template lora --tasks ./tests/dummy_data.json\n\n# Running the training task\npython launch.py run --base_model TinyLlama/TinyLlama_v1.1\n\n# Try with gradio web ui\npython inference.py \\\n  --base_model TinyLlama/TinyLlama_v1.1 \\\n  --template alpaca \\\n  --lora_weights ./casual_0\n```\n\nFor further detailed usage information, please refer to the `help` command:\n\n```bash\npython launch.py help\n```\n\n## MoE-PEFT\n\nThe `moe_peft.py` code is a starting point for finetuning on various datasets.\n\nBasic command for finetuning a baseline model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:\n```bash\n# Generating configuration\npython launch.py gen \\\n  --template lora \\\n  --tasks yahma/alpaca-cleaned\n\npython moe_peft.py \\\n  --base_model meta-llama/Llama-2-7b-hf \\\n  --config moe_peft.json \\\n  --bf16\n```\n\nYou can check the template finetune configuration in [templates](./templates/) folder.\n\nFor further detailed usage information, please use `--help` option:\n```bash\npython moe_peft.py --help\n```\n\n## Use Docker\n\nFirstly, ensure that you have installed Docker Engine and NVIDIA Container Toolkit correctly.\n\nAfter that, you can launch the container using the following typical command:\n\n```\ndocker run --gpus all -it --rm mikecovlee/moe_peft\n```\n\nYou can check all available tags from: [mikecovlee/moe_peft/tags](https://hub.docker.com/r/mikecovlee/moe_peft/tags)\n\nPlease note that this container only provides a proper environment to run MoE-PEFT. The codes of MoE-PEFT are not included.\n\n## Copyright\n\nThis project is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftudb-labs%2Fmoe-peft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftudb-labs%2Fmoe-peft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftudb-labs%2Fmoe-peft/lists"}