{"id":45181793,"url":"https://github.com/foundation-model-stack/fms-acceleration","last_synced_at":"2026-03-09T00:32:10.095Z","repository":{"id":239529795,"uuid":"794804846","full_name":"foundation-model-stack/fms-acceleration","owner":"foundation-model-stack","description":"🚀 Collection of libraries used with fms-hf-tuning to accelerate fine-tuning and training of large models.","archived":false,"fork":false,"pushed_at":"2026-01-30T12:31:34.000Z","size":3389,"stargazers_count":13,"open_issues_count":28,"forks_count":19,"subscribers_count":11,"default_branch":"main","last_synced_at":"2026-01-31T05:35:08.538Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/foundation-model-stack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"code-of-conduct.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-05-02T01:48:53.000Z","updated_at":"2026-01-30T12:31:38.000Z","dependencies_parsed_at":"2024-05-13T06:23:53.462Z","dependency_job_id":"09f83598-524a-47ce-b563-11a46677eff1","html_url":"https://github.com/foundation-model-stack/fms-acceleration","commit_stats":null,"previous_names":["foundation-model-stack/fms-acceleration"],"tags_count":33,"template":false,"template_full_name":null,"purl":"pkg:github/foundation-model-stack/fms-acceleration","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffms-acceleration","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffms-acceleration/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffms-acceleration/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffms-acceleration/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/foundation-model-stack","download_url":"https://codeload.github.com/foundation-model-stack/fms-acceleration/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffms-acceleration/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30278518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-08T20:45:49.896Z","status":"ssl_error","status_checked_at":"2026-03-08T20:45:49.525Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-20T10:00:30.864Z","updated_at":"2026-03-09T00:32:10.069Z","avatar_url":"https://github.com/foundation-model-stack.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\n# FMS Acceleration 🚀\n\nFMS Acceleration is designed to accelerate the fine-tuning and training of large models. This framework comprises a collection of libraries\nintended to be used with the [fms-hf-tuning](https://github.com/foundation-model-stack/fms-hf-tuning) suite.\n\nThe fms-acceleration framework includes accelerators for Full and Parameter Efficient Fine Tuning (PEFT), including\n\n - Low Rank Adaptation (LoRA) acceleration (coming soon)\n - Bits-and-Bytes (BNB) quantised LoRA : QLoRA acceleration\n - AutoGPTQ quantised LoRA : GPTQ-LoRA acceleration\n - Full Fine Tuning acceleration (coming soon)\n - Padding-Free Attention\n\nOur tests show a significant increase in training token throughput using this fms-acceleration framework.\n\n\nFor example:\n\n- QLoRA: 22-43 % token throughput increase on 1 GPU as compared to using Hugging Face BNB QLoRA\n- QLoRA: Straightforward integration with multiple GPU as compared to using Hugging Face BNB QLoRA\n- GPTQ-LoRA: 22-44 % token throughput increase on 1 GPU as compared to using Hugging Face BNB QLoRA \n- GPTQ-LoRA: Straightforward integration with multiple GPU as compared to using Hugging Face BNB QLoRA\n\n*The above includes numbers using fusedOps-and-kernels and actual impl coming soon, see below*.\n\n**This package is in BETA and is under development. Expect breaking changes!**\n\n## Plugins\n\nPlugin | Description | Depends | License | Status\n--|--|--|--|--\n[framework](./plugins/framework/README.md) | This acceleration framework for integration with huggingface trainers | | | Alpha\n[accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface\u003cbr\u003eAutoGPTQ | Apache 2.0\u003cbr\u003eMIT | Alpha\n[fused-op-and-kernels](./plugins/fused-ops-and-kernels/README.md)  | Fused LoRA and triton kernels (e.g., fast cross-entropy, rms, rope) | -- | Apache 2.0 [(contains extracted code)](./plugins/fused-ops-and-kernels/README.md#code-extracted-from-unsloth)| Beta\n[attention-and-distributed-packing](./plugins/attention-and-distributed-packing/README.md)  | Padding-Free Flash Attention Computation | flash-attn | Apache 2.0 | Beta\n[accelerated-moe](./plugins/accelerated-moe/README.md)   | Triton Kernels for Mixture-of-Expert parallel, inspired by [ScatterMoe](https://github.com/shawntan/scattermoe) and [MegaBlocks](https://github.com/databricks/megablocks) |  | Apache 2.0 | Beta\n\n## Usage with FMS HF Tuning\n\nBelow we demonstrate how to accelerate your tuning experience with [tuning/sft_trainer.py](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/sft_trainer.py) from `fms-hf-tuning`. \n\n**Note: New exciting [plugins](#plugins) will be added over time, so please check here for the latest accelerations!**.\n\n### Integration with FMS HF Tuning\n\n\n`fms-acceleration` is part of `fms-hf-tuning`, and instructions to utilize `fms-acceleration` for tuning are found [here](https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#fms-acceleration). In particular, `fms-acceleration` plugins can be accessed via command line arguments to `fms-hf-tuning` (e.g., `--auto_gptq triton_v2`); this is made available via integrated [configuration dataclasses](https://github.com/foundation-model-stack/fms-hf-tuning/tree/main/tuning/config/acceleration_configs) that configures the `AccelerationFramework` for the user.\n\n#### Need for an alternative way to access features pre-integration\n\nAs new plugins [become available](#plugins), more command line arguments will be made avaiable to `fms-hf-tuning` to enable them. However, this kind of integration takes time; plugins that are in development / research stages may not be immediately integrated.\n\nTherefore, an intermediary step is required to access plugins in `fms-acceleration` before they become integrated into `fms-hf-tuning`. In fact, such a method is critical for benchmarking / testing, that needs to happen before integration of any plugin in `fms-hf-tuning` can even be considered. Hence, we provide a method to configure the acceleration framework via a configuration YAML, that is passed into `AccelerationFramework` via an environment variable; the instructions for this is provided below. Futhermore, *experienced users* can also leverage this to early test plugins, but be warned that the learning curve to use these plugins is high (since it requires knowledge on how to write such a configuration). To aid on this, the following instructions are provide that describes both a basic and advanced flow.\n\n\n### FMS Acceleration Via Configuration YAML\n\n**Note**: As mentioned above, the recommended approach for `fms-hf-tuning` is to use the [acceleration config dataclasses](https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#fms-acceleration). \nThis method documented for the configuration YAML is only for *testing/research purposes* and not recommended for production. For general use, please refer instead [to the instructions here](#integration-with-fms-hf-tuning).\n\nBelow we illustrate a configuration YAML flow using the accelerated quantised PEFT using GPTQ-LoRA tuning with the AutoGPTQ `triton_v2` kernel use case; this kernel is state-of-the-art [provided by `jeromeku` on Mar 2024](https://github.com/AutoGPTQ/AutoGPTQ/pull/596):\n\nThere is both a *basic* and *advanced* usage for the configuration YAML flow.\n\n![Usage Flows](img/fms-accel-flows.png)\n\n#### Basic Configuration YAML Flow 🤡\n\nMost users of `fms-hf-tuning` only require the basic flow:\n- Assumption 1: user has an already prepared configuration, say from [sample-configurations](./sample-configurations/accelerated-peft-autogptq-sample-configuration.yaml).\n- Assumption 2: user knows exactly what acceleration 'plugins` are required (based on the configuration).\n- Assumption 3: the arguments for running `sft_trainer.py` is the same; save for one extra argument `--acceleration_framework_config_file` used to pass in the acceleration config.\n\nIn this case then the basic flow comprises of 3 steps:\n1. First go to [fms-hf-tuning](https://github.com/foundation-model-stack/fms-hf-tuning) and install the [framework library](./plugins/framework):\n    ```\n    $ pip install -e .[fms-accel]\n    ```\n    or alternatively install the framework directly:\n    ```\n    $ pip install git+https://github.com/foundation-model-stack/fms-acceleration.git#subdirectory=plugins/framework\n    ```\n\n    The above installs the command line utility `fms_acceleration.cli`, which is used to install plugins (and also other things like view sample configurations). \n\n3. `install` the required framework plugins; we install the `fms-acceleration-peft` plugin for GPTQ-LoRA tuning with triton v2 as:\n    ```\n    python -m fms_acceleration.cli install fms_acceleration_peft\n    ```\n    The above is the equivalent of:\n    ```\n    pip install git+https://github.com/foundation-model-stack/fms-acceleration.git#subdirectory=plugins/accelerated-peft\n    ```\n\n5. Run `sft_trainer.py` providing the acceleration configuration (via the environment variable `ACCELERATION_FRAMEWORK_CONFIG_FILE` and arguments; given the basic flow assumption that we simply re-use the same `sft_trainer.py` arguments as we had without using the `fms_acceleration` package:\n    ```\n    # when using sample-configurations, arguments can be referred from\n    # defaults.yaml and scenarios.yaml\n    ACCELERATION_FRAMEWORK_CONFIG_FILE=framework.yaml \\\n    python sft_trainer.py \\\n        ...  # arguments\n    ```\n\n    The framework activates relevant plugins given the framework configuration; for more details [see framework/README.md](./plugins/framework/README.md#configuration-of-plugins).\n\n    Activate `TRANSFORMERS_VERBOSITY=info` to see the huggingface trainer printouts and verify that `AccelerationFramework` is activated!\n\n    ```\n    # this printout will be seen in huggingface trainer logs if acceleration is activated\n    ***** FMS AccelerationFramework *****\n    Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.0.1.\n    ***** Running training *****\n    Num examples = 1,549\n    Num Epochs = 1\n    Instantaneous batch size per device = 4\n    Total train batch size (w. parallel, distributed \u0026 accumulation) = 4\n    Gradient Accumulation steps = 1\n    Total optimization steps = 200\n    Number of trainable parameters = 13,631,488\n    ```\n\n#### Advanced Configuration YAML Flow 🥷 🦹\n\nThe advanced flow makes further use of `fms_acceleration.cli` to: \n* list all available configs and acceleration plugins the configs depend on. \n* list all available plugins and check which are the installed ones.\n* identify critical `sft_trainer` arguments required for correct operation of a particular framework config.\n\nThe advanced flow comprises of 5 steps:\n1. Same as Step 1 of basic flow.\n2. Use `fms_acceleration.cli configs` to search for sample configs:\n    ```\n    $ python -m fms_acceleration.cli configs\n\n    1. accelerated-peft-autogptq (accelerated-peft-autogptq-sample-configuration.yaml) - plugins: ['accelerated-peft']\n    2. accelerated-peft-bnb (accelerated-peft-bnb-nf4-sample-configuration.yaml) - plugins: ['accelerated-peft']\n    ```\n\n    This is equivalent to the searching over the:\n    * [Full sample configuration list](./sample-configurations/CONTENTS.yaml) that shows `plugins` required for all available configs.\n    * E.g., [Accelerated GPTQ-LoRA configuration here](sample-configurations/accelerated-peft-autogptq-sample-configuration.yaml). \n3. `install` plugins same as Step 2 of basic flow, noting that in addition we can use `plugins` to display all available plugins; this list updates [as more plugins get developed](#plugins). Recall that `configs` list the required `plugins` for the sample configurations; make sure all of them are installed.\n    ```\n    $ python -m fms_acceleration.cli plugins\n\n    Choose from the list of plugin shortnames, and do:\n    * 'python -m fms_acceleration.cli install \u003cpip-install-flags\u003e PLUGIN_NAME'.\n\n    List of PLUGIN_NAME [PLUGIN_SHORTNAME]:\n\n    1. fms_acceleration_peft [peft]\n    ```\n    After `install` the list will update to indicate the installed plugins.\n4. Get the correct arguments for `sft_trainer.py`: \n    \n    * arguments required for correct operation (e.g., if using accelerated peft, then `peft_method` is required).\n\n        * Use `arguments` along with the [sample configuration `shortname`](./sample-configurations/CONTENTS.yaml) to display the relevant *critical arguments*; these arguments can be manually referred from [scenarios.yaml](./scripts/benchmarks/scenarios.yaml):\n        ```\n        $ python -m fms_acceleration.cli arguments accelerated-peft-autogptq\n\n        Searching for configuration shortnames: ['accelerated-peft-autogptq']\n        1. scenario: accelerated-peft-gptq\n        configs: accelerated-peft-autogptq\n        arguments:\n            --learning_rate 2e-4 \\\n            --fp16 True \\\n            --torch_dtype float16 \\\n            --peft_method lora \\\n            --r 16 \\\n            --lora_alpha 16 \\\n            --lora_dropout 0.0 \\\n            --target_modules ['q_proj', 'k_proj', 'v_proj', 'o_proj']\n        ```\n\n    * More info on `defaults.yaml` and `scenarios.yaml` [found here](./scripts/benchmarks/README.md#benchmark-scenarios).\n        * Arguments *not critical to the plugins* found in [defaults.yaml](./scripts/benchmarks/defaults.yaml). These can be taken with liberty.\n        * Arguments *critcal to plugins* found in [scenarios.yaml](./scripts/benchmarks/scenarios.yaml). The relevant section of [scenarios.yaml](./scripts/benchmarks/scenarios.yaml), is the one whose `framework_config` entries, match the `shortname` of the sample configuration of [interest](./sample-configurations/CONTENTS.yaml).\n\n### CUDA Dependencies\n\nThis repo requires CUDA to compute the kernels, and it is convinient to use [NVidia Pytorch Containers](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html) that already comets with CUDA installed. We have tested with the following versions:\n- `pytorch:24.01-py3`\n\n### Benchmarks\n\nThe benchmarks can be reproduced [with the provided scripts](./scripts/benchmarks). \n- includes baseline benches (e.g., standard fine-tuning, standard peft).\n- benches for various [acceleration sample configs](./sample-configurations/CONTENTS.yaml).\n\nSee below CSV files for various results:\n- [A100-80GB](./scripts/benchmarks/refs/a100_80gb.csv).\n\n### Code Architecture\n\nFor deeper dive into details see [framework/README.md](./plugins/framework/README.md).\n\n\n## Maintainers\n\nIBM Research\n- Fabian Lim flim@sg.ibm.com\n- Anh Uong anh.uong@ibm.com\n- Will Johnson Will.Johnson@ibm.com\n- Abhishek Maurya maurya.abhishek@ibm.com\n\nPast Contributors\n- Aaron Chew \n- Laura Wynter \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundation-model-stack%2Ffms-acceleration","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffoundation-model-stack%2Ffms-acceleration","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundation-model-stack%2Ffms-acceleration/lists"}