{"id":15634544,"url":"https://github.com/ericlbuehler/xlora","last_synced_at":"2025-04-09T18:17:54.375Z","repository":{"id":221825568,"uuid":"735740004","full_name":"EricLBuehler/xlora","owner":"EricLBuehler","description":"X-LoRA: Mixture of LoRA Experts","archived":false,"fork":false,"pushed_at":"2024-08-04T00:28:01.000Z","size":27607,"stargazers_count":216,"open_issues_count":10,"forks_count":12,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-09T18:17:50.101Z","etag":null,"topics":["llm","lora","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EricLBuehler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-26T01:41:30.000Z","updated_at":"2025-04-06T12:31:32.000Z","dependencies_parsed_at":"2024-04-11T03:27:22.318Z","dependency_job_id":"68e48310-ae29-415f-b866-2104b8d7cec9","html_url":"https://github.com/EricLBuehler/xlora","commit_stats":null,"previous_names":["ericlbuehler/xlora"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricLBuehler%2Fxlora","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricLBuehler%2Fxlora/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricLBuehler%2Fxlora/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricLBuehler%2Fxlora/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EricLBuehler","download_url":"https://codeload.github.com/EricLBuehler/xlora/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248085328,"owners_count":21045139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","lora","python","pytorch"],"created_at":"2024-10-03T10:53:56.842Z","updated_at":"2025-04-09T18:17:54.351Z","avatar_url":"https://github.com/EricLBuehler.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# X-LoRA\nMixture of LoRA Experts: Leverage the power of fine-tuned LoRA experts by employing a mixture of experts, or MoE technique.\n\nX-LoRA works by learning scaling values for LoRA adapters. These learned scalings values are used to\ngate the LoRA experts in a dense fashion. Additionally, all LoRA adapters and the base model are frozen, allowing efficient fine tuning due to a low parameter count.\n\nX-LoRA is easily applied to any HuggingFace Transformers model. Please see our weights, [here](https://huggingface.co/lamm-mit/x-lora) and our [paper](https://arxiv.org/abs/2402.07148).\n\n### Token-by-token scalings\n![Token-by-token scalings](./res/token_by_token_scalings.gif)\n\n## Advantages and features\n- Effective: Dense gating of experts allows effective mixing\n- Efficient fine-tuning: low trainable parameter count\n- Hierarchical encapsulated strategy: Re-use existing trained models or model section and re-use them to address complex tasks that cut across experts, following a bio-inspired strategy \n- Easy-to-use API: `add_xlora_to_model`, broad compatibility \n- Dynamically mix LoRA adapters: Deep layer-wise combinations of adapters.\n\n### Architecture\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"./res/general_arch_v5.png\" alt=\"General Architecture\" width=75%/\u003e\n\u003c/p\u003e\n\nSee the [examples](examples) folder for some examples of how to get started with X-LoRA.\n\n## Efficient Inference Support\n[Mistral.rs](https://github.com/EricLBuehler/mistral.rs) is an inference framework which supports X-LoRA! To use it, follow the installation instructions and run the following command to start up an X-LoRA inference platform.\n\n`./mistralrs-server --port 1234 x-lora-mistral -o ordering.json`\n\nBase and X-LoRA Huggingface model IDs may be specified through command line switches to use your own models. Please see the Github page for further details.\n\n## Installation\nPending a pip release, run the following command to install X-LoRA.\n\n`pip install git+https://github.com/EricLBuehler/xlora.git`\n\n## Examples\nExcerpt from [this](./examples/simple.ipynb) example.\n\n- [Converting a model](README.md#converting-a-model)\n- [Loading a trained X-LoRA model from scratch](README.md#loading-a-trained-x-lora-model-from-scratch)\n- [Loading a trained X-LoRA model with a convenience function](README.md#loading-a-trained-x-lora-model-with-a-convenience-function)\n- [Scalings logging](README.md#scalings-logging)\n- [Trainable parameters](README.md#trainable-parameters)\n- [Setting trainability of adapters dynamically](README.md#setting-trainability-of-adapters-dynamically)\n- [Setting and resetting the scaling pass value](README.md#setting-and-resetting-the-scaling-pass-value)\n- [Setting and getting the global LoRA weight](README.md#setting-and-getting-the-global-lora-weight)\n- [Setting and getting the top-k lora value](README.md#setting-and-getting-the-top-k-lora-value)\n\n### Converting a model\n```python\nimport torch\nimport xlora\nfrom transformers import AutoConfig, AutoModelForCausalLM # type: ignore\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"mistralai/Mistral-7B-Instruct-v0.1\",\n    trust_remote_code=True,\n    use_flash_attention_2=False,\n    device_map=\"cuda:0\",\n    torch_dtype=torch.bfloat16,\n)\n\nconfig = AutoConfig.from_pretrained(\n    \"mistralai/Mistral-7B-Instruct-v0.1\",\n    trust_remote_code=True,\n    use_flash_attention_2=False,\n    device_map=\"auto\",\n)\n\n### Convert the model to X-LoRA\nmodel_created = xlora.add_xlora_to_model(\n    model=model,\n    xlora_config=xlora.xLoRAConfig(\n        config.hidden_size,\n        base_model_id=\"mistralai/Mistral-7B-Instruct-v0.1\",\n        xlora_depth=8,\n        device=torch.device(\"cuda\"),\n        adapters={\n            \"adapter_1\": \"./path/to/the/checkpoint/\",\n            \"adapter_2\": \"./path/to/the/checkpoint/\",\n            \"adapter_n\": \"./path/to/the/checkpoint/\",\n        },\n    ),\n    verbose=True,\n)\n```\n### Loading a trained X-LoRA model from scratch\n```python\nimport torch\nimport xlora\nfrom transformers import AutoConfig, AutoModelForCausalLM # type: ignore\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"mistralai/Mistral-7B-Instruct-v0.1\",\n    trust_remote_code=True,\n    use_flash_attention_2=False,\n    device_map=\"cuda:0\",\n    torch_dtype=torch.bfloat16,\n)\n\nconfig = AutoConfig.from_pretrained(\n    \"mistralai/Mistral-7B-Instruct-v0.1\",\n    trust_remote_code=True,\n    use_flash_attention_2=False,\n    device_map=\"auto\",\n)\n\nmodel_created = xlora.from_pretrained(\n    \"./path/to/saved/model\",\n    model,\n    \"cuda\",\n)\n```\n\n### Loading a trained X-LoRA model with a convenience function\n```python\nimport torch\nfrom xlora.xlora_utils import load_model  # type: ignore\n\nXLoRA_model_name = \"myuser/repo\"\n\nmodel_loaded, tokenizer = load_model(\n    model_name=XLoRA_model_name,\n    device=\"cuda:0\",\n    dtype=torch.bfloat16,\n)\n```\n\n### Scalings logging\n```python\n# Enable scalings logging and begin a log\nmodel_created.enable_scalings_logging()\n\n# Run forward passes to accumulate a log\n\n# Write the log to a file, or multiple.\nmodel_created.flush_log_scalings(\"./path/to/output/file\")\n\n# Get a shallow copy of the scalings\nlog_copy = model_created.get_scalings_log()\n\n# Disable scalings logging\nmodel_created.disable_scalings_logging()\n\n# Clear the scalings log\nmodel_created.clear_scalings_log()\n\n# Get the latest scalings prediction\nscalings_pred = model_created.get_latest_scalings()\n\n# Load the scalings log from a file, or multiple automatically.\nloaded_log = xlora.xlora_utils.load_scalings_log(\"./path/to/output/file\", verbose=True)\n```\n\n### Trainable parameters\n```python\nmodel: xLoRAModel = ... # Load the model\n\nnum_trainable, num_all_params = model.get_nb_trainable_parameters()\n\nmodel.print_trainable_parameters()\n```\n\n### Setting trainability of adapters dynamically\n```python\nmodel: xLoRAModel = ... # Load the model\n\n# Use trainable adapters: mark all adapters as trainable\nmodel.set_use_trainable_adapters(True)\n\n# Get the current status of the trainable adapters, in this case returning True\nmodel.get_use_trainable_adapters()\n```\n\n### Setting and resetting the scaling pass value\n```python\nmodel: xLoRAModel = ... # Load the model\n\n# Set the scaling pass value to 0, meaning that no adapters will contribute to the scaling pass output\nmodel.set_scaling_pass_value(0)\n\n# Allow the model to use the default scaling pass value\nmodel.set_scaling_pass_value(None)\n```\n\n### Setting and getting the global LoRA weight\n```python\nmodel: xLoRAModel = ... # Load the model\n\n# Multiply the output of each LoRA adapter by 2, additionally to the scalings.\nmodel.set_global_scaling_weight(2)\n\n# Returns 2\nres = model.get_global_scaling_weight()\n```\n\n### Setting and getting the top-k lora value\n```python\n# Use the top 2 lora experts\nmodel_created.set_topk_lora(2)\n\n# Returns 2\nres = model_created.get_topk_lora()\n```\n\n## API\nThe X-LoRA API is composed of 3 parts: the \"Global API\", the \"Model API\" and the \"Utility API\". Generally the global API is used to create X-LoRA models and the model API is used to interface with the models while the Utility API provides useful utility functions.\n\n- [Global API](README.md#global-api): `xlora.*`\n  - `xlora.add_xlora_to_model`\n  - `xlora.from_pretrained`\n- [Utility API](README.md#utility-api): `xlora.xlora_utils.*`\n  - `xlora.xlora_utils.load_scalings_log`\n  - `xlora.xlora_utils.load_model`\n- [Model API](README.md#model-api): `xLoraModel.*`\n  - [Scalings](README.md#scalings)\n    - `xLoraModel.disable_scalings_logging`\n    - `xLoraModel.enable_scalings_logging`\n    - `xLoraModel.flush_log_scalings`\n    - `xLoraModel.get_scalings_log`\n    - `xLoraModel.set_scaling_pass_value`\n    - `xLoraModel.get_latest_scalings`\n    - `xLoraModel.set_global_lora_weight`\n    - `xLoraModel.get_global_lora_weight`\n  - [Trainable parameters](README.md#trainable-parameters-1)\n    - `xLoraModel.get_nb_trainable_parameters`\n    - `xLoraModel.print_trainable_parameters`\n  - [Trainable adapters](README.md#setting-the-trainable-adapters)\n    - `xLoraModel.set_use_trainable_adapters`\n    - `xLoraModel.get_use_trainable_adapters`\n\n### X-LoRA Config\nThe X-LoRA Config saves the full configuration of an X-LoRA model.\n```python\nArgs:\n    hidden_size (`int`):\n        Hidden size of the base model.\n    device (`torch.device`):\n        Device for the X-LoRA classifier.\n    enable_softmax (`bool`, *optional*, defaults to `True`):\n        Enable softmax application for the X-LoRA classifier.\n    enable_softmax_topk (`bool`, *optional*, defaults to `False`):\n        Enable softmax application for the top-k LoRA adapters. Mutually exclusive to `enable_softmax` and must only be set if `top_k_lora` is.\n    softmax_temperature (`float`, *optional*, defaults to 1.0):\n        Softmax temperature, lower yields sharper predictions\n    layerwise_scalings (`bool`, *optional*, defaults to `False`):\n        Generate scalings for each layer.\n    top_k_lora (`int`, *optional*, defaults to None):\n        Sparsely select the top_k LoRA experts instead of the default dense method.\n    xlora_depth (`int`, *optional*, defaults to 1):\n        Depth of the X-LoRA classifier.\n    xlora_size (`int`, *optional*, defaults to 2048):\n        Hidden size of the X-LoRA classifier, irrelevant if `xlora_depth=1`.\n    enable_relu_and_dropout (`bool`, *optional*, defaults to `True`):\n        Enable ReLU activation and Dropout application of the X-LoRA classifier.\n    use_bias (`bool`, *optional*, defaults to `True`):\n        Enable bias in X-LoRA classifier.\n    xlora_dropout_p (`float`, *optional*, defaults to 0.2):\n        Dropout probability of the X-LoRA classifier, irrelevant if `xlora_depth=1` or `enable_relu_and_dropout=False`.\n    stop_token_id (`int`, *optional*):\n        The id of the stop token for the input. If this is None, the sequence length is calculated using the attention mask.\n    use_trainable_adapters (`bool`, *optional*, defaults to False):\n        Make the adapters trainable.\n    scaling_pass_value (`float`, *optional*, defaults to 0):\n        Scaling pass value.\n    global_scaling_weight (`float`, *optional*, defaults to 1):\n        Weight to multiply output of each LoRA adapter by.\n```\n\n### Global API\n- `xlora.add_xlora_to_model(model: PreTrainedModel, xlora_config: xLoRAConfig, adapters: Dict[str, str], verbose: bool) -\u003e xLoraModel`\n  - Convert a model to an xLoraModel, instantiating the classifier and adapters.\n- `xlora.from_pretrained(load_directory: str, model: PreTrainedModel, adapters: adapters: Optional[Dict[str, str]] = None, verbose: bool, device: str, from_safetensors: bool = True) -\u003e xLoraModel`\n  - Load the X-LoRA classifier and adapters from the specified local path or HuggingFace model ID. This should be called after an X-LoRA classifier has been trained.\n\n### Utility API\n- `xlora.xlora_utils.load_scalings_log(path: str, verbose: bool = False) -\u003e List[torch.Tensor]`\n  - Load the scalings log, with awareness to the two types.\n- `xlora.xlora_utils.load_model(model_name: str, device: str, dtype: torch.dtype, adapters: Dict[str, str], use_flash_attention_2: bool = False, load_xlora: bool = True, verbose: bool = False) -\u003e Tuple[Union[AutoModelForCausalLM, xLoRAModel], Union[PreTrainedTokenizer, PreTrainedTokenizerFast]`\n  - Convenience function to load a model with the specified adapters like the X-LoRA config, converting it to X-LoRA if specified. `model_name` can be a HuggingFace model ID, and it will automatically download all necessary weights.\n\n### Model API\n#### Scalings\n- `xLoraModel.disable_scalings_logging()`\n  - Disable scalings logging, without clearing the log.\n- `xLoraModel.clear_scalings_log()`\n  - Clear the scalings log.\n- `xLoraModel.enable_scalings_logging()`\n  - Enable scalings logging. Each time a forward pass occurs, the predicted scalings will be logged.\n- `xLoraModel.flush_log_scalings(path: str)`\n  - Write the scalings log (a tensor of shape (num_logged, batch_size, seq_len, n_layers, n_classes)) to the specified path.\n    If the tensor cannot be constructed, multiple files are written containing tensors of shape\n    (num_logged, batch_size, seq_len, n_layers, n_classes) such that each file contains one sequence length. Additionally a JSON\n    file is outputted containing the mapping from each sequence log file to the index of the contained tensor so that one may reconstruct\n    the log order.\n    The file specified should not contain an extension.\n- `xLoraModel.get_scalings_log(self) -\u003e List[Tensor]`\n  - Returns a shallow (only copying the list itself not the tensors) copy of the list containing the scalings log. Editing the list does not change the underlying log.\n    The tensors are of shape (batch_size, seq_len, n_layers, n_classes). The seq_len dim may vary with input dimension.\n- `xLoraModel.set_scaling_pass_value(self, value: Union[Number, None])`\n  - Manually set the scalings to a specific value during the scaling pass, forever. Call this function with None to enable the default scalings. This is reflected in the config.\n- `xLoraModel.get_latest_scalings(self) -\u003e Optional[Tensor]`\n  - Returns the latest scalings prediction, or None if no scalings have been predicted. The tensor is of shape (batch_size, seq_len, n_layers, n_classes).\n- `xLoraModel.set_global_scaling_weight(self, weight: float)`\n  - Set the global LoRA weight, a scalar to multiply the output of each LoRA adapter by. This is by default 1. This is reflected in the config.\n- `xLoraModel.get_global_scaling_weight(self) -\u003e float`\n  - Get the global LoRA weight.\n#### Trainable parameters\n- `xLoraModel.get_nb_trainable_parameters() -\u003e Tuple[int, int]`\n  - Return a tuple `(num_trainable, num_all_params)`\n- `xLoraModel.print_trainable_parameters()`\n  - Print the trainable and non-trainable parameters for the given model, including with the X-LoRA components.\n#### Setting the trainable adapters\n- `xLoraModel.set_use_trainable_adapters(use_trainable_adapters: bool)`\n  - Set the trainability of the adapters. This is reflected in the config.\n- `xLoraModel.get_use_trainable_adapters(self) -\u003e bool`\n  - Get the trainable or not trainable state of the adapters.\n#### Top-k\n- `xLoraModel.set_topk_lora(self, value: Optional[int])`\n  - Sparsely select the specified top_k LoRA experts instead of the default dense method. Set to None to use dense. This is reflected in the config.\n- `xLoraModel.get_topk_lora(self) -\u003e Optional[int]`\n  - Get the current top_k LoRA experts value.\n\n## Original paper and citation\n\nCite this work as:\n```bibtex\n@article{Buehler_XLoRA_2024,\n    title   = {X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Design},\n    author  = {E.L. Buehler, M.J. Buehler},\n    journal = {},\n    year    = {2024},\n    volume  = {},\n    pages   = {},\n    url     = {https://arxiv.org/abs/2402.07148}\n}\n```\n\n## Contributing\nPlease run `make style` before submitting a PR.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericlbuehler%2Fxlora","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fericlbuehler%2Fxlora","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericlbuehler%2Fxlora/lists"}