{"id":21962600,"url":"https://github.com/xuyige/revmux","last_synced_at":"2025-07-24T22:34:32.903Z","repository":{"id":261787960,"uuid":"868513930","full_name":"xuyige/RevMUX","owner":"xuyige","description":"Source code for EMNLP 2024 paper: RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference","archived":false,"fork":false,"pushed_at":"2024-12-02T08:26:15.000Z","size":304,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-13T19:38:16.558Z","etag":null,"topics":["efficient-inference","large-language-models","natural-language-processing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xuyige.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-06T15:23:41.000Z","updated_at":"2025-04-09T14:43:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"7d827eb6-e12d-4967-b0f8-23027c0f1210","html_url":"https://github.com/xuyige/RevMUX","commit_stats":null,"previous_names":["xuyige/revmux"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/xuyige/RevMUX","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FRevMUX","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FRevMUX/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FRevMUX/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FRevMUX/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xuyige","download_url":"https://codeload.github.com/xuyige/RevMUX/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FRevMUX/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266913677,"owners_count":24005578,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-24T02:00:09.469Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["efficient-inference","large-language-models","natural-language-processing"],"created_at":"2024-11-29T10:52:05.341Z","updated_at":"2025-07-24T22:34:32.814Z","avatar_url":"https://github.com/xuyige.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference \n\n**Authors:** [Yige Xu](https://xuyige.github.io), [Xu Guo](https://guoxuxu.github.io/), [Zhiwei Zeng](https://scholar.google.com/citations?user=6eiLXmcAAAAJ), [Chunyan Miao](https://scholar.google.com/citations?user=fmXGRJgAAAAJ)\n\n[Paper](https://aclanthology.org/2024.emnlp-main.1232/)\n\n---\n\n## Overview\n\nThe expansion of Large Language Models (LLMs) has driven breakthrough in Natural Language Processing (NLP) but raised concerns about \u003cspan style=\"color:blue\"\u003einference efficiency\u003c/span\u003e, particularly latency, memory usage, and throughput.\n\n\u003cdiv style=\"text-align: center;\"\u003e\n  \u003cfigure style=\"text-align: center; margin: 0;\"\u003e\n    \u003cimg src=\"images/minibatch.png\" alt=\"Description of image 1\" width=\"\"/\u003e\n    \u003cbr\u003e\n    \u003cfigcaption\u003eFigure 1: Mini-Batch Processing with Single-Input Single-Output (SISO)\u003c/figcaption\u003e\n  \u003c/figure\u003e\n\u003c/div\u003e\n\n\u003cdiv style=\"text-align: center;\"\u003e\n  \u003cfigure style=\"text-align: center; margin: 0;\"\u003e\n    \u003cimg src=\"images/datamux-pipeline.png\" alt=\"Description of image 2\" width=\"\"/\u003e\n    \u003cbr\u003e\n    \u003cfigcaption\u003eFigure 2: Multi-Input Multi-Output (MIMO) with data multiplexing and demultiplexing\u003c/figcaption\u003e\n  \u003c/figure\u003e\n\u003c/div\u003e\n\nOur work addresses the need of \u003cb\u003e\u003cspan style=\"color:red\"\u003ehigh throughput\u003c/span\u003e\u003c/b\u003e through \u003cb\u003e\u003cspan style=\"color:red\"\u003edata multiplexing\u003c/span\u003e\u003c/b\u003e, handling batches of concurrent queries while maintaining satisfactory downstream performance.\n\nWe fixed the backbone language models and tunes the adapters only. Then we design a reversible adapter to mix the instances and perform a reverse operation to reconstruct the individual outputs.\n\n\u003cdiv style=\"text-align: center;\"\u003e\n  \u003cimg src=\"images/bi-pipeline.png\" alt=\"Description\" width=\"\"/\u003e\n  \u003cbr\u003e\n  \u003cfigcaption\u003eFigure 3: Overview of Our RevMUX\u003c/figcaption\u003e\n\u003c/div\u003e\n\n\u003cdiv style=\"text-align: center;\"\u003e\n  \u003cimg src=\"images/bi-invertible.png\" alt=\"Description\" width=\"\"/\u003e\n  \u003cbr\u003e\n  \u003cfigcaption\u003eFigure 4: Illustration of the reversible multiplexer and reverse demultiplexer when N=2.\u003c/figcaption\u003e\n\u003c/div\u003e\n\n\n\n\n\n## Quick Start\n\n### Setup and Dependencies\n\nRequirements:\n\n- fastNLP==0.7.0\n- torch==2.3.1+cu118\n- transformers==4.42.3\n\n### Data Preparation\n\nThe dataset should be downloaded under the same directory:\n\n```\n/path/to/your/data/dir\n    |--/MRPC/\n        |--/dev.tsv\n        |--/test.tsv\n        |--/train.tsv\n    |--/QNLI/\n        |--/dev.tsv\n        |--/test.tsv\n        |--/train.tsv\n    |--/RTE/\n        |--/dev.tsv\n        |--/test.tsv\n        |--/train.tsv\n    |--/SST-2/\n        |--/dev.tsv\n        |--/test.tsv\n        |--/train.tsv\n```\n\n\n### Usage\n\n#### T5\n\n```bash\nbash run_batch_inference_t5.sh  \\\n    --task_name sst-2 \\\n    --model_name t5-small \\\n    --model_type revmux \\\n    --batch_size 32 \\\n    --n_epochs 50 \\\n    --combine_first 3 \\\n    --compose_size 2 \\\n    --data_dir /path/to/your/data/dir \\\n    --adapter_lr 2e-5 \\\n    --save_dir /path/to/you/save/dir\n```\n\n#### BERT\n\n```bash\nbash run_batch_inference_bert.sh  \\\n    --task_name sst-2 \\\n    --model_name bert-base-uncased \\\n    --model_type revmux \\\n    --batch_size 32 \\\n    --n_epochs 50 \\\n    --combine_first 6 \\\n    --compose_size 2 \\\n    --data_dir /path/to/your/data/dir \\\n    --adapter_lr 2e-5 \\\n    --save_dir /path/to/you/save/dir\n```\n\n#### LLaMA\n```bash\nbash run_batch_inference_llama.sh  \\\n    --task_name sst-2 \\\n    --model_name /path/to/your/llama3 \\\n    --model_type revmux \\\n    --batch_size 2 \\\n    --n_epochs 10 \\\n    --combine_first 16 \\\n    --compose_size 2 \\\n    --data_dir /path/to/your/data/dir \\\n    --adapter_lr 2e-5 \\\n    --save_dir /path/to/you/save/dir\n```\n\n**Arguments**: \n\n`task_name` is selected from `[sst-2, rte, qnli, mrpc]`.\n\n`model_name` is the name of backbone language model, selected from `[t5-small, t5-base, t5-large, bert-base-uncased]`.\n\n`model_type`: `revmux` is our **RevMUX**, `ora` is the baseline of **Only Multiplexer Reversible**, `adapter` is the baseline of **Vanilla Adapters**.\n\n`combine_first`: the number of prefilling layers.\n\n`compose_size`: the number of instances mixed together.\n\n\n\n## Citation\n```\n@inproceedings{xu-etal-2024-revmux,\n    title = \"{R}ev{MUX}: Data Multiplexing with Reversible Adapters for Efficient {LLM} Batch Inference\",\n    author = \"Xu, Yige  and\n      Guo, Xu  and\n      Zeng, Zhiwei  and\n      Miao, Chunyan\",\n    booktitle = \"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing\",\n    month = nov,\n    year = \"2024\",\n    address = \"Miami, Florida, USA\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.emnlp-main.1232\",\n    doi = \"10.18653/v1/2024.emnlp-main.1232\",\n    pages = \"22072--22087\",\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxuyige%2Frevmux","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxuyige%2Frevmux","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxuyige%2Frevmux/lists"}