https://github.com/xuyige/revmux
Source code for EMNLP 2024 paper: RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference
https://github.com/xuyige/revmux
efficient-inference large-language-models natural-language-processing
Last synced: 11 months ago
JSON representation
Source code for EMNLP 2024 paper: RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference
- Host: GitHub
- URL: https://github.com/xuyige/revmux
- Owner: xuyige
- Created: 2024-10-06T15:23:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-02T08:26:15.000Z (over 1 year ago)
- Last Synced: 2025-07-13T19:38:16.558Z (12 months ago)
- Topics: efficient-inference, large-language-models, natural-language-processing
- Language: Python
- Homepage:
- Size: 297 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference
**Authors:** [Yige Xu](https://xuyige.github.io), [Xu Guo](https://guoxuxu.github.io/), [Zhiwei Zeng](https://scholar.google.com/citations?user=6eiLXmcAAAAJ), [Chunyan Miao](https://scholar.google.com/citations?user=fmXGRJgAAAAJ)
[Paper](https://aclanthology.org/2024.emnlp-main.1232/)
---
## Overview
The expansion of Large Language Models (LLMs) has driven breakthrough in Natural Language Processing (NLP) but raised concerns about inference efficiency, particularly latency, memory usage, and throughput.
Figure 1: Mini-Batch Processing with Single-Input Single-Output (SISO)
Figure 2: Multi-Input Multi-Output (MIMO) with data multiplexing and demultiplexing
Our work addresses the need of high throughput through data multiplexing, handling batches of concurrent queries while maintaining satisfactory downstream performance.
We fixed the backbone language models and tunes the adapters only. Then we design a reversible adapter to mix the instances and perform a reverse operation to reconstruct the individual outputs.
Figure 3: Overview of Our RevMUX
Figure 4: Illustration of the reversible multiplexer and reverse demultiplexer when N=2.
## Quick Start
### Setup and Dependencies
Requirements:
- fastNLP==0.7.0
- torch==2.3.1+cu118
- transformers==4.42.3
### Data Preparation
The dataset should be downloaded under the same directory:
```
/path/to/your/data/dir
|--/MRPC/
|--/dev.tsv
|--/test.tsv
|--/train.tsv
|--/QNLI/
|--/dev.tsv
|--/test.tsv
|--/train.tsv
|--/RTE/
|--/dev.tsv
|--/test.tsv
|--/train.tsv
|--/SST-2/
|--/dev.tsv
|--/test.tsv
|--/train.tsv
```
### Usage
#### T5
```bash
bash run_batch_inference_t5.sh \
--task_name sst-2 \
--model_name t5-small \
--model_type revmux \
--batch_size 32 \
--n_epochs 50 \
--combine_first 3 \
--compose_size 2 \
--data_dir /path/to/your/data/dir \
--adapter_lr 2e-5 \
--save_dir /path/to/you/save/dir
```
#### BERT
```bash
bash run_batch_inference_bert.sh \
--task_name sst-2 \
--model_name bert-base-uncased \
--model_type revmux \
--batch_size 32 \
--n_epochs 50 \
--combine_first 6 \
--compose_size 2 \
--data_dir /path/to/your/data/dir \
--adapter_lr 2e-5 \
--save_dir /path/to/you/save/dir
```
#### LLaMA
```bash
bash run_batch_inference_llama.sh \
--task_name sst-2 \
--model_name /path/to/your/llama3 \
--model_type revmux \
--batch_size 2 \
--n_epochs 10 \
--combine_first 16 \
--compose_size 2 \
--data_dir /path/to/your/data/dir \
--adapter_lr 2e-5 \
--save_dir /path/to/you/save/dir
```
**Arguments**:
`task_name` is selected from `[sst-2, rte, qnli, mrpc]`.
`model_name` is the name of backbone language model, selected from `[t5-small, t5-base, t5-large, bert-base-uncased]`.
`model_type`: `revmux` is our **RevMUX**, `ora` is the baseline of **Only Multiplexer Reversible**, `adapter` is the baseline of **Vanilla Adapters**.
`combine_first`: the number of prefilling layers.
`compose_size`: the number of instances mixed together.
## Citation
```
@inproceedings{xu-etal-2024-revmux,
title = "{R}ev{MUX}: Data Multiplexing with Reversible Adapters for Efficient {LLM} Batch Inference",
author = "Xu, Yige and
Guo, Xu and
Zeng, Zhiwei and
Miao, Chunyan",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.1232",
doi = "10.18653/v1/2024.emnlp-main.1232",
pages = "22072--22087",
}
```