Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/for-ai/parameter-efficient-moe
https://github.com/for-ai/parameter-efficient-moe
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/for-ai/parameter-efficient-moe
- Owner: for-ai
- Created: 2023-09-06T08:58:04.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-31T19:21:15.000Z (about 1 year ago)
- Last Synced: 2024-06-12T02:31:40.577Z (5 months ago)
- Language: Python
- Size: 396 KB
- Stars: 225
- Watchers: 17
- Forks: 13
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-adaptive-computation - official Jax code
README
## MoV and MoLoRA
This repository contains the official code for the paper: "[Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning](https://arxiv.org/abs/2309.05444)."The codebase is built on [T5X](https://github.com/google-research/t5x), which
defines the model and training loop;
[Flaxformer](https://github.com/google/flaxformer), which defines the
model computation; [Flax](https://github.com/google/flax), which defines the low
level model layers; and [Jax](https://github.com/google/jax), which provides the execution![My LaTeX Image](demo.png)
#### Installation
# CLONE repo
git clone https://github.com/for-ai/parameter-efficient-moe
# COPY to TPUs
gcloud alpha compute tpus tpu-vm scp --recurse parameter-efficient-moe :parameter-efficient-moe --zone --worker=all# RUN on TPUs
bash scripts/setup.sh### Dataset
The dataset that is used for training and evaluation should be cached using [SeqIO](https://github.com/google/seqio). We used [bigscience/P3](https://huggingface.co/datasets/bigscience/P3) dataset which is already prepared. For the dataset preparation, we refer [bigscience/t-zero](https://github.com/bigscience-workshop/t-zero/tree/master/training) repository.### Code components
Here is the code layout:
* `configs/` :: contains configs for the architecture of the each models including T0, IA3, LoRA, MoV, MoLoRa using gin style configuration.
* `scripts/` :: contains all the training and evaluation files for full fine-tuning, vanilla parameter-efficient fine-tuning, and their mixture counterpart fine-tuning.
* `src/` :: contains IA3, LoRA, MoV and MoLoRa computations, including the router they use.#### Example script
gcloud alpha compute tpus tpu-vm ssh --zone --worker=all --command "cd parameter-efficient-moe; bash scripts/mov_train.sh"
#### Fine-tuning:
```sh
# moe/scripts/mov_train.shMODEL_DIR=${1:-${MODEL_DIR}} # Model dir to save logs, ckpts, etc. in "gs://model_dir" format.
T5X_DIR="`python3 -m scripts.find_module t5x`/.." # directory where the T5X repo is cloned.
FLAXFORMER_DIR="`python3 -m scripts.find_module flaxformer`/.." # directory where the Flaxformer repo is cloned.
echo "Searching for gin configs in:"
echo "- ${T5X_DIR}"
echo "- ${FLAXFORMER_DIR}"
echo "============================="PRETRAINED_MODEL="gs://t5-data/pretrained_models/t5x/t5_1_1_lm100k_large/checkpoint_1100000"
CACHE_DIR="raw_tfrecords/you_cache_dir" # Directory where P3 cached data is stored, etc. in "gs://model_dir" format.python3 -m t5x.train \
--gin_search_paths="${T5X_DIR}" \
--gin_file="configs/t5/models/t5_1_1_large.gin" \ #e.g. 770M(t5-large) model
--gin_file="configs/mov.gin" \ # Use MoV as the architecture for PEFT
--gin.MODEL_DIR="'${MODEL_DIR}'" \
--gin.LOSS_NORMALIZING_FACTOR="'AVERAGE_PER_SEQUENCE'" \
--gin.MIXTURE_OR_TASK_NAME="'t0_train'" \ # Training subset
--gin.TASK_FEATURE_LENGTHS="{'inputs': 1024, 'targets': 256}" \
--gin.INITIAL_CHECKPOINT_PATH="'${PRETRAINED_MODEL}'" \
--gin.TRAIN_STEPS="1_600_000" \ # Pre-trained + number of steps
--gin.USE_CACHED_TASKS="True" \
--gin.PACKING="True" \
--seqio_additional_cache_dirs=${CACHE_DIR} \
--gin.BATCH_SIZE="32"
```#### Evaluation:
```sh
# moe/scripts/mov_eval.shCKPT_DIR=${1:-${CKPT_DIR}} # directory where the fine-tune model is stored
EVAL_DIR=${2:-${EVAL_DIR}} # directory to write eval outputT5X_DIR="`python3 -m scripts.find_module t5x`/.." #directory where the t5x is cloned
FLAXFORMER_DIR="`python3 -m scripts.find_module flaxformer`/.." #directory where the flaxformer is cloned
echo "Searching for gin configs in:"
echo "- ${T5X_DIR}"
echo "- ${FLAXFORMER_DIR}"
echo "============================="CACHE_DIR="raw_tfrecords/you_cache_dir" # directory where P3 cached data is stored, etc. in "gs://model_dir" format.
python3 -m t5x.eval \
--gin_search_paths="${T5X_DIR}" \
--gin_file="configs/t5/models/t5_1_1_large.gin" \
--gin_file="configs/mov_eval.gin" \ # Use MoV as the architecture for PEFT
--gin.EVAL_OUTPUT_DIR="'${EVAL_DIR}'" \
--gin.MIXTURE_OR_TASK_NAME="'t0_eval_score_eval'" \ # Evaluation subset
--gin.TASK_FEATURE_LENGTHS="{'inputs': 1024, 'targets': 256}" \
--gin.CHECKPOINT_PATH="'${CKPT_DIR}'" \
--seqio_additional_cache_dirs=${CACHE_DIR} \
--gin.utils.DatasetConfig.use_cached="True" \
--gin.utils.DatasetConfig.split="'validation'" \
--gin.BATCH_SIZE="32"
```
#### References
Our IA3 module implementation is the based on [prompt-tuning](https://github.com/google-research/prompt-tuning), and we used [bigscience/t-zero](https://github.com/bigscience-workshop/t-zero/tree/master/training) for implementation of the dataset.#### Citation
Please use the following bibtex entry to cite our work.```
@article{zadouri2023pushing,
url = {https://arxiv.org/abs/2309.05444}
title={Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning},
author={Ted Zadouri and Ahmet Üstün and Arash Ahmadian and Beyza Ermiş and Acyr Locatelli and Sara Hooker},
year={2023},
}
```