{"id":31211526,"url":"https://github.com/cyberagentailab/posthoc-control-moe","last_synced_at":"2026-07-21T03:31:46.124Z","repository":{"id":244407182,"uuid":"814418382","full_name":"CyberAgentAILab/posthoc-control-moe","owner":"CyberAgentAILab","description":"[TACL 2024] Code for \"Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding\"","archived":false,"fork":false,"pushed_at":"2025-06-17T00:43:07.000Z","size":28,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-21T05:37:08.797Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CyberAgentAILab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-13T01:29:24.000Z","updated_at":"2025-06-17T00:43:10.000Z","dependencies_parsed_at":"2024-11-28T09:39:36.743Z","dependency_job_id":null,"html_url":"https://github.com/CyberAgentAILab/posthoc-control-moe","commit_stats":null,"previous_names":["cyberagentailab/posthoc-control-moe"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CyberAgentAILab/posthoc-control-moe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fposthoc-control-moe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fposthoc-control-moe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fposthoc-control-moe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fposthoc-control-moe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CyberAgentAILab","download_url":"https://codeload.github.com/CyberAgentAILab/posthoc-control-moe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2Fposthoc-control-moe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279017179,"owners_count":26086021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-21T05:30:39.803Z","updated_at":"2026-07-21T03:31:46.082Z","avatar_url":"https://github.com/CyberAgentAILab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Post-Hoc Control over Mixture-of-Experts\n\nThis repository implements the main experiments of our TACL 2024 paper, [Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00701/124836/Not-Eliminate-but-Aggregate-Post-Hoc-Control-over).\n\nThe code is intended solely for reproducing the experiments. We thank the authors of [RISK](https://github.com/CuteyThyme/RISK), on which our code was based.\n\n\n## Environment\n\nWe tested our code in the following environment.\n* OS: Debian GNU/Linux 10 (buster)\n* Python: 3.8.3\n* CUDA: 11.2\n* GPUs: **NVIDIA V100 x 2**\n\nThe experiment with `DeBERTa-v3-large` requires a different environment.\n* OS: Debian GNU/Linux 10 (buster)\n* Python: 3.8.3\n* CUDA: 11.2\n* GPUs: **NVIDIA A100 (40GB) x 2**\n\n\n## Getting Started\n\n```bash\ngit clone https://github.com/CyberAgentAILab/posthoc-control-moe\ncd posthoc-control-moe\n```\n\n\n### Installation\n\n\u003e [!NOTE]\n\u003e The exact versions of the libraries we used are specified in the requirements for reproducibility. For improved security, consider updating the libraries, particularly PyTorch and Transformers. However, note that we have not tested reproducibility with the updated versions.  \n\nInstall dependencies to reproduce the main results.  \n```bash\n# For conda users\nconda env create -f environment.yaml\nconda activate posthoc-control-moe\n\n# For the others\npip install --force-reinstall --no-cache-dir -r requirements.txt\n```\n\nFor the experiment with `DeBERTa-v3-large`, use `environment_deberta.yaml` or `requirements_deberta.txt`.\n```bash\n# For conda users\nconda env create -f environment_deberta.yaml\nconda activate posthoc-control-moe-deberta\n\n# For the others\npip install --force-reinstall --no-cache-dir -r requirements_deberta.txt\n```\n\n\n### Data Preparation\n\n[Download the datasets from here](https://drive.google.com/drive/folders/1aleJytl3SAKdGBsxZbxznwusINOnTAzh?usp=share_link) and place them as follows.\nOr you can just run `gdown 'https://drive.google.com/drive/folders/1aleJytl3SAKdGBsxZbxznwusINOnTAzh?usp=share_link' --folder` to download the datasets at once.\nThe link is kindly provided by [RISK](https://github.com/CuteyThyme/RISK).\n```\n./dataset/\n  ├── multinli/\n  │     ├── train.tsv\n  │     └── dev_matched.tsv\n  ├── hans/heuristics_evaluation_set.txt\n  ├── qqp_paws/\n  │     ├── qqp_train.tsv\n  │     ├── qqp_dev.tsv\n  │     └── paws_devtest.tsv\n  └── fever/\n        ├── fever.train.jsonl\n        ├── fever.dev.jsonl\n        ├── symmetric_v0.1/fever_symmetric_generated.jsonl\n        └── symmetric_v0.2/fever_symmetric_test.jsonl\n```\n\nOriginal links for the datasets:\n* MNLI:  [https://cims.nyu.edu/~sbowman/multinli/](https://cims.nyu.edu/~sbowman/multinli/)     \n* HANS:  [https://github.com/tommccoy1/hans](https://github.com/tommccoy1/hans)    \n* QQP and PAWS: [https://github.com/google-research-datasets/paws](https://github.com/google-research-datasets/paws)\n* FEVER and FEVER-Symmetric: [https://github.com/TalSchuster/FeverSymmetric](https://github.com/TalSchuster/FeverSymmetric)     \n\n\n## Usage\n\n### Training\n\nTrain the mixture-of-experts and save the one that performs the best on ID dev.\nHere, we specify the seed that yields near the average performance shown in the paper.\nThe default seed is `777`, and the analyses were conducted on that seed.\n```bash\nmkdir -p saved_models/mnli\nmkdir -p saved_models/qqp\nmkdir -p saved_models/fever\n\n# MNLI\nCUDA_VISIBLE_DEVICES=0,1 accelerate launch \\\n    --config_file accelerate_config.yaml --main_process_port 20880 \\\n    src/main_mix.py --model bert_mos --pretrained_path bert-base-uncased \\\n    --dataset mnli --batch_size 32 --epochs 10 \\\n    --num_experts 10 --router_loss 0.5 --router_tau 1 \\\n    --num_topk_mask 8 --lr 2e-5 --seed 888 --save_dir saved_models/mnli \\\n    --best_model_name bert_mos_e10_rs05k8_ep10_lr2e-5_8 --save\n\n# QQP\nCUDA_VISIBLE_DEVICES=0,1 accelerate launch \\\n    --config_file accelerate_config.yaml --main_process_port 20880 \\\n    src/main_mix.py --model bert_mos --pretrained_path bert-base-uncased \\\n    --dataset qqp --batch_size 32 --epochs 10 \\\n    --num_experts 15 --router_loss 1 --router_tau 1 \\\n    --num_topk_mask 8 --lr 2e-5 --seed 888 --save_dir saved_models/qqp \\\n    --best_model_name bert_mos_e15_rs1k8_ep10_lr2e-5_8 --save\n\n# FEVER\nCUDA_VISIBLE_DEVICES=0,1 accelerate launch \\\n    --config_file accelerate_config.yaml --main_process_port 20880 \\\n    src/main_mix.py --model bert_mos --pretrained_path bert-base-uncased \\\n    --dataset fever --batch_size 32 --epochs 10 \\\n    --num_experts 10 --router_loss 1 --router_tau 1 \\\n    --num_topk_mask 8 --lr 2e-5 --seed 888 --save_dir saved_models/fever \\\n    --best_model_name bert_mos_e10_rs1k8_ep10_lr2e-5_8 --save\n```\n\nFor the `DeBERTa-v3-large` ablation study:\n```bash\n# Make sure to use the environment and dependencies prepared for DeBERTa-v3-large\nCUDA_VISIBLE_DEVICES=0,1 accelerate launch \\\n    --config_file accelerate_config_deberta.yaml --main_process_port 20880 \\\n    src/main_mix.py --model bert_mos --pretrained_path microsoft/deberta-v3-large \\\n    --dataset mnli --batch_size 32 --epochs 10 \\\n    --num_experts 10 --router_loss 0.5 --router_tau 1 \\\n    --num_topk_mask 8 --lr 5e-6 --max_grad_norm 1 --seed 888 \\\n    --save_dir saved_models/mnli \\\n    --best_model_name deberta_mos_e10_rs05k8_ep10_lr5e-6g1_bf16_8 --save\n```\n\n\n### Evaluation\n\nEvaluate the post-hoc control over the mixture-of-experts on OOD tests.\n[Some saved models are available here](https://console.cloud.google.com/storage/browser/ailab-public/posthoc-control-moe) for those who want to check the results quickly.\nDownload and place them under `saved_models/[task_name]/`.\n\n```bash\n# HANS\nCUDA_VISIBLE_DEVICES=0,1 accelerate launch \\\n    --config_file accelerate_config.yaml --main_process_port 20880 \\\n    src/main_mix.py --model bert_mos --pretrained_path bert-base-uncased \\\n    --dataset mnli --batch_size 32 --epochs 10 \\\n    --num_experts 10 --router_loss 0.5 --router_tau 1 \\\n    --num_topk_mask 8 --lr 2e-5 --seed 888 --save_dir saved_models/mnli \\\n    --resume bert_mos_e10_rs05k8_ep10_lr2e-5_8 --evaluate\n\n# PAWS\nCUDA_VISIBLE_DEVICES=0,1 accelerate launch \\\n    --config_file accelerate_config.yaml --main_process_port 20880 \\\n    src/main_mix.py --model bert_mos --pretrained_path bert-base-uncased \\\n    --dataset qqp --batch_size 32 --epochs 10 \\\n    --num_experts 15 --router_loss 1 --router_tau 1 \\\n    --num_topk_mask 8 --lr 2e-5 --seed 888 --save_dir saved_models/qqp \\\n    --resume bert_mos_e15_rs1k8_ep10_lr2e-5_8 --evaluate\n\n# Symm. v1 and v2\nCUDA_VISIBLE_DEVICES=0,1 accelerate launch \\\n    --config_file accelerate_config.yaml --main_process_port 20880 \\\n    src/main_mix.py --model bert_mos --pretrained_path bert-base-uncased \\\n    --dataset fever --batch_size 32 --epochs 10 \\\n    --num_experts 10 --router_loss 1 --router_tau 1 \\\n    --num_topk_mask 8 --lr 2e-5 --seed 888 --save_dir saved_models/fever \\\n    --resume bert_mos_e10_rs1k8_ep10_lr2e-5_8 --evaluate\n```\n\nFor the `DeBERTa-v3-large` ablation study:\n```bash\n# Make sure to use the environment and dependencies prepared for DeBERTa-v3-large\nCUDA_VISIBLE_DEVICES=0,1 accelerate launch \\\n    --config_file accelerate_config_deberta.yaml --main_process_port 20880 \\\n    src/main_mix.py --model bert_mos --pretrained_path microsoft/deberta-v3-large \\\n    --dataset mnli --batch_size 32 --epochs 10 \\\n    --num_experts 10 --router_loss 0.5 --router_tau 1 \\\n    --num_topk_mask 8 --lr 5e-6 --max_grad_norm 1 --seed 888 \\\n    --save_dir saved_models/mnli \\\n    --resume deberta_mos_e10_rs05k8_ep10_lr5e-6g1_bf16_8 --evaluate\n```\n\n\n## Citation\n\nIf you find our work useful for your research, please consider citing our paper:\n```bibtex\n@article{10.1162/tacl_a_00701,\n    author = {Honda, Ukyo and Oka, Tatsushi and Zhang, Peinan and Mita, Masato},\n    title = {Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding},\n    journal = {Transactions of the Association for Computational Linguistics},\n    volume = {12},\n    pages = {1268-1289},\n    year = {2024},\n    month = {10},\n    issn = {2307-387X},\n    doi = {10.1162/tacl_a_00701},\n    url = {https://doi.org/10.1162/tacl\\_a\\_00701},\n    eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\\_a\\_00701/2480600/tacl\\_a\\_00701.pdf},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberagentailab%2Fposthoc-control-moe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyberagentailab%2Fposthoc-control-moe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberagentailab%2Fposthoc-control-moe/lists"}