https://github.com/luodian/generalizable-mixture-of-experts

GMoE could be the next backbone model for many kinds of generalization task.
https://github.com/luodian/generalizable-mixture-of-experts

deep-learning domain-generalization mixture-of-experts pytorch pytorch-implementation

Last synced: 2 months ago
JSON representation

GMoE could be the next backbone model for many kinds of generalization task.

Host: GitHub
URL: https://github.com/luodian/generalizable-mixture-of-experts
Owner: Luodian
License: mit
Created: 2022-05-28T04:17:42.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-03-21T18:23:06.000Z (about 2 years ago)
Last Synced: 2025-04-02T19:07:11.558Z (3 months ago)
Topics: deep-learning, domain-generalization, mixture-of-experts, pytorch, pytorch-implementation
Language: Python
Homepage:
Size: 2.04 MB
Stars: 269
Watchers: 12
Forks: 35
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Welcome to Generalizable Mixture-of-Experts for Domain Generalization

🔥 Our paper [Sparse Mixture-of-Experts are Domain Generalizable Learners](https://openreview.net/forum?id=RecZ9nB9Q4) has officially been accepted as ICLR 2023 for Oral presentation.

🔥 GMoE-S/16 model currently [ranks top place](https://paperswithcode.com/sota/domain-generalization-on-domainnet) among multiple DG datasets without extra pre-training data. (Our GMoE-S/16 is initilized from [DeiT-S/16](https://github.com/facebookresearch/deit/blob/main/README_deit.md), which was only pretrained on ImageNet-1K 2012)

Wondering why GMoEs have astonishing performance? 🤯 Let's investigate the generalization ability of model architecture itself and see the great potentials of Sparse Mixture-of-Experts (MoE) architecture.

### Preparation

```sh
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

python3 -m pip uninstall tutel -y
python3 -m pip install --user --upgrade git+https://github.com/microsoft/tutel@main

pip3 install -r requirements.txt
```

### Datasets

```sh
python3 -m domainbed.scripts.download \
--data_dir=./domainbed/data
```

### Environments

Environment details used in paper for the main experiments on Nvidia V100 GPU.

```shell
Environment:
Python: 3.9.12
PyTorch: 1.12.0+cu116
Torchvision: 0.13.0+cu116
CUDA: 11.6
CUDNN: 8302
NumPy: 1.19.5
PIL: 9.2.0
```

## Start Training

Train a model:

```sh
python3 -m domainbed.scripts.train\
--data_dir=./domainbed/data/OfficeHome/\
--algorithm GMOE\
--dataset OfficeHome\
--test_env 2
```

## Hyper-params

We put hparams for each dataset into
```sh
./domainbed/hparams_registry.py
```

Basically, you just need to choose `--algorithm` and `--dataset`. The optimal hparams will be loaded accordingly.

## License

This source code is released under the MIT license, included [here](LICENSE).

## Acknowledgement

The MoE module is built on [Tutel MoE](https://github.com/microsoft/tutel).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/luodian/generalizable-mixture-of-experts

Awesome Lists containing this project

README