Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-mixture-of-experts-papers
Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.
https://github.com/arpita8/awesome-mixture-of-experts-papers
Last synced: 2 days ago
JSON representation
-
Links
- Mendeley - Mixture-of-Experts-Papers/blob/main/Mixture_of_Experts_Survey_Paper.pdf)
-
Evolution in Sparse Mixture of Experts
-
Collection of Recent MoE Papers
-
MoE in Visual Domain
- MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection
- MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection
- MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
- MoVA: Adapting Mixture of Vision Experts to Multimodal Context
- MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation
- AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts
- ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
- Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
- Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
- MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
- VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
- DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
- Vision Mixture of Experts
- DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning
- Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
-
MoE in LLMs
- LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
- Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-ExpertsCuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
- RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
- Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
- Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
- MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models
- Mistral 7B
- HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
-
MoE for Scaling LLMs
- u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
- QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
- Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
- Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
- Multi-Head Mixture-of-Experts
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
- Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
- OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
- Tutel: Adaptive Mixture-of-Experts at Scale
- Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields
- SaMoE: Parameter Efficient MoE Language Models via Self-Adaptive Expert Combination
- MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
- ST-MoE: Designing Stable and Transferable Sparse Expert Models
- Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
- SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
- Sparse is Enough in Scaling Transformers
- JetMoE: Reaching Llama2 Performance with 0.1M Dollars
-
MoE: Enhancing System Performance and Efficiency
- HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
- PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning
- MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
- SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System
- BlackMamba: Mixture of Experts for State-Space Models
- BaGuaLu: targeting brain scale pretrained models with over 37 million cores
- Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
- MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
- Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
- Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers
- StableMoE: Stable Routing Strategy for Mixture of Experts
- Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
- MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
- EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
- No Language Left Behind: Scaling Human-Centered Machine Translation
- EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
- No Language Left Behind: Scaling Human-Centered Machine Translation
- EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
- FastMoE: A Fast Mixture-of-Expert Training System
- ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot
- FastMoE: A Fast Mixture-of-Expert Training System
- ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot
- M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
- M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing
- ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling
-
Python Libraries for MoE
- MoE-Infinity: Offloading-Efficient MoE Model Serving
- SMT 2.0: A Surrogate Modeling Toolbox with a focus on Hierarchical and Mixed Variables Gaussian Processes
- MoE-Infinity: Offloading-Efficient MoE Model Serving
- SMT 2.0: A Surrogate Modeling Toolbox with a focus on Hierarchical and Mixed Variables Gaussian Processes
-
Integrating Mixture of Experts into Recommendation Algorithms
- MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
- SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
- MDFEND: Multi-domain Fake News Detection
- SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
- MDFEND: Multi-domain Fake News Detection
- PLE outperforming state-of-the-art MTL models
- PLE outperforming state-of-the-art MTL models
- CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval
-