Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kyegomez/switchtransformers
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
https://github.com/kyegomez/switchtransformers
ai gpt4 llama mixture-model mixture-of-experts mixture-of-models ml moe multi-modal
Last synced: 6 days ago
JSON representation
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
- Host: GitHub
- URL: https://github.com/kyegomez/switchtransformers
- Owner: kyegomez
- License: mit
- Created: 2024-01-22T11:38:12.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-09-23T04:53:14.000Z (10 days ago)
- Last Synced: 2024-09-27T07:04:01.304Z (6 days ago)
- Topics: ai, gpt4, llama, mixture-model, mixture-of-experts, mixture-of-models, ml, moe, multi-modal
- Language: Python
- Homepage: https://discord.gg/GYbXvDGevY
- Size: 2.43 MB
- Stars: 44
- Watchers: 3
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# Switch Transformers
![Switch Transformer](st.png)
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity" in PyTorch, Einops, and Zeta. [PAPER LINK](https://arxiv.org/abs/2101.03961)
## Installation
```bash
pip install switch-transformers
```# Usage
```python
import torch
from switch_transformers import SwitchTransformer# Generate a random tensor of shape (1, 10) with values between 0 and 100
x = torch.randint(0, 100, (1, 10))# Create an instance of the SwitchTransformer model
# num_tokens: the number of tokens in the input sequence
# dim: the dimensionality of the model
# heads: the number of attention heads
# dim_head: the dimensionality of each attention head
model = SwitchTransformer(
num_tokens=100, dim=512, heads=8, dim_head=64
)# Pass the input tensor through the model
out = model(x)# Print the shape of the output tensor
print(out.shape)```
## Citation
```bibtex
@misc{fedus2022switch,
title={Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity},
author={William Fedus and Barret Zoph and Noam Shazeer},
year={2022},
eprint={2101.03961},
archivePrefix={arXiv},
primaryClass={cs.LG}
}```
# License
MIT