https://github.com/nanowell/differential-transformer-pytorch
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
https://github.com/nanowell/differential-transformer-pytorch
differential-transformer large-language-models machine-learning pytorch
Last synced: 11 months ago
JSON representation
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
- Host: GitHub
- URL: https://github.com/nanowell/differential-transformer-pytorch
- Owner: nanowell
- License: mit
- Created: 2024-10-08T13:48:40.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-27T19:43:03.000Z (over 1 year ago)
- Last Synced: 2024-10-28T00:07:08.825Z (over 1 year ago)
- Topics: differential-transformer, large-language-models, machine-learning, pytorch
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 19
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Differential-Transformer-PyTorch
Unofficial PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.


```bibtex
@misc{ye2024differentialtransformer,
title={Differential Transformer},
author={Tianzhu Ye and Li Dong and Yuqing Xia and Yutao Sun and Yi Zhu and Gao Huang and Furu Wei},
year={2024},
eprint={2410.05258},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.05258},
}