An open API service indexing awesome lists of open source software.

https://github.com/nanowell/differential-transformer-pytorch

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
https://github.com/nanowell/differential-transformer-pytorch

differential-transformer large-language-models machine-learning pytorch

Last synced: 11 months ago
JSON representation

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.

Awesome Lists containing this project

README

          

# Differential-Transformer-PyTorch
Unofficial PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
![image](https://github.com/user-attachments/assets/b3ebaf46-9db7-464b-8ef0-3095c4bdfc19)

![arch](https://github.com/user-attachments/assets/7ca6f267-2df0-4298-9974-1edc9bc19c64)

```bibtex
@misc{ye2024differentialtransformer,
title={Differential Transformer},
author={Tianzhu Ye and Li Dong and Yuqing Xia and Yutao Sun and Yi Zhu and Gao Huang and Furu Wei},
year={2024},
eprint={2410.05258},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.05258},
}