https://github.com/kyegomez/differentialtransformer

An open source community implementation of the model from "DIFFERENTIAL TRANSFORMER" paper by Microsoft.
https://github.com/kyegomez/differentialtransformer

ai attention ml rnns ssm transformers transformers-library transformers-models

Last synced: 5 months ago
JSON representation

An open source community implementation of the model from "DIFFERENTIAL TRANSFORMER" paper by Microsoft.

Host: GitHub
URL: https://github.com/kyegomez/differentialtransformer
Owner: kyegomez
License: mit
Created: 2024-10-12T20:16:59.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-04-19T12:55:12.000Z (6 months ago)
Last Synced: 2025-04-19T20:16:52.669Z (6 months ago)
Topics: ai, attention, ml, rnns, ssm, transformers, transformers-library, transformers-models
Language: Python
Homepage: https://discord.com/servers/agora-999382051935506503
Size: 2.16 MB
Stars: 24
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          
# Differential Transformer 

An open source community implementation of the model from "DIFFERENTIAL TRANSFORMER" paper by Microsoft. [Paper Link](https://arxiv.org/abs/2410.05258). "Differential attention takes the difference between two softmax attention functions to eliminate attention noise. The idea is analogous to differential amplifiers [19] proposed in electrical engineering,where the difference between two signals is used as output, so that we can null out the common-mode noise of the input. In addition, the design of noise-canceling headphones is based on a similar idea. We can directly reuse FlashAttention [8] as described in Appendix A, which significantly improves model efficiency."

[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)

## Install

```bash

$ pip3 install differential-transformers

```

## Usage Transformer

```python

import torch

from differential_transformer.main import DifferentialTransformer

from loguru import logger

# Example usage:

# Example dimensions

batch_size = 32

seq_len = 128

embedding_dim = 64

h = 8

λ = 0.1

λinit = 0.05

# Create random input tensor

x = torch.randint(0, 256, (1, 1024))

# Instantiate and run the multi-head attention

multi_head = DifferentialTransformer(heads=h, dim=embedding_dim, λinit=λinit)

output = multi_head(x, λ=λ)

logger.info(f"Output shape: {output.shape}")

```

# License

MIT

## Citation

```bibtex

@misc{ye2024differentialtransformer,

    title={Differential Transformer}, 

    author={Tianzhu Ye and Li Dong and Yuqing Xia and Yutao Sun and Yi Zhu and Gao Huang and Furu Wei},

    year={2024},

    eprint={2410.05258},

    archivePrefix={arXiv},

    primaryClass={cs.CL},

    url={https://arxiv.org/abs/2410.05258}, 

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kyegomez/differentialtransformer

Awesome Lists containing this project

README