https://github.com/lucidrains/agent-attention-pytorch

Implementation of Agent Attention in Pytorch
https://github.com/lucidrains/agent-attention-pytorch

artificial-intelligence attention-mechanisms deep-learning linear-attention

Last synced: 2 months ago
JSON representation

Implementation of Agent Attention in Pytorch

Host: GitHub
URL: https://github.com/lucidrains/agent-attention-pytorch
Owner: lucidrains
License: mit
Created: 2023-12-18T20:38:16.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-07-10T16:34:37.000Z (about 1 year ago)
Last Synced: 2025-05-09T21:35:26.170Z (2 months ago)
Topics: artificial-intelligence, attention-mechanisms, deep-learning, linear-attention
Language: Python
Homepage:
Size: 516 KB
Stars: 89
Watchers: 3
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        

## Agent Attention - Pytorch

Implementation of Agent Attention in Pytorch.

This work seems to be an elegant simplification of `ISAB` architecture from the Set Transformers paper (requires only one attention block rather than two). While ISAB works, I have found it to be a bit unstable, thus wondering if the simplification in this work resolves that issue.

This repository will add support for variable sequence lengths (masking) and post-softmax talking heads.

## Appreciation

- A16Z Open Source AI Grant Program and 🤗 Huggingface for the generous sponsorships, as well as my other sponsors, for affording me the independence to open source current artificial intelligence research

## Install

```bash

$ pip install agent-attention-pytorch

```

## Usage

```python

import torch

from agent_attention_pytorch import AgentSelfAttention

attn = AgentSelfAttention(

    dim = 512,

    num_agent_tokens = 256,       # number of "agent" tokens

    dim_head = 64,                # attention head dimension

    heads = 8                     # number of heads

)

x = torch.randn(3, 65536, 512)

mask = torch.ones(3, 65536).bool()

out = attn(x, mask = mask)

assert out.shape == x.shape

```

For a full fledged linear transformer based on agent tokens, just import `AgentTransformer`

```python

import torch

from agent_attention_pytorch import AgentTransformer

transformer = AgentTransformer(

    dim = 512,

    depth = 6,

    num_agent_tokens = 128,

    dim_head = 64,

    heads = 8

)

x = torch.randn(3, 65536, 512)

mask = torch.ones(3, 65536).bool()

out, agent_tokens = transformer(x, mask = mask, return_agent_tokens = True)

# (3, 65536, 512), (3, 128, 512)

assert out.shape == x.shape

```

## Citations

```bibtex

@inproceedings{Han2023AgentAO,

    title   = {Agent Attention: On the Integration of Softmax and Linear Attention},

    author  = {Dongchen Han and Tianzhu Ye and Yizeng Han and Zhuofan Xia and Shiji Song and Gao Huang},

    year    = {2023},

    url     = {https://api.semanticscholar.org/CorpusID:266210414}

}

```

```bibtex

@misc{shazeer2020talkingheads,

    title   = {Talking-Heads Attention}, 

    author  = {Noam Shazeer and Zhenzhong Lan and Youlong Cheng and Nan Ding and Le Hou},

    year    = {2020},

    eprint  = {2003.02436},

    archivePrefix = {arXiv},

    primaryClass = {cs.LG}

}

```

```bibtex

@article{Bondarenko2023QuantizableTR,

    title   = {Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing},

    author  = {Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort},

    journal = {ArXiv},

    year    = {2023},

    volume  = {abs/2306.12929},

    url     = {https://api.semanticscholar.org/CorpusID:259224568}

}

```

```bibtex

@article{Wang2022FoundationT,

    title   = {Foundation Transformers},

    author  = {Hongyu Wang and Shuming Ma and Shaohan Huang and Li Dong and Wenhui Wang and Zhiliang Peng and Yu Wu and Payal Bajaj and Saksham Singhal and Alon Benhaim and Barun Patra and Zhun Liu and Vishrav Chaudhary and Xia Song and Furu Wei},

    journal = {ArXiv},

    year    = {2022},

    volume  = {abs/2210.06423},

    url     = {https://api.semanticscholar.org/CorpusID:252846241}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucidrains/agent-attention-pytorch

Awesome Lists containing this project

README