https://github.com/lucidrains/bidirectional-cross-attention
A simple cross attention that updates both the source and target in one step
https://github.com/lucidrains/bidirectional-cross-attention
artificial-intelligence attention-mechanism deep-learning
Last synced: about 1 year ago
JSON representation
A simple cross attention that updates both the source and target in one step
- Host: GitHub
- URL: https://github.com/lucidrains/bidirectional-cross-attention
- Owner: lucidrains
- License: mit
- Created: 2022-03-27T16:20:53.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-05-07T14:40:35.000Z (about 2 years ago)
- Last Synced: 2025-03-31T15:20:04.664Z (about 1 year ago)
- Topics: artificial-intelligence, attention-mechanism, deep-learning
- Language: Python
- Homepage:
- Size: 12.7 KB
- Stars: 166
- Watchers: 4
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Bidirectional Cross Attention
A simple cross attention that updates both the source and target in one step. The key insight is that one can do shared query / key attention and use the attention matrix twice to update both ways. Used for a contracting project for predicting DNA / protein binding here.
## Install
```bash
$ pip install bidirectional-cross-attention
```
## Usage
```python
import torch
from bidirectional_cross_attention import BidirectionalCrossAttention
video = torch.randn(1, 4096, 512)
audio = torch.randn(1, 8192, 386)
video_mask = torch.ones((1, 4096)).bool()
audio_mask = torch.ones((1, 8192)).bool()
joint_cross_attn = BidirectionalCrossAttention(
dim = 512,
heads = 8,
dim_head = 64,
context_dim = 386
)
video_out, audio_out = joint_cross_attn(
video,
audio,
mask = video_mask,
context_mask = audio_mask
)
# attended output should have the same shape as input
assert video_out.shape == video.shape
assert audio_out.shape == audio.shape
```
## Todo
- [ ] allow for cosine sim attention
## Citations
```bibtex
@article{Hiller2024PerceivingLS,
title = {Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers},
author = {Markus Hiller and Krista A. Ehinger and Tom Drummond},
journal = {ArXiv},
year = {2024},
volume = {abs/2402.12138},
url = {https://api.semanticscholar.org/CorpusID:267751060}
}
```