https://github.com/shreyansh26/attention-mask-patterns
Using FlexAttention to compute attention with different masking patterns
https://github.com/shreyansh26/attention-mask-patterns
attention flex-attention
Last synced: 6 months ago
JSON representation
Using FlexAttention to compute attention with different masking patterns
- Host: GitHub
- URL: https://github.com/shreyansh26/attention-mask-patterns
- Owner: shreyansh26
- Created: 2024-09-07T20:25:57.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-22T15:26:42.000Z (about 1 year ago)
- Last Synced: 2025-03-24T18:52:32.287Z (7 months ago)
- Topics: attention, flex-attention
- Language: Python
- Homepage:
- Size: 4.59 MB
- Stars: 42
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Attention Mask Patterns
Using FlexAttention to compute attention with different masking patterns.
The speedup over F.sdpa/xFormers and FA2 tends to increase with increasing sequence length. Timing plots are shown for different sequence lengths. It is mentioned in the title of the plot.
### Causal mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### Causal sliding window mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### Bidirectional sliding window mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### Bidirectional dilated sliding window mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### Bidirectional global + local sliding window attention mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### PrefixLM mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### Multi-document bidirectional mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### Multi-document causal mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### Multi-document prefixLM mask
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ### Stand-alone Self-Attention mask
(Reference - [attention-gym repo](https://github.com/pytorch-labs/attention-gym/blob/75867424a1d4391bff49527029d3612a09dd67e2/examples/flex_attn.ipynb))
Mask | Execution Time
:-------------------------:|:-------------------------:
 | ## Requirements
* Pytorch Nightly (for FlexAttention, to be released with Pytorch 2.5)
* Refer `requirements.txt` for other requirements