https://github.com/tech-srl/layer_norm_expressivity_role
Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)
https://github.com/tech-srl/layer_norm_expressivity_role
attention layer-normalization layernorm transformers
Last synced: about 1 year ago
JSON representation
Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)
- Host: GitHub
- URL: https://github.com/tech-srl/layer_norm_expressivity_role
- Owner: tech-srl
- Created: 2023-05-03T11:37:03.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-09-27T12:14:53.000Z (over 1 year ago)
- Last Synced: 2025-04-11T23:47:51.987Z (about 1 year ago)
- Topics: attention, layer-normalization, layernorm, transformers
- Language: Python
- Homepage:
- Size: 748 KB
- Stars: 46
- Watchers: 6
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# On the Expressivity Role of LayerNorm in Transformers' Attention
This repository contains the code for reproduce the results from "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023) [[PDF]](https://arxiv.org/pdf/2305.02582.pdf).

## Setup
Make sure you have [wandb.ai](wandb.ai) user and that you are [logged](https://docs.wandb.ai/ref/cli/wandb-login) into your machine.
Install the required python packages:
```
pip install -r requirements.txt
```
Gurobi is needed to find unselectable keys, and requires a license. See in [here](https://www.gurobi.com/academia/academic-program-and-licenses/).
## Hardware
In general, all experiments can run on either GPU or CPU.
## Code Structure
1. The `majority` subdirectory contains the files needed to reproduce the results of the Majority task (Figure 1a, 1b, 2, 3).
2. The `unselectable` subdirectory contains the files needed to reproduce the results of the unselectable experiments (Figure 1c, 1d, 4, Table 1, 2).
## Citation
[On the Expressivity Role of LayerNorm in Transformers' Attention](https://arxiv.org/pdf/2305.02582.pdf)
```
@article{brody2023expressivity,
title={On the Expressivity Role of LayerNorm in Transformers' Attention},
author={Brody, Shaked and Alon, Uri and Yahav, Eran},
journal={arXiv preprint arXiv:2305.02582},
year={2023}
}
```