https://github.com/berlino/seq_icl

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/berlino/seq_icl
Owner: berlino
License: apache-2.0
Created: 2023-08-10T14:50:00.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-04-16T16:03:45.000Z (about 1 year ago)
Last Synced: 2024-04-16T21:14:12.370Z (about 1 year ago)
Language: Jupyter Notebook
Size: 2.4 MB
Stars: 37
Watchers: 4
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

README

# In-context Language Learning: Architectures and Algorithms [WIP]

This repo serves for the experiments for the paper:

Title: [In-context Language Learning: Architectures and Algorithms](https://arxiv.org/abs/2401.12973)

Authors : Ekin Akyürek, Bailin Wang, Yoon Kim, Jacob Andreas

## Setup

```bash
conda create -n seq_icl python=3.11
pip install -r requirements.txt
```

## Experiments

### Experiments on DFA

To run the training,
```bash
python -m train experiment=dfa/lstm
python -m train experiment=dfa/retnet
python -m train experiment=dfa/gla
python -m train experiment=dfa/transformer+
```

### Troubleshooting

* add `export PATH=$PATH:/usr/local/sbin:/usr/sbin:/sbin` so that ldconfig can work properly
* The MHA in simple\_lm.py use `num_heads`, but in other modules we use `n_heads`. The name needs to be changed for consistency, but they're kept as is for now.
* you might need to set up conv1d following the command in [this issue](https://github.com/state-spaces/mamba/issues/55)
```
git clone https://github.com/Dao-AILab/causal-conv1d.git
cd causal_conv1d
git checkout v1.0.2 # this is the highest compatible version allowed by Mamba
CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .
```

### Acknowledgements

This repo is adapted from [safari](https://github.com/HazyResearch/safari/tree/main). Triton implementations are taken from [linear rnn](https://github.com/sustcsonglin/pytorch_linear_rnn).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/berlino/seq_icl

Awesome Lists containing this project

README