Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/antofuller/configaformers

A python library for highly configurable transformers - easing model architecture search and experimentation.
https://github.com/antofuller/configaformers

artificial-intelligence deep-learning pytorch transformers

Last synced: 2 days ago
JSON representation

A python library for highly configurable transformers - easing model architecture search and experimentation.

Host: GitHub
URL: https://github.com/antofuller/configaformers
Owner: antofuller
License: apache-2.0
Archived: true
Created: 2021-08-26T20:19:50.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2021-11-30T02:49:33.000Z (almost 3 years ago)
Last Synced: 2024-08-04T03:12:24.361Z (4 months ago)
Topics: artificial-intelligence, deep-learning, pytorch, transformers
Language: Python
Homepage:
Size: 1.76 MB
Stars: 50
Watchers: 4
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # configaformers

A python library for highly configurable transformers - easing model architecture search and experimentation.

Special thanks to lucidrains (https://github.com/lucidrains) and Kharr.

## Notable Features

The main purpose of this library is to allow users to quickly construct transformers by editing config files. We will also provide prebuilt configurations to common or promising model architectures.

Another feature is our model compiler. When a model is initialized it will print out (on your console) all modules, shapes, input and output names. It also performs shape checking which helps catch errors prior to running data through the model.

## Setup

Requirements: PyTorch and einops

```bash

git clone https://github.com/antofuller/configaformers.git

cd configaformers

```

## Usage

Quick demo that will configure a 768-wide, 12-layer transformer, with a language modeling head.

Import, and create token embedding block:

```python

from model_builder import ConfigaFormer

from prebuilt_blocks import get_transformer_block

model_dim = 768

num_heads = 12

vocab_size = 50257

# Token embedding block

emb = [{'type': 'embedding',

        'output_dim': model_dim,

        'num_classes': vocab_size}]

```

Use our prebuilt transformer block:

```python

t_block = transformer_block(num_heads=num_heads, dim=model_dim)

```

Create language modeling head:

```python

to_logits = [{'type': 'linear',

              'output_dim': vocab_size,

              'output_name': 'logits'}]

```

Create blocks, initialize input shapes, and init the model:

```python

my_blocks = [{"config": emb,

              "repeat": 1},

             {"config": t_block,

              "repeat": 12},

             {"config": to_logits,

              "repeat": 1},

             ]

input_streams = {'emb_ids': ['B', 'L_in'], 'attn_offset': ['B', num_heads, 'L_in', 'L_in'],}

model = ConfigaFormer(blocks=my_blocks, input_shapes=input_streams).cuda()

```

This will print out the transformer config:

```bash

Block #1, 1x

embedding -> Input(s): emb_ids (BSZ, L_in) - Output(s): x (BSZ, L_in, 768)

Block #2, 12x

make_stream -> Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)

norm -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)

linear -> Input(s): x (BSZ, L_in, 768) - Output(s): queries (BSZ, L_in, 768)

linear -> Input(s): x (BSZ, L_in, 768) - Output(s): keys (BSZ, L_in, 768)

linear -> Input(s): x (BSZ, L_in, 768) - Output(s): values (BSZ, L_in, 768)

make_heads -> Input(s): queries (BSZ, L_in, 768) - Output(s): queries (BSZ, 12, L_in, 64)

make_heads -> Input(s): keys (BSZ, L_in, 768) - Output(s): keys (BSZ, 12, L_in, 64)

make_heads -> Input(s): values (BSZ, L_in, 768) - Output(s): values (BSZ, 12, L_in, 64)

mha_dots -> Input(s): queries (BSZ, 12, L_in, 64), keys (BSZ, 12, L_in, 64) - Output(s): attn_dots (BSZ, 12, L_in, L_in)

merge_streams -> Input(s): attn_dots (BSZ, 12, L_in, L_in), attn_offset (B, 12, L_in, L_in) - Output(s): attn_dots (BSZ, 12, L_in, L_in)

mha_sum -> Input(s): values (BSZ, 12, L_in, 64), attn_dots (BSZ, 12, L_in, L_in) - Output(s): x (BSZ, 12, L_in, 64)

merge_heads -> Input(s): x (BSZ, 12, L_in, 64) - Output(s): x (BSZ, L_in, 768)

linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)

merge_streams -> Input(s): x (BSZ, L_in, 768), residual (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)

make_stream -> Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)

norm -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)

linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 3072)

activation -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 3072)

linear -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 768)

merge_streams -> Input(s): x (BSZ, L_in, 768), residual (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)

Block #3, 1x

linear -> Input(s): x (BSZ, L_in, 768) - Output(s): logits (BSZ, L_in, 50257)

```

Before running, we need to get the attention offset (in this case, AliBi with a causal mask):

```python

from utils import get_alibi

attn_offset = get_alibi(num_heads=12, max_length=1024)

```

Now we can use the model:

```python

# Prepare attention offset by repeating it over the batch dimension

attn_offset = attn_offset.repeat(bsz, 1, 1, 1)

input_data = {'emb_ids': batch_ids.view(bsz, 1024).cuda(),

              'attn_offset': attn_offset.cuda()}

logits = model(input_data)['logits'].view(bsz, 1024, 50257)

```

## Features on the way...

1. Revamp rearrange module

2. Product-Key memories

3. Create more prebuilt blocks

4. Improve attention offsets and masking

5. Experiment with Triton for speed-up