Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/moskomule/simple_transformers

Simple transformer implementations that I can understand
https://github.com/moskomule/simple_transformers

transformers

Last synced: about 2 months ago
JSON representation

Simple transformer implementations that I can understand

Host: GitHub
URL: https://github.com/moskomule/simple_transformers
Owner: moskomule
Created: 2021-02-17T13:35:25.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2021-12-28T05:15:48.000Z (about 3 years ago)
Last Synced: 2024-08-06T21:21:50.705Z (5 months ago)
Topics: transformers
Language: Python
Homepage:
Size: 119 KB
Stars: 20
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Simple Transformers

Simple transformer implementations that I can understand.

## Requirements

```commandline
conda create -n transformer python=3.9
conda activate transformer
conda install -c pytorch -c conda-forge pytorch torchvision cudatoolkit=11.1
pip install -U homura-core chika rich
# for NLP also install
pip install -U datasets tokenizers
# To use checkpointing
pip install -U fairscale
# To accelerate (probably only slightly),
pip install -U opt_einsum
```

## Examples

### Language Modeling

#### GPT

Train GPT-like models on wikitext or GigaText. Currently, Transformer blocks of improved pre LN, pre LN, and post LN are
available for comparison.

```commandline
python gpt.py [--model.block {ipre_ln, pre_ln, post_ln}] [--amp]
```

#### Bert

Work in progress

### Image Recognition

Train ImageNet classification models.

Currently, ViT, and CaiT are implemented.

```commandline
# single process training
python vit.py [--amp] [--model.ema]
# for multi-process training,
python -m torch.distributed.launch --nproc_per_node=2 vit.py ...
```

## Acknowledgement

For this project, I learned a lot from Andrej's [minGPT](https://github.com/karpathy/mingpt),
Ross's [timm](https://github.com/rwightman/pytorch-image-models), and
FAIR's [ClassyVision](https://github.com/facebookresearch/ClassyVision).