Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/abacaj/awesome-transformers

A curated list of awesome transformer models.
https://github.com/abacaj/awesome-transformers

List: awesome-transformers

Last synced: about 1 month ago
JSON representation

A curated list of awesome transformer models.

Awesome Lists containing this project

README

        

# Awesome Transformers

![Transformers](logo.png)

A curated list of awesome transformer models.

If you want to contribute to this list, send a pull request or reach out to me on twitter: [@abacaj](https://twitter.com/abacaj). Let's make this list useful.

There are a number of models available that are not entirely open source (non-commercial, etc), this repository should serve to also make you aware of that. Tracking the original source/company of the model will help.

I would also eventually like to add model use cases. So it is easier for others to find the right one to fine-tune.

_Format_:

- Model name: short description, usually from paper
- Model link (usually huggingface or github)
- Paper link
- Source as company or group
- Model license

## Table of Contents

- [Encoder (autoencoder) models](#encoder)
- [ALBERT](#albert)
- [BERT](#bert)
- [DistilBERT](#distilbert)
- [DeBERTaV3](#debertav3)
- [Electra](#electra)
- [RoBERTa](#roberta)
- [Decoder (autoregressive) models](#decoder)
- [BioGPT](#bio-gpt)
- [CodeGen](#codegen)
- [LLaMa](#llama)
- [GPT](#gpt)
- [GPT-2](#gpt-2)
- [GPT-J](#gpt-j)
- [GPT-NEO](#gpt-neo)
- [GPT-NEOX](#gpt-neox)
- [NeMo Megatron-GPT](#nemo)
- [OPT](#opt)
- [BLOOM](#bloom)
- [GLM](#glm)
- [YaLM](#yalm)
- [Encoder+decoder (seq2seq) models](#encoder-decoder)
- [T5](#t5)
- [FLAN-T5](#flan-t5)
- [Code-T5](#code-t5)
- [Bart](#bart)
- [Pegasus](#pegasus)
- [MT5](#mt5)
- [UL2](#ul2)
- [FLAN-UL2](#flanul2)
- [EdgeFormer](#edgeformer)
- [Multimodal models](#multimodal)
- [Donut](#donut)
- [LayoutLMv3](#layoutlmv3)
- [TrOCR](#trocr)
- [CLIP](#clip)
- [Unified-IO](#unifiedio)
- [Vision models](#vision)
- [DiT](#dit)
- [DETR](#detr)
- [EfficientFormer](#efficientformer)
- [Audio models](#audio)
- [Whisper](#whisper)
- [VALL-E](#valle)
- [Recommendation models](#recommendation)
- [P5](#p5)
- [Grounded Situation Recognition models](#gsr)
- [GSRTR](#gsrtr)
- [CoFormer](#coformer)

## Encoder models

- ALBERT: "A Lite" version of BERT
- [Model](https://huggingface.co/models?other=albert)
- [Paper](https://arxiv.org/pdf/1909.11942.pdf)
- Google
- Apache v2
- BERT: Bidirectional Encoder Representations from Transformers

- [Model](https://huggingface.co/models?other=bert)
- [Paper](https://arxiv.org/pdf/1810.04805.pdf)
- Google
- Apache v2
- DistilBERT: Distilled version of BERT smaller, faster, cheaper and lighter

- [Model](https://huggingface.co/models?other=distilbert)
- [Paper](https://arxiv.org/pdf/1910.01108.pdf)
- HuggingFace
- Apache v2
- DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

- [Model](https://huggingface.co/models?sort=downloads&search=microsoft%2Fdeberta-v3)
- [Paper](https://arxiv.org/pdf/2111.09543.pdf)
- Microsoft
- MIT
- Electra: Pre-training Text Encoders as Discriminators Rather Than Generators

- [Model](https://huggingface.co/models?other=electra)
- [Paper](https://arxiv.org/pdf/2003.10555.pdf)
- Google
- Apache v2
- RoBERTa: Robustly Optimized BERT Pretraining Approach

- [Model](https://huggingface.co/models?other=roberta)
- [Paper](https://arxiv.org/pdf/1907.11692.pdf)
- Facebook
- MIT

## Decoder models

- BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
- [Model](https://huggingface.co/microsoft/biogpt)
- [Paper](https://arxiv.org/pdf/2210.10341.pdf)
- Microsoft
- MIT
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

- [Model](https://huggingface.co/models?sort=downloads&search=salesforce%2Fcodegen)
- [Paper](https://arxiv.org/pdf/2203.13474.pdf)
- Salesforce
- BSD 3-Clause
- LLaMa: Open and Efficient Foundation Language Models

- [Model](https://github.com/facebookresearch/llama)
- [Paper](https://research.facebook.com/file/1574548786327032/LLaMA--Open-and-Efficient-Foundation-Language-Models.pdf)
- Facebook
- [Requires approval, non-commercial](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform)
- GPT: Improving Language Understanding by Generative Pre-Training

- [Model](https://huggingface.co/openai-gpt)
- [Paper](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf)
- OpenAI
- MIT
- GPT-2: Language Models are Unsupervised Multitask Learners

- [Model](https://huggingface.co/models?search=gpt-2)
- [Paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
- OpenAI
- MIT
- GPT-J: A 6 Billion Parameter Autoregressive Language Model

- [Model](https://huggingface.co/EleutherAI/gpt-j-6B)
- [Paper](https://github.com/kingoflolz/mesh-transformer-jax)
- EleutherAI
- Apache v2
- GPT-NEO: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow

- [Model](https://huggingface.co/models?search=gpt-neo)
- [Paper](https://doi.org/10.5281/zenodo.5297715)
- EleutherAI
- MIT
- GPT-NEOX-20B: An Open-Source Autoregressive Language Model

- [Model](https://huggingface.co/EleutherAI/gpt-neox-20b)
- [Paper](https://arxiv.org/pdf/2204.06745.pdf)
- EleutherAI
- Apache v2
- NeMo Megatron-GPT: Megatron-GPT 20B is a transformer-based language model.

- [Model](https://huggingface.co/nvidia/nemo-megatron-gpt-20B)
- [Paper](https://arxiv.org/pdf/1909.08053.pdf)
- NVidia
- CC BY 4.0
- OPT: Open Pre-trained Transformer Language Models

- [Model](https://huggingface.co/models?search=facebook%2Fopt)
- [Paper](https://arxiv.org/pdf/2205.01068.pdf?fbclid=IwAR1Fhxr_i3UK3ttigVDGBwbtO-3zLzjTwnyn0dkYt8rf6hxUAUS7Sk7VrYk)
- Facebook
- [Requires approval, non-commercial](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/MODEL_LICENSE.md?fbclid=IwAR2jiCf2R9fTouGGF7v8Tt7Yq8sSVOMot0YIE8ibaP9b2avxw2bEbEaTJZY)
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

- [Model](https://huggingface.co/bigscience/bloom)
- [Paper](https://arxiv.org/pdf/2211.05100.pdf)
- BigScience
- [OpenRAIL, use-based restrictions](https://huggingface.co/spaces/bigscience/license)
- GLM: An Open Bilingual Pre-Trained Model

- [Model](https://github.com/THUDM/GLM-130B)
- [Paper](https://arxiv.org/pdf/2210.02414.pdf)
- Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University
- [Custom license, see restrictions](https://github.com/THUDM/GLM-130B/blob/main/MODEL_LICENSE)
- YaLM: Pretrained language model with 100B parameters

- [Model](https://github.com/yandex/YaLM-100B)
- [Paper](https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6)
- Yandex
- Apache v2

## Encoder+decoder (seq2seq) models

- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

- [Model](https://huggingface.co/models?sort=downloads&search=t5)
- [Paper](https://arxiv.org/pdf/1910.10683.pdf)
- Google
- Apache v2
- FLAN-T5: Scaling Instruction-Finetuned Language Models

- [Model](https://huggingface.co/models?sort=downloads&search=flan-t5)
- [Paper](https://arxiv.org/pdf/2210.11416.pdf)
- Google
- Apache v2
- Code-T5: Identifier-aware Unified Pre-trained Encoder-Decoder Models
for Code Understanding and Generation

- [Model](https://huggingface.co/models?search=code-t5)
- [Paper](https://arxiv.org/pdf/2109.00859.pdf)
- Salesforce
- BSD 3-Clause
- Bart: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

- [Model](https://huggingface.co/facebook/bart-large)
- [Paper](https://arxiv.org/pdf/1910.13461.pdf)
- Facebook
- Apache v2
- Pegasus: Pre-training with Extracted Gap-sentences for Abstractive Summarization

- [Model](https://huggingface.co/models?sort=downloads&search=pegasus)
- [Paper](https://arxiv.org/pdf/1912.08777.pdf)
- Google
- Apache v2
- MT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

- [Model](https://huggingface.co/models?search=mt5)
- [Paper](https://arxiv.org/pdf/2010.11934.pdf)
- Google
- Apache v2
- UL2: Unifying Language Learning Paradigms

- [Model](https://huggingface.co/google/ul2)
- [Paper](https://arxiv.org/pdf/2205.05131v1.pdf)
- Google
- Apache v2
- FLAN-UL2: A New Open Source Flan 20B with UL2

- [Model](https://github.com/google-research/google-research/tree/master/ul2)
- [Paper](https://arxiv.org/abs/2205.05131)
- Google
- Apache v2
- EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

- [Model](https://github.com/microsoft/unilm/tree/master/edgelm)
- [Paper](https://arxiv.org/pdf/2202.07959.pdf)
- Microsoft
- MIT

## Multimodal models

- Donut: OCR-free Document Understanding Transformer
- [Model](https://huggingface.co/models?sort=downloads&search=clova%2Fdonut)
- [Paper](https://arxiv.org/pdf/2111.15664.pdf)
- ClovaAI
- MIT
- LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

- [Model](https://huggingface.co/models?sort=downloads&search=microsoft%2Flayoutlmv3)
- [Paper](https://arxiv.org/pdf/2204.08387.pdf)
- Microsoft
- CC BY-NC-SA 4.0 (non-commercial)
- TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

- [Model](https://huggingface.co/models?search=trocr)
- [Paper](https://arxiv.org/abs/2109.10282)
- Microsoft
- [Inherits MIT license](https://github.com/microsoft/unilm/tree/master/trocr#license)
- CLIP: Learning Transferable Visual Models From Natural Language Supervision

- [Model](https://huggingface.co/models?sort=downloads&search=openai%2Fclip)
- [Paper](https://arxiv.org/pdf/2103.00020.pdf)
- OpenAI
- MIT
- Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

- [Model](https://github.com/allenai/unified-io-inference)
- [Paper](https://arxiv.org/pdf/2206.08916.pdf)
- allenai
- Apache v2

## Vision models

- DiT: Self-supervised Pre-training for Document Image Transformer
- [Model](https://huggingface.co/models?search=microsoft/dit)
- [Paper](https://arxiv.org/pdf/2203.02378.pdf)
- Microsoft
- [Inherits MIT license](https://github.com/microsoft/unilm/tree/master/dit#license)
- DETR: End-to-End Object Detection with Transformers

- [Model](https://huggingface.co/models?search=facebook/detr)
- [Paper](https://arxiv.org/pdf/2005.12872.pdf)
- Facebook
- Apache v2
- EfficientFormer: Vision Transformers at MobileNet Speed

- [Model](https://huggingface.co/models?sort=downloads&search=snap-research%2Fefficientformer)
- [Paper](https://arxiv.org/pdf/2206.01191.pdf)
- Snap
- Apache v2

## Audio models

- Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
- [Model](https://huggingface.co/models?sort=downloads&search=openai%2Fwhisper)
- [Paper](https://arxiv.org/pdf/2212.04356.pdf)
- OpenAI
- MIT
- VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

- [Model (unofficial)](https://github.com/enhuiz/vall-e)
- MIT but has a dependency on a CC-BY-NC library
- [Model (unofficial)](https://github.com/lifeiteng/vall-e)
- Apache v2
- [Paper](https://arxiv.org/pdf/2301.02111.pdf)
- Microsoft

## Recommendation models

- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
- [Model](https://github.com/jeykigung/P5)
- [Paper](https://arxiv.org/abs/2203.13366)
- Rutgers
- Apache v2

## Grounded Situation Recognition models

- Grounded Situation Recognition with Transformers
- [Model](https://github.com/jhcho99/gsrtr)
- [Paper](https://arxiv.org/abs/2111.10135)
- POSTECH
- Apache v2
- Collaborative Transformers for Grounded Situation Recognition

- [Model](https://github.com/jhcho99/CoFormer)
- [Paper](https://arxiv.org/abs/2203.16518)
- POSTECH
- Apache v2