https://github.com/lucidrains/conformer
Implementation of the convolutional module from the Conformer paper, for use in Transformers
https://github.com/lucidrains/conformer
artificial-intelligence audio deep-learning transformer
Last synced: about 1 year ago
JSON representation
Implementation of the convolutional module from the Conformer paper, for use in Transformers
- Host: GitHub
- URL: https://github.com/lucidrains/conformer
- Owner: lucidrains
- License: mit
- Created: 2020-07-26T18:56:01.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-05-17T21:08:21.000Z (about 3 years ago)
- Last Synced: 2025-05-16T19:02:04.962Z (about 1 year ago)
- Topics: artificial-intelligence, audio, deep-learning, transformer
- Language: Python
- Homepage:
- Size: 46.9 KB
- Stars: 397
- Watchers: 9
- Forks: 52
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

## Conformer
Implementation of the convolutional module from the Conformer paper, for improving the local inductive bias in Transformers.
## Install
```bash
$ pip install conformer
```
## Usage
The Conformer convolutional module, the main novelty of the paper
```python
import torch
from conformer import ConformerConvModule
layer = ConformerConvModule(
dim = 512,
causal = False, # auto-regressive or not - 1d conv will be made causal with padding if so
expansion_factor = 2, # what multiple of the dimension to expand for the depthwise convolution
kernel_size = 31, # kernel size, 17 - 31 was said to be optimal
dropout = 0. # dropout at the very end
)
x = torch.randn(1, 1024, 512)
x = layer(x) + x
```
1 Conformer Block
```python
import torch
from conformer import ConformerBlock
block = ConformerBlock(
dim = 512,
dim_head = 64,
heads = 8,
ff_mult = 4,
conv_expansion_factor = 2,
conv_kernel_size = 31,
attn_dropout = 0.,
ff_dropout = 0.,
conv_dropout = 0.
)
x = torch.randn(1, 1024, 512)
block(x) # (1, 1024, 512)
```
Conformer - just multiple `ConformerBlock` from above
```python
import torch
from conformer import Conformer
conformer = Conformer(
dim = 512,
depth = 12, # 12 blocks
dim_head = 64,
heads = 8,
ff_mult = 4,
conv_expansion_factor = 2,
conv_kernel_size = 31,
attn_dropout = 0.,
ff_dropout = 0.,
conv_dropout = 0.
)
x = torch.randn(1, 1024, 512)
conformer(x) # (1, 1024, 512)
```
## Todo
- [ ] switch to a better relative positional encoding. shaw's is dated
- [ ] flash attention with a better RPE
## Citations
```bibtex
@misc{gulati2020conformer,
title = {Conformer: Convolution-augmented Transformer for Speech Recognition},
author = {Anmol Gulati and James Qin and Chung-Cheng Chiu and Niki Parmar and Yu Zhang and Jiahui Yu and Wei Han and Shibo Wang and Zhengdong Zhang and Yonghui Wu and Ruoming Pang},
year = {2020},
eprint = {2005.08100},
archivePrefix = {arXiv},
primaryClass = {eess.AS}
}
```