Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/lucidrains/BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
https://github.com/lucidrains/BS-RoFormer

artificial-intelligence attention-mechanisms deep-learning music-source-separation transformers

Last synced: 3 months ago
JSON representation

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

Host: GitHub
URL: https://github.com/lucidrains/BS-RoFormer
Owner: lucidrains
License: mit
Created: 2023-09-09T17:48:36.000Z (10 months ago)
Default Branch: main
Last Pushed: 2023-12-12T16:31:08.000Z (7 months ago)
Last Synced: 2024-03-23T04:22:06.162Z (4 months ago)
Topics: artificial-intelligence, attention-mechanisms, deep-learning, music-source-separation, transformers
Language: Python
Homepage:
Size: 217 KB
Stars: 236
Watchers: 8
Forks: 10
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

my-awesome-stars - lucidrains/BS-RoFormer - Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs (Python)

README

        

## BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs. They beat the previous first place by a large margin. The technique uses axial attention across frequency (hence multi-band) and time. They also have experiments to show that rotary positional encoding led to a huge improvement over learned absolute positions.

It also includes support for stereo training and outputting multiple stems.

Please join  if you are interested in replicating a SOTA music source separator out in the open

## Appreciation

- StabilityAI and 🤗 Huggingface for the generous sponsorship, as well as my other sponsors, for affording me the independence to open source artificial intelligence.

- Roee and Fabian-Robert for sharing their audio expertise and fixing audio hyperparameters

- @chenht2010 and Roman for working out the default band splitting hyperparameter!

- Max Prod for reporting a big bug with Mel-Band Roformer with stereo training!

- Roman for successfully training the model and open sourcing his training code and weights at this repository!

- Christopher for fixing an issue with multiple stems in Mel-Band Roformer

- Iver Jordal for identifying that the default stft window function is not correct

## Install

```bash

$ pip install BS-RoFormer

```

## Usage

```python

import torch

from bs_roformer import BSRoformer

model = BSRoformer(

    dim = 512,

    depth = 12,

    time_transformer_depth = 1,

    freq_transformer_depth = 1

)

x = torch.randn(2, 352800)

target = torch.randn(2, 352800)

loss = model(x, target = target)

loss.backward()

# after much training

out = model(x)

```

To use the Mel-Band Roformer proposed in a recent follow up paper, simply import `MelBandRoformer` instead

```python

import torch

from bs_roformer import MelBandRoformer

model = MelBandRoformer(

    dim = 32,

    depth = 1,

    time_transformer_depth = 1,

    freq_transformer_depth = 1

)

x = torch.randn(2, 352800)

target = torch.randn(2, 352800)

loss = model(x, target = target)

loss.backward()

# after much training

out = model(x)

```

## Todo

- [x] get the multiscale stft loss in there

- [x] figure out what `n_fft` should be

- [x] review band split + mask estimation modules

## Citations

```bibtex

@inproceedings{Lu2023MusicSS,

    title   = {Music Source Separation with Band-Split RoPE Transformer},

    author  = {Wei-Tsung Lu and Ju-Chiang Wang and Qiuqiang Kong and Yun-Ning Hung},

    year    = {2023},

    url     = {https://api.semanticscholar.org/CorpusID:261556702}

}

```

```bibtex

@inproceedings{Wang2023MelBandRF,

    title   = {Mel-Band RoFormer for Music Source Separation},

    author  = {Ju-Chiang Wang and Wei-Tsung Lu and Minz Won},

    year    = {2023},

    url     = {https://api.semanticscholar.org/CorpusID:263608675}

}

```

```bibtex

@misc{ho2019axial,

    title  = {Axial Attention in Multidimensional Transformers},

    author = {Jonathan Ho and Nal Kalchbrenner and Dirk Weissenborn and Tim Salimans},

    year   = {2019},

    archivePrefix = {arXiv}

}

```

```bibtex

@misc{su2021roformer,

    title   = {RoFormer: Enhanced Transformer with Rotary Position Embedding},

    author  = {Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},

    year    = {2021},

    eprint  = {2104.09864},

    archivePrefix = {arXiv},

    primaryClass = {cs.CL}

}

```

```bibtex

@inproceedings{dao2022flashattention,

    title   = {Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},

    author  = {Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},

    booktitle = {Advances in Neural Information Processing Systems},

    year    = {2022}

}

```

```bibtex

@article{Bondarenko2023QuantizableTR,

    title   = {Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing},

    author  = {Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort},

    journal = {ArXiv},

    year    = {2023},

    volume  = {abs/2306.12929},

    url     = {https://api.semanticscholar.org/CorpusID:259224568}

}

```

```bibtex

@inproceedings{ElNouby2021XCiTCI,

    title   = {XCiT: Cross-Covariance Image Transformers},

    author  = {Alaaeldin El-Nouby and Hugo Touvron and Mathilde Caron and Piotr Bojanowski and Matthijs Douze and Armand Joulin and Ivan Laptev and Natalia Neverova and Gabriel Synnaeve and Jakob Verbeek and Herv{\'e} J{\'e}gou},

    booktitle = {Neural Information Processing Systems},

    year    = {2021},

    url     = {https://api.semanticscholar.org/CorpusID:235458262}

}

```