https://github.com/keunwoochoi/drummernet

Supplementary material of "Deep Unsupervised Drum Transcription", ISMIR 2019
https://github.com/keunwoochoi/drummernet
deeplearning drums music
Last synced: about 1 month ago
JSON representation
Supplementary material of "Deep Unsupervised Drum Transcription", ISMIR 2019
Host: GitHub
URL: https://github.com/keunwoochoi/drummernet
Owner: keunwoochoi
Created: 2019-06-07T21:23:51.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2019-10-08T18:27:10.000Z (over 5 years ago)
Last Synced: 2023-10-20T20:56:32.382Z (over 1 year ago)
Topics: deeplearning, drums, music
Language: TeX
Homepage: https://arxiv.org/abs/1906.03697
Size: 24 MB
Stars: 109
Watchers: 3
Forks: 10
Open Issues: 2
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # DrummerNet

This is supplementary material of "Deep Unsupervised Drum Transcription" by Keunwoo Choi and Kyunghyun Cho, ISMIR 2019 (Delft, Netherland). 

[Paper on arXiv](https://arxiv.org/abs/1906.03697) | [Blog post](https://keunwoochoi.wordpress.com/2019/06/11/drummernet-deep-unsupervised-drum-transcription/) | [Poster](https://github.com/keunwoochoi/DrummerNet/blob/master/DrummerNet-poster.pdf)  

* What we provide: Pytorch implementation for the paper 

* What we do **not** provide:

  - pre-trained model

  - drum stems that we used for the training

## Installation

If you're using conda and wanna run it DrummerNet CPU, make sure it installs mkl because we'll need its fft module.

```bash

conda install -c anaconda mkl

```

Then,

```bash

pip install -r requirements.txt

```

Using conda, it would be something like this, but customize it yourself!

```bash

conda install -c pytorch pytorch torchvision 

```

`Python3` required.

## Preparation

#### Wav files for Drum Synthesizer

 * `data_drum_sources`: folder for isolated drum sources. 12 kits x 11 drum components are included.

 If you want to add more drum sources,

 

   - Add files and update `globals.py` accordingly. 

    ```python

    # These names are matched with file names in data_drum_sources

    DRUM_NAMES = ["KD_KD", "SD_SD", "HH_CHH", "HH_OHH", "HH_PHH", "TT_HIT", "TT_MHT",

                  "TT_HFT", "CY_RDC", "CY_CRC", "OT_TMB"]

    N_DRUM_VSTS = 12

    ```

   - Note that as shown in `inst_src_sec.get_instset_drum()`, the last drum kit will be used in the test time only. 

#### Training files

We unfortunately **cannot** provide the drum-stems that we used for the trained network in the paper.

 * `/data_drumstems`: nearly blank folder, placeholder for training data. I put one wav file and `files.txt` as an minimum working example.

 * [Mark Cartwright](http://dafx2018.web.ua.pt/papers/DAFx2018_paper_60.pdf)'s and [Richard Vogl](https://arxiv.org/abs/1806.06676)'s papers/codes provide a way to synthesize large-scale drum stems   

#### Evaluation files, e.g., SMT

 * It is not part of the code, you have to download/process it by yourself.

 * First, [download SMT dataset](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/drums.html) (320.7MB)

 * Unzip it. Let's call the unzipped folder PATH_UNZIP

 * Then run `$ python3 drummernet/eval_import_smt.py PATH_UNZIP`. E.g.,

    ```bash

   $ cd drummernet

   $ python3 eval_import_smt.py ~/Downloads/SMT_DRUMS/

   Processing annotations...

   Processing audio file - copying it...

    all done! check out if everything's fine at data_evals/SMT_DRUMS

   ```  

 * `data_evals`: blank, placeholder for evaluation datasets

## Training

 * If you prepared evaluation files

```

python3 main.py --eval false -ld spectrum --exp_name temp_exp --metrics mae

```

 * Otherwise,

```

python3 main.py --eval true -ld spectrum --exp_name temp_exp --metrics mae

```

If everything's fine, you'll see..

```bash

$ cd drummernet

$ python3 main.py --eval True -ld spectrum --exp_name temp_exp --metrics mae

Add arguments..

Namespace(activation='elu', batch_size=32, compare_after_hpss=False, conv_bias=False, eval=False, exp_name='temp_exp', kernel_size=3, l1_reg_lambda=0.003, learning_rate=0.0004, loss_domains=['spectrum'], metrics=['mae'], n_cqt_bins=12, n_layer_dec=6, n_layer_enc=10, n_mels=None, num_channel=50, recurrenter='three', resume=False, resume_num='', scale_r=2, source_norm='sqrsum', sparsemax_lst=64, sparsemax_type='multiply')

| With a sampling rate of 16000 Hz,

| the deepest encoded signal: 1 sample == 64 ms.

| At predicting impulses, which is done at u_conv3, 1 sample == 1 ms.

| and sparsemax_lst=64 samples at the same, at=`r` level

n_notes: 11, n_vsts:{'KD_KD': 11, 'SD_SD': 11, 'HH_CHH': 11, 'HH_OHH': 11, 'HH_PHH': 11, 'TT_HIT': 11, 'TT_MHT': 11, 'TT_HFT': 11, 'CY_RDC': 11, 'CY_CRC': 11, 'OT_TMB': 11}

```

then you'll see the model details.

```bash

DrummerHalfUNet(

  (unet): ValidAutoUnet(

    (d_conv0): Conv1d(1, 50, kernel_size=(3,), stride=(1,), bias=False)

    (d_convs): ModuleList(

      (0): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (1): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (2): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (3): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (4): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (5): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (6): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (7): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (8): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (9): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

    )

    (pools): ModuleList(

      (0): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (3): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (4): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (5): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (6): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (7): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (8): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

      (9): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

    )

    (encode_conv): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

    (u_convs): ModuleList(

      (0): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)

      (1): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)

      (2): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)

      (3): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)

      (4): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)

      (5): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)

    )

    (last_conv): Conv1d(100, 100, kernel_size=(3,), stride=(1,))

  )

  (recurrenter): Recurrenter(

    (midi_x2h): GRU(100, 11, batch_first=True, bidirectional=True)

    (midi_h2hh): GRU(22, 11, batch_first=True)

    (midi_hh2y): GRU(1, 1, bias=False, batch_first=True)

  )

  (double_sparsemax): MultiplySparsemax(

    (sparsemax_inst): Sparsemax()

    (sparsemax_time): Sparsemax()

  )

  (zero_inserter): ZeroInserter()

  (synthesizer): FastDrumSynthesizer()

  (mixer): Mixer()

)

NUM_PARAM overall: 203869

             unet: 195250

      recurrenter: 8619

       sparsemaxs: 0

      synthesizer: 0

UM_PARAM overall: 203869

             unet: 195250

      recurrenter: 8619

       sparsemaxs: 0

      synthesizer: 0

```

..as well as training details..

```bash

PseudoCQT init with fmin:32, 12, bins, 12 bins/oct, win_len: 16384, n_fft:16384, hop_length:64

PseudoCQT init with fmin:65, 12, bins, 12 bins/oct, win_len: 8192, n_fft:8192, hop_length:64

PseudoCQT init with fmin:130, 12, bins, 12 bins/oct, win_len: 4096, n_fft:4096, hop_length:64

PseudoCQT init with fmin:261, 12, bins, 12 bins/oct, win_len: 2048, n_fft:2048, hop_length:64

PseudoCQT init with fmin:523, 12, bins, 12 bins/oct, win_len: 1024, n_fft:1024, hop_length:64

PseudoCQT init with fmin:1046, 12, bins, 12 bins/oct, win_len: 512, n_fft:512, hop_length:64

PseudoCQT init with fmin:2093, 12, bins, 12 bins/oct, win_len: 256, n_fft:256, hop_length:64

PseudoCQT init with fmin:4000, 12, bins, 12 bins/oct, win_len: 128, n_fft:128, hop_length:64

item check-points after this..: [128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]

total 8388480 n_items to train!

```

..then the training will start..

```bash

c1mae:5.53 c2mae:4.39 c3mae:2.95 c4mae:3.19 c5mae:2.22 c6mae:1.90 c7mae:2.14 c8mae:2.26: 100%|███████████████████████████████████| 1/1 [00:25<00:00, 25.03s/it]

```

## Troubleshooting

### Install MKL for pytorch FFT

In case you face this error, 

```bash

RuntimeError: fft: ATen not compiled with MKL support

```

[As stated here](https://discuss.pytorch.org/t/error-using-fft-runtimeerror-fft-aten-not-compiled-with-mkl-support/21671/2), this is an issue of MKL library installation. 

A quick solution is to use Conda. Otherwise you should install [Interl MKL](https://software.intel.com/en-us/get-started-with-mkl-for-macos) manually.

In some cases, if Pytorch was once built without MKL, it might not able to find later-installed MKL. 

You should try to remove the cache of pip/conda. Or just make a new environment.    

## Requirement detail

These are the exact versions I used for the dependency.

```

Python==3.7.3

Cython==0.29.6

cython==0.29.6

numpy==1.16.2

librosa==0.6.2

torch==1.0.0

torchvision==0.2.1

madmom==0.16.1

matplotlib==2.2.0

tqdm==4.31.1

mir_eval==0.5

```

## Citation

```

@inproceedings{choi2019deep,

  title={Deep Unsupervised Drum Transcription},

  author={Choi, Keunwoo and Cho, Kyunghyun},

  booktitle={Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, Netherland},

  year={2019}

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/keunwoochoi/drummernet

Awesome Lists containing this project

README