{"id":26504849,"url":"https://github.com/keunwoochoi/drummernet","last_synced_at":"2025-10-10T19:40:47.888Z","repository":{"id":51327791,"uuid":"190810428","full_name":"keunwoochoi/DrummerNet","owner":"keunwoochoi","description":"Supplementary material of \"Deep Unsupervised Drum Transcription\", ISMIR 2019","archived":false,"fork":false,"pushed_at":"2024-07-25T10:15:03.000Z","size":25178,"stargazers_count":128,"open_issues_count":3,"forks_count":13,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-06T13:11:14.890Z","etag":null,"topics":["deeplearning","drums","music"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1906.03697","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/keunwoochoi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-07T21:23:51.000Z","updated_at":"2025-04-05T03:44:49.000Z","dependencies_parsed_at":"2022-09-11T02:01:08.700Z","dependency_job_id":null,"html_url":"https://github.com/keunwoochoi/DrummerNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/keunwoochoi/DrummerNet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keunwoochoi%2FDrummerNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keunwoochoi%2FDrummerNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keunwoochoi%2FDrummerNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keunwoochoi%2FDrummerNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/keunwoochoi","download_url":"https://codeload.github.com/keunwoochoi/DrummerNet/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keunwoochoi%2FDrummerNet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279005033,"owners_count":26083827,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deeplearning","drums","music"],"created_at":"2025-03-20T20:45:58.844Z","updated_at":"2025-10-10T19:40:47.872Z","avatar_url":"https://github.com/keunwoochoi.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DrummerNet\n\nThis is supplementary material of \"Deep Unsupervised Drum Transcription\" by Keunwoo Choi and Kyunghyun Cho, ISMIR 2019 (Delft, Netherland). \n\n[Paper on arXiv](https://arxiv.org/abs/1906.03697) | [Blog post](https://keunwoochoi.wordpress.com/2019/06/11/drummernet-deep-unsupervised-drum-transcription/) | [Poster](https://github.com/keunwoochoi/DrummerNet/blob/master/DrummerNet-poster.pdf)  \n\n* What we provide: Pytorch implementation for the paper \n* What we do **not** provide:\n  - pre-trained model\n  - drum stems that we used for the training\n\n## Installation\n\nIf you're using conda and wanna run it DrummerNet CPU, make sure it installs mkl because we'll need its fft module.\n```bash\nconda install -c anaconda mkl\n```\nThen,\n```bash\npip install -r requirements.txt\n```\n\nUsing conda, it would be something like this, but customize it yourself!\n```bash\nconda install -c pytorch pytorch torchvision \n```\n\n`Python3` required.\n\n## Preparation\n#### Wav files for Drum Synthesizer\n * `data_drum_sources`: folder for isolated drum sources. 12 kits x 11 drum components are included.\n If you want to add more drum sources,\n \n   - Add files and update `globals.py` accordingly. \n    ```python\n    # These names are matched with file names in data_drum_sources\n    DRUM_NAMES = [\"KD_KD\", \"SD_SD\", \"HH_CHH\", \"HH_OHH\", \"HH_PHH\", \"TT_HIT\", \"TT_MHT\",\n                  \"TT_HFT\", \"CY_RDC\", \"CY_CRC\", \"OT_TMB\"]\n    N_DRUM_VSTS = 12\n    ```\n   - Note that as shown in `inst_src_sec.get_instset_drum()`, the last drum kit will be used in the test time only. \n\n#### Training files\nWe unfortunately **cannot** provide the drum-stems that we used for the trained network in the paper.\n * `/data_drumstems`: nearly blank folder, placeholder for training data. I put one wav file and `files.txt` as an minimum working example.\n * [Mark Cartwright](http://dafx2018.web.ua.pt/papers/DAFx2018_paper_60.pdf)'s and [Richard Vogl](https://arxiv.org/abs/1806.06676)'s papers/codes provide a way to synthesize large-scale drum stems   \n\n#### Evaluation files, e.g., SMT\n * It is not part of the code, you have to download/process it by yourself.\n * First, [download SMT dataset](https://www.idmt.fraunhofer.de/en/business_units/m2d/smt/drums.html) (320.7MB)\n * Unzip it. Let's call the unzipped folder PATH_UNZIP\n * Then run `$ python3 drummernet/eval_import_smt.py PATH_UNZIP`. E.g.,\n    ```bash\n   $ cd drummernet\n   $ python3 eval_import_smt.py ~/Downloads/SMT_DRUMS/\n   Processing annotations...\n   Processing audio file - copying it...\n    all done! check out if everything's fine at data_evals/SMT_DRUMS\n   ```  \n * `data_evals`: blank, placeholder for evaluation datasets\n\n## Training\n\n * If you prepared evaluation files\n```\npython3 main.py --eval false -ld spectrum --exp_name temp_exp --metrics mae\n```\n * Otherwise,\n```\npython3 main.py --eval true -ld spectrum --exp_name temp_exp --metrics mae\n```\n\nIf everything's fine, you'll see..\n```bash\n$ cd drummernet\n$ python3 main.py --eval True -ld spectrum --exp_name temp_exp --metrics mae\nAdd arguments..\nNamespace(activation='elu', batch_size=32, compare_after_hpss=False, conv_bias=False, eval=False, exp_name='temp_exp', kernel_size=3, l1_reg_lambda=0.003, learning_rate=0.0004, loss_domains=['spectrum'], metrics=['mae'], n_cqt_bins=12, n_layer_dec=6, n_layer_enc=10, n_mels=None, num_channel=50, recurrenter='three', resume=False, resume_num='', scale_r=2, source_norm='sqrsum', sparsemax_lst=64, sparsemax_type='multiply')\n| With a sampling rate of 16000 Hz,\n| the deepest encoded signal: 1 sample == 64 ms.\n| At predicting impulses, which is done at u_conv3, 1 sample == 1 ms.\n| and sparsemax_lst=64 samples at the same, at=`r` level\nn_notes: 11, n_vsts:{'KD_KD': 11, 'SD_SD': 11, 'HH_CHH': 11, 'HH_OHH': 11, 'HH_PHH': 11, 'TT_HIT': 11, 'TT_MHT': 11, 'TT_HFT': 11, 'CY_RDC': 11, 'CY_CRC': 11, 'OT_TMB': 11}\n```\nthen you'll see the model details.\n```bash\nDrummerHalfUNet(\n  (unet): ValidAutoUnet(\n    (d_conv0): Conv1d(1, 50, kernel_size=(3,), stride=(1,), bias=False)\n    (d_convs): ModuleList(\n      (0): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (1): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (2): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (3): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (4): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (5): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (6): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (7): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (8): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (9): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n    )\n    (pools): ModuleList(\n      (0): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (3): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (4): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (5): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (6): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (7): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (8): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (9): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n    )\n    (encode_conv): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n    (u_convs): ModuleList(\n      (0): Conv1d(50, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (1): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (2): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (3): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (4): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)\n      (5): Conv1d(100, 50, kernel_size=(3,), stride=(1,), bias=False)\n    )\n    (last_conv): Conv1d(100, 100, kernel_size=(3,), stride=(1,))\n  )\n  (recurrenter): Recurrenter(\n    (midi_x2h): GRU(100, 11, batch_first=True, bidirectional=True)\n    (midi_h2hh): GRU(22, 11, batch_first=True)\n    (midi_hh2y): GRU(1, 1, bias=False, batch_first=True)\n  )\n  (double_sparsemax): MultiplySparsemax(\n    (sparsemax_inst): Sparsemax()\n    (sparsemax_time): Sparsemax()\n  )\n  (zero_inserter): ZeroInserter()\n  (synthesizer): FastDrumSynthesizer()\n  (mixer): Mixer()\n)\nNUM_PARAM overall: 203869\n             unet: 195250\n      recurrenter: 8619\n       sparsemaxs: 0\n      synthesizer: 0\nUM_PARAM overall: 203869\n             unet: 195250\n      recurrenter: 8619\n       sparsemaxs: 0\n      synthesizer: 0\n```\n..as well as training details..\n```bash\nPseudoCQT init with fmin:32, 12, bins, 12 bins/oct, win_len: 16384, n_fft:16384, hop_length:64\nPseudoCQT init with fmin:65, 12, bins, 12 bins/oct, win_len: 8192, n_fft:8192, hop_length:64\nPseudoCQT init with fmin:130, 12, bins, 12 bins/oct, win_len: 4096, n_fft:4096, hop_length:64\nPseudoCQT init with fmin:261, 12, bins, 12 bins/oct, win_len: 2048, n_fft:2048, hop_length:64\nPseudoCQT init with fmin:523, 12, bins, 12 bins/oct, win_len: 1024, n_fft:1024, hop_length:64\nPseudoCQT init with fmin:1046, 12, bins, 12 bins/oct, win_len: 512, n_fft:512, hop_length:64\nPseudoCQT init with fmin:2093, 12, bins, 12 bins/oct, win_len: 256, n_fft:256, hop_length:64\nPseudoCQT init with fmin:4000, 12, bins, 12 bins/oct, win_len: 128, n_fft:128, hop_length:64\nitem check-points after this..: [128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]\ntotal 8388480 n_items to train!\n\n```\n..then the training will start..\n```bash\nc1mae:5.53 c2mae:4.39 c3mae:2.95 c4mae:3.19 c5mae:2.22 c6mae:1.90 c7mae:2.14 c8mae:2.26: 100%|███████████████████████████████████| 1/1 [00:25\u003c00:00, 25.03s/it]\n```\n\n## Troubleshooting\n### Install MKL for pytorch FFT\nIn case you face this error, \n```bash\nRuntimeError: fft: ATen not compiled with MKL support\n```\n[As stated here](https://discuss.pytorch.org/t/error-using-fft-runtimeerror-fft-aten-not-compiled-with-mkl-support/21671/2), this is an issue of MKL library installation. \nA quick solution is to use Conda. Otherwise you should install [Interl MKL](https://software.intel.com/en-us/get-started-with-mkl-for-macos) manually.\n\nIn some cases, if Pytorch was once built without MKL, it might not able to find later-installed MKL. \nYou should try to remove the cache of pip/conda. Or just make a new environment.    \n\n## Requirement detail\n\nThese are the exact versions I used for the dependency.\n```\nPython==3.7.3\nCython==0.29.6\ncython==0.29.6\nnumpy==1.16.2\nlibrosa==0.6.2\ntorch==1.0.0\ntorchvision==0.2.1\nmadmom==0.16.1\nmatplotlib==2.2.0\ntqdm==4.31.1\nmir_eval==0.5\n```\n\n## Citation\n\n```\n@inproceedings{choi2019deep,\n  title={Deep Unsupervised Drum Transcription},\n  author={Choi, Keunwoo and Cho, Kyunghyun},\n  booktitle={Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, Netherland},\n  year={2019}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeunwoochoi%2Fdrummernet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkeunwoochoi%2Fdrummernet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeunwoochoi%2Fdrummernet/lists"}