{"id":29026122,"url":"https://github.com/mtg/deepconvsep","last_synced_at":"2025-06-26T05:08:37.992Z","repository":{"id":79252679,"uuid":"74496817","full_name":"MTG/DeepConvSep","owner":"MTG","description":"Deep Convolutional Neural Networks for Musical Source Separation ","archived":false,"fork":false,"pushed_at":"2020-01-31T13:45:23.000Z","size":38043,"stargazers_count":465,"open_issues_count":3,"forks_count":109,"subscribers_count":34,"default_branch":"master","last_synced_at":"2024-04-15T00:14:59.624Z","etag":null,"topics":["audio-synthesis","convolutional-neural-networks","data-augmentation","data-generation","deep-learning","sample-querying","score-synthesis","signal-processing","source-separation","theano"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MTG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-11-22T17:24:10.000Z","updated_at":"2024-04-15T00:14:59.624Z","dependencies_parsed_at":null,"dependency_job_id":"bd4ca8d7-28b8-4994-ac80-f98a7bd9bb63","html_url":"https://github.com/MTG/DeepConvSep","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MTG/DeepConvSep","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2FDeepConvSep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2FDeepConvSep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2FDeepConvSep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2FDeepConvSep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MTG","download_url":"https://codeload.github.com/MTG/DeepConvSep/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MTG%2FDeepConvSep/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262003992,"owners_count":23243358,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-synthesis","convolutional-neural-networks","data-augmentation","data-generation","deep-learning","sample-querying","score-synthesis","signal-processing","source-separation","theano"],"created_at":"2025-06-26T05:08:30.997Z","updated_at":"2025-06-26T05:08:37.976Z","avatar_url":"https://github.com/MTG.png","language":"Python","readme":"# DeepConvSep\nDeep Convolutional Neural Networks for Musical Source Separation\n\nThis repository contains classes for data generation and preprocessing and feature computation, useful in training neural networks with large datasets that do not fit into memory. Additionally, you can find classes to query samples of instrument sounds from \u003ca href=\"https://staff.aist.go.jp/m.goto/RWC-MDB/\"\u003eRWC instrument sound dataset\u003c/a\u003e.\n\nIn the 'examples' folder you can find use cases for the classes above for the case of music source separation. We provide code for feature computation (STFT) and for training convolutional neural networks for music source separation: singing voice source separation with the dataset iKala dataset, for voice, bass, drums separation with DSD100 dataset, for bassoon, clarinet, saxophone, violin with \u003ca href=\"http://music.cs.northwestern.edu/data/Bach10.html\"\u003eBach10 dataset\u003c/a\u003e. The later is a good example for training a neural network with instrument samples from the RWC instrument sound database \u003ca href=\"https://staff.aist.go.jp/m.goto/RWC-MDB/\"\u003eRWC instrument sound dataset\u003c/a\u003e, when the original score is available.\n\nIn the 'evaluation' folder you can find matlab code to evaluate the quality of separation, based on \u003ca href=\"http://bass-db.gforge.inria.fr/bss_eval/\"\u003eBSS eval\u003c/a\u003e.\n\nFor training neural networks we use \u003ca href=\"http://lasagne.readthedocs.io/\"\u003eLasagne\u003c/a\u003e and \u003ca href=\"http://deeplearning.net/software/theano/\"\u003eTheano\u003c/a\u003e.\n\nWe provide code for separation using already trained models for different tasks.\n\nSeparate music into vocals, bass, drums, accompaniment in examples/dsd100/separate_dsd.py :\n\n    python separate_dsd.py -i \u003cinputfile\u003e -o \u003coutputdir\u003e -m \u003cpath_to_model.pkl\u003e\n\nwhere :\n- \\\u003cinputfile\\\u003e is the wav file to separate\n- \\\u003coutputdir\\\u003e is the output directory where to write the separation\n- \\\u003cpath_to_model.pkl\\\u003e is the local path to the .pkl file you can download from \u003ca href=\"https://drive.google.com/open?id=0B-Th_dYuM4nOb281azdKc2tWbFk\"\u003ethis address\u003c/a\u003e\n\nSinging voice source separation in examples/ikala/separate_ikala.py :\n\n    python separate_ikala.py -i \u003cinputfile\u003e -o \u003coutputdir\u003e -m \u003cpath_to_model.pkl\u003e\n\nwhere :\n- \\\u003cinputfile\\\u003e is the wav file to separate\n- \\\u003coutputdir\\\u003e is the output directory where to write the separation\n- \\\u003cpath_to_model.pkl\\\u003e is the local path to the .pkl file you can download from \u003ca href=\"https://drive.google.com/open?id=0B-Th_dYuM4nOYlRxQTl3eDBxQTg\"\u003ethis address\u003c/a\u003e\n\nSeparate Bach chorales from the Bach10 dataset into bassoon, clarinet, saxophone, violin in examples/bach10/separate_bach10.py :\n\n    python separate_bach10.py -i \u003cinputfile\u003e -o \u003coutputdir\u003e -m \u003cpath_to_model.pkl\u003e\n\nwhere :\n- \\\u003cinputfile\\\u003e is the wav file to separate\n- \\\u003coutputdir\\\u003e is the output directory where to write the separation\n- \\\u003cpath_to_model.pkl\\\u003e is the local path to the .pkl file you can download from \u003ca href=\"https://drive.google.com/open?id=0B-Th_dYuM4nOa3ZMSmhwRkwzaGM\"\u003ethis address\u003c/a\u003e\n\nScore-informed separation of Bach chorales from the Bach10 dataset into bassoon, clarinet, saxophone, violin in examples/bach10_scoreinformed/separate_bach10.py:\n\npython separate_bach10.py -i \u003cinputfile\u003e -o \u003coutputdir\u003e -m \u003cpath_to_model.pkl\u003e\n\nwhere :\n- \\\u003cinputfile\\\u003e is the wav file to separate\n- \\\u003coutputdir\\\u003e is the output directory where to write the separation\n- \\\u003cpath_to_model.pkl\\\u003e is the local path to the .pkl file you can download from \u003ca href=\"https://zenodo.org/record/1009144\"\u003ezenodo\u003c/a\u003e\n\nThe folder with the \\\u003cinputfile\\\u003e must contain the scores: 'bassoon_b.txt','clarinet_b.txt','saxophone_b.txt','violin_b.txt'. The score file as a note on each line with the format: note_onset_time,note_offset_time,note_name .\n\n\n# Feature computation\nCompute the features for a given set of audio signals extending the \"Transform\" class in transform.py\n\nFor instance the TransformFFT class helps computing the STFT of an audio signal and saves the magnitude spectrogram as a binary file.\n\nExamples\n\n    ### 1. Computing the STFT of a matrix of signals \\\"audio\\\" and writing the STFT data in \\\"path\\\" (except the phase)\n    tt1=transformFFT(frameSize=2048, hopSize=512, sampleRate=44100)\n    tt1.compute_transform(audio,out_path=path, phase=False)\n\n    ### 2. Computing the STFT of a single signal \\\"audio\\\" and returning the magnitude and phase\n    tt1=transformFFT(frameSize=2048, hopSize=512, sampleRate=44100)\n    mag,ph = tt1.compute_file(audio,phase=True)\n\n    ### 3. Computing the inverse STFT using the magnitude and phase and returning the audio data\n    #we use the tt1 from 2.\n    audio = tt1.compute_inverse(mag,phase)\n\n\n# Data preprocessing\nLoad features which have been computed with transform.py, and yield batches necessary for training neural networks. These classes are useful when the data does not fit into memory, and the batches can be loaded in chunks.\n\nExample\n\n    ### Load binary training data from the out_path folder\n    train = LargeDataset(path_transform_in=out_path, batch_size=32, batch_memory=200, time_context=30, overlap=20, nprocs=7)\n\n# Audio sample querying using RWC database\nThe \u003ca href=\"https://staff.aist.go.jp/m.goto/RWC-MDB/\"\u003eRWC instrument sound dataset\u003c/a\u003e contains samples played by various musicians in various styles and dynamics, comprising different instruments.\nYou can obtain a sample for a given midi note, instrument, style, dynamics and musician(1,2,3) by using the classes in 'rwc.py'.\n\nExample\n\n    ### construct lists for the desired dynamics,styles,musician and instrument codes\n    allowed_styles = ['NO']\n    allowed_dynamics = ['F','M','P']\n    allowed_case = [1,2,3]\n    instrument_nums=[30,31,27,15] #bassoon,clarinet,saxophone,violin\n    instruments = []\n    for ins in range(len(instrument_nums)):\n        #for each instrument construct an Instrument object\n        instruments.append(rwc.Instrument(rwc_path,instrument_nums[ins],allowed_styles,allowed_case,allowed_dynamics))\n\n    #then, for a given instrument 'i' and midi note 'm', dynamics 'd', style 's', musician 'n'\n    note = self.instruments[i].getNote(melNotes[m],d,s,n)\n    #get the audio vector for the note\n    audio = note.getAudio()\n\n# Data generation\nBach10 experiments offer examples of data generation (or augmentation). Starting from the score or from existing pieces, we can augment the existing data or generate new data with some desired factors.\nFor instance if you have four factors time_shifts,intensity_shifts,style_shifts,timbre_shifts, you can generate the possible combinations between them for a set of pieces and instruments(sources).\n\n    #create the product of these factors\n    cc=[(time_shifts[i], intensity_shifts[j], style_shifts[l], timbre_shifts[k]) for i in xrange(len(time_shifts)) for j in xrange(len(intensity_shifts)) for l in xrange(len(style_shifts)) for k in xrange(len(timbre_shifts))]\n\n    #create combinations for each of the instruments (sources)\n    if len(cc)\u003clen(sources):\n        combo1 = list(it.product(cc,repeat=len(sources)))\n        combo = []\n        for i in range(len(combo1)):\n          c = np.array(combo1[i])\n          #if (all(x == c[0,0] for x in c[:,0]) or all(x == c[0,1] for x in c[:,1])) \\\n          if (len(intensity_shifts)==1 and not(all(x == c[0,0] for x in c[:,0]))) \\\n            or (len(time_shifts)==1 and not(all(x == c[0,1] for x in c[:,1]))):\n              combo.append(c)\n        combo = np.array(combo)\n    else:\n        combo = np.array(list(it.permutations(cc,len(sources))))\n    if len(combo)==0:\n        combo = np.array([[[time_shifts[0],intensity_shifts[0],style_shifts[0],timbre_shifts[0]] for s in sources]])\n\n    #if there are too many combination, you can just randomly sample\n    if sample_size\u003clen(combo):\n        sampled_combo = combo[np.random.choice(len(combo),size=sample_size, replace=False)]\n    else:\n        sampled_combo = combo\n\n# References\nMore details on the separation method can be found in the following article:\n\nP. Chandna, M. Miron, J. Janer, and E. Gomez,\n“Monoaural audio source separation using deep convolutional neural networks”\nInternational Conference on Latent Variable Analysis and Signal Separation, 2017.\n\u003ca href=\"http://mtg.upf.edu/node/3680\"\u003ePDF\u003c/a\u003e\n\nM. Miron, J. Janer, and E. Gomez,\n\"Generating data to train convolutional neural networks for low latency classical music source separation\"\nSound and Music Computing Conference 2017\n\nM. Miron, J. Janer, and E. Gomez,\n\"Monaural score-informed source separation for classical music using convolutional neural networks\"\nISMIR Conference 2017\n\n\n# Dependencies\npython 2.7\n\nclimate, numpy, scipy, cPickle, theano, lasagne\n\nThe dependencies can be installed with pip:\n\n    pip install numpy scipy pickle cPickle climate theano\n    pip install https://github.com/Lasagne/Lasagne/archive/master.zip\n\n# Separating classical music mixtures with Bach10 dataset\nWe separate bassoon,clarinet,saxophone,violing using \u003ca href=\"http://music.cs.northwestern.edu/data/Bach10.html\"\u003eBach10 dataset\u003c/a\u003e, which comprises 10 Bach chorales. Our approach consists in synthesing the original scores considering different timbres, dynamics, playing styles, and local timing deviations to train a more robust model for classical music separation.\n\nWe have three experiments:\n\n-Oracle: train with the original pieces (obviously overfitting, hence this is the \"Oracle\");\n\n-Sibelius: train with the pieces sythesized with Sibelius software;\n\n-RWC: train with the pieces synthesized using the samples in \u003ca href=\"https://staff.aist.go.jp/m.goto/RWC-MDB/\"\u003eRWC instrument sound dataset\u003c/a\u003e.\n\nThe code for feature computation and training the network can be found in \"examples/bach10\" folder.\n\n# Score-informed separation of classical music mixtures with Bach10 dataset\nWe separate bassoon,clarinet,saxophone,violing using \u003ca href=\"http://music.cs.northwestern.edu/data/Bach10.html\"\u003eBach10 dataset\u003c/a\u003e, which comprises 10 Bach chorales and the associated score.\n\nWe generate training data with the approach mentioned above using the RWC database. Consequently, we train with the pieces synthesized using the samples in \u003ca href=\"https://staff.aist.go.jp/m.goto/RWC-MDB/\"\u003eRWC instrument sound dataset\u003c/a\u003e.\n\nThe score is given in .txt files containing the name of the of the instrument and an additional suffix, e.g. 'bassoon_g.txt'. The format for a note in the text file is: onset, offset, midinotename , as the following example: 6.1600,6.7000,F4# .\n\nThe code for feature computation and training the network can be found in \"examples/bach10_sourceseparation\" folder.\n\n\n# Separating Professionally Produced Music\nWe separate voice, bass, drums and accompaniment using DSD100 dataset comprising professionally produced music. For more details about the challenge, please refer to \u003ca href=\"http://www.sisec17.audiolabs-erlangen.de\"\u003eSiSEC MUS\u003c/a\u003e challenge and \u003ca href=\"https://sisec.inria.fr/home/2016-professionally-produced-music-recordings/\"\u003eDSD100\u003c/a\u003e dataset.\n\nThe code for feature computation and training the network can be found in \"examples/dsd100\" folder.\n\n# iKala - Singing voice separation\nWe separate voice and accompaniment using the iKala dataset. For more details about the challenge, please refer to \u003ca href=\"http://www.music-ir.org/mirex/wiki/2016:Singing_Voice_Separation_Results\"\u003eMIREX Singing voice separation 2016\u003c/a\u003e and \u003ca href=\"http://mac.citi.sinica.edu.tw/ikala/\"\u003eiKala\u003c/a\u003e dataset.\n\nThe code for feature computation and training the network can be found in \"examples/ikala\" folder.\n\n# Training models\n\nFor Bach10 dataset :\n\n    #train with the original dataset\n    python -m examples.bach10.compute_features_bach10 --db '/path/to/Bach10/'\n    #train with the the synthetic dataset generated with Sibelius\n    python -m examples.bach10.compute_features_bach10sibelius --db '/path/to/Bach10Sibelius/'\n    #train with the rwc dataset\n    python -m examples.bach10.compute_features_bach10rwc --db '/path/to/Bach10Sibelius/' --rwc '/path/to/rwc/'\n    ### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration\n    THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.bach10.trainCNNrwc --db '/path/to/Bach10/' --dbs '/path/to/Bach10Sibelius/' --output '/output/path/'\n    THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.bach10.trainCNNSibelius --db '/path/to/Bach10/' --dbs '/path/to/Bach10Sibelius/' --output '/output/path/'\n\nFor iKala :\n\n    python -m examples.ikala.compute_features --db '/path/to/iKala/'\n    ### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration\n    THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.ikala.trainCNN --db '/path/to/iKala/'\n\nFor SiSEC MUS using DSD100 dataset :\n\n    python -m examples.dsd100.compute_features --db '/path/to/DSD100/'\n    ### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration\n    THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.dsd100.trainCNN --db '/path/to/DSD100/'\n\n\n# Evaluation\n\nThe metrics are computed with bsseval images v3.0, as described \u003ca href=\"http://bass-db.gforge.inria.fr/bss_eval/\"\u003ehere\u003c/a\u003e.\n\nThe evaluation scripts can be found in the subfolder \"evaluation\".\nThe subfolder \"script_cluster\" contains scripts to run the evaluation script in parallel on a HPC cluster system.\n\nFor Bach10, you need to run the script Bach10_eval_only.m for each method in the 'base_estimates_directory' folder and for the 10 pieces. To evaluate the separation of the \u003ca href=\"https://zenodo.org/record/321361#.WNFhKt-i7J8\"\u003eBach10 Sibeliust dataset\u003c/a\u003e, use the 'Bach10_eval_only_original.m' script. Be careful not to mix the estimation directories for the two datasets.\n\nFor iKala, you need to run the script evaluate_SS_iKala.m for each of the 252 files in the dataset.\nThe script takes as parameters the id of the file, the path to the dataset, and the method of separation, which needs to be a directory containing the separation results, stored in 'output' folder.\n\n    for id=1:252\n        evaluate_SS_iKala(id,'/homedtic/mmiron/data/iKala/','fft_1024');\n    end\n\nFor SiSEC-MUS/DSD100, use the scripts at the \u003ca href=\"https://github.com/faroit/dsd100mat\"\u003eweb-page\u003c/a\u003e.\n\nIf you have access to a HPC cluster, you can use the .sh scripts in the script_cluster folder which call the corresponding .m files.\n\n# Research reproducibility\nFor DSD100 and iKAla, the framework was tested as a part of a public evaluation campaign and the results were published online (see the sections above).\n\nFor Bach10, we provide the synthetic \u003ca href=\"https://zenodo.org/record/321361#.WNFhKt-i7J8\"\u003eBach10 Sibeliust dataset\u003c/a\u003e and the \u003ca href=\"https://zenodo.org/record/344499#.WNFjMN-i7J8\"\u003eBach10 Separation SMC2017 dataset\u003c/a\u003e containing the separation for each method as .wav files and the evaluation results as .mat files.\n\nIf you want to compute the features and re-train the models, check the 'examples/bach10' folder and the instructions above. Alternatively, you can \u003ca href=\"https://drive.google.com/open?id=0B-Th_dYuM4nOa3ZMSmhwRkwzaGM\"\u003edownload\u003c/a\u003e an already trained model and perform separation with 'separate_bach10.py'.\n\nIf you want to evaluate the methods in \u003ca href=\"https://zenodo.org/record/344499#.WNFjMN-i7J8\"\u003eBach10 Separation SMC2017 dataset\u003c/a\u003e, then you can use the scripts in evaluation directory, which we explained above in the 'Evaluation' section.\n\nIf you want to replicate the plots in the SMC2017 paper, you need to have installed 'pandas' and 'seaborn' (pip install pandas seaborn) and then run the script in the plots subfolder:\n\n    bach10_smc_stats.py --db 'path-to-results-dir'\n\nWhere 'path-to-results-dir' is the path to the folder where you have stored the results for each method (e.g. if you downloaded the Bach10 Separation SMC2017, it would be the 'results' subfolder).\n\n# Acknowledgments\nThe TITANX used for this research was donated by the NVIDIA Corporation.\n\n# License\n\n    Copyright (c) 2014-2017\n    Marius Miron \u003cmiron.marius at gmail dot com\u003e,\n    Pritish Chandna \u003cpc2752 at gmail dot com\u003e,\n    Gerard Erruz, and Hector Martel\n    Music Technology Group, Universitat Pompeu Fabra, Barcelona \u003cmtg.upf.edu\u003e\n\n    This program is free software: you can redistribute it and/or modify\n    it under the terms of the Affero GPL license published by\n    the Free Software Foundation, either version 3 of the License, or (at your\n    option) any later version.\n\n    This program is distributed in the hope that it will be useful,\n    but WITHOUT ANY WARRANTY; without even the implied warranty of\n    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n    Affero GPL license for more details.\n\n    You should have received a copy of the Affero GPL license\n    along with this program.  If not, see \u003chttp://www.gnu.org/licenses/\u003e.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtg%2Fdeepconvsep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmtg%2Fdeepconvsep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtg%2Fdeepconvsep/lists"}