{"id":17036782,"url":"https://github.com/faroit/dsdtools","last_synced_at":"2025-04-12T12:32:13.800Z","repository":{"id":57424307,"uuid":"46276004","full_name":"faroit/dsdtools","owner":"faroit","description":"Parse and process the demixing secrets dataset (DSD100)","archived":false,"fork":false,"pushed_at":"2018-03-17T14:03:24.000Z","size":395,"stargazers_count":49,"open_issues_count":0,"forks_count":10,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-21T00:38:21.450Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://dsdtools.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/faroit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-11-16T13:24:49.000Z","updated_at":"2025-02-17T12:30:14.000Z","dependencies_parsed_at":"2022-08-23T16:40:17.390Z","dependency_job_id":null,"html_url":"https://github.com/faroit/dsdtools","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/faroit%2Fdsdtools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/faroit%2Fdsdtools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/faroit%2Fdsdtools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/faroit%2Fdsdtools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/faroit","download_url":"https://codeload.github.com/faroit/dsdtools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248566587,"owners_count":21125692,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T08:51:57.939Z","updated_at":"2025-04-12T12:32:13.476Z","avatar_url":"https://github.com/faroit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dsdtools\n\n### :warning: We released [MUSDB18](https://sigsep.github.io/musdb) our newer, larger dataset. Please use the [musdb python package](https://github.com/sigsep/sigsep-mus-db) to parse it.\n\n[![Build Status](https://travis-ci.org/faroit/dsdtools.svg?branch=master)](https://travis-ci.org/faroit/dsdtools)\n[![Coverage Status](https://coveralls.io/repos/github/faroit/dsdtools/badge.svg?branch=master)](https://coveralls.io/github/faroit/dsdtools?branch=master)\n[![Docs Status](https://readthedocs.org/projects/dsdtools/badge/?version=latest)](https://dsdtools.readthedocs.org/en/latest/)\n\n\nA python package to parse and process the __demixing secrets dataset (DSD)__ as part of the [MUS task](https://sisec.inria.fr/home/2016-professionally-produced-music-recordings/) of the [Signal Separation Evaluation Campaign (SISEC)](https://sisec.inria.fr/).\n\n## Installation\n\n```bash\npip install dsdtools\n```\n\n## DSD100 Dataset / Subset\n\nThe complete dataset (~14 GB) can be downloaded [here](http://liutkus.net/DSD100.zip). For testing and development we provide a subset of the DSD100 [for direct download here](https://www.loria.fr/~aliutkus/DSD100subset.zip). It has the same file and folder structure as well as the same audio file formats but consists of only 4 tracks of 30s each.\n\n## Usage\n\nThis package should nicely integrate with your existing python code, thus makes it easy to participate in the [SISEC MUS tasks](https://sisec.inria.fr/home/2016-professionally-produced-music-recordings). The core of this package is calling a user-provided function that separates the mixtures from the DSD into several estimated target sources.\n\n### Providing a compatible function\n\n- The function will take an DSD ```Track``` object which can be used from inside your algorithm.\n- Participants can access:\n\n - ```Track.audio```, representing the stereo mixture as an ```np.ndarray``` of ```shape=(nun_sampl, 2)```\n - ```Track.rate```, the sample rate\n - ```Track.path```, the absolute path of the mixture which might be handy to process with external applications, so that participants don't need to write out temporary wav files.\n\n- The provided function needs to return a python ```Dict``` which consists of target name (```key```) and the estimated target as audio arrays with same shape as the mixture (```value```).\n- It is the users choice which target sources they want to provide for a given mixture. Supported targets are ```['vocals', 'accompaniment', 'drums', 'bass', 'other']```.\n- Please make sure that the returned estimates do have the same sample rate as the mixture track.\n\nHere is an example for such a function separating the mixture into a __vocals__ and __accompaniment__ track:\n\n```python\ndef my_function(track):\n    # get the audio mixture as\n    # numpy array shape=(nun_sampl, 2)\n    track.audio\n\n    # compute voc_array, acc_array\n    # ...\n\n    return {\n        'vocals': voc_array,\n        'accompaniment': acc_array\n    }\n```\n\n### Creating estimates for SiSEC evaluation\n\n#### Setting up dsdtools\n\nSimply import the dsdtools package in your main python function:\n\n```python\nimport dsdtools\n\ndsd = dsdtools.DB(root_dir='path/to/dsdtools')\n```\n\nThe ```root_dir``` is the path to the dsdtools dataset folder. Instead of ```root_dir``` it can also be set system-wide. Just ```export DSD_PATH=/path/to/dsdtools``` inside your terminal environment.\n\n#### Test if your separation function generates valid output\n\nBefore processing the full DSD100 which might take very long, participants can test their separation function by running:\n```python\ndsd.test(my_function)\n```\nThis test makes sure the user provided output is compatible to the dsdtools framework. The function returns `True` if the test succeeds.\n\n#### Processing the full DSD100\n\nTo process all 100 DSD tracks and saves the results to the folder ```estimates_dir```:\n\n```python\ndsd.run(my_function, estimates_dir=\"path/to/estimates\")\n```\n\n#### Processing training and testing subsets separately\n\nAlgorithms which make use of machine learning techniques can use the training subset and then apply the algorithm on the test data. That way it is possible to apply different user functions for both datasets.\n\n```python\ndsd.run(my_training_function, subsets=\"Dev\")\ndsd.run(my_test_function, subsets=\"Test\")\n```\n\n##### Access the reference signals / targets\n\nFor supervised learning you can use the provided reference sources by loading the `track.targets` dictionary.\nE.g. to access the vocal reference from a track:\n\n```python\ntrack.targets['vocals'].audio\n```\n\nIf you want to exclude tracks from the training you can specify track ids as  the `dsdtools.DB(..., valid_ids=[1, 2]`) object. Those tracks are then not included in `Dev` but are returned for `subsets=\"Valid\"`.\n\n\n#### Processing single or multiple DSD100 tracks\n\n```python\ndsd.run(my_function, ids=30)\ndsd.run(my_function, ids=[1, 2, 3])\ndsd.run(my_function, ids=range(90, 99))\n```\n\nNote, that the provided list of ids can be overridden if the user sets a terminal environment variable ```DSD_ID=1```.\n\n#### Use multiple cores\n\n##### Python Multiprocessing\n\nTo speed up the processing, `run` can make use of multiple CPUs:\n\n```python\ndsd.run(my_function, parallel=True, cpus=4)\n```\n\nNote: We use the python builtin multiprocessing package, which sometimes is unable to parallelize the user provided function to [PicklingError](http://stackoverflow.com/a/8805244).\n\n##### GNU Parallel\n\n\u003e [GNU parallel](http://www.gnu.org/software/parallel) is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.\n\nBy running only one ```id``` in each python process the DSD100 set can easily be processed with GNU parallel using multiple CPUs without any further modifications to your code:\n\n```bash\nparallel --bar 'DSD_ID={0} python main.py' ::: {0..99}  \n```\n\n## Compute the bss_eval measures\n\nThe official SISEC evaluation relies on _MATLAB_ because currently there does not exist a [bss_eval](http://bass-db.gforge.inria.fr/bss_eval/) implementation for python which produces identical results.\nTherefore please run ```dsd100_eval_only.m``` from the [DSD100 Matlab scripts](https://github.com/faroit/dsd100mat) after you have processed and saved your estimates with _dsdtools_.\n\n## Full code Example\n\n```python\nimport dsdtools\n\ndef my_function(track):\n    '''My fancy BSS algorithm'''\n\n    # get the audio mixture as numpy array shape=(num_sampl, 2)\n    track.audio\n\n    # get the mixture path for external processing\n    track.path\n\n    # get the sample rate\n    track.rate\n\n    # return any number of targets\n    estimates = {\n        'vocals': vocals_array,\n        'accompaniment': acc_array,\n    }\n    return estimates\n\n# initiate dsdtools\ndsd = dsdtools.DB(root_dir=\"./Volumes/Data/dsdtools\")\n\n# verify if my_function works correctly\nif dsd.test(my_function):\n    print \"my_function is valid\"\n\n# this might take 3 days to finish\ndsd.run(my_function, estimates_dir=\"path/to/estimates\")\n\n```\n\n## References\n\nIf you use this package, please reference the following paper\n\n```tex\n@inproceedings{\n  SiSEC17,\n  Title = {The 2016 Signal Separation Evaluation Campaign},\n  Address = {Cham},\n  Author = {Liutkus, Antoine and St{\\\"o}ter, Fabian-Robert and Rafii, Zafar and Kitamura, Daichi and Rivet, Bertrand and Ito, Nobutaka and Ono, Nobutaka and Fontecave, Julie},\n  Editor = {Tichavsk{\\'y}, Petr and Babaie-Zadeh, Massoud and Michel, Olivier J.J. and Thirion-Moreau, Nad{\\`e}ge},\n  Pages = {323--332},\n  Publisher = {Springer International Publishing},\n  Year = {2017},\n  booktitle = {Latent Variable Analysis and Signal Separation - 12th International Conference, {LVA/ICA} 2015, Liberec, Czech Republic, August 25-28, 2015, Proceedings},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffaroit%2Fdsdtools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffaroit%2Fdsdtools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffaroit%2Fdsdtools/lists"}