Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dansuh17/jdcnet-pytorch
pytorch implementation of JDCNet, singing voice detection and classification network
https://github.com/dansuh17/jdcnet-pytorch
bilstm deep-learning lstm melody mir music-information-retrieval pytorch singing-voice
Last synced: 26 days ago
JSON representation
pytorch implementation of JDCNet, singing voice detection and classification network
- Host: GitHub
- URL: https://github.com/dansuh17/jdcnet-pytorch
- Owner: dansuh17
- Created: 2019-10-06T12:02:52.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-02-15T22:55:35.000Z (almost 2 years ago)
- Last Synced: 2024-10-03T12:31:19.195Z (about 2 months ago)
- Topics: bilstm, deep-learning, lstm, melody, mir, music-information-retrieval, pytorch, singing-voice
- Language: Python
- Homepage:
- Size: 19 MB
- Stars: 47
- Watchers: 1
- Forks: 5
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# JDCNet-pytorch
This is a [PyTorch](https://pytorch.org/) re-implementation of
_Kum et al. - "Joint Detection and Classification of
Singing Voice Melody Using Convolutional Recurrent Neural Networks" (2019)_.
The proposed neural network model will be called **JDCNet** for convenience.- **paper**: [PDF](https://www.mdpi.com/2076-3417/9/7/1324)
- **original Keras implementation**: [melodyExtraction_JDC](https://github.com/keums/melodyExtraction_JDC)This is an attempt of implementing JDCNet as close as possible with the original paper.
Any ambiguities in implementation details have been filled in by my own decisions,
which account for any differences with the original author's implementation details.## Prerequisites
Major dependencies for this project are:
- python >= 3.6
- pytorch >= 1.2
- librosa >= 0.7.0
- pytorch-land == 0.1.6 (train only)Any other required libraries are written in `requirements.txt`.
This project also uses a mini library called [pytorch-land](https://github.com/dansuh17/pytorch-land)
created by myself that implements a general `Trainer` for pytorch based models.
It provides easy logging, native tensorboard support,
and performs basic "train-validate-test" training sequence.[librosa](https://librosa.github.io/librosa/) is used for reading audio files.
# JDCNet
JDCNet is a singing voice melody detection and classification network.
It detects detection of whether there exists a noticeable singing voice in a certain frame,
and, if exists, classifies the pitch of the sung note.The pitch classification is done using a convolutional network with a bidirectional LSTM (BiLSTM) module attached at the end.
Intermediate features for pitch classifier are utilized by the auxiliary detector network,
also a BiLSTM module, to aid the determination of voice existence.The input is a log-magnitude spectrogram chunk that consists of 31 frames and 513 frequency bins.
The model predicts whether or not the voice exists for each frame, giving a `(31 x 2)` tensor output,
and classifies the pitch into one of 722 classes that represent 721 different frequencies
evenly distributed (in log scale) from notes D3 (MIDI=38) to B5 (MIDI=83) inclusive,
and an extra 'non-voice' class.![jdcnet_architecture](assets/jdcnet_diagram.png)
# Data Preprocess
[MedleyDB's Melody Subset](https://zenodo.org/record/2628782#.XcvOPpIzZ24)
dataset is used to train this model.
Acquire the dataset, extract the contents, and run the preprocessing script
to be ready for training.```shell
./medleydb_preprocess.py --in_root --out_root --metadata_path //.json
```# Train
You must provide a configuration file to train the network.
Default configuration file with default parameters are provided as `default_config.json`.
In order to start training, run the script `train.py`.```shell
./train.py --config default_config.json
```# Singing voice melody extraction
You can generate a MIDI file containing extracted singing voice melody using the provided [pretrained model](/example_model).
```shell
./extract_melody.py --model example_model/jdcnet_model.pth --input_audio .wav
```# Generated Melody Audio Examples
Some audible examples have been posted in [this post](https://dansuh17.github.io/2019/11/19/jdcnet.html),
and example MIDI files are in the ['melody_results'](/melody_results) directory.