Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/manoskary/vocsep_ijcai2023

Code for the paper Musical Voice Separation as Link Prediction: Modeling a Musical Perception Task as a Multi-Trajectory Tracking Problem
https://github.com/manoskary/vocsep_ijcai2023

machine-learning music-information-retrieval symbolic-music voice-separation

Last synced: 2 months ago
JSON representation

Code for the paper Musical Voice Separation as Link Prediction: Modeling a Musical Perception Task as a Multi-Trajectory Tracking Problem

Host: GitHub
URL: https://github.com/manoskary/vocsep_ijcai2023
Owner: manoskary
License: mit
Created: 2023-04-20T14:22:11.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-10-18T07:56:02.000Z (about 1 year ago)
Last Synced: 2024-10-03T19:10:51.326Z (3 months ago)
Topics: machine-learning, music-information-retrieval, symbolic-music, voice-separation
Language: Python
Homepage:
Size: 130 KB
Stars: 7
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Voice Separation as Link Prediction - IJCAI 2023

This repository contains the code for the paper :

[Musical Voice Separation as Link Prediction: Modeling a Musical Perception Task as a Multi-Trajectory Tracking Problem]()

##### Abstract

This paper targets the perceptual task of separating the different interacting voices, i.e., monophonic melodic streams, in a polyphonic musical piece.  We target symbolic music, where notes are explicitly encoded, and model this task as a Multi-Trajectory Tracking (MTT) problem from discrete observations, i.e., notes in a pitch-time space. Our approach builds a graph from a musical piece, by creating one node for every note, and separates the melodic trajectories by predicting a link between two notes if they are consecutive in the same voice/stream.  This kind of local, greedy prediction is made possible by node embeddings created by a heterogeneous graph neural network that can capture inter- and intra-trajectory information. Furthermore, we propose a new regularization loss that encourages the output to respect the MTT premise of at most one incoming and one outgoing link for every node, favouring monophonic (voice) trajectories; this loss function might also be useful in other general MTT scenarios.

Our approach does not use domain-specific heuristics, is scalable to longer sequences and a higher number of voices, and can handle complex cases such as voice inversions and overlaps. We reach new state-of-the-art results for the voice separation task in classical music of different styles.

## Install and Run

It is suggested to create an environment using conda or miniconda more information [here](https://docs.conda.io/projects/miniconda/en/latest/index.html).

If you have conda then do:

```shell

conda create -n vocsep python=3.8

conda activate vocsep

```

The suggested version of python is 3.8 or later. To install requirements and run just do:

```shell

pip install -r requirements.txt

```

The requirements installs the cpu version of Pytorch and Pytorch-Scatter

If you want to run the code with CUDA please install the corresponding version of [Pytorch](https://pytorch.org/) and [Pytorch-Scatter](https://github.com/rusty1s/pytorch_scatter) to your system (follow links for more information).

To run the code just do:

```shell

python -m main.py

```

For more information about the configuration you can add execute with the flag `-h`.

This project depends on [WANDB](https://wandb.ai/) therfore you will need an account to run the code.

#### Results 

|            |       | **McLeod**    |       |       | **GMTT**      |       |       | **GMMT+LA**      |                 |

|------------|-------|---------------|-------|-------|---------------|-------|-------|------------------|-----------------|

| Datasets   | P     | R             | F1    | P     | R             | F1    | P     | R                | F1              |

| Inventions | 0.992 | 0.991         | 0.992 | 0.989 | 0.997         | 0.995 | 0.996 | 0.995            | **0.997**       |

| Sinfonias  | 0.982 | 0.982         | 0.982 | 0.987 | 0.989         | 0.978 | 0.987 | 0.982            | **0.985**       |

| WTC I      | 0.964 | 0.964         | 0.964 | 0.949 | 0.983         | 0.967 | 0.980 | 0.973            | **0.976**       |

| WTC II     | 0.964 | 0.964         | 0.964 | 0.945 | 0.979         | 0.962 | 0.976 | 0.968            | **0.972**       |

| Haydn      | 0.781 | 0.781         | 0.781 | 0.787 | 0.929         | 0.850 | 0.883 | 0.860            | **0.872**       |

For all results and model visit our logs at [WANDB](https://wandb.ai/vocsep/Voice%20Separation).