https://github.com/d-f/nylon-amt

Automatic music transcription for classical guitar with hierarchical frequency-time transformers and the MAESTRO dataset
https://github.com/d-f/nylon-amt

huggingface-transformers music-information-retrieval music-transcription torchaudio transformer

Last synced: about 2 months ago
JSON representation

Automatic music transcription for classical guitar with hierarchical frequency-time transformers and the MAESTRO dataset

Host: GitHub
URL: https://github.com/d-f/nylon-amt
Owner: d-f
Created: 2023-07-14T00:00:36.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-02-08T21:25:33.000Z (4 months ago)
Last Synced: 2025-02-08T21:27:44.391Z (4 months ago)
Topics: huggingface-transformers, music-information-retrieval, music-transcription, torchaudio, transformer
Language: Python
Homepage:
Size: 133 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        This project makes use of hierarchical frequency time transformers in order to transcribe classical guitar audio into sheet music and guitar tabulature. The original implementation of this model can be found at [https://github.com/sony/hFT-Transformer/](https://github.com/sony/hFT-Transformer/) and [https://arxiv.org/abs/2307.04305](https://arxiv.org/abs/2307.04305)

First, a model is trained to transcribe solo piano recital pieces using the MAESTRO dataset

which can be found here: [https://magenta.tensorflow.org/datasets/maestro](https://magenta.tensorflow.org/datasets/maestro) This dataset includes paired audio and midi files that are aligned within 3ms. There are data for 10 years of international competitive piano recitals created by recording MIDI alongside audio at the time of the performance.

# Training details

The following model architecture hyperparameters were reduced, otherwise the architecture was identical to the one presented in the paper:

| hidden layer | position-wise feed forward layer | num encoder layers | num decoder layers | num encoder heads | num decoder heads | 

| ------------ | -------------------------------- | ------------------ | ------------------ | ----------------- | ----------------- | 

| 64           | 128                              | 2                  | 2                  | 2                 | 2                 |

| batch size | learning rate | drop out | dataset size     | num epochs | 

| ---------- | ------------- | -------- | ---------------- | ---------- |

| 8          | 1e-4          | 0.1      | 0.2% of original | 20         |

# MAESTRO dataset performance

| Precision | Recall | F1    | 

| --------- | ------ | ----- |

| 0.01      | 0.95   |  0.02 |

It is clear that more than 0.2% of the dataset needs to be used for training. 

dataset_creation.py was used in order to format the MAESTRO dataset in the expected way. The HFTT github was used with very minor adjustments

m_training.py was used in order to train the model, with example usage in the file /evaluation/EXE-EVALUATION-MAESTRO.sh

Changes to the HFTT repo:

- m_training.py

  - changing device to torch.device("cuda") compared to just "cuda" on line 111

  - moving testing to end of training

- train.py

  - changing the total to be the length of the iterator rather than a fixed value in the tqdm progress bar on line 16.

  -  measure precision, recall and f1 on test set with mir_eval

[Alphatab](https://github.com/CoderLine/alphaTab) will be used for tab visualization.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/d-f/nylon-amt

Awesome Lists containing this project

README