Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/OMR-Research/tf-end-to-end
TensorFlow code to perform end-to-end Optical Music Recognition on monophonic scores through Convolutional Recurrent Neural Networks and CTC-based training.
https://github.com/OMR-Research/tf-end-to-end
Last synced: 5 days ago
JSON representation
TensorFlow code to perform end-to-end Optical Music Recognition on monophonic scores through Convolutional Recurrent Neural Networks and CTC-based training.
- Host: GitHub
- URL: https://github.com/OMR-Research/tf-end-to-end
- Owner: OMR-Research
- License: mit
- Created: 2018-03-10T16:46:33.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-03-07T13:47:42.000Z (8 months ago)
- Last Synced: 2024-08-03T09:10:33.326Z (4 months ago)
- Language: Python
- Homepage:
- Size: 301 KB
- Stars: 138
- Watchers: 8
- Forks: 58
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tf-deep-omr
TensorFlow code to perform end-to-end Optical Music Recognition on monophonic scores through Convolutional Recurrent Neural Networks and CTC-based training.
## Citation
This repository was used for the experiments reported in the paper:
[End-to-End Neural Optical Music Recognition of Monophonic Scores](http://www.mdpi.com/2076-3417/8/4/606)
```
@Article{Calvo-Zaragoza2018,
AUTHOR = {Calvo-Zaragoza, Jorge and Rizo, David},
TITLE = {End-to-End Neural Optical Music Recognition of Monophonic Scores},
JOURNAL = {Applied Sciences},
VOLUME = {8},
YEAR = {2018},
NUMBER = {4},
ARTICLE NUMBER = {606},
URL = {http://www.mdpi.com/2076-3417/8/4/606},
ISSN = {2076-3417},
DOI = {10.3390/app8040606}
}
```## Corpora
This repository is intended for the Printed Images of Music Staves (PrIMuS) dataset.
PrIMuS can be donwloaded from https://grfia.dlsi.ua.es/primus/
## Training
Assuming that PrIMuS dataset has been downloaded, and all its samples has been placed in the same folder, the training of the models can be done with `ctc_training.py`. It is necessary to build a list of training samples and the set of symbols (vocabulary). Examples of these files are given in `Data`folder.
### Semantic
```
python ctc_training.py -semantic -corpus -set Data/train.txt -vocabulary Data/vocabulary_semantic.txt -save_model ./trained_semantic_model
```### Agnostic
```
python ctc_training.py -corpus -set Data/train.txt -vocabulary Data/vocabulary_agnostic.txt -save_model ./trained_agnostic_model
```## Recognition
For running inference over an input image, `ctc_predict.py` can be used, along with an input image, a trained model, and the corresponding vocabulary file.
The repository is not provided with trained models but these can be download from:
* [Agnostic model](https://grfia.dlsi.ua.es/primus/models/PrIMuS/Agnostic-Model.zip)
* [Semantic model](https://grfia.dlsi.ua.es/primus/models/PrIMuS/Semantic-Model.zip)These models were the result of the traning process for one of the folds of the 10-fold cross-validation considered in the paper.
Let's see an example for the sample from PrIMuS provided in `Data/Example`:
![Alt text](Data/Example/000051652-1_2_1.png?raw=true "000051652-1_2_1")
This sample belongs to the test set of the aforementioned fold, so it was not seen by the networks during their training stage.
### Semantic
Running
```
python ctc_predict.py -image Data/Example/000051652-1_2_1.png -model Models/semantic_model.meta -vocabulary Data/vocabulary_semantic.txt
```should get the following prediction
``
clef-C1 keySignature-EbM timeSignature-2/4 multirest-23 barline rest-quarter rest-eighth note-Bb4_eighth barline note-Bb4_quarter. note-G4_eighth barline note-Eb5_quarter. note-D5_eighth barline note-C5_eighth note-C5_eighth rest-quarter barline
``The ground-truth of this example is given in `Data/Example/000051652-1_2_1.semantic`:
``
clef-C1 keySignature-EbM timeSignature-2/4 multirest-23 barline rest-quarter rest-eighth note-Bb4_eighth barline note-Bb4_quarter. note-G4_eighth barline note-Eb5_quarter. note-D5_eighth barline note-C5_eighth note-C5_eighth rest-quarter barline
``It can be observed that the staff section is perfectly recognized by the model.
### Agnostic
Running
```
python ctc_predict.py -image Data/Example/000051652-1_2_1.png -model Models/agnostic_model.meta -vocabulary Data/vocabulary_agnostic.txt
```should get the following prediction
``
clef.C-L1 accidental.flat-L4 accidental.flat-L2 accidental.flat-S3 digit.2-L4 digit.4-L2 digit.2-S5 digit.3-S5 multirest-L3 barline-L1 rest.quarter-L3 rest.eighth-L3 note.eighth-L4 barline-L1 note.quarter-L4 dot-S4 note.eighth-L3 barline-L1 note.quarter-S5 dot-S5 note.eighth-L5 barline-L1 note.eighth-S4 note.eighth-S4 rest.quarter-L3
``The ground-truth of this example is given in `Data/Example/000051652-1_2_1.agnostic`:
``
clef.C-L1 accidental.flat-L4 accidental.flat-L2 accidental.flat-S3 digit.2-L4 digit.4-L2 digit.2-S5 digit.3-S5 multirest-L3 barline-L1 rest.quarter-L3 rest.eighth-L3 note.eighth-L4 barline-L1 note.quarter-L4 dot-S4 note.eighth-L3 barline-L1 note.quarter-S5 dot-S5 note.eighth-L5 barline-L1 note.eighth-S4 note.eighth-S4 rest.quarter-L3 barline-L1
``As discussed in the paper, this representation often misses the last barline.
## Contact:
* Jorge Calvo Zaragoza ([email protected])