https://github.com/bcc-code/bccmedia-song-or-not
[BCC Media] NN to determine what parts of an audio file are songs and which are speech
https://github.com/bcc-code/bccmedia-song-or-not
bcc-media
Last synced: 12 months ago
JSON representation
[BCC Media] NN to determine what parts of an audio file are songs and which are speech
- Host: GitHub
- URL: https://github.com/bcc-code/bccmedia-song-or-not
- Owner: bcc-code
- License: other
- Created: 2023-06-01T13:02:44.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2024-08-27T13:45:47.000Z (almost 2 years ago)
- Last Synced: 2024-08-28T12:57:33.056Z (almost 2 years ago)
- Topics: bcc-media
- Language: Python
- Homepage:
- Size: 1.08 MB
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# Song-Or-Not
## Summary
This is the code used to train a neural network with the goal of detecting of parts of a recording
are speech or song.
## Training data
The training data used is in forms of MP3 files (not included) and can be placed into the `songs`
or `speech` folders. Then you can run the `./split.sh` scripts which splits the file into chunks
with the specified length. Currently only files with 44100 Hz sample rate are supported
## Dependencies
Incomplete and untested list:
```
conda install -c apple tensorflow-deps
conda install tensorflow_io
conda install torchaudio
```
## Training
Run `python3 train.py`
This will produce a `./songornot_trained.pt` file. If you run the script again the file will be replaced
## Inference
As a sample you can run `./tests/inference_test.py `.
Results will be similar to this:
```
/Users/matjaz/meeting.wav
Chunks: 1317
Total items: 1317
song (15:10): 00:00 - 15:10
speech (04:00): 15:10 - 19:10
song (03:05): 19:10 - 22:15
speech (18:30): 22:15 - 40:45
song (04:00): 40:45 - 44:45
speech (00:20): 44:45 - 45:05
song (00:15): 45:05 - 45:20
speech (08:05): 45:20 - 53:25
song (03:30): 53:25 - 56:55
speech (25:05): 56:55 - 82:00
song (01:55): 82:00 - 83:55
speech (20:40): 83:55 - 104:35
song (03:30): 104:35 - 108:05
```
### Models
The provided models are provided for demonstration purposes and are (C) 2023 BCC Media STI.
## License
Model files listed here are licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) license:
* songornot_2s.pt
* songornot_5s.pt
Everything with the exception of the above list of files is released under the MIT License.