https://github.com/bcc-code/bccmedia-song-or-not

[BCC Media] NN to determine what parts of an audio file are songs and which are speech
https://github.com/bcc-code/bccmedia-song-or-not

bcc-media

Last synced: 12 months ago
JSON representation

[BCC Media] NN to determine what parts of an audio file are songs and which are speech

Host: GitHub
URL: https://github.com/bcc-code/bccmedia-song-or-not
Owner: bcc-code
License: other
Created: 2023-06-01T13:02:44.000Z (about 3 years ago)
Default Branch: master
Last Pushed: 2024-08-27T13:45:47.000Z (almost 2 years ago)
Last Synced: 2024-08-28T12:57:33.056Z (almost 2 years ago)
Topics: bcc-media
Language: Python
Homepage:
Size: 1.08 MB
Stars: 1
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

README

# Song-Or-Not
## Summary

This is the code used to train a neural network with the goal of detecting of parts of a recording
are speech or song.

## Training data

The training data used is in forms of MP3 files (not included) and can be placed into the `songs`
or `speech` folders. Then you can run the `./split.sh` scripts which splits the file into chunks
with the specified length. Currently only files with 44100 Hz sample rate are supported

## Dependencies

Incomplete and untested list:

```
conda install -c apple tensorflow-deps
conda install tensorflow_io
conda install torchaudio
```

## Training

Run `python3 train.py`

This will produce a `./songornot_trained.pt` file. If you run the script again the file will be replaced

## Inference

As a sample you can run `./tests/inference_test.py `.

Results will be similar to this:

```
/Users/matjaz/meeting.wav
Chunks: 1317
Total items: 1317
song (15:10): 00:00 - 15:10
speech (04:00): 15:10 - 19:10
song (03:05): 19:10 - 22:15
speech (18:30): 22:15 - 40:45
song (04:00): 40:45 - 44:45
speech (00:20): 44:45 - 45:05
song (00:15): 45:05 - 45:20
speech (08:05): 45:20 - 53:25
song (03:30): 53:25 - 56:55
speech (25:05): 56:55 - 82:00
song (01:55): 82:00 - 83:55
speech (20:40): 83:55 - 104:35
song (03:30): 104:35 - 108:05

```

### Models

## License

Model files listed here are licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) license:

* songornot_2s.pt
* songornot_5s.pt

Everything with the exception of the above list of files is released under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bcc-code/bccmedia-song-or-not

Awesome Lists containing this project

README