https://github.com/zvikinoza/masr
Mini Automatic Speech Recognition
https://github.com/zvikinoza/masr
fourier-transform keras-tensorflow librosa sound-processing speech-recognition speech-to-text
Last synced: over 1 year ago
JSON representation
Mini Automatic Speech Recognition
- Host: GitHub
- URL: https://github.com/zvikinoza/masr
- Owner: zvikinoza
- Created: 2020-01-04T12:45:50.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-09-26T19:29:50.000Z (almost 4 years ago)
- Last Synced: 2025-01-26T15:16:56.561Z (over 1 year ago)
- Topics: fourier-transform, keras-tensorflow, librosa, sound-processing, speech-recognition, speech-to-text
- Language: Jupyter Notebook
- Size: 84.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Mini Automatic Speech Recognition
## Using Fourier transforms and CNN (VGG like) architecture

EDA and modeling on ≈150 samples of sounds speaking numbers form 1 to 5 recorded by 10 people.
Accuracy on large (>10k) dataset: 94%.
Accuracy on given small dataset: 93%
Task was more or less chalenging because of small dataset.
Augmentation techniques:
* increase/decrease pitch
* increase/decrease speed
* stretching
* frequency and time masking
* white noice injection
* time shifting
* overlay 2 samples (quiet and louder)
* pre/post noise padding
Splitting data is done by speakers 5-5.
Training graph :
