Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/demiseom/specaugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
https://github.com/demiseom/specaugment
data-augmentation python pytorch specaugment speech speech-recognition tensorflow
Last synced: about 1 month ago
JSON representation
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
- Host: GitHub
- URL: https://github.com/demiseom/specaugment
- Owner: DemisEom
- License: apache-2.0
- Created: 2019-04-24T03:29:51.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-04-05T00:46:37.000Z (over 2 years ago)
- Last Synced: 2024-10-11T04:41:43.923Z (about 1 month ago)
- Topics: data-augmentation, python, pytorch, specaugment, speech, speech-recognition, tensorflow
- Language: Python
- Homepage:
- Size: 428 KB
- Stars: 638
- Watchers: 11
- Forks: 135
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SpecAugment [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
This is a implementation of SpecAugment that speech data augmentation method which directly process the spectrogram with Tensorflow & Pytorch, introduced by Google Brain[1]. This is currently under the Apache 2.0, Please feel free to use for your project. Enjoy!## How to use
First, you need to have python 3 installed along with [Tensorflow](https://www.tensorflow.org/install/).
Next, you need to install some audio libraries work properly. To install the requirement packages. Run the following command:
```bash
pip3 install SpecAugment
```And then, run the specAugment.py program. It modifies the spectrogram by warping it in the time direction, masking blocks of consecutive frequency channels, and masking blocks of utterances in time.
#### *Try your audio file SpecAugment*
```shell
$ python3
``````python
>>> import librosa
>>> from specAugment import spec_augment_tensorflow
# If you are Pytorch, then import spec_augment_pytorch instead of spec_augment_tensorflow
>>> audio, sampling_rate = librosa.load(audio_path)
>>> mel_spectrogram = librosa.feature.melspectrogram(y=audio,
sr=sampling_rate,
n_mels=256,
hop_length=128,
fmax=8000)
>>> warped_masked_spectrogram = spec_augment_tensorflow.spec_augment(mel_spectrogram=mel_spectrogram)
>>> print(warped_masked_spectrogram)
'
[[1.54055389e-01 7.51822486e-01 7.29588015e-01 ... 1.03616300e-01
1.04682689e-01 1.05411769e-01]
[2.21608739e-01 1.38559084e-01 1.01564167e-01 ... 4.19907116e-02
4.86430404e-02 5.27331798e-02]
[3.62784019e-01 2.09934399e-01 1.79158230e-01 ... 2.42307431e-01
3.18662338e-01 3.67405599e-01]
...
[6.36117335e-07 8.06897948e-07 8.55346431e-07 ... 2.84445018e-07
4.02975952e-07 5.57131738e-07]
[6.27753429e-07 7.53681318e-07 8.13035033e-07 ... 1.35111146e-07
2.74058225e-07 4.56901031e-07]
[0.00000000e+00 7.48416680e-07 5.51771037e-07 ... 1.13901361e-07
2.56365068e-07 4.43868592e-07]]
'
```
Learn more examples about how to do specific tasks in SpecAugment at the test code.```bash
python spec_augment_test.py
```
In test code, we using one of the [LibriSpeech dataset](http://www.openslr.org/12/).
# Reference
1. https://arxiv.org/pdf/1904.08779.pdf