https://github.com/keunwoochoi/kapre

kapre: Keras Audio Preprocessors
https://github.com/keunwoochoi/kapre

audio kapre-layers keras keras-audio-preprocessors melspectrogram preprocess shot spectrogram tensorflow

Last synced: 8 months ago
JSON representation

kapre: Keras Audio Preprocessors

Host: GitHub
URL: https://github.com/keunwoochoi/kapre
Owner: keunwoochoi
License: mit
Created: 2016-12-14T18:36:36.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2023-10-23T02:52:41.000Z (about 2 years ago)
Last Synced: 2025-05-13T09:08:33.973Z (8 months ago)
Topics: audio, kapre-layers, keras, keras-audio-preprocessors, melspectrogram, preprocess, shot, spectrogram, tensorflow
Language: Python
Homepage:
Size: 4.86 MB
Stars: 930
Watchers: 22
Forks: 146
Open Issues: 17
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE.txt

Awesome Lists containing this project

fucking-awesome-python - :octocat: kapre - :star: 892 :fork_and_knife: 146 - Keras Audio Preprocessors (Audio)
fucking-awesome-python-cn - kapre
awesome-python-resources - GitHub - 12% open · ⏱️ 04.07.2022): (音频)
awesome-python-machine-learning-resources - GitHub - 12% open · ⏱️ 04.07.2022): (音频处理)
awesome-python - kapre - Keras Audio Preprocessors (Audio)
python-awesome - kapre - Keras Audio Preprocessors. (Audio)
awesome-python - kapre - Keras Audio Preprocessors. (Audio)
awesome-python - kapre - kapre: Keras Audio Preprocessors ` 📝 3 months ago` (Audio [🔝](#readme))
awesome-python-zh - kapre - Keras音频预处理器。 (音频)
awesome-drone - kapre - Keras Audio Preprocessors. (Audio / Drone Frames)
fucking-awesome-python - kapre - Keras Audio Preprocessors. (Audio)
awesome-python-cn - kapre
awesome-python-scientific-audio - Kapre - Keras Audio Preprocessors (Audio Related Packages)
awesome-open-source - kapre - Keras Audio Preprocessors. (Python)

README

          # Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

 

Tested on Python 3.6 and 3.7

## Why Kapre?

### vs. Pre-computation

* You can optimize DSP parameters

* Your model deployment becomes much simpler and consistent.

* Your code and model has less dependencies

### vs. Your own implementation

* Quick and easy!

* Consistent with 1D/2D tensorflow batch shapes

* Data format agnostic (`channels_first` and `channels_last`)

* Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) *trickier* than you think.

* Kapre layers have some extended APIs from the default `tf.signals` implementation such as..

  - A perfectly invertible `STFT` and `InverseSTFT` pair

  - Mel-spectrogram with more options

* Reproducibility - Kapre is available on pip with versioning   

## Workflow with Kapre

1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).

2. In your ML model, add Kapre layer e.g. `kapre.time_frequency.STFT()` as the first layer of the model.

3. The data loader simply loads audio signals and feed them into the model

4. In your hyperparameter search, include DSP parameters like `n_fft` to boost the performance.

5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

## Installation

 

```sh

pip install kapre

```

## API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

## One-shot example

```python

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax

from kapre import STFT, Magnitude, MagnitudeToDecibel

from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.

input_shape = (44100, 6)

sr = 44100

model = Sequential()

# A STFT layer

model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,

               window_name=None, pad_end=False,

               input_data_format='channels_last', output_data_format='channels_last',

               input_shape=input_shape))

model.add(Magnitude())

model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()

# Alternatively, you may want to use a melspectrogram layer

# melgram_layer = get_melspectrogram_layer()

# or log-frequency layer

# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want

model.add(Conv2D(32, (3, 3), strides=(2, 2)))

model.add(BatchNormalization())

model.add(ReLU())

model.add(GlobalAveragePooling2D())

model.add(Dense(10))

model.add(Softmax())

# Compile the model

model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs

# for example, you may have functions that load your data as below.

x = load_x() # e.g., x.shape = (10000, 6, 44100)

y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification

# then..

model.fit(x, y)

# Done!

```

* See the Jupyter notebook at the [example folder](https://github.com/keunwoochoi/kapre/tree/master/examples)

## Tflite compatbility

The `STFT` layer is not tflite compatible (due to `tf.signal.stft`). To create a tflite

compatible model, first train using the normal `kapre` layers then create a new

model replacing `STFT` and `Magnitude` with `STFTTflite`, `MagnitudeTflite`.

Tflite compatible layers are restricted to a batch size of 1 which prevents use

of them during training.

```python

# assumes you have run the one-shot example above.

from kapre import STFTTflite, MagnitudeTflite

model_tflite = Sequential()

model_tflite.add(STFTTflite(n_fft=2048, win_length=2018, hop_length=1024,

               window_name=None, pad_end=False,

               input_data_format='channels_last', output_data_format='channels_last',

               input_shape=input_shape))

model_tflite.add(MagnitudeTflite())

model_tflite.add(MagnitudeToDecibel())  

model_tflite.add(Conv2D(32, (3, 3), strides=(2, 2)))

model_tflite.add(BatchNormalization())

model_tflite.add(ReLU())

model_tflite.add(GlobalAveragePooling2D())

model_tflite.add(Dense(10))

model_tflite.add(Softmax())

# load the trained weights into the tflite compatible model.

model_tflite.set_weights(model.get_weights())

```

# Citation

Please cite this paper if you use Kapre for your work.

```

@inproceedings{choi2017kapre,

  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},

  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},

  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},

  year={2017},

  organization={ICML}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/keunwoochoi/kapre

Awesome Lists containing this project

README