https://github.com/sp-nitech/sptk
A suite of speech signal processing tools
https://github.com/sp-nitech/sptk
audio-processing cepstrum cpp dsp lpc lsp mfcc signal-processing speech speech-processing sptk unix-command
Last synced: 2 months ago
JSON representation
A suite of speech signal processing tools
- Host: GitHub
- URL: https://github.com/sp-nitech/sptk
- Owner: sp-nitech
- License: apache-2.0
- Created: 2017-09-13T01:15:34.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2025-03-11T05:16:45.000Z (3 months ago)
- Last Synced: 2025-04-12T19:48:16.191Z (2 months ago)
- Topics: audio-processing, cepstrum, cpp, dsp, lpc, lsp, mfcc, signal-processing, speech, speech-processing, sptk, unix-command
- Language: C++
- Homepage: http://sp-tk.sourceforge.net
- Size: 5.57 MB
- Stars: 232
- Watchers: 17
- Forks: 27
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# SPTK
The Speech Signal Processing Toolkit (SPTK) is a software for speech signal processing tools.
- Older version: [SPTK3](https://sourceforge.net/projects/sp-tk/)
- PyTorch version: [diffsptk](https://github.com/sp-nitech/diffsptk)[](https://sp-nitech.github.io/sptk/latest/)
[](https://sp-nitech.github.io/sptk/4.3/)
[](https://github.com/sp-nitech/SPTK/blob/master/LICENSE)
[](https://github.com/sp-nitech/SPTK/actions)## What is SPTK?
- SPTK consists of over 100 commands for speech signal processing.
- The data format used in SPTK is raw header-less, i.e., there is no specific structure.
Thanks to the data format, we can check file contents immediately on CUI.
```sh
dmp +s data.raw
```
- The data used in the commands is passed through standard input/output.
We can chain multiple processes using pipes.
```sh
x2x +sd < data.raw | clip | x2x +da | less
```
- The data type is basically little-endian double 8 bytes.
- The commands do not require interactive user inputs.
Parameters are set via command line options beforehand.
```sh
impulse -l 4 | sopr -m 10 | x2x +da
```## Documentation
- See [this page](https://sp-nitech.github.io/sptk/latest/) for a reference manual.
- Our [paper](https://www.isca-speech.org/archive/ssw_2023/yoshimura23_ssw.html) is available on the ISCA Archive.## Requirements
- GCC 4.8.5+ / Clang 3.5.0+ / Visual Studio 2015+
- CMake 3.1+## Installation
### Linux / macOS
expand
The latest release can be downloaded through Git.
The install procedure is as follows.```sh
git clone https://github.com/sp-nitech/SPTK.git
cd SPTK
make
```Then the SPTK commands can be used by adding `bin/` directory to the `PATH` environment variable.
If you would like to use a part of the SPTK functions, please link the static library `lib/libsptk.a`.### Windows
expand
You may need to add `cmake` and `MSBuild` to the `PATH` environment variable in advance.
Please run `make.bat` or open Command Prompt and follow the below procedure:```sh
cd /path/to/SPTK # Please change here to your appropriate path.
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=.. # Please change install directory.
MSBuild /p:Configuration=Release INSTALL.vcxproj
```You can compile SPTK via GUI instead of running MSBuild by opening the generated project file.
Then the SPTK functions can be used by linking the static library `lib/sptk.lib`.## Demonstration
- [Twitter](https://twitter.com/SPTK_DSP)
- [Analysis-synthesis](https://colab.research.google.com/drive/1spX1v9mk6Itxa63R4wYwekzduvLeaUmE?usp=sharing) via mel-cepstrum
- [Parametric coding](https://colab.research.google.com/drive/1NeZxrWiNeixAWaux_HIBLbtaSeokUNiG?usp=sharing) via line spectral pairs## Examples
SPTK provides some examples.
Go to an example directory and execute `run.sh`, e.g.,```sh
cd egs/analysis_synthesis/mgc
./run.sh
```The below is a simple example that decreases the volume of input audio in `input.wav`.
You may need to install `sox` command on your system.```sh
sox -t wav input.wav -c 1 -t s16 -r 16000 - |
x2x +sd | sopr -m 0.5 | x2x +ds -r |
sox -c 1 -t s16 -r 16000 - -t wav output.wav
```If you would like to draw figures, please prepare a python environment.
```sh
cd tools; make venv PYTHON_VERSION=3.8; cd ..
. ./tools/venv/bin/activate
impulse -l 32 | gseries impulse.png
deactivate
```## Changes from SPTK3
- **Input and output types are changed to double from float**
- Signal processing classes are written in C++ instead of C
- Drawing commands are implemented in Python
- Some option names
- No memory leaks
- Thread-safe
- New main features:
- Aperiodicity extraction (`ap`)
- Dynamic range compression (`drc`)
- Magic number interpolation (`magic_intpl`)
- Median filter (`medfilt`)
- Mel-filter-bank extraction (`fbank`)
- Nonrecursive MLPG (`mlpg -R 1`)
- Pitch adaptive spectrum estimation (`pitch_spec`)
- Pitch extraction used in WORLD (`pitch -a 3` and `pitch -a 4`)
- PLP extraction (`plp`)
- Sinusoidal generation from pitch (`pitch2sin`)
- Subband decomposition (`pqmf` and `ipqmf`)
- WORLD synthesis (`world_synth`)
- Windows build support
- Obsoleted commands:
- `acep`, `agcep`, and `amcep` -> `amgcep`
- `bell`
- `c2sp` -> `mgc2sp`
- `cat2` and `echo2`
- `da`
- `ds`, `us`, `us16`, and `uscd` -> `sox`
- `fig`
- `gc2gc` -> `mgc2mgc`
- `gcep`, `mcep`, and `uels` -> `mgcep`
- `glsadf`, `lmadf`, and `mlsadf` -> `mglsadf`
- `ivq` and `vq` -> `imsvq` and `msvq`
- `lsp2sp` -> `mglsp2sp`
- `mgc2mgclsp` and `mgclsp2mgc`
- `psgr` and `xgr`
- `raw2wav`, `wav2raw`, `wavjoin`, and `wavsplit` -> `sox`
- Separated commands:
- `c2ir` -> `c2mpir` and `mpir2c`
- `dtw` -> `dtw` and `dtw_merge`
- `mglsadf` -> `mglsadf` and `imglsadf`
- `train` -> `train` and `mseq`
- `ulaw` -> `ulaw` and `iulaw`
- `vstat` -> `vstat` and `median`
- Renamed commands:
- `mgclsp2sp` -> `mglsp2sp`## Who we are
- **Keiichi Tokuda** - *Produce and Design* - [Nagoya Institute of Technology](http://www.sp.nitech.ac.jp/~tokuda/)
- **Keiichiro Oura** - [Nagoya Institute of Technology](http://www.sp.nitech.ac.jp/~uratec/)
- **Takenori Yoshimura** - *Main Maintainer* - [Nagoya Institute of Technology](http://www.sp.nitech.ac.jp/~takenori/)
- **Takato Fujimoto** - [Nagoya Institute of Technology](http://www.sp.nitech.ac.jp/~taka19/)## Contributors to former versions of SPTK
- Akira Tamamori
- Cassia Valentini
- Chiyomi Miyajima
- Fernando Gil Resende Junior
- Gou Hirabayashi
- Heiga Zen
- Junichi Yamagishi
- Kazuhito Koishida
- Keiichi Tokuda
- Keiichiro Oura
- Kenji Chiba
- Masatsune Tamura
- Naohiro Isshiki
- Noboru Miyazaki
- Satoshi Imai
- Shinji Sako
- Tadashi Kitamura
- Takao Kobayashi
- Takashi Masuko
- Takashi Nose
- Takato Fujimoto
- Takayoshi Yoshimura
- Takenori Yoshimura
- Toru Takahashi
- Toshiaki Fukada
- Toshihiko Kato
- Toshio Kanno
- Yoshihiko Nankaku## License
This software is released under the Apache License 2.0.
## Citation
```bibtex
@InProceedings{sp-nitech2023sptk,
author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
title = {{SPTK4}: An open-source software toolkit for speech signal processing},
booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},
pages = {211--217},
year = {2023},
}
```