{"id":25740173,"url":"https://github.com/sp-nitech/sptk","last_synced_at":"2025-12-24T17:45:20.697Z","repository":{"id":41055478,"uuid":"103337202","full_name":"sp-nitech/SPTK","owner":"sp-nitech","description":"A suite of speech signal processing tools","archived":false,"fork":false,"pushed_at":"2025-03-11T05:16:45.000Z","size":5838,"stargazers_count":232,"open_issues_count":0,"forks_count":27,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-04-12T19:48:16.191Z","etag":null,"topics":["audio-processing","cepstrum","cpp","dsp","lpc","lsp","mfcc","signal-processing","speech","speech-processing","sptk","unix-command"],"latest_commit_sha":null,"homepage":"http://sp-tk.sourceforge.net","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sp-nitech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-13T01:15:34.000Z","updated_at":"2025-03-11T05:16:48.000Z","dependencies_parsed_at":"2022-08-10T01:31:12.606Z","dependency_job_id":"de081bfb-dc3a-42bd-9c6e-56de14ca412b","html_url":"https://github.com/sp-nitech/SPTK","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sp-nitech%2FSPTK","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sp-nitech%2FSPTK/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sp-nitech%2FSPTK/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sp-nitech%2FSPTK/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sp-nitech","download_url":"https://codeload.github.com/sp-nitech/SPTK/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248625498,"owners_count":21135513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-processing","cepstrum","cpp","dsp","lpc","lsp","mfcc","signal-processing","speech","speech-processing","sptk","unix-command"],"created_at":"2025-02-26T08:37:09.057Z","updated_at":"2025-12-24T17:45:20.691Z","avatar_url":"https://github.com/sp-nitech.png","language":"C++","readme":"# SPTK\n\nThe Speech Signal Processing Toolkit (SPTK) is a software for speech signal processing tools.\n\n- Older version: [SPTK3](https://sourceforge.net/projects/sp-tk/)\n- PyTorch version: [diffsptk](https://github.com/sp-nitech/diffsptk)\n\n[![](https://img.shields.io/badge/docs-latest-blue.svg)](https://sp-nitech.github.io/sptk/latest/)\n[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://sp-nitech.github.io/sptk/4.4/)\n[![](https://img.shields.io/badge/license-Apache%202.0-green.svg)](https://github.com/sp-nitech/SPTK/blob/master/LICENSE)\n[![](https://github.com/sp-nitech/SPTK/workflows/build/badge.svg)](https://github.com/sp-nitech/SPTK/actions)\n\n## What is SPTK?\n\n- SPTK consists of over 100 commands for speech signal processing.\n- The data format used in SPTK is raw header-less, i.e., there is no specific structure.\n  Thanks to the data format, we can check file contents immediately on CUI.\n  ```sh\n  dmp +s data.raw\n  ```\n- The data used in the commands is passed through standard input/output.\n  We can chain multiple processes using pipes.\n  ```sh\n  x2x +sd \u003c data.raw | clip | x2x +da | less\n  ```\n- The data type is basically little-endian double 8 bytes.\n- The commands do not require interactive user inputs.\n  Parameters are set via command line options beforehand.\n  ```sh\n  impulse -l 4 | sopr -m 10 | x2x +da\n  ```\n\n## Documentation\n\n- Refer to the reference [manual](https://sp-nitech.github.io/sptk/latest/).\n- Refer to the tutorial [slides](https://speakerdeck.com/takenori/introduction-to-sptk-a-toolkit-for-speech-signal-processing).\n- Our [paper](https://www.isca-archive.org/ssw_2023/yoshimura23_ssw.html) is available on the ISCA Archive.\n\n## Requirements\n\n- GCC 4.8.5+ / Clang 3.5.0+ / Visual Studio 2015+\n- CMake 3.1+\n\n## Installation\n\n### Linux / macOS\n\n\u003cdetails\u003e\u003csummary\u003eexpand\u003c/summary\u003e\u003cdiv\u003e\n\nThe latest release can be downloaded through Git.\nThe install procedure is as follows.\n\n```sh\ngit clone https://github.com/sp-nitech/SPTK.git\ncd SPTK\nmake\n```\n\nThen the SPTK commands can be used by adding `bin/` directory to the `PATH` environment variable.\nIf you would like to use a part of the SPTK functions, please link the static library `lib/libsptk.a`.\n\n\u003c/div\u003e\u003c/details\u003e\n\n### Windows\n\n\u003cdetails\u003e\u003csummary\u003eexpand\u003c/summary\u003e\u003cdiv\u003e\n\nYou may need to add `cmake` and `MSBuild` to the `PATH` environment variable in advance.\nPlease run `make.bat` or open Command Prompt and follow the below procedure:\n\n```sh\ncd /path/to/SPTK  # Please change here to your appropriate path.\nmkdir build\ncd build\ncmake .. -DCMAKE_INSTALL_PREFIX=..  # Please change install directory.\nMSBuild /p:Configuration=Release INSTALL.vcxproj\n```\n\nYou can compile SPTK via GUI instead of running MSBuild by opening the generated project file.\nThen the SPTK functions can be used by linking the static library `lib/sptk.lib`.\n\n\u003c/div\u003e\u003c/details\u003e\n\n## Demonstration\n\n- [Twitter](https://twitter.com/SPTK_DSP)\n- [Tutorial](https://colab.research.google.com/drive/1vmbIJQDhT5F26eCE5iYKQuEEGxYUv-uJ?usp=drive_link) on Google Colab\n\n## Examples\n\nSPTK provides some examples.\nGo to an example directory and execute `run.sh`, e.g.,\n\n```sh\ncd egs/analysis_synthesis/mgc\n./run.sh\n```\n\nThe below is a simple example that decreases the volume of input audio in `input.wav`.\n\n```sh\nwav2raw +s input.wav | x2x +sd | sopr -m 0.5 | x2x +ds -r | raw2wav +s -s 16 \u003e output.wav\n```\n\nIf you would like to draw figures, please prepare a python environment.\n\n```sh\ncd tools; make venv PYTHON_VERSION=3.8; cd ..\n. ./tools/venv/bin/activate\nimpulse -l 32 | gseries impulse.png\ndeactivate\n```\n\n## Changes from SPTK3\n\n- **Input and output types are changed to double from float**\n- Signal processing classes are written in C++ instead of C\n- Drawing commands are implemented in Python\n- Some option names\n- No memory leaks\n- Thread-safe\n- New main features:\n  - Aperiodicity extraction (`ap`)\n  - Dynamic range compression (`drc`)\n  - Magic number interpolation (`magic_intpl`)\n  - Median filter (`medfilt`)\n  - Mel-filter-bank extraction (`fbank`)\n  - Nonrecursive MLPG (`mlpg -R 1`)\n  - Pitch adaptive spectrum estimation (`pitch_spec`)\n  - Pitch extraction used in WORLD (`pitch -a 3` and `pitch -a 4`)\n  - PLP extraction (`plp`)\n  - Sinusoidal generation from pitch (`pitch2sin`)\n  - Subband decomposition (`pqmf` and `ipqmf`)\n  - WORLD synthesis (`world_synth`)\n  - Windows build support\n- Obsoleted commands:\n  - `acep`, `agcep`, and `amcep` -\u003e `amgcep`\n  - `bell`\n  - `c2sp` -\u003e `mgc2sp`\n  - `cat2` and `echo2`\n  - `da`\n  - `ds`, `us`, `us16`, and `uscd` -\u003e `sox` or `ffmpeg`\n  - `fig`\n  - `gc2gc` -\u003e `mgc2mgc`\n  - `gcep`, `mcep`, and `uels` -\u003e `mgcep`\n  - `glsadf`, `lmadf`, and `mlsadf` -\u003e `mglsadf`\n  - `ivq` and `vq` -\u003e `imsvq` and `msvq`\n  - `lsp2sp` -\u003e `mglsp2sp`\n  - `mgc2mgclsp` and `mgclsp2mgc`\n  - `psgr` and `xgr`\n  - `wavjoin` and `wavsplit`\n- Separated commands:\n  - `c2ir` -\u003e `c2mpir` and `mpir2c`\n  - `dtw` -\u003e `dtw` and `dtw_merge`\n  - `mglsadf` -\u003e `mglsadf` and `imglsadf`\n  - `train` -\u003e `train` and `mseq`\n  - `ulaw` -\u003e `ulaw` and `iulaw`\n  - `vstat` -\u003e `vstat` and `median`\n- Renamed commands:\n  - `mgclsp2sp` -\u003e `mglsp2sp`\n\n## Who we are\n\n- **Keiichi Tokuda** - *Produce and Design* - [Nagoya Institute of Technology](http://www.sp.nitech.ac.jp/~tokuda/)\n- **Keiichiro Oura** - [Nagoya Institute of Technology](http://www.sp.nitech.ac.jp/~uratec/)\n- **Takenori Yoshimura** - *Main Maintainer* - [Nagoya Institute of Technology](http://www.sp.nitech.ac.jp/~takenori/)\n- **Takato Fujimoto** - [Nagoya Institute of Technology](http://www.sp.nitech.ac.jp/~taka19/)\n\n## Contributors to former versions of SPTK\n\n- Akira Tamamori\n- Cassia Valentini\n- Chiyomi Miyajima\n- Fernando Gil Resende Junior\n- Gou Hirabayashi\n- Heiga Zen\n- Junichi Yamagishi\n- Kazuhito Koishida\n- Keiichi Tokuda\n- Keiichiro Oura\n- Kenji Chiba\n- Masatsune Tamura\n- Naohiro Isshiki\n- Noboru Miyazaki\n- Satoshi Imai\n- Shinji Sako\n- Tadashi Kitamura\n- Takao Kobayashi\n- Takashi Masuko\n- Takashi Nose\n- Takato Fujimoto\n- Takayoshi Yoshimura\n- Takenori Yoshimura\n- Toru Takahashi\n- Toshiaki Fukada\n- Toshihiko Kato\n- Toshio Kanno\n- Yoshihiko Nankaku\n\n## License\n\nThis software is released under the Apache License 2.0.\n\n## Third-party software licenses\n\nThis project incorporates the following third-party libraries.\n\n- Pitch extraction\n  - [Snack](https://github.com/scottypitcher/tcl-snack) - Tcl/Tk License\n  - [SWIPE'](https://github.com/kylebgorman/swipe) - MIT License\n  - [REAPER](https://github.com/google/REAPER) - Apache License 2.0\n  - [WORLD](https://github.com/mmorise/World) - 3-Clause BSD License\n- Pitch-adaptive spectral estimation / Aperiodicity estimation\n  - [WORLD](https://github.com/mmorise/World) - 3-Clause BSD License\n- Audio format conversion\n  - [dr_libs](https://github.com/mackron/dr_libs) - Public Domain / MIT License\n  - [stb](https://github.com/nothings/stb) - Public Domain / MIT License\n- Command-line parser\n  - [ya_getopt](https://github.com/kubo/ya_getopt) - 2-Clause BSD License\n\n## Citation\n\n```bibtex\n@InProceedings{sp-nitech2023sptk,\n  author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},\n  title = {{SPTK4}: An open-source software toolkit for speech signal processing},\n  booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},\n  pages = {211--217},\n  year = {2023},\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsp-nitech%2Fsptk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsp-nitech%2Fsptk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsp-nitech%2Fsptk/lists"}