{"id":22199100,"url":"https://github.com/sp-nitech/diffsptk","last_synced_at":"2025-05-16T07:07:14.872Z","repository":{"id":37923750,"uuid":"467545959","full_name":"sp-nitech/diffsptk","owner":"sp-nitech","description":"A differentiable version of SPTK","archived":false,"fork":false,"pushed_at":"2025-05-08T13:19:39.000Z","size":1729,"stargazers_count":182,"open_issues_count":0,"forks_count":16,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-05-08T14:27:24.058Z","etag":null,"topics":["cepstrum","cqt","ddsp","deep-learning","digital-signal-processing","dsp","gmm","k-means","lpc","lsp","mdct","mfcc","nmf","plp","pqmf","python","pytorch","signal-processing","sptk","stft"],"latest_commit_sha":null,"homepage":"http://sp-tk.sourceforge.net","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sp-nitech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-08T14:28:14.000Z","updated_at":"2025-05-08T13:19:39.000Z","dependencies_parsed_at":"2024-01-09T10:46:55.929Z","dependency_job_id":"53f495f1-ef4c-4e32-8e48-904a48588d68","html_url":"https://github.com/sp-nitech/diffsptk","commit_stats":{"total_commits":257,"total_committers":1,"mean_commits":257.0,"dds":0.0,"last_synced_commit":"56f857f32d3b074db3f94671a1639b4187f32ba7"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sp-nitech%2Fdiffsptk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sp-nitech%2Fdiffsptk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sp-nitech%2Fdiffsptk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sp-nitech%2Fdiffsptk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sp-nitech","download_url":"https://codeload.github.com/sp-nitech/diffsptk/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254485065,"owners_count":22078767,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cepstrum","cqt","ddsp","deep-learning","digital-signal-processing","dsp","gmm","k-means","lpc","lsp","mdct","mfcc","nmf","plp","pqmf","python","pytorch","signal-processing","sptk","stft"],"created_at":"2024-12-02T15:13:18.857Z","updated_at":"2025-05-16T07:07:09.856Z","avatar_url":"https://github.com/sp-nitech.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# diffsptk\n\n*diffsptk* is a differentiable version of [SPTK](https://github.com/sp-nitech/SPTK) based on the PyTorch framework.\n\n[![Manual](https://img.shields.io/badge/docs-stable-blue.svg)](https://sp-nitech.github.io/diffsptk/3.0.2/)\n[![Downloads](https://static.pepy.tech/badge/diffsptk)](https://pepy.tech/project/diffsptk)\n[![ClickPy](https://img.shields.io/badge/downloads-clickpy-yellow.svg)](https://clickpy.clickhouse.com/dashboard/diffsptk)\n[![Python Version](https://img.shields.io/pypi/pyversions/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)\n[![PyTorch Version](https://img.shields.io/badge/pytorch-2.3.1%20%7C%202.6.0-orange.svg)](https://pypi.python.org/pypi/diffsptk)\n[![PyPI Version](https://img.shields.io/pypi/v/diffsptk.svg)](https://pypi.python.org/pypi/diffsptk)\n[![Codecov](https://codecov.io/gh/sp-nitech/diffsptk/branch/master/graph/badge.svg)](https://app.codecov.io/gh/sp-nitech/diffsptk)\n[![License](https://img.shields.io/github/license/sp-nitech/diffsptk.svg)](https://github.com/sp-nitech/diffsptk/blob/master/LICENSE)\n[![GitHub Actions](https://github.com/sp-nitech/diffsptk/workflows/package/badge.svg)](https://github.com/sp-nitech/diffsptk/actions)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\n## Requirements\n\n- Python 3.10+\n- PyTorch 2.3.1+\n\n## Documentation\n\n- See [this page](https://sp-nitech.github.io/diffsptk/3.0.2/) for the reference manual.\n- Our [paper](https://www.isca-speech.org/archive/ssw_2023/yoshimura23_ssw.html) is available on the ISCA Archive.\n\n## Installation\n\nThe latest stable release can be installed through PyPI by running\n\n```sh\npip install diffsptk\n```\n\nThe development release can be installed from the master branch:\n\n```sh\npip install git+https://github.com/sp-nitech/diffsptk.git@master\n```\n\n## Examples\n\n### Running on a GPU\n\n```python\nimport diffsptk\n\nstft_params = {\"frame_length\": 400, \"frame_period\": 80, \"fft_length\": 512}\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\", device=\"cuda\")\n\n# Compute spectrogram using a nn.Module class.\nX1 = diffsptk.STFT(**stft_params).to(\"cuda\")(x)\n\n# Compute spectrogram using a functional method.\nX2 = diffsptk.functional.stft(x, **stft_params)\n\nassert X1.device == X2.device\nassert X1.allclose(X2)\n```\n\n### Mel-cepstral analysis and synthesis\n\n```python\nimport diffsptk\n\nfl = 400     # Frame length.\nfp = 80      # Frame period.\nn_fft = 512  # FFT length.\nM = 24       # Mel-cepstrum dimensions.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Compute STFT amplitude of x.\nstft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)\nX = stft(x)\n\n# Estimate mel-cepstrum of x.\nalpha = diffsptk.get_alpha(sr)\nmcep = diffsptk.MelCepstralAnalysis(\n    fft_length=n_fft,\n    cep_order=M,\n    alpha=alpha,\n    n_iter=10,\n)\nmc = mcep(X)\n\n# Reconstruct x.\nmlsa = diffsptk.MLSA(filter_order=M, frame_period=fp, alpha=alpha, taylor_order=20)\nx_hat = mlsa(mlsa(x, -mc), mc)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n\n# Extract pitch of x.\npitch = diffsptk.Pitch(\n    frame_period=fp,\n    sample_rate=sr,\n    f_min=80,\n    f_max=180,\n    voicing_threshold=0.4,\n    out_format=\"pitch\",\n)\np = pitch(x)\n\n# Generate excitation signal.\nexcite = diffsptk.ExcitationGeneration(frame_period=fp)\ne = excite(p)\nn = diffsptk.nrand(x.size(0) - 1)\n\n# Synthesize waveform.\nx_voiced = mlsa(e, mc)\nx_unvoiced = mlsa(n, mc)\n\n# Output analysis-synthesis result.\ndiffsptk.write(\"voiced.wav\", x_voiced, sr)\ndiffsptk.write(\"unvoiced.wav\", x_unvoiced, sr)\n```\n\n### WORLD analysis and mel-cepstral synthesis\n\n```python\nimport diffsptk\n\nfp = 80       # Frame period.\nn_fft = 1024  # FFT length.\nM = 24        # Mel-cepstrum dimensions.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Extract F0 of x, or prepare well-estimated F0.\npitch = diffsptk.Pitch(\n    frame_period=fp,\n    sample_rate=sr,\n    f_min=80,\n    f_max=180,\n    voicing_threshold=0.4,\n    out_format=\"f0\",\n)\nf0 = pitch(x)\n\n# Extract aperiodicity of x by D4C.\nap = diffsptk.Aperiodicity(\n    frame_period=fp,\n    sample_rate=sr,\n    fft_length=n_fft,\n    algorithm=\"d4c\",\n    out_format=\"a\",\n)\nA = ap(x, f0)\n\n# Extract spectral envelope of x by CheapTrick.\npitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(\n    frame_period=fp,\n    sample_rate=sr,\n    fft_length=n_fft,\n    algorithm=\"cheap-trick\",\n    out_format=\"power\",\n)\nH = pitch_spec(x, f0)\n\n# Estimate mel-cepstrum of x.\nalpha = diffsptk.get_alpha(sr)\nmcep = diffsptk.MelCepstralAnalysis(fft_length=n_fft, cep_order=M, alpha=alpha)\nmc_a = mcep(A)\nmc_h = mcep(H)\n\n# Generate excitation signals.\nexcite = diffsptk.ExcitationGeneration(frame_period=fp, unvoiced_region=\"zeros\")\np = (sr / f0).nan_to_num(posinf=0)\npulse = excite(p)\nnoise = diffsptk.nrand(len(pulse) - 1)\n\n# Make mixed excitation signal and reconstruct x.\nmlsa = diffsptk.MLSA(filter_order=M, frame_period=fp, alpha=alpha, taylor_order=20)\ne_p = pulse - mlsa(pulse, mc_a)\ne_a = mlsa(noise, mc_a)\ne = e_p + e_a\nx_hat = mlsa(e, mc_h)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n```\n\n### LPC analysis and synthesis\n\n```python\nimport diffsptk\n\nfl = 400  # Frame length.\nfp = 80   # Frame period.\nM = 24    # LPC dimensions.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Estimate LPC of x.\nframe = diffsptk.Frame(frame_length=fl, frame_period=fp)\nwindow = diffsptk.Window(in_length=fl)\nlpc = diffsptk.LPC(frame_length=fl, lpc_order=M, eps=1e-6)\na = lpc(window(frame(x)))\n\n# Convert to inverse filter coefficients.\nnorm0 = diffsptk.AllPoleToAllZeroDigitalFilterCoefficients(filter_order=M)\nb = norm0(a)\n\n# Reconstruct x.\nzerodf = diffsptk.AllZeroDigitalFilter(filter_order=M, frame_period=fp)\npoledf = diffsptk.AllPoleDigitalFilter(filter_order=M, frame_period=fp)\nx_hat = poledf(zerodf(x, b), a)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Mel-spectrogram, MFCC, and PLP extraction\n\n```python\nimport diffsptk\n\nfl = 400        # Frame length\nfp = 80         # Frame period\nn_fft = 512     # FFT length\nn_channel = 80  # Number of channels\nM = 12          # MFCC/PLP dimensions\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Compute STFT amplitude of x.\nstft = diffsptk.STFT(frame_length=fl, frame_period=fp, fft_length=n_fft)\nX = stft(x)\n\n# Extract log mel-spectrogram.\nfbank = diffsptk.MelFilterBankAnalysis(\n    fft_length=n_fft,\n    n_channel=n_channel,\n    sample_rate=sr,\n)\nY = fbank(X)\nprint(Y.shape)\n\n# Extract MFCC.\nmfcc = diffsptk.MFCC(\n    fft_length=n_fft,\n    mfcc_order=M,\n    n_channel=n_channel,\n    sample_rate=sr,\n)\nY = mfcc(X)\nprint(Y.shape)\n\n# Extract PLP.\nplp = diffsptk.PLP(\n    fft_length=n_fft,\n    plp_order=M,\n    n_channel=n_channel,\n    sample_rate=sr,\n)\nY = plp(X)\nprint(Y.shape)\n```\n\n### Subband decomposition\n\n```python\nimport diffsptk\n\nK = 4   # Number of subbands.\nM = 40  # Order of filter.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Decompose x.\npqmf = diffsptk.PQMF(K, M)\ndecimate = diffsptk.Decimation(K)\ny = decimate(pqmf(x))\n\n# Reconstruct x.\ninterpolate = diffsptk.Interpolation(K)\nipqmf = diffsptk.IPQMF(K, M)\nx_hat = ipqmf(interpolate(K * y)).reshape(-1)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Gammatone filter bank analysis and synthesis\n\n```python\nimport diffsptk\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Decompose x.\ngammatone = diffsptk.GammatoneFilterBankAnalysis(sr)\ny = gammatone(x)\n\n# Reconstruct x.\nigammatone = diffsptk.GammatoneFilterBankSynthesis(sr)\nx_hat = igammatone(y).reshape(-1)\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Constant-Q transform\n\n```python\nimport diffsptk\nimport librosa  # This is to get sample audio.\n\nfp = 128  # Frame period.\nK = 252   # Number of CQ-bins.\nB = 36    # Number of bins per octave.\n\n# Read waveform.\nx, sr = diffsptk.read(librosa.ex(\"trumpet\"))\n\n# Transform x.\ncqt = diffsptk.CQT(fp, sr, n_bin=K, n_bin_per_octave=B)\nc = cqt(x)\n\n# Reconstruct x.\nicqt = diffsptk.ICQT(fp, sr, n_bin=K, n_bin_per_octave=B)\nx_hat = icqt(c, out_length=x.size(0))\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Modified discrete cosine transform\n\n```python\nimport diffsptk\n\nfl = 512  # Frame length.\n\n# Read waveform.\nx, sr = diffsptk.read(\"assets/data.wav\")\n\n# Transform x.\nmdct = diffsptk.MDCT(fl)\nc = mdct(x)\n\n# Reconstruct x.\nimdct = diffsptk.IMDCT(fl)\nx_hat = imdct(c, out_length=x.size(0))\n\n# Write reconstructed waveform.\ndiffsptk.write(\"reconst.wav\", x_hat, sr)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n### Vector quantization\n\n```python\nimport diffsptk\n\nK = 2  # Codebook size.\nM = 4  # Order of vector.\n\n# Prepare input.\nx = diffsptk.nrand(M)\n\n# Quantize x.\nvq = diffsptk.VectorQuantization(M, K)\nx_hat, indices, commitment_loss = vq(x)\n\n# Compute error.\nerror = (x_hat - x).abs().sum()\nprint(error)\n```\n\n## License\n\nThis software is released under the Apache License 2.0.\n\n## Citation\n\n```bibtex\n@InProceedings{sp-nitech2023sptk,\n  author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},\n  title = {{SPTK4}: An open-source software toolkit for speech signal processing},\n  booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},\n  pages = {211--217},\n  year = {2023},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsp-nitech%2Fdiffsptk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsp-nitech%2Fdiffsptk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsp-nitech%2Fdiffsptk/lists"}