{"id":15896573,"url":"https://github.com/mthrok/tkaldi","last_synced_at":"2026-05-18T11:10:02.149Z","repository":{"id":56004351,"uuid":"314643658","full_name":"mthrok/tkaldi","owner":"mthrok","description":"Kaldi-ASR powered by PyTorch C++ API (Experimental)","archived":false,"fork":false,"pushed_at":"2020-12-01T20:09:56.000Z","size":484,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-19T09:52:26.895Z","etag":null,"topics":["asr","kaldi","pytorch"],"latest_commit_sha":null,"homepage":"https://mthrok.github.io/tkaldi/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mthrok.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-20T18:56:36.000Z","updated_at":"2020-12-02T04:01:51.000Z","dependencies_parsed_at":"2022-08-15T11:10:46.937Z","dependency_job_id":null,"html_url":"https://github.com/mthrok/tkaldi","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mthrok/tkaldi","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mthrok%2Ftkaldi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mthrok%2Ftkaldi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mthrok%2Ftkaldi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mthrok%2Ftkaldi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mthrok","download_url":"https://codeload.github.com/mthrok/tkaldi/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mthrok%2Ftkaldi/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33175961,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-18T09:27:30.708Z","status":"ssl_error","status_checked_at":"2026-05-18T09:27:28.300Z","response_time":71,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","kaldi","pytorch"],"created_at":"2024-10-06T09:20:29.786Z","updated_at":"2026-05-18T11:10:02.121Z","avatar_url":"https://github.com/mthrok.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![CircleCI](https://circleci.com/gh/mthrok/tkaldi.svg?style=svg)](https://circleci.com/gh/mthrok/tkaldi?branch=main)\n\n# tKaldi\n\nYet Another Aproach to Port Kaldi\n\nThis is an experimental attempt to re-write Kaldi's matrix library with PyTorch's C++ API.\n\nNote: This is my Sunday project. \n\n## Approach to Port Kaldi\n\nThis project aims to implement the following classes as wrppers around\nPyTorch's `torch::Tensor` class.\n\n**Vector Classes**\n - `kaldi::VectorBase`\n - `kaldi::Vector`\n - `kaldi::SubVector`\n\n**Matrix Classes**\n - `kaldi::MatrixBase`\n - `kaldi::Matrix`\n - `kaldi::SubMatrix`\n\n(You can check out the code from [here](./src/libtkaldi/src).)\n\nTheoretically, by swapping the original source codes with these implementations,\nwe should be able to build the reset of Kaldi libraries.\n(Except the parts related to CUDA and OpenFST, which I have not looked into.)\n\nOnce we build the Kaldi code with PyTorch's backend, it should be fairly easy to\nbuild the PyTorch binding of the resulting library, and this means that we can call\nKaldi functions from PyTorch natively.\n\n## Execution\n\nSince Kaldi's code base is huge, it is difficult to start by forking it and modifying it.\nInstead, I took a bottom up approach, which is, deciding on a target feature that I want\nto port, and then implementing the necessary interface of Vector/Matrix classes.\n\nWhen compiling the target feature, the source code of the target features are copied to\nthe workspace with minimum modification. Interestingly, all I had to do so far was to \ncomment out some `#include` statements, which are not directly related to the target feature,\nand swapping some type definitions. You can checkout these in [kaldi.patch](./kaldi.patch).\n\nFor the initial target feature, I choese [`ComputeKaldiPitch`](https://github.com/kaldi-asr/kaldi/blob/7fb716aa0f56480af31514c7e362db5c9f787fd4/src/feat/pitch-functions.h#L411-L419) and the corresponding CLI, [`compute-kaldi-pitch-feats`](https://github.com/kaldi-asr/kaldi/blob/7fb716aa0f56480af31514c7e362db5c9f787fd4/src/featbin/compute-kaldi-pitch-feats.cc).\n\nI am porting these features in the following manner.\n\n### Phase 1 - Port `ComputeKaldiPitch`\n\nThe goal of this phase is to have `ComputeKaldiPitch` function that produces the exact same result \nas the original implementation. The performance of the function does not matter. In fact, since the\nresulting Vector / Matrix classes are wrapper around `torch::Tensor`, and `torch::Tensor` is backed\nby a similar (or same) BLAS library, while Kaldi's original implementation directly calls the BLAS \nlibrary, it is expected to be slower or at the same speed at best.\n\n- [x] Implement the minimal set of methods from Vector / Matrix classes. [016ab2e7](https://github.com/mthrok/tkaldi/tree/016ab2e7d757ae654607fc60dfceadc2a6c26ada/src/libtkaldi/src/matrix)\n- [x] Compile `ComputeKaldiPitch`.\n- [x] Bind the resulting `ComputeKaldiPitch` to Python. [src](https://github.com/mthrok/tkaldi/blob/016ab2e7d757ae654607fc60dfceadc2a6c26ada/src/libtkaldi/register.cc#L18-L66)\n- [x] Check the parity of the Python function and `compute-kaldi-pitch-feats` from the original code. [test](https://app.circleci.com/pipelines/github/mthrok/tkaldi/50/workflows/d2ba7389-4088-47db-b315-45b3f863c0c3)\n\n### Phase 2 - Port `compute-kaldi-pitch-feats`\n\nThe next step is to port `compute-kaldi-pitch-feats` CLI so that I can compare the speed of the \noriginal CLI and the ported version.\n\n- [x] Extend the Vector / Matrix classes [bc8ac3c0](https://github.com/mthrok/tkaldi/tree/bc8ac3c0e85c4cb08242c837f7ccaf39b49ca619/src/libtkaldi/src/matrix).\n- [x] Compile `compute-kaldi-pitch-feats` (#12)\n- [ ] Compare the speed of the original `compute-kaldi-pitch-feats` and ported one.\n\n### Phase 3 - Improve the performace of `ComputeKaldiPitch`\n\nThe third step is to improve the speed of `ComputeKaldiPitch` by modifying the implementation to take\nadvantage of PyTorch's C++ API. (and potentially getting rid of Vector / Matrix classes).\n\n- [ ] Vectorize the operation and get rid of sequential element access.\n- [ ] Parallelize operations.\n- [ ] (Optional) Enable GPU support.\n\n## Build\n\nBecause of the approach explained in the previous section, this repository is not a fork of the original Kaldi.\nInstead, this repository references Kaldi as `git-submodule` and copy the required source codes from them.\n\n[tools.py](./tools.py) facilitates this process.\n\n**Note** When changing the list of source files under source control in [`src/libtkaldi/src`](./src/libtkaldi/src),\nedit [`.gitignore`](.gitignore) and [`tools.py`](./tools.py)\n\n* `./tools.py init`  \nThis will sync the Kaldi submodule (in [`third_party/kaldi`](./third_party)), clean up the any changes present there,\nthen apply the patch form [`kaldi.patch`](./kaldi.patch).\n\n* `./tools.py dev`  \nThis will run `git-clean` on the current [`src/libtkaldi`](./src/libtkaldi) (so that files that are not\nunder source control will be removed), copy the designated source codes from `third_party/kaldi` directory,\nthen run `python setup.py develop` to build the library.\n\n* `./tools.py stash`\nThis will stash the changes made to Kaldi submodule to [`kaldi.patch`](./kaldi.patch). When you apply change to\nthe original source code of Kaldi and you need to persist the change accross commits, you need to check-in the patch.\n\n### Getting Started\n\n```\ngit clone https://github.com/mthrok/tkaldi\ncd tkaldi\n./tools.py init\n```\n\n### Building and Runnig test\n\n```\n./tools.py dev\npytest tests\n```\n\n## Requirements\n\n```\npytorch \u003e= 1.7\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmthrok%2Ftkaldi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmthrok%2Ftkaldi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmthrok%2Ftkaldi/lists"}