{"id":13418405,"url":"https://github.com/pykaldi/pykaldi","last_synced_at":"2025-04-13T02:00:05.066Z","repository":{"id":27004968,"uuid":"94806200","full_name":"pykaldi/pykaldi","owner":"pykaldi","description":"A Python wrapper for Kaldi","archived":false,"fork":false,"pushed_at":"2025-01-23T21:39:37.000Z","size":2815,"stargazers_count":1012,"open_issues_count":66,"forks_count":246,"subscribers_count":38,"default_branch":"master","last_synced_at":"2025-04-13T01:59:58.053Z","etag":null,"topics":["asr","clif","feature-extraction","kaldi","language-model","numpy","openfst","python","speech","speech-recognition","wrapper"],"latest_commit_sha":null,"homepage":"https://pykaldi.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pykaldi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-19T18:05:19.000Z","updated_at":"2025-04-09T07:35:47.000Z","dependencies_parsed_at":"2024-01-13T17:34:47.258Z","dependency_job_id":"61705026-da06-457b-a2a0-691c80f279f5","html_url":"https://github.com/pykaldi/pykaldi","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykaldi%2Fpykaldi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykaldi%2Fpykaldi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykaldi%2Fpykaldi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykaldi%2Fpykaldi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pykaldi","download_url":"https://codeload.github.com/pykaldi/pykaldi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248654046,"owners_count":21140235,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","clif","feature-extraction","kaldi","language-model","numpy","openfst","python","speech","speech-recognition","wrapper"],"created_at":"2024-07-30T22:01:01.907Z","updated_at":"2025-04-13T02:00:05.032Z","avatar_url":"https://github.com/pykaldi.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\u003cimg src=\"docs/_static/pykaldi-logo-dark.png\" width=\"40%\"/\u003e\u003c/p\u003e\n\n--------------------------------------------------------------------------------\n\n[![Build Status]][Travis]\n\nPyKaldi is a Python scripting layer for the [Kaldi] speech recognition toolkit.\nIt provides easy-to-use, low-overhead, first-class Python wrappers for the C++\ncode in Kaldi and [OpenFst] libraries. You can use PyKaldi to write Python code\nfor things that would otherwise require writing C++ code such as calling\nlow-level Kaldi functions, manipulating Kaldi and OpenFst objects in code or\nimplementing new Kaldi tools.\n\nYou can think of Kaldi as a large box of legos that you can mix and match to\nbuild custom speech recognition solutions. The best way to think of PyKaldi is\nas a supplement, a sidekick if you will, to Kaldi. In fact, PyKaldi is at its\nbest when it is used alongside Kaldi. To that end, replicating the functionality\nof myriad command-line tools, utility scripts and shell-level recipes provided\nby Kaldi is a non-goal for the PyKaldi project.\n\n\n## Overview\n\n- [Getting Started](#getting-started)\n- [About PyKaldi](#about-pykaldi)\n- [Coverage Status](#coverage-status)\n- [Installation](#installation)\n- [FAQ](#faq)\n- [Citing](#citing)\n- [Contributing](#contributing)\n\n\n## Getting Started\n\nLike Kaldi, PyKaldi is primarily intended for speech recognition researchers and\nprofessionals. It is jam packed with goodies that one would need to build Python\nsoftware taking advantage of the vast collection of utilities, algorithms and\ndata structures provided by Kaldi and OpenFst libraries.\n\nIf you are not familiar with FST-based speech recognition or have no interest in\nhaving access to the guts of Kaldi and OpenFst in Python, but only want to run a\npre-trained Kaldi system as part of your Python application, do not fret.\nPyKaldi includes a number of high-level application oriented modules, such as\n[`asr`], [`alignment`] and [`segmentation`], that should be accessible to most\nPython programmers.\n\nIf you are interested in using PyKaldi for research or building advanced ASR\napplications, you are in luck. PyKaldi comes with everything you need to read,\nwrite, inspect, manipulate or visualize Kaldi and OpenFst objects in Python. It\nincludes Python wrappers for most functions and methods that are part of the\npublic APIs of Kaldi and OpenFst C++ libraries. If you want to read/write files\nthat are produced/consumed by Kaldi tools, check out I/O and table utilities in\nthe [`util`] package. If you want to work with Kaldi matrices and vectors, e.g.\nconvert them to [NumPy] ndarrays and vice versa, check out the [`matrix`]\npackage. If you want to use Kaldi for feature extraction and transformation,\ncheck out the [`feat`], [`ivector`] and [`transform`] packages. If you want to\nwork with lattices or other FST structures produced/consumed by Kaldi tools,\ncheck out the [`fstext`], [`lat`] and [`kws`] packages. If you want low-level\naccess to Gaussian mixture models, hidden Markov models or phonetic decision\ntrees in Kaldi, check out the [`gmm`], [`sgmm2`], [`hmm`], and [`tree`]\npackages. If you want low-level access to Kaldi neural network models, check out\nthe [`nnet3`], [`cudamatrix`] and [`chain`] packages. If you want to use the\ndecoders and language modeling utilities in Kaldi, check out the [`decoder`],\n[`lm`], [`rnnlm`], [`tfrnnlm`] and [`online2`] packages.\n\nInterested readers who would like to learn more about Kaldi and PyKaldi might\nfind the following resources useful:\n\n* [Kaldi Docs]: Read these to learn more about Kaldi.\n* [PyKaldi Docs]: Consult these to learn more about the PyKaldi API.\n* [PyKaldi Examples]: Check these out to see PyKaldi in action.\n* [PyKaldi Paper]: Read this to learn more about the design of PyKaldi.\n\nSince automatic speech recognition (ASR) in Python is undoubtedly the \"killer\napp\" for PyKaldi, we will go over a few ASR scenarios to get a feel for the\nPyKaldi API. We should note that PyKaldi does not provide any high-level\nutilities for training ASR models, so you need to train your models using Kaldi\nrecipes or use pre-trained models available online. The reason why this is so is\nsimply because there is no high-level ASR training API in Kaldi C++ libraries.\nKaldi ASR models are trained using complex shell-level [recipes][Kaldi Recipes]\nthat handle everything from data preparation to the orchestration of myriad\nKaldi executables used in training. This is by design and unlikely to change in\nthe future. PyKaldi does provide wrappers for the low-level ASR training\nutilities in Kaldi C++ libraries but those are not really useful unless you want\nto build an ASR training pipeline in Python from basic building blocks, which is\nno easy task. Continuing with the lego analogy, this task is akin to building\n[this][Lego Chiron] given access to a truck full of legos you might need. If you\nare crazy enough to try though, please don't let this paragraph discourage you.\nBefore we started building PyKaldi, we thought that was a mad man's task too.\n\n### Automatic Speech Recognition in Python\n\nPyKaldi [`asr`] module includes a number of easy-to-use, high-level classes to\nmake it dead simple to put together ASR systems in Python. Ignoring the\nboilerplate code needed for setting things up, doing ASR with PyKaldi can be as\nsimple as the following snippet of code:\n\n```python\nasr = SomeRecognizer.from_files(\"final.mdl\", \"HCLG.fst\", \"words.txt\", opts)\n\nwith SequentialMatrixReader(\"ark:feats.ark\") as feats_reader:\n    for key, feats in feats_reader:\n        out = asr.decode(feats)\n        print(key, out[\"text\"])\n```\n\nIn this simplified example, we first instantiate a hypothetical recognizer\n`SomeRecognizer` with the paths for the model `final.mdl`, the decoding graph\n`HCLG.fst` and the symbol table `words.txt`. The `opts` object contains the\nconfiguration options for the recognizer. Then, we instantiate a [PyKaldi table\nreader][`util.table`] `SequentialMatrixReader` for reading the feature\nmatrices stored in the [Kaldi archive][Kaldi Archive Docs] `feats.ark`. Finally,\nwe iterate over the feature matrices and decode them one by one. Here we are\nsimply printing the best ASR hypothesis for each utterance so we are only\ninterested in the `\"text\"` entry of the output dictionary `out`. Keep in mind\nthat the output dictionary contains a bunch of other useful entries, such as the\nframe level alignment of the best hypothesis and a weighted lattice representing\nthe most likely hypotheses. Admittedly, not all ASR pipelines will be as simple\nas this example, but they will often have the same overall structure. In the\nfollowing sections, we will see how we can adapt the code given above to\nimplement more complicated ASR pipelines.\n\n\n#### Offline ASR using Kaldi Models\n\nThis is the most common scenario. We want to do offline ASR using pre-trained\nKaldi models, such as [ASpIRE chain models]. Here we are using the term \"models\"\nloosely to refer to everything one would need to put together an ASR system. In\nthis specific example, we are going to need:\n\n* a [neural network acoustic model][Kaldi Neural Network Docs],\n* a [transition model][Kaldi Transition Model Docs],\n* a [decoding graph][Kaldi Decoding Graph Docs],\n* a [word symbol table][Kaldi Symbol Table Docs],\n* and a couple of feature extraction [configs][Kaldi Config Docs].\n\nNote that you can use this example code to decode with [ASpIRE chain models].\n\n```python\nfrom kaldi.asr import NnetLatticeFasterRecognizer\nfrom kaldi.decoder import LatticeFasterDecoderOptions\nfrom kaldi.nnet3 import NnetSimpleComputationOptions\nfrom kaldi.util.table import SequentialMatrixReader, CompactLatticeWriter\n\n# Set the paths and read/write specifiers\nmodel_path = \"models/aspire/final.mdl\"\ngraph_path = \"models/aspire/graph_pp/HCLG.fst\"\nsymbols_path = \"models/aspire/graph_pp/words.txt\"\nfeats_rspec = (\"ark:compute-mfcc-feats --config=models/aspire/conf/mfcc.conf \"\n               \"scp:wav.scp ark:- |\")\nivectors_rspec = (feats_rspec + \"ivector-extract-online2 \"\n                  \"--config=models/aspire/conf/ivector_extractor.conf \"\n                  \"ark:spk2utt ark:- ark:- |\")\nlat_wspec = \"ark:| gzip -c \u003e lat.gz\"\n\n# Instantiate the recognizer\ndecoder_opts = LatticeFasterDecoderOptions()\ndecoder_opts.beam = 13\ndecoder_opts.max_active = 7000\ndecodable_opts = NnetSimpleComputationOptions()\ndecodable_opts.acoustic_scale = 1.0\ndecodable_opts.frame_subsampling_factor = 3\nasr = NnetLatticeFasterRecognizer.from_files(\n    model_path, graph_path, symbols_path,\n    decoder_opts=decoder_opts, decodable_opts=decodable_opts)\n\n# Extract the features, decode and write output lattices\nwith SequentialMatrixReader(feats_rspec) as feats_reader, \\\n     SequentialMatrixReader(ivectors_rspec) as ivectors_reader, \\\n     CompactLatticeWriter(lat_wspec) as lat_writer:\n    for (fkey, feats), (ikey, ivectors) in zip(feats_reader, ivectors_reader):\n        assert(fkey == ikey)\n        out = asr.decode((feats, ivectors))\n        print(fkey, out[\"text\"])\n        lat_writer[fkey] = out[\"lattice\"]\n```\n\nThe fundamental difference between this example and the short snippet from last\nsection is that for each utterance we are reading the raw audio data from disk\nand computing two feature matrices on the fly instead of reading a single\nprecomputed feature matrix from disk. The [script file][Kaldi Script File Docs]\n`wav.scp` contains a list of WAV files corresponding to the utterances we want\nto decode. The additional feature matrix we are extracting contains online\ni-vectors that are used by the neural network acoustic model to perform channel\nand speaker adaptation. The [speaker-to-utterance map][Kaldi Data Docs]\n`spk2utt` is used for accumulating separate statistics for each speaker in\nonline i-vector extraction. It can be a simple identity mapping if the speaker\ninformation is not available. We pack the MFCC features and the i-vectors into a\ntuple and pass this tuple to the recognizer for decoding. The neural network\nrecognizers in PyKaldi know how to handle the additional i-vector features when\nthey are available. The model file `final.mdl` contains both the transition\nmodel and the neural network acoustic model. The `NnetLatticeFasterRecognizer`\nprocesses feature matrices by first computing phone log-likelihoods using the\nneural network acoustic model, then mapping those to transition log-likelihoods\nusing the transition model and finally decoding transition log-likelihoods into\nword sequences using the decoding graph `HCLG.fst`, which has [transition\nIDs][Kaldi Transition Model Docs] on its input labels and [word IDs][Kaldi\nSymbol Table Docs] on its output labels. After decoding, we save the lattice\ngenerated by the recognizer to a Kaldi archive for future processing.\n\nThis example also illustrates the powerful [I/O mechanisms][Kaldi I/O Docs]\nprovided by Kaldi. Instead of implementing the feature extraction pipelines in\ncode, we define them as Kaldi read specifiers and compute the feature matrices\nsimply by instantiating [PyKaldi table readers][`util.table`] and\niterating over them. This is not only the simplest but also the fastest way of\ncomputing features with PyKaldi since the feature extraction pipeline is run in\nparallel by the operating system. Similarly, we use a Kaldi write specifier to\ninstantiate a [PyKaldi table writer][`util.table`] which writes output\nlattices to a compressed Kaldi archive. Note that for these to work, we need\n`compute-mfcc-feats`, `ivector-extract-online2` and `gzip` to be on our `PATH`.\n\n#### Offline ASR using a PyTorch Acoustic Model\n\nThis is similar to the previous scenario, but instead of a Kaldi acoustic model,\nwe use a [PyTorch] acoustic model. After computing the features as before, we\nconvert them to a PyTorch tensor, do the forward pass using a PyTorch neural\nnetwork module outputting phone log-likelihoods and finally convert those\nlog-likelihoods back into a PyKaldi matrix for decoding. The recognizer uses the\ntransition model to automatically map phone IDs to transition IDs, the input\nlabels on a typical Kaldi decoding graph.\n\n```python\nfrom kaldi.asr import MappedLatticeFasterRecognizer\nfrom kaldi.decoder import LatticeFasterDecoderOptions\nfrom kaldi.matrix import Matrix\nfrom kaldi.util.table import SequentialMatrixReader, CompactLatticeWriter\nfrom models import AcousticModel  # Import your PyTorch model\nimport torch\n\n# Set the paths and read/write specifiers\nacoustic_model_path = \"models/aspire/model.pt\"\ntransition_model_path = \"models/aspire/final.mdl\"\ngraph_path = \"models/aspire/graph_pp/HCLG.fst\"\nsymbols_path = \"models/aspire/graph_pp/words.txt\"\nfeats_rspec = (\"ark:compute-mfcc-feats --config=models/aspire/conf/mfcc.conf \"\n               \"scp:wav.scp ark:- |\")\nlat_wspec = \"ark:| gzip -c \u003e lat.gz\"\n\n# Instantiate the recognizer\ndecoder_opts = LatticeFasterDecoderOptions()\ndecoder_opts.beam = 13\ndecoder_opts.max_active = 7000\nasr = MappedLatticeFasterRecognizer.from_files(\n    transition_model_path, graph_path, symbols_path, decoder_opts=decoder_opts)\n\n# Instantiate the PyTorch acoustic model (subclass of torch.nn.Module)\nmodel = AcousticModel(...)\nmodel.load_state_dict(torch.load(acoustic_model_path))\nmodel.eval()\n\n# Extract the features, decode and write output lattices\nwith SequentialMatrixReader(feats_rspec) as feats_reader, \\\n     CompactLatticeWriter(lat_wspec) as lat_writer:\n    for key, feats in feats_reader:\n        feats = torch.from_numpy(feats.numpy())  # Convert to PyTorch tensor\n        loglikes = model(feats)                  # Compute log-likelihoods\n        loglikes = Matrix(loglikes.numpy())      # Convert to PyKaldi matrix\n        out = asr.decode(loglikes)\n        print(key, out[\"text\"])\n        lat_writer[key] = out[\"lattice\"]\n```\n\n#### Online ASR using Kaldi Models\n\nThis section is a placeholder. Check out [this script][PyKaldi Online ASR\nExample] in the meantime.\n\n#### Lattice Rescoring with a Kaldi RNNLM\n\nLattice rescoring is a standard technique for using large n-gram language models\nor recurrent neural network language models (RNNLMs) in ASR. In this example, we\nrescore lattices using a Kaldi RNNLM. We first instantiate a rescorer by\nproviding the paths for the models. Then we use a table reader to iterate over\nthe lattices we want to rescore and finally we use a table writer to write\nrescored lattices back to disk.\n\n```python\nfrom kaldi.asr import LatticeRnnlmPrunedRescorer\nfrom kaldi.fstext import SymbolTable\nfrom kaldi.rnnlm import RnnlmComputeStateComputationOptions\nfrom kaldi.util.table import SequentialCompactLatticeReader, CompactLatticeWriter\n\n# Set the paths, extended filenames and read/write specifiers\nsymbols_path = \"models/tedlium/config/words.txt\"\nold_lm_path = \"models/tedlium/data/lang_nosp/G.fst\"\nword_feats_path = \"models/tedlium/word_feats.txt\"\nfeat_embedding_path = \"models/tedlium/feat_embedding.final.mat\"\nword_embedding_rxfilename = (\"rnnlm-get-word-embedding %s %s - |\"\n                             % (word_feats_path, feat_embedding_path))\nrnnlm_path = \"models/tedlium/final.raw\"\nlat_rspec = \"ark:gunzip -c lat.gz |\"\nlat_wspec = \"ark:| gzip -c \u003e rescored_lat.gz\"\n\n# Instantiate the rescorer\nsymbols = SymbolTable.read_text(symbols_path)\nopts = RnnlmComputeStateComputationOptions()\nopts.bos_index = symbols.find_index(\"\u003cs\u003e\")\nopts.eos_index = symbols.find_index(\"\u003c/s\u003e\")\nopts.brk_index = symbols.find_index(\"\u003cbrk\u003e\")\nrescorer = LatticeRnnlmPrunedRescorer.from_files(\n    old_lm_path, word_embedding_rxfilename, rnnlm_path, opts=opts)\n\n# Read the lattices, rescore and write output lattices\nwith SequentialCompactLatticeReader(lat_rspec) as lat_reader, \\\n     CompactLatticeWriter(lat_wspec) as lat_writer:\n  for key, lat in lat_reader:\n    lat_writer[key] = rescorer.rescore(lat)\n```\n\nNotice the extended filename we used to compute the word embeddings from the\nword features and the feature embeddings on the fly. Also of note are the\nread/write specifiers we used to transparently decompress/compress the lattice\narchives. For these to work, we need `rnnlm-get-word-embedding`, `gunzip` and\n`gzip` to be on our `PATH`.\n\n\n## About PyKaldi\n\nPyKaldi aims to bridge the gap between Kaldi and all the nice things Python has\nto offer. It is more than a collection of bindings into Kaldi libraries. It is a\nscripting layer providing first class support for essential Kaldi and [OpenFst]\ntypes in Python. PyKaldi vector and matrix types are tightly integrated with\n[NumPy]. They can be seamlessly converted to NumPy arrays and vice versa without\ncopying the underlying memory buffers. PyKaldi FST types, including Kaldi style\nlattices, are first class citizens in Python. The API for the user facing FST\ntypes and operations is almost entirely defined in Python mimicking the API\nexposed by [pywrapfst], the official Python wrapper for OpenFst.\n\nPyKaldi harnesses the power of [CLIF] to wrap Kaldi and OpenFst C++ libraries\nusing simple API descriptions. The CPython extension modules generated by CLIF\ncan be imported in Python to interact with Kaldi and OpenFst. While CLIF is\ngreat for exposing existing C++ API in Python, the wrappers do not always expose\na \"Pythonic\" API that is easy to use from Python. PyKaldi addresses this by\nextending the raw CLIF wrappers in Python (and sometimes in C++) to provide a\nmore \"Pythonic\" API. Below figure illustrates where PyKaldi fits in the Kaldi\necosystem.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/_static/pykaldi-architecture.png\" alt=\"Architecture\" width=\"400\"/\u003e\n\u003c/p\u003e\n\nPyKaldi has a modular design which makes it easy to maintain and extend. Source\nfiles are organized in a directory tree that is a replica of the Kaldi source\ntree. Each directory defines a subpackage and contains only the wrapper code\nwritten for the associated Kaldi library. The wrapper code consists of:\n\n* CLIF C++ API descriptions defining the types and functions to be wrapped and\n  their Python API,\n\n* C++ headers defining the shims for Kaldi code that is not compliant with the\n  Google C++ style expected by CLIF,\n\n* Python modules grouping together related extension modules generated with CLIF\n  and extending the raw CLIF wrappers to provide a more \"Pythonic\" API.\n\nYou can read more about the design and technical details of PyKaldi in\n[our paper][PyKaldi Paper].\n\n## Coverage Status\n\nThe following table shows the status of each PyKaldi package (we currently do\nnot plan to add support for nnet, nnet2 and online) along the following\ndimensions:\n\n* __Wrapped?__: If there are enough CLIF files to make the package usable in\n  Python.\n* __Pythonic?__: If the package API has a \"Pythonic\" look-and-feel.\n* __Documentation?__: If there is documentation beyond what is automatically\n  generated by CLIF. Single checkmark indicates that there is not much additional\n  documentation (if any). Three checkmarks indicates that package documentation\n  is complete (or near complete).\n* __Tests?__: If there are any tests for the package.\n\n| Package    | Wrapped? | Pythonic? | Documentation?             | Tests?   |\n| :--------: | :------: | :-------: | :------------------------: | :------: |\n| base       | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; | \u0026#10004; |\n| chain      | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| cudamatrix | \u0026#10004; |           | \u0026#10004;                   | \u0026#10004; |\n| decoder    | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| feat       | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| fstext     | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| gmm        | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004;          | \u0026#10004; |\n| hmm        | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; | \u0026#10004; |\n| ivector    | \u0026#10004; |           | \u0026#10004;                   |          |\n| kws        | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| lat        | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| lm         | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| matrix     | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; | \u0026#10004; |\n| nnet3      | \u0026#10004; |           | \u0026#10004;                   | \u0026#10004; |\n| online2    | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| rnnlm      | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| sgmm2      | \u0026#10004; |           | \u0026#10004;                   |          |\n| tfrnnlm    | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; |          |\n| transform  | \u0026#10004; | \u0026#10004;  | \u0026#10004;                   |          |\n| tree       | \u0026#10004; |           | \u0026#10004;                   |          |\n| util       | \u0026#10004; | \u0026#10004;  | \u0026#10004; \u0026#10004; \u0026#10004; | \u0026#10004; |\n\n\n## Installation\n\nIf you are using a relatively recent Linux or macOS, such as Ubuntu \u003e= 16.04,\nCentOS \u003e= 7 or macOS \u003e= 10.13, you should be able to install PyKaldi without too\nmuch trouble. Otherwise, you will likely need to tweak the installation scripts.\n\n### Pip / whl packages\n\nYou can now download official whl packages from our [github release page](https://github.com/pykaldi/pykaldi/releases). We have whl packages for Python 3.7, 3.8, ... , 3.11 on Linux and a few (experimental) builds for Mac M1/M2.\n\nIf you decide to use a whl package then you can skip the next sections and head straight to \"[Starting a new project with a pykaldi whl package](#starting-a-new-project-with-a-pykaldi-whl-package)\" to setup your project. Note that you still need to compile a PyKaldi-compatible version of Kaldi.\n\n### From Source\n\nTo install and build PyKaldi from source, follow the steps given below.\n\n#### Step 1: Clone PyKaldi Repository and Create a New Python Environment\n\n```bash\ngit clone https://github.com/pykaldi/pykaldi.git\ncd pykaldi\n```\n\nAlthough it is not required, we recommend installing PyKaldi and all of its\nPython dependencies inside a new isolated Python environment. If you do not want\nto create a new Python environment, you can skip the rest of this step.\n\nYou can use any tool you like for creating a new Python environment. Here we\nuse `virtualenv`, but you can use another tool like `conda` if you prefer that.\nMake sure you activate the new Python environment before continuing with the\nrest of the installation.\n\n```bash\nvirtualenv env\nsource env/bin/activate\n```\n\n#### Step 2: Install Dependencies\n\nRunning the commands below will install the system packages needed for building\nPyKaldi from source.\n\n```bash\n# Ubuntu\nsudo apt-get install autoconf automake cmake curl g++ git graphviz \\\n    libatlas3-base libtool make pkg-config subversion unzip wget zlib1g-dev\n\n# macOS\nbrew install automake cmake git graphviz libtool pkg-config wget gnu-sed openblas subversion\nPATH=\"/opt/homebrew/opt/gnu-sed/libexec/gnubin:$PATH\"\n```\n\nRunning the commands below will install the Python packages needed for building\nPyKaldi from source.\n\n```bash\npip install --upgrade pip\npip install --upgrade setuptools\npip install numpy pyparsing\npip install ninja  # not required but strongly recommended\n```\n\nIn addition to above listed packages, we also need PyKaldi compatible\ninstallations of the following software:\n\n* [Google Protobuf](https://github.com/google/protobuf.git), recommended v3.5.0. Both\nthe C++ library and the Python package must be installed.\n\n* [PyKaldi compatible fork of CLIF](https://github.com/pykaldi/clif). To\nstreamline PyKaldi development, we made some changes to CLIF codebase. We\nare hoping to upstream these changes over time. **These changes are in the pykaldi branch:**\n```bash\n# This command will be automatically run for you in the tools install scripts.\ngit clone -b pykaldi https://github.com/pykaldi/clif\n```\n\n* [PyKaldi compatible fork of Kaldi](https://github.com/pykaldi/kaldi). To\ncomply with CLIF requirements we had to make some changes to Kaldi codebase. We\nare hoping to upstream these changes over time.**These changes are in the pykaldi branch:**\n```bash\n# This command will be automatically run for you in the tools install scripts.\ngit clone -b pykaldi https://github.com/pykaldi/kaldi\n```\n\nYou can use the scripts in the `tools` directory to install or update these\nsoftware locally. Make sure you check the output of these scripts. If you do not\nsee `Done installing {protobuf,CLIF,Kaldi}` printed at the very end, it means\nthat installation has failed for some reason.\n\n```bash\ncd tools\n./check_dependencies.sh  # checks if system dependencies are installed\n./install_protobuf.sh    # installs both the C++ library and the Python package\n./install_clif.sh        # installs both the C++ library and the Python package\n./install_kaldi.sh       # installs the C++ library\ncd ..\n```\n\nNote, if you are compiling Kaldi on Apple Silicion and ./install_kaldi.sh gets stuck right at the beginning compiling sctk, you might need to remove -march=native from tools/kaldi/tools/Makefile, e.g. by uncommeting it in this line like this:\n\n```bash\nSCTK_CXFLAGS = -w #-march=native\n```\n\n#### Step 3: Install PyKaldi\n\nIf Kaldi is installed inside the `tools` directory and all Python dependencies\n(numpy, pyparsing, pyclif, protobuf) are installed in the active Python\nenvironment, you can install PyKaldi with the following command.\n\n```bash\npython setup.py install\n```\n\nOnce installed, you can run PyKaldi tests with the following command.\n\n```bash\npython setup.py test\n```\n\nYou can then also create a whl package. The whl package makes it easy to install pykaldi into a new project environment for your speech project.\n\n```bash\npython setup.py bdist_wheel\n```\n\nThe whl file can then be found in the \"dist\" folder. The whl filename depends on the pykaldi version, your Python version and your architecture. For a Python 3.9 build on x86_64 with pykaldi 0.2.2 it may look like: dist/pykaldi-0.2.2-cp39-cp39-linux_x86_64.whl  \n\n## Starting a new project with a pykaldi whl package\n\nCreate a new project folder, for example:\n\n```bash\nmkdir -p ~/projects/myASR\ncd ~/projects/myASR\n```\n\nCreate and activate a virtual environment with the same Python version as the whl package, e.g for Python 2.9:\n\n```bash\nvirtualenv -p /usr/bin/python3.9 myasr_env\n. myasr_env/bin/activate\n```\n\nInstall numpy and pykaldi into your myASR environment:\n\n```bash\npip3 install numpy\npip3 install pykaldi-0.2.2-cp39-cp39-linux_x86_64.whl  \n```\n\nCopy pykaldi/tools/install_kaldi.sh to your myASR project. Use the install_kaldi.sh script to install a pykaldi compatible kaldi version for your project:\n\n```bash\n./install_kaldi.sh\n```\n\nCopy pykaldi/tools/path.sh to your project. Path.sh is used to make pykaldi find the Kaldi libraries and binaries in the kaldi folder. Source path.sh with:\n\n```bash\n. path.sh\n```\n\nCongratulations, you are ready to use pykaldi in your project! \n\nNote: Anytime you open a new shell, you need to source the project environment and path.sh:\n\n```bash\n. myasr_env/bin/activate\n. path.sh\n```\n\n### Conda\n\nNote: Unfortunatly, the PyKaldi Conda packages are outdated. If you would like to maintain it, please get in touch with us.\n\nTo install PyKaldi with CUDA support:\n\n```bash\nconda install -c pykaldi pykaldi\n```\n\nTo install PyKaldi without CUDA support (CPU only):\n\n```bash\nconda install -c pykaldi pykaldi-cpu\n```\n\nNote that PyKaldi conda package does not provide Kaldi executables. If you would\nlike to use Kaldi executables along with PyKaldi, e.g. as part of read/write\nspecifiers, you need to install Kaldi separately.\n\n### Docker\n\nNote: The docker instructions below may be outdated. If you would like to maintain a docker image for PyKaldi, please get in touch with us.\n\nIf you would like to use PyKaldi inside a Docker container, follow the\ninstructions in the `docker` folder.\n\n## FAQ\n\n### How do I prevent PyKaldi install command from exhausting the system memory?\n\nBy default, PyKaldi install command uses all available (logical) processors to\naccelerate the build process. If the size of the system memory is relatively\nsmall compared to the number of processors, the parallel compilation/linking\njobs might end up exhausting the system memory and result in swapping. You can\nlimit the number of parallel jobs used for building PyKaldi as follows:\n\n```bash\nMAKE_NUM_JOBS=2 python setup.py install\n```\n\n### How do I build PyKaldi on Windows?\n\nWe have no idea what is needed to build PyKaldi on Windows. It would probably\nrequire lots of changes to the build system.\n\n### How do I build PyKaldi using a different Kaldi installation?\n\nAt the moment, PyKaldi is not compatible with the upstream Kaldi repository.\nYou need to build it against [our Kaldi fork](https://github.com/pykaldi/kaldi).\n\nIf you already have a compatible Kaldi installation on your system, you do not\nneed to install a new one inside the `pykaldi/tools` directory. Instead, you\ncan simply set the following environment variable before running the PyKaldi\ninstallation command.\n\n```bash\nexport KALDI_DIR=\u003cdirectory where Kaldi is installed, e.g. \"$HOME/tools/kaldi\"\u003e\n```\n\n### How do I build PyKaldi using a different CLIF installation?\n\nAt the moment, PyKaldi is not compatible with the upstream CLIF repository.\nYou need to build it using [our CLIF fork](https://github.com/pykaldi/clif).\n\nIf you already have a compatible CLIF installation on your system, you do not\nneed to install a new one inside the `pykaldi/tools` directory. Instead, you\ncan simply set the following environment variables before running the PyKaldi\ninstallation command.\n\n```bash\nexport PYCLIF=\u003cpath to pyclif executable, e.g. \"$HOME/anaconda3/envs/clif/bin/pyclif\"\u003e\nexport CLIF_MATCHER=\u003cpath to clif-matcher executable, e.g. \"$HOME/anaconda3/envs/clif/clang/bin/clif-matcher\"\u003e\n```\n\n### How do I update Protobuf, CLIF or Kaldi used by PyKaldi?\n\nWhile the need for updating Protobuf and CLIF should not come up very often, you\nmight want or need to update Kaldi installation used for building PyKaldi.\nRerunning the relevant install script in `tools` directory should update the\nexisting installation. If this does not work, please open an issue.\n\n### How do I build PyKaldi with Tensorflow RNNLM support?\n\nPyKaldi `tfrnnlm` package is built automatically along with the rest of PyKaldi\nif `kaldi-tensorflow-rnnlm` library can be found among Kaldi libraries. After\nbuilding Kaldi, go to `KALDI_DIR/src/tfrnnlm/` directory and follow the\ninstructions given in the Makefile. Make sure the symbolic link for the\n`kaldi-tensorflow-rnnlm` library is added to the `KALDI_DIR/src/lib/` directory.\n\n## Projects using PyKaldi\n\n[Shennong](https://github.com/bootphon/shennong) - a toolbox for speech features extraction, like MFCC, PLP etc. using PyKaldi.\n\n[Kaldi model server](https://github.com/uhh-lt/kaldi-model-server) - a threaded kaldi model server for live decoding. Can directly decode speech from your microphone with a nnet3 compatible model. Example models for English and German are available. Uses the PyKaldi online2 decoder.\n\n[MeetingBot](https://github.com/uhh-lt/MeetingBot) - example of a web application for meeting transcription and summarization that makes use of a pykaldi/kaldi-model-server backend to display ASR output in the browser. \n\n[Subtitle2go](https://github.com/uhh-lt/subtitle2go) - automatic subtitle generation for any media file. Uses PyKaldi for ASR with a batch decoder.\n\nIf you have a cool open source project that makes use of PyKaldi that you'd like to showcase here, let us know!\n\n## Citing\n\nIf you use PyKaldi for research, please cite [our paper][PyKaldi Paper] as\nfollows:\n\n```\n@inproceedings{pykaldi,\n  title = {PyKaldi: A Python Wrapper for Kaldi},\n  author = {Dogan Can and Victor R. Martinez and Pavlos Papadopoulos and\n            Shrikanth S. Narayanan},\n  booktitle={Acoustics, Speech and Signal Processing (ICASSP),\n             2018 IEEE International Conference on},\n  year = {2018},\n  organization = {IEEE}\n}\n```\n\n\n## Contributing\n\nWe appreciate all contributions! If you find a bug, feel free to open an issue\nor a pull request. If you would like to request or add a new feature please open\nan issue for discussion.\n\n\n[ASpIRE chain models]: http://kaldi-asr.org/models/m1\n[Build Status]: https://travis-ci.org/pykaldi/pykaldi.svg?branch=master\n[CLIF]: https://github.com/google/clif\n[Kaldi]: https://github.com/kaldi-asr/kaldi\n[Kaldi Recipes]: https://github.com/kaldi-asr/kaldi/tree/master/egs\n[Kaldi Docs]: http://kaldi-asr.org/doc/\n[Kaldi Script File Docs]: http://kaldi-asr.org/doc/io.html#io_sec_scp\n[Kaldi Archive Docs]: http://kaldi-asr.org/doc/io.html#io_sec_archive\n[Kaldi Transition Model Docs]: http://kaldi-asr.org/doc/hmm.html#transition_model\n[Kaldi Neural Network Docs]: http://kaldi-asr.org/doc/dnn3.html\n[Kaldi Decoding Graph Docs]: http://kaldi-asr.org/doc/graph.html\n[Kaldi Symbol Table Docs]: http://kaldi-asr.org/doc/data_prep.html#data_prep_lang_contents\n[Kaldi Data Docs]: http://kaldi-asr.org/doc/data_prep.html#data_prep_data\n[Kaldi Config Docs]: http://kaldi-asr.org/doc/parse_options.html#parse_options_implicit\n[Kaldi I/O Docs]: http://kaldi-asr.org/doc/io.html\n[Lego Chiron]: https://www.lego.com/en-us/themes/technic/bugatti-chiron/build-for-real\n[NumPy]: http://www.numpy.org\n[OpenFst]: http://www.openfst.org\n[PyKaldi Examples]: https://github.com/pykaldi/pykaldi/tree/master/examples/\n[PyKaldi ASR Examples]: https://github.com/pykaldi/pykaldi/tree/master/examples/asr/\n[PyKaldi Online ASR Example]: https://github.com/pykaldi/pykaldi/tree/master/examples/scripts/asr/nnet3-online-recognizer.py\n[PyKaldi Docs]: https://pykaldi.github.io\n[`asr`]: https://pykaldi.github.io/api/kaldi.asr.html\n[`alignment`]: https://pykaldi.github.io/api/kaldi.alignment.html\n[`segmentation`]: https://pykaldi.github.io/api/kaldi.segmentation.html\n[`chain`]: https://pykaldi.github.io/api/kaldi.chain.html\n[`cudamatrix`]: https://pykaldi.github.io/api/kaldi.cudamatrix.html\n[`decoder`]: https://pykaldi.github.io/api/kaldi.decoder.html\n[`feat`]: https://pykaldi.github.io/api/kaldi.feat.html\n[`fstext`]: https://pykaldi.github.io/api/kaldi.fstext.html\n[`gmm`]: https://pykaldi.github.io/api/kaldi.gmm.html\n[`hmm`]: https://pykaldi.github.io/api/kaldi.hmm.html\n[`ivector`]: https://pykaldi.github.io/api/kaldi.ivector.html\n[`kws`]: https://pykaldi.github.io/api/kaldi.kws.html\n[`lat`]: https://pykaldi.github.io/api/kaldi.lat.html\n[`lm`]: https://pykaldi.github.io/api/kaldi.lm.html\n[`matrix`]: https://pykaldi.github.io/api/kaldi.matrix.html\n[`nnet3`]: https://pykaldi.github.io/api/kaldi.nnet3.html\n[`online2`]: https://pykaldi.github.io/api/kaldi.online2.html\n[`rnnlm`]: https://pykaldi.github.io/api/kaldi.rnnlm.html\n[`sgmm2`]: https://pykaldi.github.io/api/kaldi.sgmm2.html\n[`tfrnnlm`]: https://pykaldi.github.io/api/kaldi.tfrnnlm.html\n[`transform`]: https://pykaldi.github.io/api/kaldi.transform.html\n[`tree`]: https://pykaldi.github.io/api/kaldi.tree.html\n[`util`]: https://pykaldi.github.io/api/kaldi.util.html\n[`util.table`]: https://pykaldi.github.io/api/kaldi.util.html#module-kaldi.util.table\n[PyKaldi Paper]: https://github.com/pykaldi/pykaldi/blob/master/docs/pykaldi.pdf\n[PyTorch]: https://pytorch.org\n[pywrapfst]: http://www.openfst.org/twiki/bin/view/FST/PythonExtension\n[Travis]: https://travis-ci.org/pykaldi/pykaldi\n","funding_links":[],"categories":["Speech Processing","Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpykaldi%2Fpykaldi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpykaldi%2Fpykaldi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpykaldi%2Fpykaldi/lists"}