{"id":20977035,"url":"https://github.com/githubharald/ctcdecoder","last_synced_at":"2025-04-13T00:43:51.502Z","repository":{"id":44344558,"uuid":"106310629","full_name":"githubharald/CTCDecoder","owner":"githubharald","description":"Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.","archived":false,"fork":false,"pushed_at":"2021-07-26T21:09:18.000Z","size":1030,"stargazers_count":825,"open_issues_count":0,"forks_count":183,"subscribers_count":25,"default_branch":"master","last_synced_at":"2025-04-13T00:43:48.136Z","etag":null,"topics":["beam-search","best-path","ctc","ctc-loss","handwriting-recognition","language-model","loss","opencl","prefix-search","python","recurrent-neural-networks","speech-recognition","token-passing"],"latest_commit_sha":null,"homepage":"https://harald-scheidl.medium.com/5a889a3d85a7","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/githubharald.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-09T16:56:55.000Z","updated_at":"2025-03-19T08:44:12.000Z","dependencies_parsed_at":"2022-08-24T12:10:15.026Z","dependency_job_id":null,"html_url":"https://github.com/githubharald/CTCDecoder","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2FCTCDecoder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2FCTCDecoder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2FCTCDecoder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2FCTCDecoder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/githubharald","download_url":"https://codeload.github.com/githubharald/CTCDecoder/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650419,"owners_count":21139672,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beam-search","best-path","ctc","ctc-loss","handwriting-recognition","language-model","loss","opencl","prefix-search","python","recurrent-neural-networks","speech-recognition","token-passing"],"created_at":"2024-11-19T04:57:07.289Z","updated_at":"2025-04-13T00:43:51.476Z","avatar_url":"https://github.com/githubharald.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CTC Decoding Algorithms\n\n**Update 2021: installable Python package**\n\nPython implementation of some common **Connectionist Temporal Classification (CTC) decoding algorithms**. \nA minimalistic **language model** is provided.\n\n## Installation\n\n* Go to the root level of the repository\n* Execute `pip install .`\n* Go to `tests/` and execute `pytest` to check if installation worked\n\n\n## Usage\n\n### Basic usage\n\nHere is a minimalistic executable example:\n\n````python\nimport numpy as np\nfrom ctc_decoder import best_path, beam_search\n\nmat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])\nchars = 'ab'\n\nprint(f'Best path: \"{best_path(mat, chars)}\"')\nprint(f'Beam search: \"{beam_search(mat, chars)}\"')\n````\n\nThe output `mat` (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC \nand is passed as the first argument to the decoders.\nT is the number of time-steps, and C the number of characters (the CTC-blank is the last element).\nThe characters that can be predicted by the neural network are passed as the `chars` string to the decoder.\nDecoders return the decoded string.  \nRunning the code outputs:\n\n````\nBest path: \"\"\nBeam search: \"a\"\n````\n\nTo see more examples on how to use the decoders, \nplease have a look at the scripts in the `tests/` folder.\n\n\n\n### Language model and BK-tree\n\nBeam search can optionally integrate a character-level language model.\nText statistics (bigrams) are used by beam search to improve reading accuracy.\n\n````python\nfrom ctc_decoder import beam_search, LanguageModel\n\n# create language model instance from a (large) text\nlm = LanguageModel('this is some text', chars)\n\n# and use it in the beam search decoder\nres = beam_search(mat, chars, lm=lm)\n````\n\nThe lexicon search decoder computes a first approximation with best path decoding.\nThen, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word.\nThe BK-tree is created by providing a list of dictionary words.\nA tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.\n\n````python\nfrom ctc_decoder import lexicon_search, BKTree\n\n# create BK-tree from a list of words\nbk_tree = BKTree(['words', 'from', 'a', 'dictionary'])\n\n# and use the tree in the lexicon search\nres = lexicon_search(mat, chars, bk_tree, tolerance=2)\n````\n\n### Usage with deep learning frameworks\nSome notes:\n* No adapter for TensorFlow or PyTorch is provided\n* Apply softmax already in the model\n* Convert to numpy array\n* Usually, the output of an RNN layer `rnn_output` has shape TxBxC, with B the batch dimension \n  * Decoders work on single batch elements of shape TxC\n  * Therefore, iterate over all batch elements and apply the decoder to each of them separately\n  * Example: extract matrix of batch element 0 `mat = rnn_output[:, 0, :]`\n* The CTC-blank is expected to be the last element along the character dimension\n  * TensorFlow has the CTC-blank as last element, so nothing to do here\n  * PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting \n\n## List of provided decoders\n\nRecommended decoders:\n* `best_path`: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform better\n* `beam_search`: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameter\n* `lexicon_search`: lexicon search decoder, returns the best scoring word from a dictionary\n\nOther decoders, from my experience not really suited for practical purposes, \nbut might be used for experiments or research:\n* `prefix_search`: prefix search decoder\n* `token_passing`: token passing algorithm\n* Best path decoder implementation in OpenCL (see `extras/` folder)\n\n[This paper](./doc/comparison.pdf) gives suggestions when to use best path decoding, beam search decoding and token passing.\n\n\n## Documentation of test cases and data\n\n* Documentation of [test cases](./tests/README.md)\n* Documentation of the [data](./data/README.md)\n\n\n## References\n\n* [Graves - Supervised sequence labelling with recurrent neural networks](https://www.cs.toronto.edu/~graves/preprint.pdf)\n* [Hwang - Character-level incremental speech recognition with recurrent neural networks](https://arxiv.org/pdf/1601.06581.pdf)\n* [Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf)\n* [Marti - The IAM-database: an English sentence database for offline handwriting recognition](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database)\n* [Beam Search Decoding in CTC-trained Neural Networks](https://towardsdatascience.com/5a889a3d85a7)\n* [An Intuitive Explanation of Connectionist Temporal Classification](https://towardsdatascience.com/3797e43a86c)\n* [Scheidl - Comparison of Connectionist Temporal Classification Decoding Algorithms](./doc/comparison.pdf)\n* [Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm](https://repositum.tuwien.ac.at/obvutwoa/download/pdf/2774578)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithubharald%2Fctcdecoder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgithubharald%2Fctcdecoder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithubharald%2Fctcdecoder/lists"}