{"id":15641831,"url":"https://github.com/csukuangfj/optimized_transducer","last_synced_at":"2025-04-23T04:49:26.906Z","repository":{"id":38145673,"uuid":"441114236","full_name":"csukuangfj/optimized_transducer","owner":"csukuangfj","description":"Memory efficient transducer loss computation","archived":false,"fork":false,"pushed_at":"2022-06-10T03:27:42.000Z","size":137,"stargazers_count":68,"open_issues_count":4,"forks_count":12,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-29T22:13:52.311Z","etag":null,"topics":["memory-efficient","rnn-t","transducer"],"latest_commit_sha":null,"homepage":"","language":"CMake","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/csukuangfj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-23T08:31:51.000Z","updated_at":"2024-07-04T04:55:35.000Z","dependencies_parsed_at":"2022-07-08T01:41:22.128Z","dependency_job_id":null,"html_url":"https://github.com/csukuangfj/optimized_transducer","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csukuangfj%2Foptimized_transducer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csukuangfj%2Foptimized_transducer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csukuangfj%2Foptimized_transducer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csukuangfj%2Foptimized_transducer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/csukuangfj","download_url":"https://codeload.github.com/csukuangfj/optimized_transducer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250372943,"owners_count":21419722,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["memory-efficient","rnn-t","transducer"],"created_at":"2024-10-03T11:46:11.485Z","updated_at":"2025-04-23T04:49:26.891Z","avatar_url":"https://github.com/csukuangfj.png","language":"CMake","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Introduction\n\nThis project implements the optimization techniques proposed in\n[Improving RNN Transducer Modeling for End-to-End Speech Recognition](https://arxiv.org/abs/1909.12415)\nto reduce the memory consumption for computing transducer loss.\n\n**HINT**: You can find ASR training code using this repo\nin \u003chttps://github.com/k2-fsa/icefall\u003e. You can also find\ndecoding code in [icefall](https://github.com/k2-fsa/icefall).\n\n### How does it differ from the RNN-T loss from torchaudio\n\n\nIt produces same output as [torchaudio](https://github.com/pytorch/audio)\nfor the same input, so `optimized_transducer` should be equivalent to\n[torchaudio.functional.rnnt_loss()](https://github.com/pytorch/audio/blob/main/torchaudio/functional/functional.py#L1546).\n\nThis project is more memory efficient (See \u003chttps://github.com/csukuangfj/transducer-loss-benchmarking\u003e\nfor benchmark results)\n\nAlso, `torchaudio` accepts only output from `nn.Linear`, but\nwe also support output from `log-softmax` (You can set the option\n`from_log_softmax` to `True` in this case).\n\nIt also supports a **modified** version of transducer. See [below](#modified-transducer) for what\nthe meaning of **modified transducer** is.\n\n### How does it differ from [warp-transducer](https://github.com/HawkAaron/warp-transducer)\n\nIt borrows the methods of computing alpha and beta from `warp-transducer`. Therefore,\n`optimized_transducer` produces the same `alpha` and `beta` as `warp-transducer`\nfor the same input.\n\n\nHowever, `warp-transducer` produces different gradients for CPU and CUDA\nwhen using the same input. See \u003chttps://github.com/HawkAaron/warp-transducer/issues/93\u003e.\nI also created a [colab notebook](https://colab.research.google.com/drive/1vMkH8LmiCCOiCo4KTTEcv-NU8_OGn0ie?usp=sharing)\nto reproduce that issue.\n\nThis project produces consistent gradient on CPU and CUDA for the same input, just like\nwhat `torchaudio` is doing. (We borrow the gradient computation formula from `torchaudio`).\n\n`optimized_transducer` uses less memory than that of `warp-transducer`\n(See \u003chttps://github.com/csukuangfj/transducer-loss-benchmarking\u003e for benchmark results).\n\nIt also supports a **modified** version of transducer. See [below](#modified-transducer) for what\nthe meaning of **modified transducer** is.\n\n### Modified Transducer\n\nIn **modified transducer**, we limit the maximum number of symbols per frame to 1. The following\nfigure compares the formula for forward and backward procedures between standard transducer\nand modified transducer.\n\n\u003cimg src=\"/pic.svg\" width=\"514\" height=\"507.5\" /\u003e\n\n**Note**: Modified transducer is proposed independently by [@danpovey](https://github.com/danpovey).\nWe were later informed that the idea already existed in\n[Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for\nSequence to Sequence Mapping](https://www.isca-speech.org/archive_v0/Interspeech_2017/pdfs/1705.PDF)\n\n\n## Installation\n\nYou can install it via `pip`:\n\n```\npip install optimized_transducer\n```\n\nTo check that `optimized_transducer` was installed successfully, please run\n\n```\npython3 -c \"import optimized_transducer; print(optimized_transducer.__version__)\"\n```\n\nwhich should print the version of the installed `optimized_transducer`, e.g., `1.2`.\n\n### Installation FAQ\n\n### What operating systems are supported ?\n\nIt has been tested on Ubuntu 18.04. It should also work on macOS and other unixes systems.\nIt may work on Windows, though it is not tested.\n\n### How to display installation log ?\n\nUse\n\n```\npip install --verbose optimized_transducer\n```\n\n### How to reduce installation time ?\n\nUse\n\n```\nexport OT_MAKE_ARGS=\"-j\"\npip install --verbose optimized_transducer\n```\n\nIt will pass `-j` to `make`.\n\n### Which version of PyTorch is supported ?\n\nIt has been tested on PyTorch \u003e= 1.5.0. It may work on PyTorch \u003c 1.5.0\n\n\n### How to install a CPU version of `optimized_transducer` ?\n\nUse\n\n```\nexport OT_CMAKE_ARGS=\"-DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF\"\nexport OT_MAKE_ARGS=\"-j\"\npip install --verbose optimized_transducer\n```\n\nIt will pass `-DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF` to `cmake`.\n\n### What Python versions are supported ?\n\nPython \u003e= 3.6 is known to work. It may work for Python 2.7, though it is not tested.\n\n### Where to get help if I have problems with the installation ?\n\nPlease file an issue at \u003chttps://github.com/csukuangfj/optimized_transducer/issues\u003e\nand describe your problem there.\n\n## Usage\n\n`optimized_transducer` expects that the output shape of the joint network is\n**NOT** `(N, T, U, V)`, but is `(sum_all_TU, V)`, which is a concatenation\nof 2-D tensors: `(T_1 * U_1, V)`, `(T_2 * U_2, V)`, ..., `(T_N, U_N, V)`.\n**Note**: `(T_1 * U_1, V)` is just the reshape of a 3-D tensor `(T_1, U_1, V)`.\n\n\nSuppose your original joint network looks somewhat like the following:\n\n```python3\nencoder_out = torch.rand(N, T, D) # from the encoder\ndecoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network\n\nencoder_out = encoder_out.unsqueeze(2) # Now encoder out is (N, T, 1, D)\ndecoder_out = decoder_out.unsqueeze(1) # Now decoder out is (N, 1, U, D)\n\nx = encoder_out + decoder_out # x is of shape (N, T, U, D)\nactivation = torch.tanh(x)\n\nlogits = linear(activation) # linear is an instance of `nn.Linear`.\n\nloss = torchaudio.functional.rnnt_loss(\n    logits=logits,\n    targets=targets,\n    logit_lengths=logit_lengths,\n    target_lengths=target_lengths,\n    blank=blank_id,\n    reduction=\"mean\",\n)\n```\n\nYou need to change it to the following:\n\n```python3\nencoder_out = torch.rand(N, T, D) # from the encoder\ndecoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network\n\nencoder_out_list = [encoder_out[i, :logit_lengths[i], :] for i in range(N)]\ndecoder_out_list = [decoder_out[i, :target_lengths[i]+1, :] for i in range(N)]\n\nx = [e.unsqueeze(1) + d.unsqueeze(0) for e, d in zip(encoder_out_list, decoder_out_list)]\nx = [p.reshape(-1, D) for p in x]\nx = torch.cat(x)\n\nactivation = torch.tanh(x)\nlogits = linear(activation) # linear is an instance of `nn.Linear`.\n\nloss = optimized_transducer.transducer_loss(\n    logits=logits,\n    targets=targets,\n    logit_lengths=logit_lengths,\n    target_lengths=target_lengths,\n    blank=blank_id,\n    reduction=\"mean\",\n    from_log_softmax=False,\n)\n```\n\n**Caution**: We used `from_log_softmax=False` in the above example since `logits`\nis the output of `nn.Linear`.\n\n**Hint**: If `logits` is the output of `log-softmax`, you should use `from_log_softmax=True`.\n\nIn most cases, you should pass the output of `nn.Linear` to compute the loss, i.e.,\nuse `from_log_softmax=False`, to save memory.\n\nIf you want to do some operations on the output of `log-softmax` before feeding it\nto `optimized_transducer.transducer_loss()`, `from_log_softmax=True` is helpful in\nthis case. But be aware that this will increase the memory usage.\n\n\nTo use the **modified** transducer, pass an additional argument `one_sym_per_frame=True`\nto `optimized_transducer.transducer_loss()`.\n\nFor more usages, please refer to\n\n  - \u003chttps://github.com/csukuangfj/optimized_transducer/blob/master/optimized_transducer/python/optimized_transducer/transducer_loss.py\u003e\n  - \u003chttps://github.com/csukuangfj/optimized_transducer/blob/master/optimized_transducer/python/tests/test_cuda.py\u003e\n  - \u003chttps://github.com/csukuangfj/optimized_transducer/blob/master/optimized_transducer/python/tests/test_compute_transducer_loss.py\u003e\n  - \u003chttps://github.com/csukuangfj/optimized_transducer/blob/master/optimized_transducer/python/tests/test_max_symbol_per_frame.py\u003e\n\n## For developers\n\nAs a developer, you don't need to use `pip install optimized_transducer`.\nTo make development easier, you can use\n\n```\ngit clone https://github.com/csukuangfj/optimized_transducer.git\ncd optimized_transducer\nmkdir build\ncd build\ncmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..\nmake -j\nexport PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH\n```\n\nI usually create a file `path.sh` inside the `build` directory, containing\n\n```\nexport PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH\n```\n\nso what you need to do is\n```\ncd optimized_transducer/build\nsource path.sh\n\n# Then you are ready to run Python tests\npython3 optimized_transducer/python/tests/test_compute_transducer_loss.py\n\n# You can also use \"import optimized_transducer\" in your Python projects\n```\n\nTo run all Python tests, use\n\n```\ncd optimized_transducer/build\nctest --output-on-failure\n```\n\nAlternatively one can \"make\" all available tests\n\n```\nmake -j test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcsukuangfj%2Foptimized_transducer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcsukuangfj%2Foptimized_transducer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcsukuangfj%2Foptimized_transducer/lists"}