https://github.com/prouast/ctc-beam-search-op
Custom tensorflow op for ctc beam search that keeps track of best alignment for each beam
https://github.com/prouast/ctc-beam-search-op
ctc ctc-decode tensorflow
Last synced: about 2 months ago
JSON representation
Custom tensorflow op for ctc beam search that keeps track of best alignment for each beam
- Host: GitHub
- URL: https://github.com/prouast/ctc-beam-search-op
- Owner: prouast
- License: apache-2.0
- Created: 2020-02-20T00:57:02.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-10-19T00:30:00.000Z (over 5 years ago)
- Last Synced: 2025-08-23T19:26:25.658Z (10 months ago)
- Topics: ctc, ctc-decode, tensorflow
- Language: C++
- Homepage:
- Size: 43 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TensorFlow CTC Beam Search Decoder that keeps track of best alignment for each beam
This is a custom version of the [`tf.nn.ctc_beam_search_decoder`](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_beam_search_decoder).
During the beam search, it keeps track of the most probable alignment for each beam, which includes blank labels.
This makes it possible to estimate at which point in the sequence a label might be situated.
This op was implemented for CPU only as a c++ kernel following the [guide](https://github.com/tensorflow/custom-op) provided by the TensorFlow authors.
Refer to the guide for instructions on how to build and test the Op using Docker for Windows or Linux.
### Args
- `inputs`: 3-D `float` `Tensor`, size `[max_time, batch_size, num_classes]`. Input logits.
- `sequence_length`: 1-D `int32` vector containing sequence lengths, having size `[batch_size]`.
- `beam_width`: An int scalar >= 1 (beam search beam width).
- `top_paths`: An int scalar >= 1, <= beam_width (controls output size).
- `merge_repeated`: A boolean indicating whether repeated sequence elements are merged in regular output.
- `blank_index`: An int < `num_classes` indicating which index in the inputs corresponds to blank labels.
- `blank_label`: An int indicating the label that should be used for blanks in alignment.
### Outputs
A tuple `(decoded, alignment, log_probability)` where
- `decoded`: A list of length `top_paths`, where `decoded[j]` is a `SparseTensor` containing the decoded outputs:
* `decoded[j].indices`: Indices matrix `[total_decoded_outputs[j], 2]`; The rows store: `[batch, time]`.
* `decoded[j].values`: Values vector, size `[total_decoded_outputs[j]]`. The vector stores the decoded classes for beam `j`.
* `decoded[j].shape`: Shape vector, size `(2)`. The shape values are: `[batch_size, max_decoded_length[j]]`.
- `alignment`: A list of length `top_paths`, where `decoded[j]` is a `SparseTensor` containing the alignments of decoded outputs:
* `alignment[j].indices`: Indices matrix `[sequence_length[j], 2]`; The rows store: `[batch, time]`.
* `alignment[j].values`: Values vector, size `[sequence_length[j]]`. The vector stores the alignment for beam `j`.
* `alignment[j].shape`: Shape vector, size `(2)`. The shape values are: `[batch_size, sequence_length[j]]`.
- `log_probability`: A float matrix `[batch_size, top_paths]` containing sequence log-probabilities.