https://github.com/alexeytochin/tf_seq2seq_losses
TensorFlow implementations of losses for sequence to sequence machine learning models
https://github.com/alexeytochin/tf_seq2seq_losses
2nd-derivative artificial-intelligence artificial-neural-networks ctc ctc-loss ctc-loss-implemenetation deep-learning fast-ctc-loss fast-ctc-loss-implementation hessian loss loss-functions python second-derivative seq2seq sequence-recognition sequence-to-sequence tensorflow
Last synced: about 1 year ago
JSON representation
TensorFlow implementations of losses for sequence to sequence machine learning models
- Host: GitHub
- URL: https://github.com/alexeytochin/tf_seq2seq_losses
- Owner: alexeytochin
- License: apache-2.0
- Created: 2021-10-29T19:20:13.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-06-22T12:23:00.000Z (almost 2 years ago)
- Last Synced: 2025-03-25T06:24:38.723Z (about 1 year ago)
- Topics: 2nd-derivative, artificial-intelligence, artificial-neural-networks, ctc, ctc-loss, ctc-loss-implemenetation, deep-learning, fast-ctc-loss, fast-ctc-loss-implementation, hessian, loss, loss-functions, python, second-derivative, seq2seq, sequence-recognition, sequence-to-sequence, tensorflow
- Language: Python
- Homepage:
- Size: 73.2 KB
- Stars: 10
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# tf-seq2seq-losses
Tensorflow implementations for
[Connectionist Temporal Classification](file:///home/alexey/Downloads/Connectionist_temporal_classification_Labelling_un.pdf)
(CTC) loss functions that are fast and support second-order derivatives.
## Installation
```bash
$ pip install tf-seq2seq-losses
```
## Why Use This Package?
### 1. Faster Performance
Official CTC loss implementation,
[`tf.nn.ctc_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss),
is significantly slower.
Our implementation is approximately 30 times faster, as shown by the benchmark results:
| Name | Forward Time (ms) | Gradient Calculation Time (ms) |
|:------------------:|:-----------------:|:------------------------------:|
| `tf.nn.ctc_loss` | 13.2 ± 0.02 | 10.4 ± 3 |
| `classic_ctc_loss` | 0.138 ± 0.006 | 0.28 ± 0.01 |
| `simple_ctc_loss` | 0.0531 ± 0.003 | 0.119 ± 0.004 |
Tested on a single GPU: GeForce GTX 970, Driver Version: 460.91.03, CUDA Version: 11.2. For the experimental setup, see
[`benchmark.py`](tests/performance_test.py)
To reproduce this benchmark, run the following command from the project root directory
(install `pytest` and `pandas` if needed):
```bash
$ pytest -o log_cli=true --log-level=INFO tests/benchmark.py
```
Here, `classic_ctc_loss` is the standard version of CTC loss with token collapsing, e.g., `a_bb_ccc_c -> abcc`.
The `simple_ctc_loss` is a simplified version that removes blanks trivially, e.g., `a_bb_ccc_c -> abbcccc`.
### 2. Supports Second-Order Derivatives
This implementation supports second-order derivatives without using TensorFlow's autogradient.
Instead, it uses a custom approach similar to the one described
[here](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss)
with a complexity of
$O(l^4)$,
where
$l$
is the sequence length. The gradient complexity is
$O(l^2)$.
Example usage:
```python
import tensorflow as tf
from tf_seq2seq_losses import classic_ctc_loss
batch_size = 2
num_tokens = 3
logit_length = 5
labels = tf.constant([[1, 2, 2, 1], [1, 2, 1, 0]], dtype=tf.int32)
label_length = tf.constant([4, 3], dtype=tf.int32)
logits = tf.zeros(shape=[batch_size, logit_length, num_tokens], dtype=tf.float32)
logit_length = tf.constant([5, 4], dtype=tf.int32)
with tf.GradientTape(persistent=True) as tape1:
tape1.watch([logits])
with tf.GradientTape() as tape2:
tape2.watch([logits])
loss = tf.reduce_sum(classic_ctc_loss(
labels=labels,
logits=logits,
label_length=label_length,
logit_length=logit_length,
blank_index=0,
))
gradient = tape2.gradient(loss, sources=logits)
hessian = tape1.batch_jacobian(gradient, source=logits, experimental_use_pfor=False)
# shape = [2, 5, 3, 5, 3]
```
### 3. Numerical Stability
1. The proposed implementation is more numerically stable,
producing reasonable outputs even for logits of order `1e+10` and `-tf.inf`.
2. If the logit length is too short to predict the label output,
the loss is `tf.inf` for that sample, unlike `tf.nn.ctc_loss`, which might output `707.13184`.
### 4. Pure Python Implementation
This is a pure Python/TensorFlow implementation, eliminating the need to build or compile any C++/CUDA components.
## Usage
The interface is identical to `tensorflow.nn.ctc_loss` with `logits_time_major=False`.
Example:
```python
import tensorflow as tf
from tf_seq2seq_losses import classic_ctc_loss
batch_size = 1
num_tokens = 3 # = 2 tokens + 1 blank token
logit_length = 5
loss = classic_ctc_loss(
labels=tf.constant([[1, 2, 2, 1]], dtype=tf.int32),
logits=tf.zeros(shape=[batch_size, logit_length, num_tokens], dtype=tf.float32),
label_length=tf.constant([4], dtype=tf.int32),
logit_length=tf.constant([logit_length], dtype=tf.int32),
blank_index=0,
)
```
## Under the Hood
The implementation uses TensorFlow operations such as tf.while_loop and tf.TensorArray.
The main computational bottleneck is the iteration over the logit length to calculate α and β
(as described in the original
[CTC paper](file:///home/alexey/Downloads/Connectionist_temporal_classification_Labelling_un.pdf)).
The expected gradient GPU calculation time is linear with respect to the logit length.
## Known Issues
### 1. Warning:
> AutoGraph could not transform and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10
(on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Observed with TensorFlow version 2.4.1.
This warning does not affect performance and is caused by the use of Union in type annotations.
### 2. UnimplementedError:
Using `tf.jacobian` and `tf.batch_jacobian` for the second derivative of classic_ctc_loss with
`experimental_use_pfor=False` in `tf.GradientTape` may cause an unexpected `UnimplementedError`
in TensorFlow version 2.4.1 or later.
This can be avoided by setting `experimental_use_pfor=True`
or by using `ClassicCtcLossData.hessian` directly without `tf.GradientTape`.
Feel free to reach out if you have any questions or need further clarification.