https://github.com/alexeytochin/tf_seq2seq_losses

TensorFlow implementations of losses for sequence to sequence machine learning models
https://github.com/alexeytochin/tf_seq2seq_losses

2nd-derivative artificial-intelligence artificial-neural-networks ctc ctc-loss ctc-loss-implemenetation deep-learning fast-ctc-loss fast-ctc-loss-implementation hessian loss loss-functions python second-derivative seq2seq sequence-recognition sequence-to-sequence tensorflow

Last synced: over 1 year ago
JSON representation

TensorFlow implementations of losses for sequence to sequence machine learning models

Host: GitHub
URL: https://github.com/alexeytochin/tf_seq2seq_losses
Owner: alexeytochin
License: apache-2.0
Created: 2021-10-29T19:20:13.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2024-06-22T12:23:00.000Z (about 2 years ago)
Last Synced: 2025-03-25T06:24:38.723Z (over 1 year ago)
Topics: 2nd-derivative, artificial-intelligence, artificial-neural-networks, ctc, ctc-loss, ctc-loss-implemenetation, deep-learning, fast-ctc-loss, fast-ctc-loss-implementation, hessian, loss, loss-functions, python, second-derivative, seq2seq, sequence-recognition, sequence-to-sequence, tensorflow
Language: Python
Homepage:
Size: 73.2 KB
Stars: 10
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # tf-seq2seq-losses

Tensorflow implementations for

[Connectionist Temporal Classification](file:///home/alexey/Downloads/Connectionist_temporal_classification_Labelling_un.pdf)

(CTC) loss functions that are fast and support second-order derivatives.

## Installation

```bash

$ pip install tf-seq2seq-losses

```

## Why Use This Package?

### 1. Faster Performance

Official CTC loss implementation, 

[`tf.nn.ctc_loss`](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss),

is significantly slower.

Our implementation is approximately 30 times faster, as shown by the benchmark results:

|        Name        | Forward Time (ms) | Gradient Calculation Time (ms) |                 

|:------------------:|:-----------------:|:------------------------------:|

|  `tf.nn.ctc_loss`  |    13.2 ± 0.02    |            10.4 ± 3            |

| `classic_ctc_loss` |   0.138 ± 0.006   |          0.28 ± 0.01           |

| `simple_ctc_loss`  |  0.0531 ± 0.003   |         0.119 ± 0.004          |

Tested on a single GPU: GeForce GTX 970, Driver Version: 460.91.03, CUDA Version: 11.2. For the experimental setup, see

[`benchmark.py`](tests/performance_test.py)

To reproduce this benchmark, run the following command from the project root directory 

(install `pytest` and `pandas` if needed):

```bash

$ pytest -o log_cli=true --log-level=INFO tests/benchmark.py

```

Here, `classic_ctc_loss` is the standard version of CTC loss with token collapsing, e.g., `a_bb_ccc_c -> abcc`. 

The `simple_ctc_loss` is a simplified version that removes blanks trivially, e.g., `a_bb_ccc_c -> abbcccc`.

### 2. Supports Second-Order Derivatives

This implementation supports second-order derivatives without using TensorFlow's autogradient. 

Instead, it uses a custom approach similar to the one described

[here](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss)

with a complexity of 

$O(l^4)$, 

where 

$l$

is the sequence length. The gradient complexity is 

$O(l^2)$.

Example usage:

```python

import tensorflow as tf

from tf_seq2seq_losses import classic_ctc_loss 

batch_size = 2

num_tokens = 3

logit_length = 5

labels = tf.constant([[1, 2, 2, 1], [1, 2, 1, 0]], dtype=tf.int32)

label_length = tf.constant([4, 3], dtype=tf.int32)

logits = tf.zeros(shape=[batch_size, logit_length, num_tokens], dtype=tf.float32)

logit_length = tf.constant([5, 4], dtype=tf.int32)

with tf.GradientTape(persistent=True) as tape1: 

    tape1.watch([logits])

    with tf.GradientTape() as tape2:

        tape2.watch([logits])

        loss = tf.reduce_sum(classic_ctc_loss(

            labels=labels,

            logits=logits,

            label_length=label_length,

            logit_length=logit_length,

            blank_index=0,

        ))

    gradient = tape2.gradient(loss, sources=logits)

hessian = tape1.batch_jacobian(gradient, source=logits, experimental_use_pfor=False)

# shape = [2, 5, 3, 5, 3]

```

### 3. Numerical Stability

1. The proposed implementation is more numerically stable, 

producing reasonable outputs even for logits of order `1e+10` and `-tf.inf`.

2. If the logit length is too short to predict the label output, 

the loss is `tf.inf` for that sample, unlike `tf.nn.ctc_loss`, which might output `707.13184`.

### 4. Pure Python Implementation

This is a pure Python/TensorFlow implementation, eliminating the need to build or compile any C++/CUDA components.

## Usage

The interface is identical to `tensorflow.nn.ctc_loss` with `logits_time_major=False`.

Example:

```python

import tensorflow as tf

from tf_seq2seq_losses import classic_ctc_loss

batch_size = 1

num_tokens = 3 # = 2 tokens + 1 blank token

logit_length = 5

loss = classic_ctc_loss(

    labels=tf.constant([[1, 2, 2, 1]], dtype=tf.int32),

    logits=tf.zeros(shape=[batch_size, logit_length, num_tokens], dtype=tf.float32),

    label_length=tf.constant([4], dtype=tf.int32),

    logit_length=tf.constant([logit_length], dtype=tf.int32),

    blank_index=0,

)

```

## Under the Hood

The implementation uses TensorFlow operations such as tf.while_loop and tf.TensorArray. 

The main computational bottleneck is the iteration over the logit length to calculate α and β 

(as described in the original

[CTC paper](file:///home/alexey/Downloads/Connectionist_temporal_classification_Labelling_un.pdf)). 

The expected gradient GPU calculation time is linear with respect to the logit length.

## Known Issues

### 1. Warning:

> AutoGraph could not transform  and will run it as-is.

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 

(on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.

Observed with TensorFlow version 2.4.1. 

This warning does not affect performance and is caused by the use of Union in type annotations.

### 2. UnimplementedError:

Using `tf.jacobian` and `tf.batch_jacobian` for the second derivative of classic_ctc_loss with 

`experimental_use_pfor=False` in `tf.GradientTape` may cause an unexpected `UnimplementedError` 

in TensorFlow version 2.4.1 or later. 

This can be avoided by setting `experimental_use_pfor=True` 

or by using `ClassicCtcLossData.hessian` directly without `tf.GradientTape`.

Feel free to reach out if you have any questions or need further clarification.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alexeytochin/tf_seq2seq_losses

Awesome Lists containing this project

README