https://github.com/yoctol/uttut

Utterance utilities.
https://github.com/yoctol/uttut

bert chatbot natural-language-processing

Last synced: over 1 year ago
JSON representation

Utterance utilities.

Host: GitHub
URL: https://github.com/yoctol/uttut
Owner: Yoctol
License: mit
Created: 2018-03-26T11:19:39.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2021-04-20T21:58:09.000Z (about 5 years ago)
Last Synced: 2025-02-10T14:03:45.552Z (over 1 year ago)
Topics: bert, chatbot, natural-language-processing
Language: Python
Homepage:
Size: 1.12 MB
Stars: 0
Watchers: 10
Forks: 0
Open Issues: 11
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # UTTUT

[![travis][travis-image]][travis-url]

[![codecov][codecov-image]][codecov-url]

[![pypi][pypi-image]][pypi-url]

![release][release-image]

[travis-image]: https://img.shields.io/travis/Yoctol/uttut.svg?style=flat

[travis-url]: https://travis-ci.org/Yoctol/uttut

[pypi-image]: https://img.shields.io/pypi/v/uttut.svg?style=flat

[pypi-url]: https://pypi.python.org/pypi/uttut

[codecov-image]: https://codecov.io/gh/Yoctol/uttut/branch/master/graph/badge.svg

[codecov-url]: https://codecov.io/gh/Yoctol/uttut

[release-image]: https://img.shields.io/github/release/Yoctol/uttut.svg

UTTerance UTilities for dialogue system. This package provides some general utils when processing chatbot utterance data.

# BERT Pipe

To create a pipe for BERT preprocessing, please take a look at [BERT](https://github.com/Yoctol/uttut/tree/master/uttut/pipeline/bert).

# Installation

```

$ pip install uttut

```

# Usage

Let's create a Pipe to preprocess a Datum with English utterance.

## Build a Pipe

```python

>>> from uttut.pipeline.pipe import Pipe

>>> p = Pipe()

>>> p.add('IntTokenWithSpace')

>>> p.add('FloatTokenWithSpace')

>>> p.add('MergeWhiteSpaceCharacters')

>>> p.add('StripWhiteSpaceCharacters')

>>> p.add('EngTokenizer')  # word-level (ref: BERT)

>>> p.add('AddSosEos', checkpoint='result_of_add_sos_eos')

>>> p.add('Pad', {'maxlen': 5})

>>> p.add(

    'Token2Index',

    {

       'token2index': {

            '': 0, '': 1,  # for  AddSosEos

            '': 2, '': 3,  # for Pad

            '_int_': 4,  # for IntTokenWithSpace

            '_float_': 5,  # for FloatTokenWithSpace

            'I': 6,

            'apples': 7,

        },

    },

)

```

## transform

```python

>>> from uttut.elements import Datum, Entity, Intent

>>> datum = Datum(

    utterance='I like apples.',

    intents=[Intent(label=1), Intent(label=2)],

    entities=[Entity(start=7, end=13, value='apples', label=7)],

)

>>> output_indices, intent_labels, entity_labels, label_aligner, intermediate = p.transform(datum)

>>> output_indices

[0, 6, 2, 7, 1, 3, 3]

>>> intent_labels

[1, 2]

>>> entity_labels

[0, 0, 0, 7, 0, 0, 0]

# intermediate

>>> intermediate.get_from_checkpoint('result_of_add_sos_eos')

["", "I", "like", "apples", ""] 

# label_aligner

>>> label_aligner.inverse_transform(entity_labels)

[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]

```

## transform sequence

```python

>>> output_sequence, label_aligner, intermediate = p.transform_sequence('I like apples.')

>>> output_sequence

[0, 6, 2, 7, 1, 3, 3]

# label_aligner

>>> label_aligner.transform([0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0])

[0, 0, 0, 7, 0, 0, 0]

>>> label_aligner.inverse_transform([0, 0, 0, 7, 0, 0, 0])

[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]

# intermediate

>>> intermediate.get_from_checkpoint('result_of_add_sos_eos')

["", "I", "like", "apples", ""]

```

# Serialization

## Serialize

```python

>>> serialized_str = p.serialize()

```

##  Deserialize 

```python

>>> from uttut.pipeline.pipe import Pipe

>>> p = Pipe.deserialize(serialized_str )

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yoctol/uttut

Awesome Lists containing this project

README