An open API service indexing awesome lists of open source software.

https://github.com/danieldk/wordpieces

Split tokens into word pieces
https://github.com/danieldk/wordpieces

piece rust tokenization word wordpiece

Last synced: 28 days ago
JSON representation

Split tokens into word pieces

Awesome Lists containing this project

README

        

# wordpieces

This crate provides a subword tokenizer. A subword tokenizer splits a
token into several pieces, so-called *word pieces*. Word pieces were
popularized by and used in the
[BERT](https://arxiv.org/abs/1810.04805) natural language encoder.