An open API service indexing awesome lists of open source software.

https://github.com/iampara0x/fast-bpe

Fast BPE algorithm to generate byte pair encodings from text corpus, it's written in rust and approximately 20x faster than it's python implementation
https://github.com/iampara0x/fast-bpe

byte-pair-encoding rust tokenizer transformers

Last synced: 8 months ago
JSON representation

Fast BPE algorithm to generate byte pair encodings from text corpus, it's written in rust and approximately 20x faster than it's python implementation

Awesome Lists containing this project

README

          

# Fast Byte Pair Encoding

This contains the `bpe_train` which trains on the text corpus to generate byte pairs. It's approximately 20x faster than the version written in python!!.