https://github.com/iampara0x/fast-bpe
Fast BPE algorithm to generate byte pair encodings from text corpus, it's written in rust and approximately 20x faster than it's python implementation
https://github.com/iampara0x/fast-bpe
byte-pair-encoding rust tokenizer transformers
Last synced: 8 months ago
JSON representation
Fast BPE algorithm to generate byte pair encodings from text corpus, it's written in rust and approximately 20x faster than it's python implementation
- Host: GitHub
- URL: https://github.com/iampara0x/fast-bpe
- Owner: IAmPara0x
- Created: 2024-08-15T12:27:29.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-17T15:04:43.000Z (almost 2 years ago)
- Last Synced: 2024-08-17T16:24:23.679Z (almost 2 years ago)
- Topics: byte-pair-encoding, rust, tokenizer, transformers
- Language: Rust
- Homepage:
- Size: 432 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fast Byte Pair Encoding
This contains the `bpe_train` which trains on the text corpus to generate byte pairs. It's approximately 20x faster than the version written in python!!.