awesome-tokenizers
A curated list of tokenizer libraries for blazing-fast NLP processing.
https://github.com/nlpoptimize/awesome-tokenizers
Last synced: 10 days ago
JSON representation
-
🔹 **BPE (Byte Pair Encoding) Implementations**
-
🔹 **SentencePiece Implementations**
-
🔹 **WordPiece Tokenizer Implementations**
- FlashTokenizer
- FastBertTokenizer
- BertTokenizers
- rust-tokenizers
- tokenizers-cpp
- bertTokenizer (Java)
- ZhuoruLin/fast-wordpiece
- huggingface_tokenizer_cpp
- SeanLee97/BertWordPieceTokenizer.jl
- BlingFire
- transformers BertTokenizer
- Deep Java Library (DJL) BertTokenizer
- tokenizers.net
- Tokenizers.jl
- fast-bert-tokenizer-py
- ml-commons/tokenizer
Categories
Sub Categories
Keywords
nlp
6
natural-language-processing
5
deep-learning
4
bert
4
bpe
3
machine-learning
3
python
3
tokenizer
2
transformer
2
natural-language-understanding
2
pytorch
2
language-model
2
huggingface
2
wordpiece
2
wordpiece-tokenization
2
llm
2
tensorflow
2
neural-machine-translation
2
word-segmentation
2
autograd
1
ai
1
deep-neural-networks
1
djl
1
java
1
ml
1
mxnet
1
vlm
1
speech-recognition
1
qwen
1
pytorch-transformers
1
pretrained-models
1
model-hub
1
glm
1
gemma
1
deepseek
1
tokenizers
1
tiktoken
1
rust
1
pypi-package
1
openai
1
byte-pair-tokenizer
1
byte-pair-encoding
1
bpe-tokenizer
1
transfomers
1
trie
1
pybind11
1
flash
1
cpp17
1
cpp
1
berttokenizer
1