awesome-tokenizers
A curated list of tokenizer libraries for blazing-fast NLP processing.
https://github.com/nlpoptimize/awesome-tokenizers
Last synced: 2 days ago
JSON representation
-
🔹 **WordPiece Tokenizer Implementations**
- Tokenizers.jl
- rust-tokenizers
- tokenizers-cpp
- BertTokenizers
- bertTokenizer (Java)
- ZhuoruLin/fast-wordpiece
- FlashTokenizer
- FastBertTokenizer
- SeanLee97/BertWordPieceTokenizer.jl
- BlingFire
- huggingface_tokenizer_cpp
- Deep Java Library (DJL) BertTokenizer
- tokenizers.net
- transformers BertTokenizer
- fast-bert-tokenizer-py
- ml-commons/tokenizer
-
🔹 **BPE (Byte Pair Encoding) Implementations**
-
🔹 **SentencePiece Implementations**
Categories
Sub Categories
Keywords
nlp
6
bert
5
natural-language-processing
5
deep-learning
4
bpe
3
language-model
3
machine-learning
3
tensorflow
3
python
3
natural-language-understanding
2
word-segmentation
2
neural-machine-translation
2
transformer
2
tokenizer
2
pytorch
2
huggingface
2
wordpiece
2
wordpiece-tokenization
2
mxnet
1
ml
1
java
1
djl
1
flax
1
deep-neural-networks
1
autograd
1
ai
1
jax
1
language-models
1
model-hub
1
speech-recognition
1
seq2seq
1
pytorch-transformers
1
nlp-library
1
tokenizers
1
tiktoken
1
rust
1
pypi-package
1
openai
1
llm
1
byte-pair-tokenizer
1
byte-pair-encoding
1
bpe-tokenizer
1
transfomers
1
trie
1
pybind11
1
flash
1
cpp17
1
cpp
1
berttokenizer
1
word-embeddings
1