Projects in Awesome Lists tagged with byte-pair-encoding
A curated list of projects in awesome lists tagged with byte-pair-encoding .
https://github.com/samber/go-gpt-3-encoder
Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3
bpe byte-pair-encoding codex decoder encoder go gpt-2 gpt-3 openai token tokenizer transformer
Last synced: 05 Apr 2025
https://github.com/aallam/ktoken
Kotlin multiplatform BPE tokenizer library for OpenAI models
binary-p bpe byte-pair-encoding gpt kotlin openai tiktoken tokenizer
Last synced: 26 Aug 2025
https://github.com/ankane/youtokentome-ruby
High performance unsupervised text tokenization for Ruby
bpe byte-pair-encoding npl tokenization unsupervised-learning word-segmentation
Last synced: 16 Jul 2025
https://github.com/bnosac/tokenizers.bpe
R package for Byte Pair Encoding based on YouTokenToMe
bpe byte-pair-encoding text-mining tokenization
Last synced: 13 Jun 2025
https://github.com/bobmcdear/minbpe-hs
Byte-level byte pair encoding (BPE) in Haskell
bpe byte-pair-encoding haskell llm tokenizer
Last synced: 10 Apr 2025
https://github.com/akhvorov/vgram
Feature extraction from sequential data
byte-pair-encoding feature-extraction natural-language-processing sequential-data text-classification vgram word-segmentation
Last synced: 28 Jun 2025
https://github.com/theskyinflames/word2png
This is a tool that encrypts a sequence of words (or pieces of texts) using the AES-256 algorithm and encodes the encrypted result into a PNG image by linking each byte value to a specific color. It also decodes the before image to get back the original sequence of words
aes-256 bip39 bit-manipulation byte-array byte-pair-encoding clean-code cold-wallet encryption encryption-decryption go go-aes-256 golang golang-wasm hexagonal-architecture image-processing png-decoder pterm solid-principles tdd wasm
Last synced: 12 Jan 2026
https://github.com/jmaczan/bpe-tokenizer
Byte-Pair Encoding tokenizer for training large language models on huge datasets
bpe bpe-tokenizer byte-pair-encoding chunking deep-learning from-scratch large-language-models llm machine-learning python tokenizer
Last synced: 18 Sep 2025
https://github.com/andreykolomiets/news_headline_generation
Генерация новостных заголовков
abstractive-summarization byte-pair-encoding pointer-generator pytorch reinforce summarization summatization
Last synced: 30 Oct 2025
https://github.com/crodriguez1a/bpe-summarizer
Auto summarization from BPE tokenization
bart bpe byte-pair-encoding gpt2tokenizer huggingface nlu python summarization
Last synced: 05 Mar 2026
https://github.com/gweidart/rs-bpe
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
bpe bpe-tokenizer byte-pair-encoding byte-pair-tokenizer huggingface llm openai pypi-package python rust tiktoken tokenizers
Last synced: 28 Apr 2025
https://github.com/andreimoraru123/neural-machine-translation
Modern Eager TensorFlow implementation of Attention Is All You Need
attention beam-search bleu-score byte-pair-encoding deep-learning dot-product-attention einops embedding-projector embeddings encoder-decoder keras label-smoothing language language-model nlp self-attention tensorflow tokenization transformers translation
Last synced: 20 Jan 2026
https://github.com/zouharvi/tokenization-principle
bpe byte-pair-encoding tokenization
Last synced: 25 Feb 2025
https://github.com/capjamesg/bpe
Byte-pair encoding implementation in Python.
byte-pair-encoding text-encoding
Last synced: 03 Apr 2025
https://github.com/willkirkmanm/byte-pair-encoding
The Large Language Model Tokenizer Algorithm
Last synced: 24 Apr 2025
https://github.com/iampara0x/fast-bpe
Fast BPE algorithm to generate byte pair encodings from text corpus, it's written in rust and approximately 20x faster than it's python implementation
byte-pair-encoding rust tokenizer transformers
Last synced: 11 Oct 2025
https://github.com/jonasknobloch/tokenizers-mbpe
Morphologically biased byte-pair encoding pre-tokenization
byte-pair-encoding morphological-analysis morphology nlp segmentation tokenizer
Last synced: 16 Apr 2026
https://github.com/jonasknobloch/mbpe
Morphologically biased byte-pair encoding
byte-pair-encoding morphological-analysis morphology nlp segmentation tokenizer
Last synced: 29 Mar 2025
https://github.com/imass2550/token-visualizer
Token Visualizer helps you analyze and optimize your prompts for Large Language Models, saving you time and money. 🚀 With this tool, you can easily see token usage and improve your prompt efficiency. 💻
ai android api bubblemaps byte-pair-encoding docker gpt4 llm nodejs padding part-of-speech-tagger pixels playwright react textappearance tokens vercel word-segmentation
Last synced: 07 Apr 2026
https://github.com/sanatren/legal-document-analyzer
This Legal Document Analyzer is a proof-of-concept NLP project demonstrating the potential of transformers for legal document summarization.
bart byte-pair-encoding deep-learning finetuning-transformers huggingface ppo-algorithm reinforcement-learning-algorithms transformer
Last synced: 13 Jul 2025
https://github.com/dvdagames/pgn-tokenizer
A byte pair encoding (BPE) tokenizer for chess portable game notation (PGN)
bpe byte-pair-encoding chess llm pgn tokenizer
Last synced: 26 Feb 2026
https://github.com/jiauzhang/textok
Text Tokenizer in C++
byte-pair-encoding language-model llm nlp tokenizer
Last synced: 10 Feb 2026
https://github.com/mecanik/tiny-bpe-trainer
Lightweight, header-only Byte Pair Encoding (BPE) trainer in modern C++17. Produces HuggingFace-compatible vocabularies for transformers and integrates with Modern Text Tokenizer.
bpe byte-pair-encoding c17 deep-learning header-only huggingface machine-learning modern-cpp natural-language-processing nlp no-dependencies text-processing tokenization tokenizer transformers vocabulary
Last synced: 15 Sep 2025
https://github.com/lukasdrews97/dumblellm
Decoder-only LLM trained on the Harry Potter books.
byte-pair-encoding flash-attention grouped-query-attention large-language-model rotary-position-embedding transformer
Last synced: 05 Apr 2025
https://github.com/sameermanan/rs-bpe
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
bpe bpe-tokenizer byte-pair-encoding byte-pair-tokenizer huggingface llm openai pypi-package python rust tiktoken tokenizers
Last synced: 10 May 2026