An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with byte-pair-encoding

A curated list of projects in awesome lists tagged with byte-pair-encoding .

https://github.com/samber/go-gpt-3-encoder

Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3

bpe byte-pair-encoding codex decoder encoder go gpt-2 gpt-3 openai token tokenizer transformer

Last synced: 05 Apr 2025

https://github.com/aallam/ktoken

Kotlin multiplatform BPE tokenizer library for OpenAI models

binary-p bpe byte-pair-encoding gpt kotlin openai tiktoken tokenizer

Last synced: 26 Aug 2025

https://github.com/ankane/youtokentome-ruby

High performance unsupervised text tokenization for Ruby

bpe byte-pair-encoding npl tokenization unsupervised-learning word-segmentation

Last synced: 16 Jul 2025

https://github.com/bnosac/tokenizers.bpe

R package for Byte Pair Encoding based on YouTokenToMe

bpe byte-pair-encoding text-mining tokenization

Last synced: 13 Jun 2025

https://github.com/bobmcdear/minbpe-hs

Byte-level byte pair encoding (BPE) in Haskell

bpe byte-pair-encoding haskell llm tokenizer

Last synced: 10 Apr 2025

https://github.com/theskyinflames/word2png

This is a tool that encrypts a sequence of words (or pieces of texts) using the AES-256 algorithm and encodes the encrypted result into a PNG image by linking each byte value to a specific color. It also decodes the before image to get back the original sequence of words

aes-256 bip39 bit-manipulation byte-array byte-pair-encoding clean-code cold-wallet encryption encryption-decryption go go-aes-256 golang golang-wasm hexagonal-architecture image-processing png-decoder pterm solid-principles tdd wasm

Last synced: 12 Jan 2026

https://github.com/jmaczan/bpe-tokenizer

Byte-Pair Encoding tokenizer for training large language models on huge datasets

bpe bpe-tokenizer byte-pair-encoding chunking deep-learning from-scratch large-language-models llm machine-learning python tokenizer

Last synced: 18 Sep 2025

https://github.com/gweidart/rs-bpe

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

bpe bpe-tokenizer byte-pair-encoding byte-pair-tokenizer huggingface llm openai pypi-package python rust tiktoken tokenizers

Last synced: 28 Apr 2025

https://github.com/capjamesg/bpe

Byte-pair encoding implementation in Python.

byte-pair-encoding text-encoding

Last synced: 03 Apr 2025

https://github.com/willkirkmanm/byte-pair-encoding

The Large Language Model Tokenizer Algorithm

byte-pair-encoding parsonlabs

Last synced: 24 Apr 2025

https://github.com/iampara0x/fast-bpe

Fast BPE algorithm to generate byte pair encodings from text corpus, it's written in rust and approximately 20x faster than it's python implementation

byte-pair-encoding rust tokenizer transformers

Last synced: 11 Oct 2025

https://github.com/jonasknobloch/tokenizers-mbpe

Morphologically biased byte-pair encoding pre-tokenization

byte-pair-encoding morphological-analysis morphology nlp segmentation tokenizer

Last synced: 16 Apr 2026

https://github.com/imass2550/token-visualizer

Token Visualizer helps you analyze and optimize your prompts for Large Language Models, saving you time and money. 🚀 With this tool, you can easily see token usage and improve your prompt efficiency. 💻

ai android api bubblemaps byte-pair-encoding docker gpt4 llm nodejs padding part-of-speech-tagger pixels playwright react textappearance tokens vercel word-segmentation

Last synced: 07 Apr 2026

https://github.com/sanatren/legal-document-analyzer

This Legal Document Analyzer is a proof-of-concept NLP project demonstrating the potential of transformers for legal document summarization.

bart byte-pair-encoding deep-learning finetuning-transformers huggingface ppo-algorithm reinforcement-learning-algorithms transformer

Last synced: 13 Jul 2025

https://github.com/dvdagames/pgn-tokenizer

A byte pair encoding (BPE) tokenizer for chess portable game notation (PGN)

bpe byte-pair-encoding chess llm pgn tokenizer

Last synced: 26 Feb 2026

https://github.com/mecanik/tiny-bpe-trainer

Lightweight, header-only Byte Pair Encoding (BPE) trainer in modern C++17. Produces HuggingFace-compatible vocabularies for transformers and integrates with Modern Text Tokenizer.

bpe byte-pair-encoding c17 deep-learning header-only huggingface machine-learning modern-cpp natural-language-processing nlp no-dependencies text-processing tokenization tokenizer transformers vocabulary

Last synced: 15 Sep 2025

https://github.com/sameermanan/rs-bpe

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

bpe bpe-tokenizer byte-pair-encoding byte-pair-tokenizer huggingface llm openai pypi-package python rust tiktoken tokenizers

Last synced: 10 May 2026