Projects in Awesome Lists tagged with tokenisation
A curated list of projects in awesome lists tagged with tokenisation .
https://github.com/alasdairforsythe/tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
text-tokenization tokenisation tokenization tokenize tokenizer tokenizing vocabulary vocabulary-builder vocabulary-generator
Last synced: 16 Jan 2026
https://github.com/checkout/frames-ios
Frames iOS: making native card payments simple
card-payment card-validations checkout credit-card fintech ios mobile-payments payment payments-expert swift tokenisation validation
Last synced: 08 Oct 2025
https://github.com/flammie/omorfi
Open morphology for Finnish
analysis finnish morphological-analysis python python-bindings spell-check tokenisation
Last synced: 05 Jan 2026
https://github.com/andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser
hokkien natural-language-processing nlp nlp-library poj python romanisation romanization taigi taiwanese tl tokenisation tokeniser tokenization tokenizer transliteration transliterator zhuyin
Last synced: 21 Mar 2025
https://github.com/checkout/frames-android
Frames Android: making native card payments simple
android card-validations checkout credit-card fintech mobile-payments payment payment-express tokenisation validation
Last synced: 27 Jul 2025
https://github.com/andreihar/taibun.js
Taiwanese Hokkien Transliterator and Tokeniser
hokkien javascript js natural-language-processing nlp nlp-library poj romanisation romanization taigi taiwanese tl tokenisation tokeniser tokenization tokenizer transliteration transliterator zhuyin
Last synced: 26 Aug 2025
https://github.com/raiyanyahya/how-to-train-your-gpt
Build a modern LLM from scratch. Every line commented. Explained like we are five.
attention-mechanism deep-learning educational from-scratch gpt language-model llama llm machine-learning natural-language-processing python pytorch tokenisation transformers tutorial
Last synced: 11 May 2026
https://github.com/micycle1/count-tokens
Ultra-fast, client-side token counter for large text blobs
bpe-tokenizer tokenisation tokenization tokenizer
Last synced: 04 Oct 2025