Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with bpe-tokenizer
A curated list of projects in awesome lists tagged with bpe-tokenizer .
https://github.com/jmaczan/bpe.c
Byte-Pair Encoding tokenizer for training large language models on huge datasets. I don't know C, so most of the code comes from AI :D I hope to learn by rewriting it and making changes, fixes etc
bpe bpe-tokenizer c clang llm tokenizer
Last synced: 07 Nov 2024
https://github.com/jmaczan/bpe-tokenizer
Byte-Pair Encoding tokenizer for training large language models on huge datasets
bpe bpe-tokenizer byte-pair-encoding chunking deep-learning from-scratch large-language-models llm machine-learning python tokenizer
Last synced: 07 Nov 2024
https://github.com/shivendrra/tokenizers
self made byte-pair-encoding tokenizer
bpe-tokenizer bytepairencoding llm tokenization tokenizer
Last synced: 26 Oct 2024