Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with bpe-tokenizer

A curated list of projects in awesome lists tagged with bpe-tokenizer .

https://github.com/jmaczan/bpe.c

Byte-Pair Encoding tokenizer for training large language models on huge datasets. I don't know C, so most of the code comes from AI :D I hope to learn by rewriting it and making changes, fixes etc

bpe bpe-tokenizer c clang llm tokenizer

Last synced: 07 Nov 2024

https://github.com/jmaczan/bpe-tokenizer

Byte-Pair Encoding tokenizer for training large language models on huge datasets

bpe bpe-tokenizer byte-pair-encoding chunking deep-learning from-scratch large-language-models llm machine-learning python tokenizer

Last synced: 07 Nov 2024

https://github.com/shivendrra/tokenizers

self made byte-pair-encoding tokenizer

bpe-tokenizer bytepairencoding llm tokenization tokenizer

Last synced: 26 Oct 2024