Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jmaczan/bpe.c

Byte-Pair Encoding tokenizer for training large language models on huge datasets. I don't know C, so most of the code comes from AI :D I hope to learn by rewriting it and making changes, fixes etc
https://github.com/jmaczan/bpe.c

bpe bpe-tokenizer c clang llm tokenizer

Last synced: 8 days ago
JSON representation

Byte-Pair Encoding tokenizer for training large language models on huge datasets. I don't know C, so most of the code comes from AI :D I hope to learn by rewriting it and making changes, fixes etc

Awesome Lists containing this project

README

        

# bpe.c

Byte-Pair Encoding tokenizer for training large language models on huge datasets. I don't know C yet, so **most of the code comes from AI :D**

I hope to learn C by rewriting it and making changes, fixes etc