Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/shivance/minbpe.c

a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
https://github.com/shivance/minbpe.c

Last synced: 18 days ago
JSON representation

a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.

Awesome Lists containing this project

README

        

# minbpe.c

minbpe.c is a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
The project is inspired by [minbpe](https://github.com/karpathy/minbpe/tree/master) by @kapathy

![](./res/basic_tokenizer_out.png)