Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shivance/minbpe.c
a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
https://github.com/shivance/minbpe.c
Last synced: 18 days ago
JSON representation
a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
- Host: GitHub
- URL: https://github.com/shivance/minbpe.c
- Owner: shivance
- Created: 2024-07-06T17:27:16.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-07-06T17:46:05.000Z (4 months ago)
- Last Synced: 2024-10-08T00:42:00.051Z (about 1 month ago)
- Language: C
- Size: 82 KB
- Stars: 21
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# minbpe.c
minbpe.c is a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
The project is inspired by [minbpe](https://github.com/karpathy/minbpe/tree/master) by @kapathy![](./res/basic_tokenizer_out.png)