https://github.com/wassemgtk/supertokenizer
A high-performance tokenizer built to rival GPT-4, trained on the C4 dataset.
https://github.com/wassemgtk/supertokenizer
tokenizer tokenizer-framework tokenizers
Last synced: about 2 months ago
JSON representation
A high-performance tokenizer built to rival GPT-4, trained on the C4 dataset.
- Host: GitHub
- URL: https://github.com/wassemgtk/supertokenizer
- Owner: wassemgtk
- License: mit
- Created: 2025-03-25T17:51:10.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-03-25T18:15:49.000Z (2 months ago)
- Last Synced: 2025-03-25T18:42:12.861Z (2 months ago)
- Topics: tokenizer, tokenizer-framework, tokenizers
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0