https://github.com/milansuk/token_go
Simple & fast Encoder/Decoder for tiktoken vocabulary.
https://github.com/milansuk/token_go
llm tokenizer
Last synced: 9 months ago
JSON representation
Simple & fast Encoder/Decoder for tiktoken vocabulary.
- Host: GitHub
- URL: https://github.com/milansuk/token_go
- Owner: MilanSuk
- License: apache-2.0
- Created: 2024-05-30T20:45:20.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-31T14:04:30.000Z (over 1 year ago)
- Last Synced: 2025-03-25T18:50:32.537Z (9 months ago)
- Topics: llm, tokenizer
- Language: Go
- Homepage:
- Size: 15 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Token_go
Simple & fast Encoder/Decoder for tiktoken vocabulary.
Implemented from scratch(no regex library). Tokenizer is in vocab.go which has ~120 lines of code.
## Performance
p50k_base.tiktoken:
- Encoder: 4.625M toks/sec, 19.143 MB/sec, 1 thread
- Decoder: 37.817M toks/sec, 156.516 MB/sec, 1 thread
cl100k_base.tiktoken:
- Encoded 3.949M toks/sec, 16.748 MB/sec, 1 thread
- Decoded 35.825M toks/sec, 151.952 MB/sec, 1 thread
Server(p50k_base)
- 8x clients calls 100K times Encode("Hi there!" + index).
- 800K total requests in 26.7sec => 30K req/sec.
## Examples
Encode/Decode:
vb, err := NewVocab("p50k_base.tiktoken", true)
toks := vb.Encode("Hi there!")
fmt.Println(toks)
str := vb.Decode(toks)
fmt.Println(str)
Client/Server:
go NewServer("8090", true) //run server in extra thread
client := NewClient("localhost:8090", "p50k_base")
toks, err := client.Encode([]byte("Hi there!"))
fmt.Println(toks)
text, err := client.Decode([]int{17250, 612, 0})
fmt.Println(text)
## Build
Written in Go language(https://go.dev/doc/install). No dependencies.
git clone https://github.com/milansuk/token_go
cd token_go
go build
./token_go
## Author
Milan Suk
Email: milan@skyalt.com
Twitter: https://twitter.com/milansuk/
**Sponsor**: https://github.com/sponsors/MilanSuk
*Feel free to follow or contact me with any idea, question or problem.*
## Contributing
Your feedback and code are welcome!
For bug report or question, please use [GitHub's Issues](https://github.com/skyaltlabs/skyalt/issues)
SkyAlt is licensed under **Apache v2.0** license. This repository includes 100% of the code.