https://github.com/iconnect/tiktoken-hs
Haskell bindings to an extremely limited subset of tiktoken-rs
https://github.com/iconnect/tiktoken-hs
Last synced: 11 months ago
JSON representation
Haskell bindings to an extremely limited subset of tiktoken-rs
- Host: GitHub
- URL: https://github.com/iconnect/tiktoken-hs
- Owner: iconnect
- License: other
- Created: 2023-09-19T10:13:54.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-02-15T14:33:43.000Z (over 2 years ago)
- Last Synced: 2025-04-08T18:48:22.936Z (about 1 year ago)
- Language: Haskell
- Homepage:
- Size: 38.1 KB
- Stars: 2
- Watchers: 5
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
tiktoken.hs
===========
This library is a binding to an _extremely_ (as in, one function) subset of the
`tiktoken-rs` library. It exposes a function `countTokens :: Text -> Word64` which
can be used to count tokens and return a result which should match the one returned
by OpenAI itself (see for example [their online tool](https://platform.openai.com/tokenizer)).
## Library design
This library uses the [haskell-foreign-rust](https://github.com/BeFunctional/haskell-foreign-rust)
and [haskell-rust-ffi](https://github.com/BeFunctional/haskell-rust-ffi) to call into [tiktoken-rs](https://github.com/zurawiki/tiktoken-rs)
which is currently the industry-standard for tokenisation. Internally, this library is really composed
by a Rust wrapper and a Haskell library, where the former is shipped alongside the latter, and we use
a Custom setup script to seamlessly build the Rust wrapper before building the Haskell library.
For more information see the blog post [Calling Purgatory from Heaven](https://well-typed.com/blog/2023/03/purgatory/).
## Building the project
This project requires a `nighly` version of the Rust toolchain as well as the `cargo-c` applet. You can
install both with:
```
rustup toolchain install nightly
cargo install cargo-c
```
Then, you can build this project like any other Haskell library with `cabal v2-build`.