https://github.com/jawrainey/hfta
Reference implementation: run any huggingface tokenizer in Android (rust).
https://github.com/jawrainey/hfta
android machine-learning on-device-ml rust tokenizers
Last synced: about 2 months ago
JSON representation
Reference implementation: run any huggingface tokenizer in Android (rust).
- Host: GitHub
- URL: https://github.com/jawrainey/hfta
- Owner: jawrainey
- Created: 2025-11-15T17:54:33.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-15T19:07:43.000Z (8 months ago)
- Last Synced: 2025-11-15T20:34:19.188Z (8 months ago)
- Topics: android, machine-learning, on-device-ml, rust, tokenizers
- Language: Kotlin
- Homepage:
- Size: 2.05 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# HuggingFace Tokenizers on Android (HFTA)
> Reference implementation using [HuggingFace's (HF) tokenizers](https://github.com/huggingface/tokenizers) in Android.
## Demo Video
UI to show text to tokens via the tokenizers library in real-time on Android at [`demo/demo.mp4`](demo/demo.mp4):
https://github.com/user-attachments/assets/54deb649-49f4-467e-8df9-dd424d3bed41
### Try a Tokenizer
1. Find a model you want to test on HF, e.g., Google's [gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it/blob/main/tokenizer.json)
2. Download and add the `tokenizer.json` to [`app/src/main/assets`](app/src/main/assets) named `gemma-3-4b-it.json`
3. Modify `SELECTED_TOKENIZER` in [`app/build.gradle.kts`](app/build.gradle.kts)
## Features
- Run any [HuggingFace's (HF) tokenizers](https://github.com/huggingface/tokenizers) on-device in Android.
- [`rust` to `java` NDK bindings of HF's tokenizers in `rs-hfta`](./rs-hfta/README.md)
- Use of [JNI bindings between rust and Android](https://github.com/jawrainey/hfta/blob/main/app/src/main/java/com/example/hfta/HFTokenizer.kt#L3)
- [Parameterized instrumentation tests (runs on-device)](./app/src/androidTest/java/com/example/hfta/HFTokenizer.kt)
- [compiler optimizations to reduce lib filesize](https://github.com/jawrainey/hfta/commit/4cd5cc10a58248827e5c68db62630dc4d8cbdcf5)
## Implementation Details
Run _any_ HF's tokenizer on Android using the associated `tokenizers.json` from `huggingface.co`. To achieve that, the HF library is built [via rust](./rs-hfta/README.md) into a [shared library](./app/src/main/jniLibs/arm64-v8a/libhfta.so) and uses Java Native Interface (JNI) to load the library.
### Thanks to
- [Hugging Face's `tokenizers` library](https://github.com/huggingface/tokenizers)
- [Qualcomm's Genie Library](https://softwarecenter.qualcomm.com/api/download/software/sdks/Qualcomm_AI_Runtime_Community/All/2.34.0.250424/v2.34.0.250424.zip) has a rust to C++ static library implementation of HF's tokenizers at `qairt/2.34.0.250424/examples/Genie/Genie/src/qualla/tokenizers/rust`
- [Shubham Panchal's `Sentence-Embeddings-Android`](https://github.com/shubham0204/Sentence-Embeddings-Android/)
- [Rust's `profile` docs](https://doc.rust-lang.org/cargo/reference/profiles.html)