{"id":34565753,"url":"https://github.com/nooscraft/tokuin","last_synced_at":"2026-02-20T05:01:21.226Z","repository":{"id":322790186,"uuid":"1090882345","full_name":"nooscraft/tokuin","owner":"nooscraft","description":"CLI tool – estimates LLM tokens/costs and runs provider-aware load tests for OpenAI, Anthropic, OpenRouter, or custom endpoints.","archived":false,"fork":false,"pushed_at":"2026-02-08T11:10:55.000Z","size":278,"stargazers_count":117,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-08T11:15:21.474Z","etag":null,"topics":["llms","prompt-engineering","rust","tokenizer"],"latest_commit_sha":null,"homepage":"https://github.com/nooscraft/tokuin","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nooscraft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-11-06T09:08:48.000Z","updated_at":"2026-02-08T11:10:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nooscraft/tokuin","commit_stats":null,"previous_names":["nooscraft/tally"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/nooscraft/tokuin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nooscraft%2Ftokuin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nooscraft%2Ftokuin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nooscraft%2Ftokuin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nooscraft%2Ftokuin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nooscraft","download_url":"https://codeload.github.com/nooscraft/tokuin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nooscraft%2Ftokuin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29641927,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-20T03:21:14.183Z","status":"ssl_error","status_checked_at":"2026-02-20T03:18:24.455Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llms","prompt-engineering","rust","tokenizer"],"created_at":"2025-12-24T09:04:04.297Z","updated_at":"2026-02-20T05:01:21.219Z","avatar_url":"https://github.com/nooscraft.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# 🧮 Tokuin\n\n[![License: MIT OR Apache-2.0](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg)](https://github.com/nooscraft/tokuin)\n[![Rust](https://img.shields.io/badge/rust-1.70%2B-orange.svg)](https://www.rust-lang.org/)\n[![CI](https://github.com/nooscraft/tokuin/workflows/CI/badge.svg)](https://github.com/nooscraft/tokuin/actions)\n\nA fast, CLI tool to **estimate token usage**, **control API costs**, and **compress prompts** for LLM providers. Built in Rust for performance, portability, and safety.\n\n## ✨ What's in v0.2.0\n\n- **Prompt Compression** — Reduce prompt tokens by 70-90% using the [Hieratic format](#️-prompt-compression)\n- **Quality Metrics** — Measure compression fidelity (semantic similarity, critical instruction preservation, structural integrity)\n- **LLM-as-a-Judge** — Compare outputs from original vs compressed prompts using an LLM judge\n- **Incremental Compression** — Compress long conversations turn-by-turn without re-processing history\n- **Structured Mode** — Compression-aware handling of JSON, code blocks, HTML tables, and technical docs\n- Everything from v0.1.x: token counting, cost estimation, multi-model comparison, load testing\n\n---\n\n## 🚀 Installation\n\n### Quick Install (macOS \u0026 Linux)\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash\n```\n\nDetects your platform, downloads the latest release, verifies its checksum, and installs `tokuin` to `/usr/local/bin` (or `~/.local/bin` if root access is unavailable).\n\n### What's included per platform\n\nRelease binaries are built with `--features all`. Embedding-based semantic scoring (`compression-embeddings`) requires the ONNX Runtime, which only has compatible prebuilt binaries for some platforms:\n\n| Platform | Token counting | Compression | Embedding scoring | LLM judge |\n|---|---|---|---|---|\n| Linux x86_64 | ✅ | ✅ | ✅ | ✅ |\n| macOS Apple Silicon (aarch64) | ✅ | ✅ | ✅ | ✅ |\n| macOS Intel (x86_64) | ✅ | ✅ | ❌ heuristic only | ✅ |\n| Linux aarch64 | ✅ | ✅ | ❌ heuristic only | ✅ |\n| Windows x86_64 | ✅ | ✅ | ❌ heuristic only | ✅ |\n\nPlatforms without embedding scoring still have full compression — quality metrics use heuristic scoring instead of embedding-based semantic similarity. See [Platform Support for Embedding Model Loading](#platform-support-for-embedding-model-loading) for the technical reasons and how to get full support on any platform.\n\n### Quick Install (Windows PowerShell)\n\n```powershell\nirm https://raw.githubusercontent.com/nooscraft/tokuin/main/install.ps1 | iex\n```\n\nBinary is placed in `%LOCALAPPDATA%\\Programs\\tokuin`. To customize: download the script first and invoke `.\\install.ps1 -InstallDir \"C:\\Tools\"`.\n\n### From Releases\n\nRelease archives for all platforms are at [GitHub Releases](https://github.com/nooscraft/tokuin/releases). Download the archive for your OS/architecture, verify against `checksums.txt`, and place `tokuin` on your `PATH`.\n\n### From Source\n\n**Minimal build (token counting only):**\n```bash\ngit clone https://github.com/nooscraft/tokuin.git\ncd tokuin\ncargo build --release\n```\n\n**Build with all features:**\n```bash\ncargo build --release --features all\n```\n\n**Build with embedding support (semantic scoring):**\n```bash\ncargo build --release --features all,compression-embeddings\n# Then set up models:\n./target/release/tokuin setup models\n```\n\n---\n\n## 📖 Usage\n\n### Token Counting\n\n```bash\necho \"Hello, world!\" | tokuin --model gpt-4\ntokuin prompt.txt --model gpt-4 --price\n```\n\n### Multi-Model Comparison\n\n```bash\necho \"Hello, world!\" | tokuin --compare gpt-4 gpt-3.5-turbo --price\n```\n\n### JSON Output / Breakdown\n\n```bash\necho \"Hello, world!\" | tokuin --model gpt-4 --format json\necho '[{\"role\":\"system\",\"content\":\"You are helpful\"},{\"role\":\"user\",\"content\":\"Hi!\"}]' | \\\n  tokuin --model gpt-4 --breakdown --price\n```\n\n### Diff \u0026 Watch\n\n```bash\ntokuin prompt.txt --model gpt-4 --diff prompt-v2.txt --price\ntokuin prompt.txt --model gpt-4 --watch\n```\n\n---\n\n## 🗜️ Prompt Compression\n\nTokuin includes a prompt compression system using the **Hieratic format** — a structured, LLM-parseable representation that reduces token usage by 70-90% while preserving semantic meaning.\n\n### Why Hieratic?\n\nNamed after ancient Egypt's compressed cursive script (a practical simplification of hieroglyphics), Hieratic represents the same information in far fewer tokens. Any modern LLM reads it directly without decoding.\n\n### Quick Start\n\n```bash\n# Compress a prompt (medium level by default)\ntokuin compress my-prompt.txt\n\n# With quality metrics\ntokuin compress my-prompt.txt --quality\n\n# Aggressive compression\ntokuin compress my-prompt.txt --level aggressive --quality\n\n# Structured mode (JSON, code blocks, HTML tables)\ntokuin compress api-spec.txt --structured --level medium\n```\n\n### Compression Levels\n\n| Level | Token Reduction | Quality Trade-off |\n|---|---|---|\n| `light` | 30–50% | Minimal — safe for most prompts |\n| `medium` | 50–70% | Balanced — good default |\n| `aggressive` | 70–90% | Maximum — verify with `--quality` |\n\n### Hieratic Format Example\n\n**Original (850 tokens):**\n```\nYou are an expert programmer with 10 years of experience in building\ndistributed systems, microservices, and cloud-native applications...\n[full verbose role description]\n\nExample 1: Authentication Bug Fix\nGiven a service that was experiencing session token bypass attacks...\n[detailed example spanning 200 tokens]\n```\n\n**Hieratic (285 tokens — 66% reduction):**\n```\n@HIERATIC v1.0\n\n@ROLE[inline]\n\"Expert engineer: 10y distributed systems, microservices, cloud-native\"\n\n@EXAMPLES[inline]\n1. Auth bug: session bypass → HMAC signing → 94% bot reduction\n2. DB perf: 2.3s queries → pooling+cache → 0.1s, 10x capacity\n\n@TASK\nAnalyze code and provide recommendations\n\n@FOCUS: performance, security, maintainability\n@STYLE: concise, actionable\n```\n\nLLMs read Hieratic natively — no expansion step needed when sending to an API.\n\n### Quality Metrics\n\nUse `--quality` to measure compression fidelity:\n\n```bash\ntokuin compress prompt.txt --quality\n```\n\n```\nQuality Metrics:\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\nOverall Score: 82.3% (Good)\n  ├─ Semantic Similarity:      85.1%\n  ├─ Critical Instructions:    3/3 preserved (100.0%)\n  ├─ Information Retention:    78.2%\n  └─ Structural Integrity:     100.0%\n\n✅ Quality is acceptable (\u003e= 70%)\n```\n\n**Scoring modes:**\n\n| Mode | Speed | Accuracy | Requires |\n|---|---|---|---|\n| `heuristic` | Fast | Keyword-based | Nothing (default) |\n| `semantic` | Slower | Embedding-based | `compression-embeddings` |\n| `hybrid` | Moderate | Best of both | `compression-embeddings` |\n\n```bash\n# Use semantic scoring (on supported platforms)\ntokuin compress prompt.txt --scoring semantic --quality\n\n# Heuristic scoring (always available)\ntokuin compress prompt.txt --scoring heuristic --quality\n```\n\n### Structured Document Mode\n\nFor prompts with JSON, HTML tables, BNF grammars, or code blocks:\n\n```bash\ntokuin compress technical-spec.txt --structured --level medium\n```\n\nStructured mode:\n- Preserves JSON document structure\n- Keeps HTML tables intact\n- Detects and consolidates repetitive instruction patterns\n- Segments by logical sections (definitions, examples, format specs)\n- Applies structure-aware importance scoring\n\nUse structured mode for: extraction prompts, API documentation, data processing specs.  \nUse default mode for: conversational prompts, natural language instructions, role descriptions.\n\n### Incremental Compression\n\nFor multi-turn conversations or continuously growing documents:\n\n```bash\n# First turn — creates conversation-turn1.txt.state.json\ntokuin compress conversation-turn1.txt --incremental\n\n# Subsequent turns — state file is auto-detected\ntokuin compress conversation-turn2.txt --incremental\ntokuin compress conversation-turn3.txt --incremental\n```\n\nOnly the delta since the last anchor is compressed — no re-processing of history.\n\n**Options:**\n- `--anchor-threshold \u003cN\u003e`: Tokens before creating an anchor summary (default: 1000)\n- `--retention-threshold \u003cN\u003e`: Recent tokens kept uncompressed (default: 500)\n- `--previous \u003cPATH\u003e`: Override default state file path\n\n```\nIncremental Mode:\n  Anchors: 3\n  Anchor tokens: 7300\n  Retained tokens: 500\n```\n\n### LLM-as-a-Judge Evaluation\n\nTest whether compressed prompts produce equivalent outputs to the original — the most reliable quality signal:\n\n```bash\nexport OPENROUTER_API_KEY=\"sk-or-...\"\n\ntokuin compress prompt.txt --quality --llm-judge\n```\n\n**How it works:**\n1. Sends the original prompt to the evaluation model → output A\n2. Sends the compressed prompt → output B\n3. Judge model scores A vs B on equivalence, instruction compliance, information completeness\n\n```\nLLM Judge Evaluation:\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\nOutput Equivalence:       92/100\nInstruction Compliance:   95/100\nInformation Completeness: 88/100\nQuality Preservation:     90/100\nOverall Fidelity:         91/100 (Excellent)\nEvaluation Cost: $0.012\n```\n\n**Options:**\n- `--llm-judge`: Enable LLM judge (requires `load-test` feature + API key)\n- `--evaluation-model \u003cMODEL\u003e`: Model for generating outputs (default: `anthropic/claude-3-opus`)\n- `--judge-model \u003cMODEL\u003e`: Model for scoring (default: `anthropic/claude-3-opus`)\n- `--judge-api-key \u003cKEY\u003e`: API key (or use `OPENROUTER_API_KEY` env var)\n- `--judge-provider \u003cPROVIDER\u003e`: `openrouter` (default), `openai`, `anthropic`\n\n**Cost per evaluation:** ~$0.01–0.05 (2 output calls + 1 judge call). Use cheaper models like `anthropic/claude-3-haiku` for evaluation and keep the high-quality model for judging.\n\n### Expand Compressed Prompts\n\n```bash\ntokuin expand compressed.hieratic --output expanded.txt\n\n# Pipe directly to an LLM tool\ntokuin expand compressed.hieratic | your-llm-tool\n```\n\n### Limitations and Known Drawbacks\n\n**Where compression works best:**\n- ✅ Prompts ≥ 100 tokens (70–90% reduction)\n- ✅ Prompts with repetitive role descriptions, examples, or constraints\n- ✅ Technical docs with clear sections (JSON extraction, API specs)\n- ✅ Multi-turn conversations (use `--incremental`)\n\n**Where it may not help or hurt:**\n- ❌ Very short prompts (\u003c 50 tokens) — Hieratic header overhead exceeds savings\n- ❌ Already dense text (URLs, code, minimal prose) — little redundancy to remove\n- ❌ Single-sentence instructions — format overhead not worth it\n- ❌ Prompts where exact wording is critical (legal, medical) — always verify with `--quality`\n\n**General caveats:**\n- Compression quality varies by prompt type and content; results are not guaranteed\n- `aggressive` level may lose nuance; validate outputs on your specific use case\n- Hieratic is a Tokuin-specific format — LLMs understand it, but it is not a standard\n- The compressed output is not human-readable without familiarity with the format\n\n---\n\n### Platform Support for Embedding Model Loading\n\nEmbedding-based semantic scoring (`--scoring semantic` / `--scoring hybrid`) requires loading an ONNX Runtime model at runtime. The ONNX Runtime ships prebuilt native libraries, and not all of them are compatible with every CI build toolchain. As a result, **release binaries only include embedding support on a subset of platforms**.\n\n#### What each platform gets\n\n| Platform | Binary includes embeddings | Scoring available | Model loading |\n|---|---|---|---|\n| **Linux x86_64** | ✅ Yes | semantic, hybrid, heuristic | `tokuin setup models` |\n| **macOS Apple Silicon** (aarch64) | ✅ Yes | semantic, hybrid, heuristic | `tokuin setup models` |\n| **macOS Intel** (x86_64) | ❌ No | heuristic only | N/A — no ONNX prebuilt |\n| **Linux aarch64** | ❌ No | heuristic only | N/A — ABI mismatch in cross build |\n| **Windows x86_64** | ❌ No | heuristic only | N/A — CRT conflict |\n\n#### Why each platform is limited\n\n**macOS Intel (x86_64):** The ONNX Runtime does not publish prebuilt binaries for `x86_64-apple-darwin` with the feature set used in CI builds. Intel Macs are increasingly rare and ort has shifted focus to Apple Silicon.\n\n**Linux aarch64:** The CI builds `aarch64-unknown-linux-gnu` via cross-compilation using a Docker container. The container's C++ standard library is too old to link against the prebuilt ONNX Runtime binary (missing `__cxa_call_terminate` and `std::filesystem` symbols). Native aarch64 Linux hosts (e.g. AWS Graviton, Raspberry Pi 4) can build with full embedding support.\n\n**Windows x86_64:** The ONNX Runtime DLL is linked against the dynamic C runtime (`msvcprt.lib`), but Rust on MSVC links against the static C runtime (`libcpmt.lib`). This causes `LNK2005` multiply-defined symbol errors at link time. There is no simple workaround without switching the entire build to dynamic CRT, which has other tradeoffs.\n\n#### What happens at runtime on unsupported platforms\n\nIf you run `--scoring semantic` or `--scoring hybrid` on a binary built without `compression-embeddings`, Tokuin automatically falls back to heuristic scoring and prints a notice:\n\n```\nℹ️  Embedding scoring requested but not available in this build.\n   Falling back to heuristic scoring.\n   Build with --features compression-embeddings for semantic/hybrid scoring.\n```\n\nQuality metrics still work — they just use keyword matching and position-based weighting instead of embedding cosine similarity.\n\n#### Getting full embedding support on any platform\n\nBuild from source on a **native** host (not cross-compiled):\n\n```bash\ngit clone https://github.com/nooscraft/tokuin.git\ncd tokuin\n\n# Build with embedding support\ncargo build --release --features all,compression-embeddings\n\n# Download embedding models\n./target/release/tokuin setup models\n\n# Verify\n./target/release/tokuin compress prompt.txt --scoring semantic --quality\n```\n\nThe `setup models` command downloads `all-MiniLM-L6-v2` from HuggingFace and caches it at `~/.cache/tokuin/models/`. It requires internet access on first run only.\n\n#### Model setup options\n\n```bash\n# Download tokenizer (required for semantic scoring)\ntokuin setup models\n\n# Also download ONNX model file (optional, higher quality inference)\ntokuin setup models --onnx\n\n# Force re-download\ntokuin setup models --force\n```\n\n**Troubleshooting model download:**\n- If `setup models` fails, check your internet connection and try again\n- The models are fetched directly from HuggingFace over HTTPS using rustls (no system TLS dependency)\n- If the download consistently fails, you can manually place `tokenizer.json` at `~/.cache/tokuin/models/all-MiniLM-L6-v2/tokenizer.json`\n- Compression itself (`--scoring heuristic`) never requires model download\n\n---\n\n## 🧪 Load Testing\n\nRun concurrent load tests against LLM APIs:\n\n```bash\n# Basic load test\nexport OPENAI_API_KEY=\"sk-openai-...\"\necho \"What is 2+2?\" | tokuin load-test \\\n  --model gpt-4 \\\n  --runs 100 \\\n  --concurrency 10\n\n# With OpenRouter (400+ models)\nexport OPENROUTER_API_KEY=\"sk-or-...\"\necho \"Hello!\" | tokuin load-test \\\n  --model openai/gpt-4 \\\n  --runs 50 \\\n  --concurrency 5 \\\n  --provider openrouter\n\n# Dry run — cost estimate only, no API calls\necho \"Test prompt\" | tokuin load-test \\\n  --model gpt-4 \\\n  --runs 1000 \\\n  --concurrency 50 \\\n  --dry-run \\\n  --estimate-cost\n\n# With think time and retries\ntokuin load-test \\\n  --model gpt-4 \\\n  --runs 200 \\\n  --concurrency 20 \\\n  --think-time \"250-750ms\" \\\n  --retry 3 \\\n  --prompt-file prompts.txt\n```\n\n**Output:**\n```\n=== Load Test Results ===\nTotal Requests: 100\nSuccessful: 98 (98.0%)\nFailed: 2 (2.0%)\n\nLatency (ms):\n  Average: 1234.56\n  p50:     1200\n  p95:     1850\n\nCost Estimation:\n  Input tokens:   5000\n  Output tokens: 12000\n  Total cost:    $0.870000\n```\n\n**Environment Variables:**\n- `OPENAI_API_KEY`\n- `ANTHROPIC_API_KEY`\n- `OPENROUTER_API_KEY`\n- `API_KEY` — generic (provider auto-detected unless `--provider` is set)\n\n---\n\n## 📋 Command Reference\n\n### Token Estimate (default)\n\n```\ntokuin [OPTIONS] [FILE|TEXT]\ntokuin estimate [OPTIONS] [FILE|TEXT]\n\nOptions:\n  -m, --model \u003cMODEL\u003e          Model for tokenization (e.g. gpt-4)\n  -c, --compare \u003cMODELS\u003e...    Compare across multiple models\n  -b, --breakdown              Show breakdown by role\n  -f, --format \u003cFORMAT\u003e        text | json | markdown  [default: text]\n  -p, --price                  Show cost estimate\n      --pricing-file \u003cFILE\u003e    Custom pricing TOML (or TOKUIN_PRICING_FILE)\n      --minify                 Strip markdown (requires markdown feature)\n      --diff \u003cFILE\u003e            Compare with another file\n  -w, --watch                  Re-run on file change (requires watch feature)\n```\n\n### Compress\n\n```\ntokuin compress \u003cFILE\u003e [OPTIONS]\n\nOptions:\n  -o, --output \u003cFILE\u003e                   Output file [default: \u003cinput\u003e.hieratic]\n  -l, --level \u003cLEVEL\u003e                   light | medium | aggressive  [default: medium]\n      --structured                       Enable structured document mode\n      --inline                           Force inline mode\n      --incremental                      Incremental compression (delta only)\n      --previous \u003cFILE\u003e                  Custom state file path\n      --anchor-threshold \u003cN\u003e             Anchor interval in tokens [default: 1000]\n      --retention-threshold \u003cN\u003e          Recent tokens kept uncompressed [default: 500]\n  -m, --model \u003cMODEL\u003e                   Tokenizer model [default: gpt-4]\n  -f, --format \u003cFORMAT\u003e                 hieratic | expanded | json  [default: hieratic]\n      --scoring \u003cMODE\u003e                   heuristic | semantic | hybrid  [default: heuristic]\n      --quality                          Show quality metrics\n      --llm-judge                        LLM judge evaluation (requires load-test + API key)\n      --evaluation-model \u003cMODEL\u003e         Model for LLM judge output generation\n      --judge-model \u003cMODEL\u003e              Model for judging [default: anthropic/claude-3-opus]\n      --judge-api-key \u003cKEY\u003e              API key for judge\n      --judge-provider \u003cPROVIDER\u003e        openrouter | openai | anthropic  [default: openrouter]\n      --pricing-file \u003cFILE\u003e              Custom pricing TOML\n```\n\n### Load Test\n\n```\ntokuin load-test [OPTIONS] --model \u003cMODEL\u003e --runs \u003cRUNS\u003e\n\nOptions:\n  -m, --model \u003cMODEL\u003e              Model name\n      --endpoint \u003cURL\u003e             API endpoint (optional)\n      --api-key \u003cKEY\u003e              Generic API key\n      --openai-api-key \u003cKEY\u003e       OpenAI API key\n      --anthropic-api-key \u003cKEY\u003e    Anthropic API key\n      --openrouter-api-key \u003cKEY\u003e   OpenRouter API key\n  -c, --concurrency \u003cN\u003e            Concurrent requests [default: 10]\n  -r, --runs \u003cN\u003e                   Total requests\n  -p, --prompt-file \u003cFILE\u003e         Prompt file (or stdin)\n      --think-time \u003cTIME\u003e          Think time e.g. \"250-750ms\"\n      --retry \u003cN\u003e                  Retries on failure [default: 3]\n  -f, --output-format \u003cFORMAT\u003e     text | json | csv | prometheus | markdown\n      --dry-run                    Estimate cost, no API calls\n      --estimate-cost              Show cost in results\n      --pricing-file \u003cFILE\u003e        Custom pricing TOML\n```\n\n---\n\n## 🔧 Features \u0026 Build Flags\n\n| Feature | Flag | What it adds |\n|---|---|---|\n| Token counting | *(default)* | Core estimation, cost, comparison |\n| Compression | `compression` | Hieratic compress/expand, heuristic quality metrics |\n| Embedding scoring | `compression-embeddings` | Semantic + hybrid scoring via ONNX |\n| Load testing | `load-test` | Concurrent API testing, LLM judge |\n| Markdown | `markdown` | Markdown output format, `--minify` |\n| Watch mode | `watch` | `--watch` file change detection |\n| Gemini | `gemini` | Google Gemini tokenization |\n| All (release) | `all` | Everything above except `compression-embeddings`* |\n\n\\* `compression-embeddings` is excluded from `all` because ONNX Runtime prebuilt binaries are not available for all cross-compiled release targets. Add it explicitly with `--features all,compression-embeddings` on a native host.\n\n```bash\n# Standard release build\ncargo build --release --features all\n\n# With embedding support (native host only)\ncargo build --release --features all,compression-embeddings\n\n# Minimal compression (no embeddings)\ncargo build --release --features compression\n\n# Load testing only\ncargo build --release --features load-test\n```\n\n### Post-Build: Set Up Embedding Models\n\nOnly needed when building with `compression-embeddings`:\n\n```bash\n./target/release/tokuin setup models        # download tokenizer\n./target/release/tokuin setup models --onnx # also download ONNX model\n./target/release/tokuin setup models --force # re-download\n```\n\nModels are stored in `~/.cache/tokuin/models/` and reused across sessions.\n\n---\n\n## 🎯 Supported Models\n\n### OpenAI\n- `gpt-4`, `gpt-4-turbo`, `gpt-3.5-turbo`, `gpt-3.5-turbo-16k`\n\n### Google Gemini (requires `gemini` feature)\n- `gemini-pro`, `gemini-2.5-pro`, `gemini-2.5-flash`\n\n### OpenRouter (requires `load-test` feature)\nUnified access to 400+ models via `provider/model` format:\n- `anthropic/claude-3-opus`, `openai/gpt-4`, `meta-llama/llama-2-70b-chat`, `google/gemini-pro`, and hundreds more\n\n### Anthropic (requires `load-test` feature)\n- `claude-3-opus`, `claude-3-sonnet`, `claude-3-haiku`\n\n### Generic Endpoint\nUse `--provider generic` with `--endpoint` to target any OpenAI-compatible API.\n\n### Planned\nMistral, Cohere, AI21 Labs, Meta LLaMA. See [PROVIDERS_PLAN.md](PROVIDERS_PLAN.md).\n\n---\n\n## 🏗️ Architecture\n\n```\nsrc/\n├── cli.rs                  # Argument parsing and command dispatch\n├── error.rs                # Central error type (thiserror)\n├── tokenizers/             # OpenAI, Gemini tokenizer implementations\n├── models/                 # Model registry and pricing tables\n├── parsers/                # Input parsing (plain text, JSON chat)\n├── output/                 # Formatters (text, JSON, Markdown)\n├── http/                   # HTTP client layer (load-test feature)\n│   └── providers/          # OpenAI, OpenRouter, Anthropic, stubs\n├── simulator/              # Load test engine, concurrency, metrics\n├── compression/            # Hieratic compression pipeline\n│   ├── compressor.rs       # Orchestration and compression levels\n│   ├── parser.rs           # Hieratic format parser\n│   ├── hieratic_encoder.rs # Encode to Hieratic\n│   ├── hieratic_decoder.rs # Expand Hieratic back to text\n│   ├── quality.rs          # Quality metrics\n│   ├── embeddings.rs       # ONNX embedding provider\n│   ├── llm_judge.rs        # LLM-as-a-judge evaluation\n│   ├── pattern_extractor.rs\n│   └── context_library.rs\n└── utils/                  # Shared helpers\n```\n\n---\n\n## 🧪 Testing\n\n```bash\ncargo test\ncargo test --features all\ncargo test -- --nocapture  # with output\ncargo fmt\ncargo clippy -- -D warnings\n```\n\n### MSRV\n\nTargets Rust 1.70+. CI runs on stable and beta. Due to dependency constraints, a specific MSRV is not enforced. See [CONTRIBUTING.md](CONTRIBUTING.md).\n\n---\n\n## 🤝 Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n\u003e Vibe coding welcome — if you collaborate with an AI pair programmer, skim `AGENTS.md` for a quick project brief and mention the session in your PR.\n\n[![Contributors](https://contrib.rocks/image?repo=nooscraft/tokuin)](https://github.com/nooscraft/tokuin/graphs/contributors)\n\n---\n\n## 📄 License\n\nLicensed under either of:\n\n- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))\n- MIT License ([LICENSE-MIT](LICENSE-MIT))\n\nat your option.\n\n## 🙏 Acknowledgments\n\n- Built with [tiktoken-rs](https://github.com/zurawiki/tiktoken-rs) for OpenAI tokenization\n- Inspired by the need for accurate token estimation and cost control in LLM development\n- Hieratic format named after the ancient Egyptian script that compressed hieroglyphics into practical everyday writing\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnooscraft%2Ftokuin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnooscraft%2Ftokuin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnooscraft%2Ftokuin/lists"}