{"id":47676216,"url":"https://github.com/tokenpak/tokenpak","last_synced_at":"2026-04-24T00:01:17.878Z","repository":{"id":353420848,"uuid":"1185689628","full_name":"tokenpak/tokenpak","owner":"tokenpak","description":"Drop-in HTTP proxy that compresses LLM context, optimizes cache hits, routes smart, and tracks every dollar. Zero SDK changes required.","archived":false,"fork":false,"pushed_at":"2026-04-23T21:02:49.000Z","size":7947,"stargazers_count":0,"open_issues_count":9,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-23T21:28:56.777Z","etag":null,"topics":["ai","anthropic","compression","context-window","cost-tracking","developer-tools","gemini","llm","openai","proxy","python","token-optimization"],"latest_commit_sha":null,"homepage":"http://tokenpak.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tokenpak.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"tokenpak"}},"created_at":"2026-03-18T21:05:10.000Z","updated_at":"2026-04-23T20:58:43.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/tokenpak/tokenpak","commit_stats":null,"previous_names":["tokenpak/tokenpak"],"tags_count":28,"template":false,"template_full_name":null,"purl":"pkg:github/tokenpak/tokenpak","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenpak%2Ftokenpak","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenpak%2Ftokenpak/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenpak%2Ftokenpak/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenpak%2Ftokenpak/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tokenpak","download_url":"https://codeload.github.com/tokenpak/tokenpak/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenpak%2Ftokenpak/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32203362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-23T20:19:26.138Z","status":"ssl_error","status_checked_at":"2026-04-23T20:19:23.520Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","anthropic","compression","context-window","cost-tracking","developer-tools","gemini","llm","openai","proxy","python","token-optimization"],"created_at":"2026-04-02T13:31:51.876Z","updated_at":"2026-04-24T00:01:17.872Z","avatar_url":"https://github.com/tokenpak.png","language":"Python","readme":"# TokenPak — Up to 90%+ savings on direct API/CLI and other favorable uncached workloads. One command to configure your LLM proxy.\n\n[![PyPI version](https://img.shields.io/pypi/v/tokenpak.svg)](https://pypi.org/project/tokenpak/)\n[![Python 3.10+](https://img.shields.io/pypi/pyversions/tokenpak.svg)](https://pypi.org/project/tokenpak/)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n\u003c!-- CI badge: pending initial workflow setup — add after .github/workflows/ lands --\u003e\n\nTokenPak is a local proxy that compresses your LLM context before it hits the API — fewer tokens, lower cost, same results. No code changes, no cloud, no credentials stored.\n\n**Observed savings vary by integration path and provider-side cache behavior.** On direct API / CLI / uncached workloads the deterministic pipeline routinely lands in the 90%+ band. On provider-cached flows (Claude Code and similar) observed incremental savings can be much lower, because the provider's own cache already absorbs most of the token pool and TokenPak only optimizes the user-controlled portion. See [How we report savings](#how-we-report-savings) for the full framing.\n\n\u003e **Status: early preview.** Core compression engine and proxy are in place. `tokenpak setup` is the interactive wizard that detects your API keys, picks a compression profile, and starts the proxy. Per-client auto-integration (the forthcoming `tokenpak integrate` command) is not yet shipped — after `tokenpak setup` runs, point your client at `http://127.0.0.1:8766` via the one-line `export` below. See QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart).\n\n---\n\n## Quick start\n\n```bash\npip install tokenpak\ntokenpak setup                      # interactive wizard — detects keys, picks a profile, starts the proxy\n```\n\nThen point your LLM client at the proxy with one env var. For the Anthropic SDK:\n\n```bash\nexport ANTHROPIC_BASE_URL=http://127.0.0.1:8766\n```\n\nOr for OpenAI-compatible clients:\n\n```bash\nexport OPENAI_BASE_URL=http://127.0.0.1:8766\n```\n\nThen use your client normally. TokenPak compresses requests on the way out and logs savings to a local SQLite ledger.\n\nIf you prefer manual configuration (no wizard), `tokenpak start` brings the proxy up with defaults and you set `ANTHROPIC_BASE_URL` / `OPENAI_BASE_URL` yourself.\n\nReproduce the savings floor locally: `make benchmark-headline` (asserts ≥30% reduction on a pinned agent-style fixture; the CI-enforced floor, not the ceiling).\n\nSee QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart) for per-client setup (Claude Code, Cursor, Aider, and others).\n\n---\n\n## What savings look like\n\nAfter a few proxied requests, `tokenpak savings` reports the cumulative reduction:\n\n```\n┌──────────────────────────────────────────────────────┐\n│  TokenPak — Savings                                  │\n├──────────────────────────────────────────────────────┤\n│  Sample scenario       DevOps agent (config + logs)  │\n│  Savings drivers                      dedup + alias  │\n├──────────────────────────────────────────────────────┤\n│  Original                                747 tokens  │\n│  Compressed                              502 tokens  │\n│  Saved                          245 tokens  (32.8%)  │\n│  Cost saved (est.)                $0.00073 per call  │\n├──────────────────────────────────────────────────────┤\n│  Stages: dedup, alias, segmentize, directives        │\n└──────────────────────────────────────────────────────┘\n```\n\nActual numbers depend on your workload. Agent-style prompts with lots of repeated context see the biggest gains.\n\n---\n\n## Works with\n\nAny LLM client that respects a custom base URL:\n\n**Claude Code** · **Cursor** · **Cline** · **Continue.dev** · **Aider** · **OpenAI SDK** · **Anthropic SDK** · **LiteLLM** · **Codex**\n\nPer-client configuration steps are in QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart). Auto-wiring via a single `tokenpak integrate \u003cclient\u003e` command is tracked for a future release.\n\n---\n\n## Install\n\n```bash\npip install tokenpak\n```\n\nTokenPak's runtime dependencies include `anthropic`, `openai`, `fastapi`, `flask`, `litellm`, `llmlingua`, `pandas`, `pydantic`, `requests`, `rich`, `scipy`, `sentence-transformers`, `tree-sitter-languages`, `watchdog`, and a few others — all installed automatically. Note that `sentence-transformers` and `scipy` are large (several hundred MB of dependencies); expect `pip install` to take a few minutes on first install.\n\nRequires Python 3.10+.\n\nSee QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart) for virtual-env setup and first-run details.\n\n---\n\n## What's included\n\n- **Context compression** — deterministic pipeline (dedup → alias → segmentize → directives). Up to 90%+ reduction on direct-API / uncached workloads; lower on provider-cached flows. CI-enforced ≥30% floor on the pinned agent-style fixture.\n- **Local proxy** — runs at `127.0.0.1:8766`; zero cloud component.\n- **Model routing** — configurable rules with fallback chains. Route-class policy presets ship under `tokenpak/services/policy_service/presets/` covering Claude Code (TUI/CLI/TMUX/SDK/IDE/CRON), Anthropic SDK, OpenAI SDK, and generic.\n- **Cost \u0026 savings tracking** — per model, per session, per agent; local SQLite (`~/.tokenpak/monitor.db`).\n- **Dashboard** — local web UI for visualizing savings at `http://127.0.0.1:8766/dashboard` (also reachable via `tokenpak dashboard`).\n- **Vault indexing + semantic search** — index a directory; search without an LLM call.\n- **A/B testing and request replay** — compare compression configs; re-run past requests.\n- **Custom compression recipes** — author your own via `tokenpak recipe create/validate/test/benchmark`.\n\n## How we report savings\n\nTokenPak's savings aren't a single number because they shouldn't be. The real savings depend on where in your stack the compression runs:\n\n- **Direct API calls, CLI tools, SDK integrations, and any uncached workload** — the compression pipeline operates on the full token pool. Observed savings routinely reach **90%+** on realistic agent-style prompts. The pinned CI benchmark measures this path; `make benchmark-headline` reproduces it.\n- **Provider-cached flows (Claude Code and similar integrations)** — your client uses the provider's server-side prompt cache for most of the prompt (system prompt, tool definitions, historical turns). TokenPak only optimizes the **user-controlled portion** of the token pool. Observed incremental savings on these paths can be much lower — sometimes a few percent of total spend — because the provider cache already did the heavy lifting. This isn't TokenPak failing; it's an honest division of labor.\n\nIf you're evaluating TokenPak, start with a direct-API workload to see the compression pipeline's actual effectiveness, then layer in your cached flows to see the marginal contribution on top of the provider cache.\n\nSee QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart) and API reference at https://github.com/tokenpak/docs (rendered at tokenpak.ai/api) to get started.\n\n---\n\n## Pro tier\n\n**Pro** adds team-scale features on top of the OSS core. The OSS proxy, compression engine, client integrations, local dashboard, telemetry store, and route-class policy presets all stay in place — Pro layers the features a team running TokenPak at scale needs:\n\n- **Team-scale cost attribution** — shared dashboard with multi-seat cost attribution: which workloads, which engineers, which tools drive spend, without wiring a BI pipeline.\n- **Budget enforcement** — monthly budget caps map to hard `429 budget_exceeded` responses at request time. Not a passive dashboard: the proxy refuses to forward the request instead of silently burning through the rest of your budget.\n- **Advanced routing policies** — cost-aware fallbacks, SLA-aware failover across providers, tiered routing by workload kind. OSS gives you the rule surface; Pro gives you the policy tooling that runs on top.\n- **Enterprise credential management** — credential-rotation hooks, audit-log export, compliance rulesets that plug into the compression pipeline.\n- **SSO, priority support, SLA** — dashboard SSO, priority support with an SLA, team onboarding.\n\nShips as the `tokenpak-paid` package via a private license-gated index at `pypi.tokenpak.ai`. See [tokenpak.ai/paid](https://tokenpak.ai/paid) to request access.\n\nInstalling the Pro package (after you have a license key):\n\n```bash\npip install --index-url https://pypi.tokenpak.ai --extra-index-url https://pypi.org/simple tokenpak-paid\ntokenpak activate \u003cyour-license-key\u003e\n```\n\nRunning `pip install tokenpak-paid-stub` from public PyPI fetches a discovery stub that prints these install instructions — so `pip` works as a learning path, not a dead end. The real paid code stays license-gated.\n\n---\n\n## Privacy + compliance\n\nTokenPak is local-first by design. Every prompt, response, and API key stays on your machine. The only data that ever leaves your machine is the LLM request you were going to send anyway — forwarded to your configured provider using your own credentials.\n\n- [**Privacy**](https://tokenpak.ai/compliance/privacy) — what's stored locally, what leaves, and every optional debug-logging escape hatch disclosed in full.\n- [**Data Processing Agreement (template)**](https://tokenpak.ai/compliance/dpa) — IAPP-based template; marked pending legal review.\n- [**Sub-processors**](https://tokenpak.ai/compliance/sub-processors) — Stripe / Cloudflare / Fly.io / GitHub / PyPI for the Pro-tier infrastructure; the OSS proxy has none.\n\nFor the runtime posture verification, run `tokenpak doctor --privacy` (plain-English summary) or `tokenpak doctor --conformance` (executes the TIP-1.0 self-conformance suite).\n\n---\n\n## Current limitations\n\nHonest about what isn't ready yet:\n\n- **No `tokenpak integrate \u003cclient\u003e` auto-wire command** — configure clients by env var as shown above. Auto-wire is planned.\n- **No published CI/CD** — releases are manual; automation is tracked in the release-workflow standards.\n- **`tokenpak demo` is a compression-recipes demo** (shows recipes applied to a sample input), not the decorated savings panel above. The panel shows what `tokenpak savings` output can look like after real usage.\n\nWe'd rather ship an honest preview than an advertised product that doesn't match install-time reality.\n\n---\n\n## Non-localhost access\n\nTokenPak's default is localhost-only. If you want to expose the proxy to other machines on your LAN, set an auth token:\n\n```bash\nexport TOKENPAK_PROXY_AUTH_TOKEN=$(openssl rand -hex 32)\ntokenpak start                # or tokenpak setup for first-time config\n```\n\nClients then include the token on non-localhost requests:\n\n```\nX-TokenPak-Auth: \u003cyour-token\u003e\n```\n\nLocalhost (`127.0.0.1`, `::1`) traffic bypasses auth — your local tools keep working without changes. Non-localhost requests without the env var return `403 forbidden`; requests with a missing or wrong header return `401 unauthorized`. The token is stripped from the request before any upstream forward, so provider APIs (Anthropic, OpenAI, etc.) never see it.\n\n---\n\n## Support\n\n- **Docs:** QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart) · API reference at https://github.com/tokenpak/docs (rendered at tokenpak.ai/api) · FAQ at https://github.com/tokenpak/docs (rendered at tokenpak.ai/faq)\n- **Issues:** [github.com/tokenpak/tokenpak/issues](https://github.com/tokenpak/tokenpak/issues)\n- **Discussions:** [github.com/tokenpak/tokenpak/discussions](https://github.com/tokenpak/tokenpak/discussions)\n- **Email:** hello@tokenpak.ai\n\n---\n\n## License\n\nApache 2.0. See [LICENSE](LICENSE).\n","funding_links":["https://github.com/sponsors/tokenpak"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftokenpak%2Ftokenpak","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftokenpak%2Ftokenpak","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftokenpak%2Ftokenpak/lists"}