https://github.com/thetokencompany/the-token-company-python
Python SDK for The Token Company. Compress LLM prompts to reduce costs and latency
https://github.com/thetokencompany/the-token-company-python
ai compression llm middleware optimization pip python
Last synced: 13 days ago
JSON representation
Python SDK for The Token Company. Compress LLM prompts to reduce costs and latency
- Host: GitHub
- URL: https://github.com/thetokencompany/the-token-company-python
- Owner: TheTokenCompany
- License: mit
- Created: 2026-06-03T00:32:04.000Z (20 days ago)
- Default Branch: main
- Last Pushed: 2026-06-05T05:41:56.000Z (17 days ago)
- Last Synced: 2026-06-08T18:28:14.776Z (14 days ago)
- Topics: ai, compression, llm, middleware, optimization, pip, python
- Language: Python
- Homepage: https://thetokencompany.com/docs
- Size: 35.2 KB
- Stars: 5
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# The Token Company Python SDK
Compress LLM prompts to reduce costs and latency. 100K tokens compressed in ~85ms.
[](https://github.com/TheTokenCompany/the-token-company-python/actions/workflows/ci.yml)
[](https://pypi.org/project/the-token-company/)
[](https://pypi.org/project/the-token-company/)
[](https://github.com/TheTokenCompany/the-token-company-python/blob/main/LICENSE)
[Docs](https://thetokencompany.com/docs) · [Website](https://thetokencompany.com) · [Dashboard](https://app.thetokencompany.com) · [Node.js SDK](https://github.com/TheTokenCompany/the-token-company-node)
## Install
```bash
pip install the-token-company
```
## Quick start
```python
from thetokencompany import TheTokenCompany
client = TheTokenCompany(api_key="ttc-...")
result = client.compress("Your long prompt text here...", model="bear-2")
print(result.output) # compressed text
print(result.tokens_saved) # tokens removed
print(result.compression_ratio) # e.g. 1.8
```
## SDK wrappers
Drop-in wrappers that auto-compress all non-assistant messages before sending to your LLM. Assistant messages pass through unchanged so the provider's KV cache stays warm.
### OpenAI / OpenRouter
```python
from openai import OpenAI
from thetokencompany.openai import with_compression
client = with_compression(OpenAI(), compression_api_key="ttc-...")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Summarize these results..."},
],
)
```
Works with `AsyncOpenAI` too — the wrapper detects async automatically.
### Anthropic
```python
from anthropic import Anthropic
from thetokencompany.anthropic import with_compression
client = with_compression(Anthropic(), compression_api_key="ttc-...")
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful assistant...",
messages=[{"role": "user", "content": "Summarize these results..."}],
)
```
Both `messages` and the `system` parameter are compressed.
## Async
```python
from thetokencompany import AsyncTheTokenCompany
async with AsyncTheTokenCompany(api_key="ttc-...") as client:
result = await client.compress("Your long prompt text...")
```
## Models
| Model | Description |
|------------|------------------------|
| `bear-2` | Latest, recommended |
| `bear-1.2` | Previous generation |
## Aggressiveness
Control compression intensity with `aggressiveness` (0.0 – 1.0, default 0.5):
```python
result = client.compress(text, model="bear-2", aggressiveness=0.8)
```
## App ID
Tag compression requests with an application identifier for usage tracking:
```python
# Set on the client — applies to all requests
client = TheTokenCompany(api_key="ttc-...", app_id="my-chatbot")
# Or per-request (overrides the client-level value)
result = client.compress(text, model="bear-2", app_id="my-chatbot")
```
Also supported in wrappers:
```python
client = with_compression(OpenAI(), compression_api_key="ttc-...", app_id="my-chatbot")
```
## Gzip
Enable gzip compression of request payloads for better performance on large inputs (up to 2.2x faster on 1M+ tokens):
```python
client = TheTokenCompany(api_key="ttc-...", gzip=True)
```
## Protect text from compression
Use `protect()` to wrap content in `` tags — protected text passes through unchanged:
```python
from thetokencompany import protect
prompt = f"{protect('system:')} You are a helpful assistant.\n{protect('user:')} Hello!"
result = client.compress(prompt, model="bear-2")
```
## Response
`CompressResponse` fields:
| Field | Type | Description |
|--------------------|---------|------------------------------------|
| `output` | `str` | Compressed text |
| `output_tokens` | `int` | Token count after compression |
| `input_tokens` | `int` | Token count before compression |
| `tokens_saved` | `int` | Tokens removed |
| `compression_ratio`| `float` | Ratio (e.g. 1.8x) |
## License
MIT