{"id":51007429,"url":"https://github.com/vjalbert/pith-skill","last_synced_at":"2026-06-20T22:01:15.383Z","repository":{"id":363846884,"uuid":"1265106204","full_name":"VjAlbert/pith-skill","owner":"VjAlbert","description":"PITH - Inter-Agent Payload Compressor. Compresses agent-to-agent handoff payloads using Zipf scoring + Benford Law validation.","archived":false,"fork":false,"pushed_at":"2026-06-10T15:07:56.000Z","size":20,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T17:08:09.869Z","etag":null,"topics":["agent-compression","benford","claude-code","claude-skill","llm","multi-agent","nlp","python","token-optimization","zipf"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VjAlbert.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-10T13:21:59.000Z","updated_at":"2026-06-10T15:15:27.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/VjAlbert/pith-skill","commit_stats":null,"previous_names":["vjalbert/pith-skill"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/VjAlbert/pith-skill","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VjAlbert%2Fpith-skill","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VjAlbert%2Fpith-skill/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VjAlbert%2Fpith-skill/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VjAlbert%2Fpith-skill/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VjAlbert","download_url":"https://codeload.github.com/VjAlbert/pith-skill/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VjAlbert%2Fpith-skill/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34586666,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-20T02:00:06.407Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-compression","benford","claude-code","claude-skill","llm","multi-agent","nlp","python","token-optimization","zipf"],"created_at":"2026-06-20T22:01:10.049Z","updated_at":"2026-06-20T22:01:15.373Z","avatar_url":"https://github.com/VjAlbert.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PITH — Inter-Agent Payload Compressor\n\n\u003e *\"Why use many token when few do trick?\"*\n\u003e — but make it mathematically principled.\n\n---\n\n## Origin\n\nPITH was born from a conversation about compression, mathematics, and a gap nobody had filled.\n\nI'm a developer with a deep interest in **game theory** and **Benford's Law** was exploring why inter-agent communication in multi-agent AI systems was so wasteful. Caveman compressed what agents *say to users*. LLMLingua compressed what users *say to agents*. But the enormous payload exchanged *between agents* tool results, reasoning traces, intermediate context was being passed raw, untouched, token-heavy.\n\nThe insight came from game theory: in any multi-player system, the Nash equilibrium of communication is the strategy where each player transmits the minimum information necessary for the next player to act optimally. Every token above that minimum is a deviation from equilibrium a pure cost with no strategic value.\n\nThe question became: *how do you find that equilibrium automatically, without destroying meaning?*\n\n---\n\n## The Mathematical Foundation\n\n### Why Zipf?\n\nGeorge Kingsley Zipf observed in 1949 that in natural language, word frequency follows a power law: the most common word appears roughly twice as often as the second most common, three times as often as the third, and so on.\n\nThe consequence for compression is elegant: **rare words carry more information**. A sentence dense with unusual, technical, specific vocabulary is doing more work per token than a sentence full of common connectives and transitional phrases. PITH uses word length ≥ 7 characters as a zero-dependency proxy for Zipf rarity — rare words are systematically longer. No external corpus, no model call, no latency.\n\n### Why Benford?\n\nFrank Benford observed in 1938 that in naturally occurring datasets, the leading digit follows a logarithmic distribution: ~30% of numbers begin with 1, ~17% with 2, ~12% with 3, and so on decreasing to ~4.6% for 9.\n\nWhat does this have to do with text? Sentence lengths in natural human writing follow this same distribution. Short sentences (1–9 words) are most common, then medium sentences, then long ones — and the distribution of their first digits approximates Benford's Law. This is a signature of *organic writing* — the natural rhythm of human thought.\n\nAI-generated text and over-compressed text deviate from this signature. They tend toward uniform sentence lengths, producing a flatter distribution. The Mean Absolute Deviation (MAD) from the expected Benford distribution is therefore a structural integrity signal: low MAD = natural text, high MAD = artificial or damaged.\n\n**Empirical validation (5 texts, 82 segments):**\n\n| Text type | Benford MAD | Verdict |\n|-----------|------------|---------|\n| Darwin (1859) | 5.0% | ✓ natural |\n| Melville (1851) | 3.1% | ✓ natural |\n| AI scientific | 7.5% | ✗ artificial |\n| AI narrative | 13.7% | ✗ artificial |\n\nThe gap between human and AI text is consistent and measurable. PITH uses this as its compression quality gate: if compression increases MAD beyond 2× the original, it relaxes the ratio and retries. The compressor cannot accidentally produce text more artificial than what it started with.\n\n### The Game Theory Connection\n\nIn information-theoretic game theory, optimal communication strategies produce power-law distributions — this is not coincidence. Benford's Law and Zipf's Law are both manifestations of the same underlying principle: natural systems that have evolved toward efficiency follow logarithmic distributions. Language has been optimized over millennia. The distribution of sentence lengths and word frequencies reflects that optimization.\n\nPITH's design hypothesis: **an agent communicating at its Nash equilibrium produces Benford-compliant text**. Deviation from Benford in the compressed output signals a departure from that equilibrium — either the compressor cut too much (over-compressed) or the original was already inflated (padding).\n\n---\n\n## What PITH Does\n\n```\nAGENT A — verbose output (487 tokens)\n    ↓\n┌──────────────────────────────────────────────┐\n│  PITH Compression Pipeline                   │\n│                                              │\n│  1. PARSER                                   │\n│     Extract: code, JSON, URLs, paths, nums   │\n│     These are quarantined — never touched    │\n│                                              │\n│  2. ZIPF SCORER                              │\n│     Score each sentence by vocabulary rarity │\n│     Keep top 60% by density (default)        │\n│                                              │\n│  3. BENFORD GATE                             │\n│     Compute MAD before and after             │\n│     If MAD \u003e 2× original → relax + retry    │\n│     Max 3 attempts                           │\n│                                              │\n│  4. REASSEMBLER                              │\n│     Restore original sentence order          │\n│     Reinsert preserved blocks                │\n│     Add metadata header                      │\n└──────────────────────────────────────────────┘\n    ↓\nAGENT B — compressed payload (284 tokens, -42%)\n[PITH | ✓ | -42% tokens | benford:4.3% | compressed]\n```\n\n---\n\n## Installation\n\n### As a Claude Code skill\n\nPlace the `pith/` directory in your skills folder or install the `.skill` file through Claude Code's skill manager.\n\n### Standalone\n\n```bash\n# No dependencies required — pure Python stdlib\npython3 scripts/compress.py --help\n```\n\n---\n\n## Usage\n\n```bash\n# Basic compression\necho \"\u003cverbose agent output\u003e\" | python3 scripts/compress.py\n\n# Explicit ratio\npython3 scripts/compress.py --payload \"\u003ctext\u003e\" --ratio 0.5\n\n# JSON output for programmatic use\npython3 scripts/compress.py --payload \"\u003ctext\u003e\" --json\n```\n\n### In a Python pipeline\n\n```python\nimport subprocess, json\n\ndef pith(payload: str, ratio: float = 0.6) -\u003e tuple[str, dict]:\n    \"\"\"Compress an inter-agent payload with PITH.\"\"\"\n    result = subprocess.run(\n        [\"python3\", \"path/to/pith/scripts/compress.py\",\n         \"--ratio\", str(ratio), \"--json\"],\n        input=payload, capture_output=True, text=True\n    )\n    data = json.loads(result.stdout)\n    return data[\"compressed\"], data[\"meta\"]\n\n# In your agent pipeline\nraw_output = agent_research.run(\"Find information about X\")\ncompressed, meta = pith(raw_output)\nprint(f\"Saved {meta['saved_pct']:.0f}% ({meta['original_tokens']} → {meta['compressed_tokens']} tokens)\")\nagent_synthesis.run(compressed)\n```\n\n---\n\n## Compression Modes\n\n| Mode | Ratio | Best For |\n|------|-------|----------|\n| Conservative | `--ratio 0.8` | Sensitive reasoning traces |\n| Default | `--ratio 0.6` | Most agent tool results |\n| Aggressive | `--ratio 0.4` | Bulk search results, summaries |\n| Maximum | `--ratio 0.3` | Context window critical |\n\n---\n\n## Benchmarks\n\n| Payload type | Default ratio | Savings | Benford |\n|---|---|---|---|\n| Web search result (verbose) | 0.6 | 34% | ✓ |\n| Web search result (aggressive) | 0.4 | 60% | ✓ |\n| Code execution result + explanation | 0.6 | 30% | ✓ |\n| Short payload (\u003c 5 sentences) | — | 0% passthrough | ✓ |\n| Pure JSON | — | 0% passthrough | ✓ |\n\n---\n\n## What Makes PITH Different\n\n| Tool | What it compresses |\n|------|--------------------|\n| Caveman | Agent → User output |\n| LLMLingua | User → Agent prompt |\n| Selective Context | Retrieved documents |\n| **PITH** | **Agent → Agent handoff payloads** |\n\nPITH fills a specific gap: the payload exchanged *between* agents in a pipeline. This is where token waste compounds — each agent inherits the verbosity of the previous one, and over a 5-agent chain, this can mean thousands of wasted tokens before the final answer is produced.\n\n---\n\n## Limitations\n\n- Requires ≥ 5 sentences for meaningful compression (shorter payloads pass through automatically)\n- Zipf proxy (word length) is an approximation — semantic importance may differ from lexical rarity in edge cases\n- Not suitable for legally sensitive content where exact phrasing is required\n- Benford validation works best on text with 8+ sentences; shorter compressed outputs may show higher MAD\n\n---\n\n## Author\n\nCreated by **Albert** ([@VjAlbert](https://github.com/VjAlbert)) — developer, game theory enthusiast, and Benford's Law advocate. PITH emerged from the belief that AI multi-agent systems should communicate at their Nash equilibrium: minimum tokens, maximum information, validated by the mathematical signatures of natural language.\n\n\u003e *\"Natural systems that evolve toward efficiency follow logarithmic distributions.\n\u003e Language did. Our agents should too.\"*\n\n---\n\n## License\n\nMIT\n\n## Related\n\n- [video-analyzer](https://github.com/VjAlbert/video-analyzer) — another skill by the same author, bridging video files and Claude Projects\n- [Anthropic Skills](https://github.com/anthropics/skills) — the official skills repository\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvjalbert%2Fpith-skill","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvjalbert%2Fpith-skill","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvjalbert%2Fpith-skill/lists"}