{"id":47640519,"url":"https://github.com/vectorarc/avp-python","last_synced_at":"2026-04-26T17:01:33.489Z","repository":{"id":338902455,"uuid":"1159600714","full_name":"VectorArc/avp-python","owner":"VectorArc","description":"Python SDK for Agent Vector Protocol – transfer KV-cache between LLM agents instead of text","archived":false,"fork":false,"pushed_at":"2026-03-30T06:25:08.000Z","size":1120,"stargazers_count":16,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-02T08:49:17.996Z","etag":null,"topics":["ai-agents","inference","kv-cache","llm","machine-learning","multi-agent","protocol","python","transformers","vllm"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VectorArc.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-16T23:24:32.000Z","updated_at":"2026-03-31T06:55:50.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/VectorArc/avp-python","commit_stats":null,"previous_names":["vectorarc/avp-python"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/VectorArc/avp-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VectorArc%2Favp-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VectorArc%2Favp-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VectorArc%2Favp-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VectorArc%2Favp-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VectorArc","download_url":"https://codeload.github.com/VectorArc/avp-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VectorArc%2Favp-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32305039,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T09:34:17.070Z","status":"ssl_error","status_checked_at":"2026-04-26T09:34:00.993Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","inference","kv-cache","llm","machine-learning","multi-agent","protocol","python","transformers","vllm"],"created_at":"2026-04-02T00:50:41.048Z","updated_at":"2026-04-26T17:01:33.484Z","avatar_url":"https://github.com/VectorArc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AVP – Agents Share Thoughts, Not Text\n\n[![PyPI](https://img.shields.io/pypi/v/avp.svg)](https://pypi.org/project/avp/)\n[![CI](https://github.com/VectorArc/avp-python/actions/workflows/ci.yml/badge.svg)](https://github.com/VectorArc/avp-python/actions/workflows/ci.yml)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://python.org)\n[![Spec](https://img.shields.io/badge/spec-v0.4-blue.svg)](https://github.com/VectorArc/avp-spec)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/VectorArc/avp-python/blob/main/notebooks/avp_quick_start.ipynb)\n\nWhen LLM agents hand off work as text, the next agent re-processes everything from scratch. AVP (Agent Vector Protocol) transfers the actual computation (KV-cache, hidden states, attention) so the receiving agent picks up where the sender left off. Zero tokens between agents, 2-3x faster pipelines, same or better accuracy. Built on [LatentMAS](https://arxiv.org/abs/2511.20639), extended with cross-model vocabulary-mediated projection. Zero training, works across model families.\n\n```bash\npip install avp[hf]\n```\n\n\u003e **Requires self-hosted models on GPUs.** AVP accesses model internals (KV-cache, hidden states) that cloud APIs don't expose. Other engines: `avp[ollama]`, `avp[llamacpp]`, `avp[vllm]` – see [Works With](#works-with).\n\n## Quick Start\n\n**Same model** – two agents share a KV-cache:\n\n```python\nfrom avp import HuggingFaceConnector\n\nconnector = HuggingFaceConnector.from_pretrained(\"Qwen/Qwen2.5-7B-Instruct\")\n\n# Agent A thinks (builds KV-cache, no text output)\ncontext = connector.think(\"Analyze this math problem: 24 * 17 + 3\", steps=20)\n\n# Agent B generates using Agent A's KV-cache\nanswer = connector.generate(\"Solve step by step: 24 * 17 + 3\", context=context)\n```\n\n**Cross-model** – different architectures, zero training:\n\n```python\nresearcher = HuggingFaceConnector.from_pretrained(\"Qwen/Qwen2.5-7B-Instruct\")\nsolver = HuggingFaceConnector.from_pretrained(\"meta-llama/Llama-3.2-3B-Instruct\")\n\ncontext = researcher.think(\"Analyze this problem\", steps=20)\nanswer = solver.generate(\"Solve it\", context=context, source=researcher, cross_model=True)\n```\n\n**Cross-process** – serialize context over any transport:\n\n```python\n# Process A\nwire_bytes = context.to_bytes(session_id=\"s1\", source_agent_id=\"agent-a\")\n\n# Process B\nrestored = AVPContext.from_bytes(wire_bytes, device=\"cuda\")\nanswer = connector.generate(prompt, context=restored)\n```\n\nYou don't choose the transfer mode. The handshake auto-negotiates based on model compatibility: same model → full KV-cache, different models → vocabulary-mediated projection (~6 KB), incompatible models → JSON text fallback.\n\n## Results\n\n**Direct** = single model, no pipeline. **Latent** = AVP transfer. **Text Chain** = standard text handoff between agents.\n\n| | Direct | Latent (AVP) | Text Chain |\n|---|--------|--------------|------------|\n| **HumanEval** (Qwen 7B, n=164) | 58.5% | **67.1%** | 53.0% |\n| **GSM8K** (Qwen 7B, n=200) | 91.0% | 90.5% | 87.0% |\n| **DebugBench** (Qwen 7B, n=100) | 50.0% | 51.0% | 49.0% |\n| **GSM8K** (Llama 3B, n=200) | 74.5% | 76.0% | 79.0% |\n\nHumanEval: +12.4pp vs text across 4 seeds (p=0.004). GSM8K and DebugBench: neutral across all modes, but the pipeline runs 3x faster (7.6s vs 22.8s end-to-end on DebugBench). Llama 3B: text wins on GSM8K; latent overhead has more impact on smaller models. All benchmarks used `steps=20` on NVIDIA A100.\n\n**Trade-off:** 20 latent steps cost ~0.9s on A100. If Agent A would normally generate 22+ tokens of text, latent is faster.\n\n**Cross-model (zero training):**\n\n| Source → Target | GSM8K (Rosetta / Text) | HumanEval (Rosetta / Text) |\n|-----------------|------------------------|----------------------------|\n| Qwen 7B → Qwen 3B | 82.5% / **88.5%** | **66.5%** / 62.2% |\n| Qwen 7B → Llama 3B | 77.0% / **86.5%** | 47.0% / **57.9%** |\n| Llama 3B → Qwen 7B | **90.0%** / 82.0% | **79.3%** / 61.6% |\n\nTarget solo baselines: Qwen 3B = 82.5% / 61.0%, Llama 3B = 76.0% / 50.6%, Qwen 7B = 91.0% / 58.5%.\n\nFull results: **[Benchmarks](docs/BENCHMARKS.md)** – 7 benchmarks, 5 models, 2 families, reproducible.\n\n## How It Works\n\n![How AVP works](assets/how_it_works_diagram.svg)\n\nAVP auto-negotiates the transfer mode via a handshake at connection time. You write the same `think()` / `generate()` code regardless of which mode is selected:\n\n| Mode | When | What transfers | Size |\n|------|------|----------------|------|\n| **Latent** | Same model | Full KV-cache | ~390 MB for 7B |\n| **Cross-model** | Different model or family | Projected hidden state via shared vocabulary | ~6 KB |\n| **JSON fallback** | No compatible projection path | Plain text | Varies |\n\nThe handshake checks model hash → structural match → shared tokenizer → vocabulary overlap (≥100 BPE tokens) → JSON. You never configure this manually.\n\n## Works With\n\n### Engines\n\n| Engine | Latent Pipeline | Cross-model |\n|--------|----------------|-------------|\n| **[HuggingFace](docs/FRAMEWORK_INTEGRATION.md)** `avp[hf]` | Full think/generate | Yes |\n| **[Ollama](docs/FRAMEWORK_INTEGRATION.md)** `avp[ollama]` | Full think/generate, auto-resolves GGUF | Yes |\n| **[llama.cpp](docs/FRAMEWORK_INTEGRATION.md)** `avp[llamacpp]` | Full think/generate on GGUF | Yes |\n| **[vLLM](docs/FRAMEWORK_INTEGRATION.md)** `avp[vllm]` | KV connector + model plugin | Yes |\n\n### Frameworks\n\n| Framework | Integration | Extra |\n|-----------|-------------|-------|\n| **[LangChain](docs/FRAMEWORK_INTEGRATION.md)** | `ChatAVP` BaseChatModel | `avp[langchain]` |\n| **[CrewAI](docs/FRAMEWORK_INTEGRATION.md)** | `AVPLLM` BaseLLM | `avp[crewai]` |\n| **[AutoGen](docs/FRAMEWORK_INTEGRATION.md)** | `AVPChatCompletionClient` | `avp[autogen]` |\n| **A2A / MCP** | Complementary: AVP handles tensor transfer, they handle routing | – |\n\nSee **[Framework Integration Guide](docs/FRAMEWORK_INTEGRATION.md)** for per-engine code examples.\n\n## Roadmap\n\n- Bidirectional latent communication (both agents share thinking, not just one)\n- CacheGen-style KV-cache compression (3-4x reduction)\n\n## Documentation\n\n- **[AVP Specification](https://github.com/VectorArc/avp-spec)** – binary format, handshake, transport\n- **[Benchmarks](docs/BENCHMARKS.md)** – 7 benchmarks, 5 models, 2 families\n- **[Framework Integration](docs/FRAMEWORK_INTEGRATION.md)** – engines, frameworks, per-engine examples\n- **[Examples](examples/)** – quickstart, cross-model, and agent demos\n- **[CHANGELOG](CHANGELOG.md)**\n\n## License\n\nApache 2.0 – see [LICENSE](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvectorarc%2Favp-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvectorarc%2Favp-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvectorarc%2Favp-python/lists"}