https://github.com/attach-dev/attach-gateway
Drop-in OIDC & Google A2A auth + Weaviate memory for Ollama, vLLM and any local LLM server.
https://github.com/attach-dev/attach-gateway
a2a attach-dev auth0 gateway jwt llm memory ollama proxy vllm weaviate
Last synced: 6 months ago
JSON representation
Drop-in OIDC & Google A2A auth + Weaviate memory for Ollama, vLLM and any local LLM server.
- Host: GitHub
- URL: https://github.com/attach-dev/attach-gateway
- Owner: attach-dev
- License: mit
- Created: 2025-04-29T19:55:23.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-13T00:59:13.000Z (8 months ago)
- Last Synced: 2025-09-27T06:12:14.996Z (6 months ago)
- Topics: a2a, attach-dev, auth0, gateway, jwt, llm, memory, ollama, proxy, vllm, weaviate
- Language: Python
- Homepage: http://attach.dev
- Size: 6.67 MB
- Stars: 12
- Watchers: 1
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Attach Gateway
> **Identity & Memory side‑car** for every LLM engine and multi‑agent framework. Add OIDC / DID SSO, A2A hand‑off, and a pluggable memory bus (Weaviate today) – all with one process.
[](https://pypi.org/project/attach-dev/)
---
## Why it exists
LLM engines such as **Ollama** or **vLLM** ship with **zero auth**. Agent‑to‑agent protocols (Google **A2A**, MCP, OpenHands) assume a *Bearer token* is already present but don't tell you how to issue or validate it. Teams end up wiring ad‑hoc reverse proxies, leaking ports, and copy‑pasting JWT code.
**Attach Gateway** is that missing resource‑server:
* ✅ Verifies **OIDC / JWT** or **DID‑JWT**
* ✅ Stamps `X‑Attach‑User` + `X‑Attach‑Session` headers so every downstream agent/tool sees the same identity
* ✅ Implements `/a2a/tasks/send` + `/tasks/status` for Google A2A & OpenHands hand‑off
* ✅ Mirrors prompts & responses to a memory backend (Weaviate Docker container by default)
* ✅ Workflow traces (Temporal)
Run it next to any model server and get secure, shareable context in under 1 minute.
---
## 60‑second Quick‑start (local laptop)
### Option 1: Install from PyPI (Recommended)
```bash
# 0) prerequisites: Python 3.12, Ollama installed, Auth0 account or DID token
# Install the package
pip install attach-dev
# 1) start memory in Docker (background tab)
# Mac M1/M2 users: use manual Docker command (see examples/README.md)
docker run --rm -d -p 6666:8080 \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
semitechnologies/weaviate:1.30.5
# 2) export your short‑lived token
export JWT=""
export OIDC_ISSUER=https://YOUR_DOMAIN.auth0.com
export OIDC_AUD=ollama-local
export MEM_BACKEND=weaviate
export WEAVIATE_URL=http://127.0.0.1:6666
# 3) run gateway
attach-gateway --port 8080 &
# 4) make a protected Ollama call via the gateway
curl -H "Authorization: Bearer $JWT" \
-d '{"model":"tinyllama","messages":[{"role":"user","content":"hello"}]}' \
http://localhost:8080/api/chat | jq .
```
### Option 2: Install from Source
```bash
# 0) prerequisites: Python 3.12, Ollama installed, Auth0 account or DID token
git clone https://github.com/attach-dev/attach-gateway.git && cd attach-gateway
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 1) start memory in Docker (background tab)
python script/start_weaviate.py &
# 2) export your short‑lived token
export JWT=""
export OIDC_ISSUER=https://YOUR_DOMAIN.auth0.com
export OIDC_AUD=ollama-local
export MEM_BACKEND=weaviate
export WEAVIATE_URL=http://127.0.0.1:6666
# 3) run gateway
uvicorn main:app --port 8080 &
# The gateway exposes your Auth0 credentials for the demo UI at
# `/auth/config`. The values are read from `AUTH0_DOMAIN`,
# `AUTH0_CLIENT` and `OIDC_AUD`.
# 4) make a protected Ollama call via the gateway
curl -H "Authorization: Bearer $JWT" \
-d '{"model":"tinyllama","messages":[{"role":"user","content":"hello"}]}' \
http://localhost:8080/api/chat | jq .
```
In another terminal, try the Temporal demo:
```bash
pip install temporalio # optional workflow engine
python examples/temporal_adapter/worker.py &
python examples/temporal_adapter/client.py
```
You should see a JSON response plus `X‑ATTACH‑Session‑Id` header – proof the pipeline works.
---
## Use in your project
1. Copy `.env.example` → `.env` and fill in OIDC + backend URLs
2. `pip install attach-dev python-dotenv`
3. `attach-gateway` (reads .env automatically)
→ See **[docs/configuration.md](docs/configuration.md)** for framework integration and **[examples/](examples/)** for code samples.
---
## Architecture (planner → coder hand‑off)
```mermaid
flowchart TD
%%────────────────────────────────
%% COMPONENTS
%%────────────────────────────────
subgraph Front-end
UI["Browser
demo.html"]
end
subgraph Gateway
GW["Attach Gateway
(OIDC SSO + A2A)"]
end
subgraph Agents
PL["Planner Agent
FastAPI :8100"]
CD["Coder Agent
FastAPI :8101"]
end
subgraph Memory
WV["Weaviate (Docker)\nclass MemoryEvent"]
end
subgraph Engine
OL["Ollama / vLLM
:11434"]
end
%%────────────────────────────────
%% USER FLOW
%%────────────────────────────────
UI -- ① POST /a2a/tasks/send
Bearer JWT, prompt --> GW
%%─ Planner hop
GW -- ② Proxy → planner
(X-Attach-User, Session) --> PL
PL -- ③ Write plan doc --> WV
PL -- ④ /a2a/tasks/send\nbody:{mem_id} --> GW
%%─ Coder hop
GW -- ⑤ Proxy → coder --> CD
CD -- ⑥ GET plan by mem_id --> WV
CD -- ⑦ POST /api/chat\nprompt(plan) --> GW
GW -- ⑧ Proxy → Ollama --> OL
OL -- ⑨ JSON response --> GW
GW -- ⑩ Write response to Weaviate --> WV
GW -- ⑪ /a2a/tasks/status = done --> UI
```
**Key headers**
| Header | Meaning |
|--------|---------|
| `Authorization: Bearer ` | OIDC or DID token proved by gateway |
| `X‑Attach‑User` | stable user ID (`auth0|123` or `did:pkh:…`) |
| `X‑Attach‑Session` | deterministic hash (user + UA) for request trace |
---
## Live two‑agent demo
```bash
# pane 1 – memory (Docker)
python script/start_weaviate.py
# pane 2 – gateway
uvicorn main:app --port 8080
# pane 3 – planner agent
uvicorn examples.agents.planner:app --port 8100
# pane 4 – coder agent
uvicorn examples.agents.coder:app --port 8101
# pane 5 – static chat UI
cd examples/static && python -m http.server 9000
open http://localhost:9000/demo.html
```
Type a request like *"Write Python to sort a list."* The browser shows:
1. Planner message → logged in gateway, plan row appears in memory.
2. Coder reply → code response, second memory row, status `done`.
---
## Directory map
| Path | Purpose |
|------|---------|
| `auth/` | OIDC & DID‑JWT verifiers |
| `middleware/` | JWT middleware, session header, mirror trigger |
| `a2a/` | `/tasks/send` & `/tasks/status` routes |
| `mem/` | pluggable memory writers (`weaviate.py`, `sakana.py`) |
| `proxy/` | Engine-agnostic HTTP proxy logic |
| `examples/agents/` | *examples* – Planner & Coder FastAPI services |
| `examples/static/` | `demo.html` chat page |
---
## Auth core
`auth.verify_jwt()` accepts three token formats and routes them automatically:
1. Standard OIDC JWTs
2. `did:key` tokens
3. `did:pkh` tokens
Example DID-JWT request:
```bash
curl -X POST /v1/resource \
-H "Authorization: Bearer did:key:z6Mki....."
```
Example OIDC JWT request:
```bash
curl -X POST /v1/resource \
-H "Authorization: Bearer $JWT"
```
### Other IdPs (Descope)
To use your OIDC provider, set the following environment variables:
```bash
OIDC_ISSUER=
OIDC_AUD=
AUTH_BACKEND=
```
Descope-specific environment variables:
```bash
DESCOPE_BASE_URL=https://api.descope.com or
DESCOPE_PROJECT_ID=
DESCOPE_CLIENT_ID=
DESCOPE_CLIENT_SECRET=
```
## 💾 Memory: logs
Send Sakana-formatted logs to the gateway and they will be stored as
`MemoryEvent` objects in Weaviate.
```bash
curl -X POST /v1/logs \
-H "Authorization: Bearer $JWT" \
-d '{"run_id":"abc","level":"info","message":"hi"}'
# => HTTP/1.1 202 Accepted
```
## Usage hooks
Emit token usage metrics for every request. Choose a backend via
`USAGE_METERING` (alias `USAGE_BACKEND`):
```bash
export USAGE_METERING=prometheus # or null
```
A Prometheus counter `attach_usage_tokens_total{user,direction,model}` is
exposed for Grafana dashboards.
Set `USAGE_METERING=null` (the default) to disable metering entirely.
> **⚠️ Usage hooks depend on the quota middleware.**
> Make sure `MAX_TOKENS_PER_MIN` is set (any positive number) so the
> `TokenQuotaMiddleware` is enabled; the middleware is what records usage
> events that feed Prometheus.
```bash
# Enable usage tracking (set any reasonable limit)
export MAX_TOKENS_PER_MIN=60000
export USAGE_METERING=prometheus
```
#### OpenMeter (Stripe / ClickHouse)
```bash
# No additional dependencies needed - uses direct HTTP API
export MAX_TOKENS_PER_MIN=60000 # Required: enables quota middleware
export USAGE_METERING=openmeter # Required: activates OpenMeter backend
export OPENMETER_API_KEY=your-api-key-here # Required: API authentication
export OPENMETER_URL=https://openmeter.cloud # Optional: defaults to https://openmeter.cloud
```
Events are sent directly to OpenMeter's HTTP API and are processed by the LLM tokens meter for billing integration with Stripe.
> **⚠️ All three variables are required for OpenMeter to work:**
> - `MAX_TOKENS_PER_MIN` enables the quota middleware that records usage events
> - `USAGE_METERING=openmeter` activates the OpenMeter backend
> - `OPENMETER_API_KEY` provides authentication to OpenMeter's API
The gateway gracefully falls back to `NullUsageBackend` if any required variable is missing.
### Scraping metrics
```bash
curl -H "Authorization: Bearer $JWT" http://localhost:8080/metrics
```
## Token quotas
Attach Gateway can enforce per-user token limits. Install the optional
dependency with `pip install attach-dev[quota]` and set
`MAX_TOKENS_PER_MIN` in your environment to enable the middleware. The
counter defaults to the `cl100k_base` encoding; override with
`QUOTA_ENCODING` if your model uses a different tokenizer. The default
in-memory store works in a single process and is not shared between
workers—requests retried across processes may be double-counted. Use Redis
for production deployments.
If `tiktoken` is missing, a byte-count fallback is used which counts about
four times more tokens than the `cl100k` tokenizer – install `tiktoken` in
production.
### Enable token quotas
```bash
# Optional: Enable token quotas
export MAX_TOKENS_PER_MIN=60000
pip install tiktoken # or pip install attach-dev[quota]
```
To customize the tokenizer:
```bash
export QUOTA_ENCODING=cl100k_base # default
```
---
## License
MIT