https://github.com/barney-w/surf

The open framework for extensible & grounded AI agent orchestration.
https://github.com/barney-w/surf
agent-framework agent-orchestration ai-agent ai-framework azure multi-agent
Last synced: about 2 months ago
JSON representation
The open framework for extensible & grounded AI agent orchestration.
Host: GitHub
URL: https://github.com/barney-w/surf
Owner: barney-w
License: other
Created: 2026-03-08T12:55:42.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-26T23:38:21.000Z (2 months ago)
Last Synced: 2026-03-27T02:12:03.490Z (2 months ago)
Topics: agent-framework, agent-orchestration, ai-agent, ai-framework, azure, multi-agent
Language: Python
Homepage:
Size: 8.51 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project

README

          


  



The open framework for AI agent orchestration.




  Build multi-agent systems that route queries to specialist agents,


  ground every answer in your own knowledge base, and ship across


  web, desktop, and mobile from a single codebase.





  

  

  

  





  Quickstart  •  How it works  •  Features  •  Agents  •  Deep Dive  •  Contributing



---

## Quickstart

Prerequisites

- [Python 3.12+](https://www.python.org/)

- [uv](https://docs.astral.sh/uv/)

- [just](https://just.systems/)

- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/) (logged in)

- Azure subscription with OpenAI access

### Setup

```bash

az login

cd api && uv sync && cd ../ingestion && uv sync && cd ..

just setup-dev          # deploy dev Azure resources + generate .env

just dev                # start API with hot reload (auto-starts Postgres, runs migrations)

```

### Verify

```bash

curl http://localhost:8090/api/v1/health

```

> **Note:** RBAC role propagation can take a few minutes. If you get 403 errors, wait and retry.

Run with DevUI / Web / Desktop

```bash

just devui              # interactive agent chat with tool call visibility — port 8091

just web                # full SPA with auth, conversation history, debug panels — port 3000

just desktop            # Tauri desktop app with native window management

```

---

## How it works

```mermaid

graph TD

  web["Web / Desktop / Mobile
surf-kit + React"]

  nginx["nginx
reverse proxy"]

  api["FastAPI API"]

  coordinator["Coordinator Agent
claude-haiku-4-5"]

  hr["HR Agent
claude-sonnet-4-6"]

  it["IT Agent
claude-sonnet-4-6"]

  website["Website Agent
claude-sonnet-4-6"]

  rag["Azure AI Search
BM25 + Vector"]

  proofread["Proofreader
claude-haiku-4-5"]

  qg["Quality Gate"]

  postgres["PostgreSQL
conversations + feedback"]

  otel["OpenTelemetry
Azure Monitor / OTLP"]

  langfuse["Langfuse
LLM tracing"]

  keyvault["Key Vault"]

  web -->|SSE| nginx --> api

  api --> coordinator

  coordinator -->|handoff| hr

  coordinator -->|handoff| it

  coordinator -->|handoff| website

  hr --> rag

  it --> rag

  website --> rag

  hr --> qg --> proofread

  api --> postgres

  api --> otel

  api --> langfuse

  api --> keyvault

```

---

## What you get

|                              |                                                                                                             |

| ---------------------------- | ----------------------------------------------------------------------------------------------------------- |

| **Zero-registration agents** | Subclass `DomainAgent` and the framework discovers, registers, and wires it automatically. No config files. |

| **Auth-filtered routing**    | Agents are invisible to users who lack the required auth level. The coordinator can't even describe them.   |

| **3-strategy RAG**           | Hybrid search with broadened-filter fallback, keyword-only rescue, and post-response quality gates.         |

| **Prompt injection defence** | Four independent layers — domain-isolated RAG, structured JSON, quality gate, source-pollution guard.       |

| **Multi-model routing**      | Haiku for fast coordinator decisions, Sonnet for specialist agents. Direct Anthropic or Azure AI Foundry.   |

| **Ship everywhere**          | Web, desktop, and mobile from one React codebase via the shared `surf-kit` component library.               |

---

## Agents

| Agent           | Purpose                                          | RAG Scope                | Model        | Auth Level        |

| --------------- | ------------------------------------------------ | ------------------------ | ------------ | ----------------- |

| **Coordinator** | Routes queries, synthesises multi-domain answers | Unscoped                 | Haiku (fast) | Public            |

| **HR**          | Leave, onboarding, performance, L&D policies     | `domain=hr`              | Sonnet       | Microsoft Account |

| **IT**          | VPN, passwords, software, hardware, security     | `domain=it`              | Sonnet       | Organisational    |

| **Website**     | Public-facing content, services, events          | `content_source=website` | Sonnet       | Public            |

Adding a new agent

```python

# api/src/agents/finance/agent.py

class FinanceAgent(DomainAgent):

    @property

    def name(self) -> str:

        return "finance_agent"

    @property

    def description(self) -> str:

        return "Handles budget and procurement queries"

    @property

    def rag_scope(self) -> RAGScope:

        return RAGScope(domain="finance", document_types=["policy", "procedure"])

    @property

    def system_prompt(self) -> str:

        return "You are a finance specialist..."

```

That's it. No registration, no config changes. The framework discovers the subclass at startup, creates its RAG tool with domain-isolated filters, and adds it to the coordinator's handoff graph. See `api/src/agents/_base.py` for the full interface and `api/src/agents/_discovery.py` for the discovery mechanism.

---

## Deep Dive

Project Structure

```

surf/

  api/                  FastAPI backend — agents, orchestrator, RAG, middleware

    src/

      agents/           Domain agents + coordinator (auto-discovered)

      orchestrator/     Workflow builder, PDF processing, middleware pipeline

      rag/              Search execution, 3-strategy tool, quality gate

      routes/           Chat, auth, user profile, admin, agent listing

      services/         Conversation persistence, Graph API, streaming, response pipeline

      middleware/       Auth, rate limiting, body limits, telemetry, input validation

      config/           Settings with environment-aware validation

    tests/

      unit/             28 test modules (~7K lines)

      security/         JWT bypass, prompt injection, conversation isolation

      integration/      Multi-turn flows against real Postgres

      eval/             LLM-judged response quality suite

      load/             Locust load testing

  web/                  React 19 + Vite 7 + TailwindCSS 4 frontend

    src-tauri/          Tauri desktop app (Rust shell)

  mobile/               React Native + Expo (iOS / Android)

  ingestion/            Document pipeline — PDF, DOCX, TXT, CSV connectors

  infra/                Azure IaC — 19 Bicep modules, 1,200+ lines

    modules/            Application Insights custom module

    environments/       dev / staging / prod parameter files

    workbooks/          Azure Monitor telemetry workbook

  data/                 Sample documents and ingestion manifests

```

Architecture (SVG diagram)



  



RAG Pipeline

The RAG tool (`api/src/rag/tools.py`) implements a multi-strategy search pipeline:

1. **Primary hybrid search** — BM25 + vector (text-embedding-3-large) with domain-scoped OData filters

2. **Broadened filter fallback** — relaxes non-identity filters when primary returns too few results

3. **Keyword-only rescue** — drops vector search entirely for edge cases where embeddings miss

Additional pipeline features:

- **LLM query rewriting** — rewrites conversational questions into keyword-rich search queries

- **Chunk merging** — consecutive chunks from the same document are merged to give the LLM complete context

- **Score normalisation** — normalises across BM25 and RRF score scales

- **Quality gate** — post-response validation catches infrastructure errors, skipped searches, ignored results, and missing sources (`api/src/rag/quality_gate.py`)

- **Source recovery** — extracts and deduplicates source references from raw agent output (`api/src/agents/_output.py`)

- **Proofreading pass** — a fast Haiku model fixes generation artefacts before final delivery (`api/src/agents/_proofread.py`)

API Reference

| Method   | Endpoint                                  | Description                                               |

| -------- | ----------------------------------------- | --------------------------------------------------------- |

| `POST`   | `/api/v1/chat`                            | Chat — returns JSON response                              |

| `POST`   | `/api/v1/chat/stream`                     | Chat — Server-Sent Events with real-time streaming        |

| `GET`    | `/api/v1/chat/{conversation_id}`          | Load conversation history                                 |

| `DELETE` | `/api/v1/chat/{conversation_id}`          | Delete a conversation                                     |

| `POST`   | `/api/v1/chat/{conversation_id}/feedback` | Record thumbs up/down + comment                           |

| `GET`    | `/api/v1/agents`                          | List available agents (filtered by caller's auth level)   |

| `POST`   | `/api/v1/auth/guest`                      | Issue a guest access token                                |

| `GET`    | `/api/v1/me`                              | User profile (JWT claims + Graph API enrichment)          |

| `GET`    | `/api/v1/me/photo`                        | User profile photo (via Graph API OBO)                    |

| `GET`    | `/api/v1/conversations`                   | List conversations for the authenticated user             |

| `GET`    | `/api/v1/health`                          | Health check (supports `?deep=true` for component checks) |

| `GET`    | `/api/v1/admin/`                          | Dev-only conversation browser dashboard                   |

#### SSE Event Protocol

```

phase(thinking) → agent(name) → phase(generating) → delta* → phase(verifying) →

confidence → verification → usage → done → [DONE]

```

- `:keepalive` comments every 5 seconds

- `phase(waiting)` after 10 seconds of no output (e.g. during upstream 429 retry)

- `debug` events with RAG search details (dev mode + `X-Surf-Debug` header)

- `error` events with structured codes for client-side handling

#### PDF Attachments

The chat endpoint accepts PDF file attachments with tiered processing (`api/src/orchestrator/pdf.py`):

- **Tier 1 (direct vision)**: PDFs up to 30 pages are sent as native document content blocks

- **Tier 2 (text extraction)**: Larger PDFs get text extracted and sent as text blocks

- Size limit: 100 MB with decompression bomb protection

Security Model

Surf implements defence-in-depth. The full model is documented in [`docs/security-model.md`](./docs/security-model.md).

| Layer                 | Mechanism                                                                              | Location                                                                     |

| --------------------- | -------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |

| **Authentication**    | Entra ID (RS256 JWKS) + guest tokens (HS256 HMAC) + dev bypass                         | `api/src/middleware/auth.py`                                                 |

| **Authorisation**     | 3-tier AuthLevel enum; agent graphs filtered per auth level                            | `api/src/agents/_base.py`, `api/src/orchestrator/builder.py`                 |

| **Rate limiting**     | Per-user limits on every endpoint (slowapi)                                            | `api/src/middleware/rate_limit.py`                                           |

| **Input validation**  | Message length cap (10K chars), control character stripping, body size limits          | `api/src/middleware/input_validation.py`, `api/src/middleware/body_limit.py` |

| **Prompt injection**  | Domain-isolated RAG, structured JSON enforcement, quality gate, source-pollution guard | `api/src/rag/tools.py`, `api/src/services/streaming.py`                      |

| **Production guards** | App refuses to start with auth disabled, debug on, wildcard CORS, or no Postgres SSL   | `api/src/main.py`                                                            |

| **Data isolation**    | All queries scoped to `user_id`; CASCADE deletes; conversation TTL expiry              | `api/src/services/conversation.py`                                           |

| **Secret management** | Key Vault for runtime secrets; managed identity for Azure services; OIDC for CI/CD     | `infra/main.bicep`                                                           |

Security tests in `api/tests/security/` cover JWT bypass attempts, input injection vectors, and conversation isolation.

Observability

| Signal          | Backend                                         | Detail                                                                                |

| --------------- | ----------------------------------------------- | ------------------------------------------------------------------------------------- |

| **Traces**      | OpenTelemetry → Azure Monitor or OTLP collector | Spans across routes, agent handoffs, RAG search, persistence                          |

| **Metrics**     | OTel histograms + counters                      | Chat duration, token usage (in/out per agent), quality gate triggers, rate limit hits |

| **LLM tracing** | Langfuse v3                                     | Per-call tracing with cost tracking; local dev stack included in `docker-compose.yml` |

| **Dashboards**  | Application Insights workbook                   | Pre-built telemetry workbook in `infra/workbooks/api-telemetry.json`                  |

| **Alerts**      | Azure metric alerts                             | Container restart, 5xx rate, CPU threshold (all in `infra/main.bicep`)                |

Telemetry configuration: `api/src/middleware/telemetry.py`. Langfuse integration: `api/src/middleware/langfuse_utils.py`.

Infrastructure

Surf's Azure infrastructure is defined in a single `infra/main.bicep` orchestrator (1,200+ lines) using [Azure Verified Modules](https://azure.github.io/Azure-Verified-Modules/):

| Resource             | Module                               | Purpose                                              |

| -------------------- | ------------------------------------ | ---------------------------------------------------- |

| Log Analytics        | `avm/operational-insights/workspace` | OpenTelemetry traces + structured logs               |

| Application Insights | `modules/application-insights.bicep` | APM, telemetry workbook                              |

| Managed Identity     | `avm/managed-identity`               | App identity + CI identity (WIF)                     |

| Azure OpenAI         | `avm/cognitive-services/account`     | text-embedding-3-large (ingestion only)              |

| Azure AI Search      | `avm/search/search-service`          | Hybrid BM25 + vector retrieval                       |

| Key Vault            | `avm/key-vault/vault`                | Secrets (API keys, client secrets, guest token HMAC) |

| VNet + NSGs          | `avm/network/virtual-network`        | Private networking with subnet isolation             |

| Private DNS Zones    | `avm/network/private-dns-zone`       | DNS for Search, Storage, OpenAI private endpoints    |

| Storage              | `avm/storage/storage-account`        | Document blob storage for ingestion                  |

| Container Registry   | `avm/container-registry`             | Container image hosting                              |

| Container Apps       | Native Bicep resource                | API (0-3 replicas), web (nginx), ingestion (0-1)     |

| Metric Alerts        | `avm/insights/metric-alert`          | Restart, 5xx, and CPU alerts                         |

Three environments: `dev.bicepparam`, `staging.bicepparam`, `prod.bicepparam`.

CI/CD

Both GitHub Actions and GitLab CI/CD pipelines are maintained:

| Pipeline      | GitHub Actions                       | GitLab CI                     | Trigger                         |

| ------------- | ------------------------------------ | ----------------------------- | ------------------------------- |

| **API**       | `.github/workflows/api-ci.yml`       | `.gitlab/ci/api-ci.yml`       | Push to `main` (`api/**`)       |

| **Web**       | `.github/workflows/web-ci.yml`       | `.gitlab/ci/web-ci.yml`       | Push to `main` (`web/**`)       |

| **Ingestion** | `.github/workflows/ingestion-ci.yml` | `.gitlab/ci/ingestion-ci.yml` | Push to `main` (`ingestion/**`) |

| **Infra**     | `.github/workflows/infra-deploy.yml` | `.gitlab/ci/infra-deploy.yml` | Push to `main` (`infra/**`)     |

| **PR Checks** | `.github/workflows/pr-checks.yml`    | `.gitlab/ci/pr-checks.yml`    | Pull/merge request              |

Key properties:

- **Zero stored secrets** — GitHub uses OIDC federation; GitLab uses Workload Identity Federation via a dedicated CI managed identity provisioned in Bicep

- **Path-filtered** — only relevant pipelines run per commit

- **Security scanning** — Gitleaks secret scanning, pip-audit dependency auditing

- **Docker builds** with BuildKit and multi-platform support

Ingestion Pipeline

The ingestion service (`ingestion/`) transforms raw documents into searchable index entries:

| Stage               | Description                                                                       |

| ------------------- | --------------------------------------------------------------------------------- |

| **Connectors**      | PDF (PyMuPDF), DOCX (python-docx), TXT, CSV parsers (`ingestion/src/connectors/`) |

| **SharePoint sync** | Graph API integration for syncing files and pages to blob storage                 |

| **Chunking**        | Token-aware text splitting with tiktoken                                          |

| **Embedding**       | Azure OpenAI text-embedding-3-large via managed identity                          |

| **Indexing**        | Azure AI Search with hybrid (BM25 + vector) index schema                          |

| **Scheduling**      | Hourly indexer runs via Azure AI Search indexer pipeline                          |

Testing

| Suite           | Location                   | What it covers                                                                                 |

| --------------- | -------------------------- | ---------------------------------------------------------------------------------------------- |

| **Unit**        | `api/tests/unit/`          | 28 modules — agents, routes, middleware, RAG tool, config, output parsing, telemetry, Langfuse |

| **Security**    | `api/tests/security/`      | JWT bypass, prompt injection, conversation isolation                                           |

| **Integration** | `api/tests/integration/`   | Multi-turn conversation flows against real Postgres                                            |

| **Eval**        | `api/tests/eval/`          | LLM-judged response quality with dataset-driven parametrisation and weighted rubric scoring    |

| **Load**        | `api/tests/load/`          | Locust load testing (`locustfile.py`)                                                          |

| **Smoke**       | `web/playwright.config.ts` | Playwright browser smoke tests                                                                 |

| **Ingestion**   | `ingestion/tests/`         | Connector and pipeline tests                                                                   |

Run with: `just test` (unit + security), `just test-integration`, `just eval`, `just smoke`.

---

## Development

| Command 
| ----------------------- 
| `just dev` 
| `just devui` 
| `just web` 
| `just desktop` 
| `just test` 
| `just 
| `just eval` 
| `just smoke` 
| `just lint` 
| `just typecheck` 
| `just format` 
| `just audit` 
| `just otel` 
| `just langfuse` 
| `just admin` 
| `just ask "question"` 
| `just ask-repl` 
| `just setup-dev` 
| `just teardown-dev` 
| `just deploy` 
| `just deploy-all`

| Description                                                                    | | ------------------------------------------------------------------------------ | | Run API with hot reload (port 8090) — auto-starts Postgres and runs migrations | | Launch DevUI — interactive agent chat with tool call tracing (port 8091)       | | Run web frontend (port 3000)                                                   | | Run Tauri desktop app                                                          | | Run unit + security tests                                                      | test-integration` | Run integration tests against real Postgres                                    | | Run LLM-judged eval suite                                                      | | Run Playwright smoke tests                                                     | | Lint all Python code (ruff)                                                    | | Type-check all Python code (pyright)                                           | | Format all Python code                                                         | | Run pip-audit security scanning                                                | | Start OpenTelemetry collector for local telemetry                              | | Start local Langfuse trace viewer at http://localhost:3100                     | | Open the dev admin dashboard                                                   | | Ask the dev agent about the codebase                                           | | Start interactive dev agent session                                            | | Deploy dev Azure resources + generate .env                                     | | Delete dev Azure resources                                                     | | Deploy API + web containers to Azure                                           | | Deploy infrastructure + all containers                                         |

---

## Links

|                     |                                                          |

| ------------------- | -------------------------------------------------------- |

| **Security Model**  | [docs/security-model.md](./docs/security-model.md)       |

| **Desktop App**     | [docs/tauri-desktop-app.md](./docs/tauri-desktop-app.md) |

| **Load Testing**    | [api/tests/load/README.md](./api/tests/load/README.md)   |

| **Contributing**    | [CONTRIBUTING.md](./CONTRIBUTING.md)                     |

| **Code of Conduct** | [CODE_OF_CONDUCT.md](./CODE_OF_CONDUCT.md)               |

| **Security Policy** | [SECURITY.md](./SECURITY.md)                             |

Tech Stack

| Layer             | Technology                                                                             |

| ----------------- | -------------------------------------------------------------------------------------- |

| **API**           | Python 3.12, FastAPI 0.115+, Pydantic 2, agent-framework                               |

| **LLM**           | Anthropic Claude (Haiku routing, Sonnet specialist) — direct API or Azure AI Foundry   |

| **RAG**           | Azure AI Search (hybrid BM25 + vector), Azure OpenAI text-embedding-3-large            |

| **Database**      | PostgreSQL 17 with Alembic migrations                                                  |

| **Web**           | React 19, Vite 7, TailwindCSS 4, TypeScript strict                                     |

| **Desktop**       | Tauri 2 (Rust shell + shared web frontend)                                             |

| **Mobile**        | React Native + Expo 54, NativeWind                                                     |

| **Shared UI**     | [surf-kit](https://github.com/barney-w/surf-kit) — hooks, theme, icons, agent protocol |

| **Auth**          | Microsoft Entra ID (JWKS) + HMAC guest tokens + MSAL                                   |

| **Observability** | OpenTelemetry, Azure Monitor, Langfuse v3                                              |

| **Infra**         | Bicep (Azure Verified Modules), Container Apps, VNet, Key Vault                        |

| **CI/CD**         | GitHub Actions + GitLab CI (OIDC / WIF, zero stored secrets)                           |

| **Testing**       | pytest, Playwright, Locust, LLM eval judge                                             |

| **Quality**       | ruff (lint + format), pyright (strict types), pip-audit, Gitleaks                      |

---

[Apache-2.0](./LICENSE)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/barney-w/surf

Awesome Lists containing this project

README