An open API service indexing awesome lists of open source software.

https://github.com/vamsiramakrishnan/gemini-rs

Full Rust SDK for the Gemini Multimodal Live API — wire protocol, agent runtime, and fluent DX in three layered crates.
https://github.com/vamsiramakrishnan/gemini-rs

adk agent-framework async-rust function-calling gemini gemini-api google-ai llm multimodal real-time rust tokio vertex-ai voice-agents websocket

Last synced: 2 months ago
JSON representation

Full Rust SDK for the Gemini Multimodal Live API — wire protocol, agent runtime, and fluent DX in three layered crates.

Awesome Lists containing this project

README

          

# gemini-rs

> Full Rust SDK for the Gemini Multimodal Live API -- wire protocol, agent runtime, and fluent DX in three layered crates.

[![CI](https://github.com/vamsiramakrishnan/gemini-rs/actions/workflows/ci.yml/badge.svg)](https://github.com/vamsiramakrishnan/gemini-rs/actions/workflows/ci.yml)
[![Docs](https://github.com/vamsiramakrishnan/gemini-rs/actions/workflows/docs.yml/badge.svg)](https://github.com/vamsiramakrishnan/gemini-rs/actions/workflows/docs.yml)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![crates.io](https://img.shields.io/crates/v/gemini-live.svg)](https://crates.io/crates/gemini-live)
[![Rust](https://img.shields.io/badge/rust-1.75%2B-orange.svg)](https://www.rust-lang.org)

---

## Why gemini-rs?

Google's Gemini Multimodal Live API enables full-duplex, real-time voice and
text conversations with tool calling, streaming audio, and mid-session
instruction updates. Building on it raw means wrestling with WebSocket frame
parsing, binary/text codec differences between Google AI and Vertex AI,
authentication token management, voice activity detection, barge-in handling,
and turn lifecycle -- before you write a single line of agent logic.

**gemini-rs** eliminates that friction. It gives you a layered Rust SDK where
each crate adds exactly the abstraction you need:

- **Wire-level access** for custom transports, proxies, or non-standard
deployments (`gemini-live`).
- **Agent runtime** with typed state, phase machines, tool dispatch, text agent
combinators, and a three-lane processor architecture (`gemini-adk`).
- **Fluent builder API** where a production voice agent is 20 lines of
declarative Rust, not 200 lines of boilerplate (`gemini-adk-fluent`).

Every layer is independently usable. Pick the altitude that fits your problem.

### Raw WebSocket vs. Fluent API

Raw WebSocket (L0 only)Fluent API (L2)

```rust
// Connect, subscribe, send, match events,
// handle tool calls, manage turns, track
// state, parse audio frames ...
let session = quick_connect(
"KEY", "gemini-2.0-flash-live-001"
).await?;
session.send_text("Hello").await?;
let mut events = session.subscribe();
while let Ok(event) = events.recv().await {
match event {
SessionEvent::Audio(data) => {
/* decode, buffer, play */
}
SessionEvent::TextDelta(t) => {
print!("{t}");
}
SessionEvent::ToolCall(calls) => {
// dispatch, build responses,
// send back ...
}
SessionEvent::TurnComplete => break,
_ => {}
}
}
```

```rust
let handle = Live::builder()
.instruction("You are a helpful assistant.")
.greeting("Say hello to the user.")
.on_audio(|data| speaker.send(data))
.on_text(|t| print!("{t}"))
.on_tool_call(|calls, state| async move {
// auto-dispatched with .tools()
None
})
.connect_google_ai("KEY")
.await?;

handle.send_text("Hello").await?;
```

---

## Architecture

```
+----------------------------------------------------------------------+
| gemini-adk-fluent (L2 -- Fluent DX) |
| |
| Live::builder() . AgentBuilder . S.C.T.P.M.A operators |
| PhaseBuilder . WatchBuilder . Temporal patterns |
+----------------------------------------------------------------------+
| gemini-adk (L1 -- Agent Runtime) |
| |
| LiveSessionBuilder . LiveHandle . Three-lane processor |
| State (prefix-scoped) . PhaseMachine . ToolDispatcher |
| TextAgent combinators . Extractors . Watchers . Telemetry |
| LlmAgent . Runner . SessionService . MCP . A2A |
+----------------------------------------------------------------------+
| gemini-live (L0 -- Wire Protocol) |
| |
| Transport (WebSocket + Mock) . Codec (JSON) . Auth providers |
| SessionHandle . Protocol types . VAD . Jitter buffer |
| Telemetry (OTel + Prometheus) . REST APIs (feature-gated) |
+----------------------------------------------------------------------+
```

Each layer depends only on the one below it. Application code imports from the
highest layer it needs (`gemini_adk_fluent::prelude::*` re-exports all three).

---

## Core Concepts & How They Interplay

A gemini-rs voice session is built from six core concepts that work together.
This section shows what each one does and how they connect.

```
+------------------+
| Live::builder | (L2 Fluent API)
+--------+---------+
| configures
+-----------+-----------+-----------+-----------+
| | | | |
+----v---+ +----v----+ +--v---+ +----v----+ +--v--------+
| Phases | |Extractors| | Tools | |Watchers | | Telemetry |
+----+---+ +----+----+ +--+---+ +----+----+ +-----+-----+
| | | | |
+-----+-----+----+----+-----+-----+ |
| | | |
+-----v----------v----------v-----+ +-----v-----+
| State | | Signals & |
| (prefix-scoped, concurrent) |<-------+ Counters |
+---------------------------------+ +-----------+
```

### 1. State -- The Shared Spine

Everything reads from and writes to `State`. It is the single source of truth
for a session -- a concurrent, typed key-value store with prefix-scoped
namespaces.

```
State
|
+-- app:caller_name = "Alice" (application state)
+-- session:turn_count = 5 (auto-tracked by SessionSignals)
+-- session:total_token_count = 1284 (auto-tracked from UsageMetadata)
+-- derived:risk_level = "high" (computed variable, read-only)
+-- turn:transcript = "I need help" (cleared each turn)
+-- bg:verification_status = "pending" (background agent result)
```

**Why it matters:** Phase transitions check state. Extractors write to state.
Watchers fire when state changes. Computed variables derive from state.
Telemetry auto-populates state. Everything converges here.

### 2. Phases -- Conversation Structure

Phases define the *shape* of a conversation: what the model should do, what
tools are available, and when to move on.

```
[greeting] ---> [identify_caller] ---> [handle_request] ---> [farewell]
| | | |
instruction: instruction: instruction: instruction:
"Welcome..." "Get name..." "Help with..." "Say goodbye"
| | |
tools: [] tools: [lookup] tools: [search, calc]
| | |
transition: transition: transition:
caller_name request_type resolved == true
is_some() is_some()
```

Each phase declares:
- **Instruction**: what the model should do (static or state-driven dynamic)
- **Tools**: which tools are available in this phase
- **Transitions**: state predicates that trigger moves to the next phase
- **Guards**: predicates that must be true before entering a phase
- **Needs**: state keys still required (drives navigation context)
- **Lifecycle hooks**: `on_enter` / `on_exit` for side effects

Phases don't micromanage the model. They set guardrails -- the LLM naturally
asks follow-up questions until the transition predicate becomes true.

### 3. Extractors -- Structured Data from Conversation

Extractors run out-of-band LLM calls to pull structured data from the
conversation transcript and write it into State.

```
Conversation transcript OOB LLM call State
+-----------------------+ +---------------+ +------------------+
| "Hi, I'm Alice from | --> | Extract with | --> | caller_name: |
| Acme Corp, I need | | JSON Schema | | "Alice" |
| help with billing." | +---------------+ | caller_org: |
+-----------------------+ | "Acme Corp" |
| request_type: |
| "billing" |
+------------------+
|
triggers phase
transition!
```

**Extraction triggers** control *when* extractors fire:

| Trigger | When it fires | Use case |
|---------|--------------|----------|
| `EveryTurn` | After every TurnComplete | Default, high-frequency extraction |
| `Interval(n)` | Every N turns | Reduce LLM costs for slow-changing data |
| `AfterToolCall` | After tool dispatch completes | Extract from tool results |
| `OnPhaseChange` | When phase transitions fire | Re-extract on context shift |

### 4. Watchers & Temporal Patterns -- Reactive State

Watchers observe state changes and fire callbacks. Temporal patterns detect
conditions that persist over time or turns.

```
State change: app:score = 0.85 --> 0.95
|
+-------v--------+
| Watcher: |
| crossed_above |
| threshold=0.9 |
+-------+--------+
|
fires callback:
state.set("alert", true)

Condition held for 30s: 3 consecutive turns:
+-------------------------+ +-------------------------+
| when_sustained: | | when_turns: |
| confused == true | | repeating == true |
| for 30 seconds | | for 3 turns |
| --> offer help | | --> break loop |
+-------------------------+ +-------------------------+
```

### 5. Tools -- Model Actions

Tools give the model the ability to take actions. gemini-rs supports typed
tools (auto-schema from Rust structs), simple tools (raw JSON), built-in
tools (Google Search, code execution), and agent-as-tool (text agent pipelines
callable by the live model).

```
Model decides to call tool
|
+--------v---------+
| ToolDispatcher | Routes by function name
+--+-----+-----+---+
| | |
+--v-+ +-v--+ +v---------+
|get_| |calc| |verify_ |
|wx | |pay | |identity |
+----+ +----+ +----------+
Simple Typed AgentTool
Tool Tool (text agent
pipeline)

Background tools: model continues talking
while the tool executes asynchronously.
```

**Background tool execution** eliminates dead air in voice sessions. Mark
tools as background and the model receives a "processing" acknowledgment
immediately, continuing the conversation while the tool runs:

```rust
Live::builder()
.tools(dispatcher)
.tool_background("search_kb") // runs async, no dead air
```

### 6. Telemetry -- Observability Pipeline

Telemetry flows through two complementary systems, both running on the
telemetry lane (off the hot path):

```
SessionEvent stream
|
+-----v--------------+ +------------------+
| SessionSignals | | SessionTelemetry |
| (State keys) | | (Atomic counters)|
+-----+---------------+ +--------+---------+
| |
v v
session:turn_count audio_chunks_out: 1482
session:total_token_count avg_latency_ms: 340
session:is_speaking interruptions: 3
session:silence_ms total_token_count: 5280
| |
v v
Available to phases, snapshot() --> JSON
watchers, extractors, for devtools UI
transition guards
```

**SessionSignals** writes to State -- so phases, watchers, and extractors can
react to session-level metrics (e.g., transition after N turns, alert when
tokens exceed budget).

**SessionTelemetry** tracks lock-free atomic counters (~1ns per operation) for
performance metrics: audio throughput, response latency (min/avg/max via CAS),
turn duration, token usage, and interruption counts.

**UsageMetadata** from the Gemini API is automatically tracked at all layers:
- L0 emits `SessionEvent::Usage(UsageMetadata)` with full token breakdowns
- L1 records in both SessionSignals (state keys) and SessionTelemetry (atomics)
- L2 exposes `.on_usage(|metadata| ...)` callback for real-time observation

### How They Work Together

Here's the flow for a single model turn in a phased conversation:

```
User speaks: "I'm Alice from Acme Corp"
|
[1] v Fast lane: on_audio, on_input_transcript (sync, <1ms)
|
[2] v Model responds, turn completes
|
[3] v Control lane: TranscriptBuffer records the turn
|
[4] v Extractors run (OOB LLM call)
| --> writes caller_name="Alice", caller_org="Acme Corp" to State
|
[5] v Watchers fire on state changes
| --> crossed_above, became_true, changed_to callbacks
|
[6] v Computed variables recompute
| --> derived:risk_level updates based on new state
|
[7] v Phase machine evaluates transitions
| --> caller_name.is_some() == true
| --> transition: identify_caller --> handle_request
|
[8] v Phase on_exit / on_enter hooks fire
| --> instruction updated, navigation context regenerated
|
[9] v Telemetry lane: SessionSignals + SessionTelemetry update
--> session:turn_count++, latency recorded, tokens tracked
```

---

## Quick Start

### Google AI (API Key)

```rust
use gemini_adk_fluent::prelude::*;

#[tokio::main]
async fn main() -> Result<(), Box> {
let handle = Live::builder()
.model(GeminiModel::Gemini2_0FlashLive)
.instruction("You are a friendly assistant.")
.on_text(|t| print!("{t}"))
.on_turn_complete(|| async { println!("\n---") })
.connect_google_ai(std::env::var("GEMINI_API_KEY")?)
.await?;

handle.send_text("What is the speed of light?").await?;
tokio::signal::ctrl_c().await?;
handle.disconnect().await?;
Ok(())
}
```

### Vertex AI

```rust
let handle = Live::builder()
.model(GeminiModel::Gemini2_0FlashLive)
.voice(Voice::Kore)
.instruction("You are a customer support agent.")
.on_audio(|data| playback_tx.send(data.clone()).ok())
.on_text(|t| print!("{t}"))
.connect_vertex("my-project", "us-central1", access_token)
.await?;
```

### Wire Level Only (L0)

```rust
use gemini_live::prelude::*;

let session = gemini_live::quick_connect(
"API_KEY", "gemini-2.0-flash-live-001"
).await?;
session.send_text("What is the speed of light?").await?;

let mut events = session.subscribe();
while let Ok(event) = events.recv().await {
if let SessionEvent::TextDelta(ref text) = event {
print!("{text}");
}
if let SessionEvent::TurnComplete = event { break; }
}
```

---

## Crate Overview

| Crate | Layer | Description |
|-------|-------|-------------|
| [`gemini-live`](crates/gemini-live) | L0 -- Wire | Protocol types, WebSocket transport, auth providers, VAD, jitter buffer, REST APIs (feature-gated). Full Rust equivalent of Google's `@google/genai`. |
| [`gemini-adk`](crates/gemini-adk) | L1 -- Runtime | Agent runtime with state management, phase machines, tool dispatch, text agent combinators, extractors, watchers, telemetry. Full Rust equivalent of Google's `@google/adk`. |
| [`gemini-adk-fluent`](crates/gemini-adk-fluent) | L2 -- Fluent | `Live::builder()` API, `AgentBuilder`, S.C.T.P.M.A operator algebra, composition patterns, test utilities. |

---

## Features

### Voice / Live Sessions

Build full-duplex voice sessions with callbacks for every event type. Audio,
text, transcription, interruptions, and turn lifecycle are all handled.

```rust
let handle = Live::builder()
.model(GeminiModel::GeminiLive2_5FlashNativeAudio)
.voice(Voice::Puck)
.instruction("You are a weather assistant.")
.greeting("Greet the user and ask how you can help.")
.transcription(true, true) // input + output transcription
.thinking(1024) // enable thinking with token budget
.include_thoughts() // receive thought summaries
.affective_dialog(true) // emotionally expressive responses
.context_compression(4000, 2000) // auto-compress context window
.on_audio(|data| speaker.write(data))
.on_thought(|text| println!("[Thought] {text}"))
.on_input_transcript(|text, _final| println!("[User] {text}"))
.on_output_transcript(|text, _final| println!("[Agent] {text}"))
.on_interrupted(|| async { speaker.flush().await })
.on_turn_complete(|| async { println!("--- turn complete ---") })
.on_usage(|usage| {
if let Some(total) = usage.total_token_count {
println!("Tokens used: {total}");
}
})
.connect_vertex(project, location, token)
.await?;
```

**Available voices:** `Aoede`, `Charon`, `Fenrir`, `Kore`, `Puck` (default), or `Voice::Custom("name")`.

### Thinking (Gemini 2.5+)

The `gemini-2.5-flash-native-audio-preview-12-2025` model supports thinking
capabilities with dynamic thinking enabled by default. Control the thinking
budget and receive thought summaries in your session:

```rust
let handle = Live::builder()
.model(GeminiModel::Custom(
"models/gemini-2.5-flash-native-audio-preview-12-2025".into(),
))
.thinking(1024) // set thinking token budget (0 = disable)
.include_thoughts() // receive thought summaries via on_thought
.on_thought(|text| println!("[Thought] {text}"))
.on_text(|t| print!("{t}"))
.connect_google_ai(api_key)
.await?;
```

**How it works in the three-lane architecture:**

- `thinkingConfig` (`thinkingBudget`, `includeThoughts`) is sent in the setup
message's `generationConfig`
- When `includeThoughts` is true, thought parts arrive as `Part::Thought` in
`model_turn` content — emitted as `SessionEvent::Thought(String)`
- Thought events are routed to the **fast lane** and delivered via the
`on_thought` sync callback (< 1ms, no allocations)

**Platform support:** Google AI only. On Vertex AI, `thinkingConfig` is
automatically stripped from the setup message — no code changes needed.

### Tool Calling

Declare function tools with JSON Schema parameters. The SDK auto-dispatches
tool calls when you provide a `ToolDispatcher`, or you can handle them manually
in `on_tool_call`.

```rust
let handle = Live::builder()
.instruction("You can check the weather and do math.")
.on_tool_call(|calls, state| async move {
let responses: Vec = calls.iter().map(|call| {
let result = match call.name.as_str() {
"get_weather" => json!({"temp": 22, "condition": "sunny"}),
_ => json!({"error": "unknown tool"}),
};
FunctionResponse {
name: call.name.clone(),
response: result,
id: call.id.clone(),
scheduling: None,
}
}).collect();
Some(responses)
})
.connect_google_ai(api_key)
.await?;
```

Or use built-in tools directly:

```rust
Live::builder()
.google_search() // Google Search grounding
.code_execution() // Sandbox code execution
.url_context() // URL content retrieval
```

### State Management

A concurrent, type-safe `State` container with prefix-scoped namespaces,
atomic read-modify-write, delta tracking, and transparent derived fallbacks.

```rust
use gemini_adk::State;
use gemini_adk::state::StateKey;

// Typed keys eliminate typo bugs
const TURN_COUNT: StateKey = StateKey::new("session:turn_count");
const SENTIMENT: StateKey = StateKey::new("derived:sentiment");

let state = State::new();

// Prefix-scoped accessors
state.app().set("flag", true); // writes to "app:flag"
state.user().set("name", "Alice"); // writes to "user:name"
state.session().set("turn_count", 0u32); // writes to "session:turn_count"
state.turn().set("transcript", "hello"); // writes to "turn:transcript"

// Atomic read-modify-write
state.modify("session:turn_count", 0u32, |n| n + 1);

// Transparent derived fallback: get("risk") auto-checks "derived:risk"
state.set("derived:risk", 0.85);
let risk: Option = state.get("risk"); // returns Some(0.85)

// Delta tracking for transactional state
let tracked = state.with_delta_tracking();
tracked.set("temp:scratch", 42);
tracked.commit(); // merge into main store
// or: tracked.rollback();
```

**Prefix namespaces:**

| Prefix | Purpose | Lifetime |
|--------|---------|----------|
| `session:` | Auto-tracked signals (turn count, tokens, timing) | Session |
| `derived:` | Read-only computed variables | Session |
| `turn:` | Cleared each turn | Turn |
| `app:` | Application state | Session |
| `bg:` | Background task state | Session |
| `user:` | User-scoped state | Session |
| `temp:` | Scratch space | Explicit |

### Phase System

Declarative conversation phase management with guard-based transitions,
per-phase tool filtering, instruction composition, and async lifecycle callbacks.

```rust
let handle = Live::builder()
.phase("greeting")
.instruction("Welcome the user warmly.")
.prompt_on_enter(true)
.transition_with("identify", |s| {
s.get::("caller_name").is_some()
}, "when caller provides their name")
.done()
.phase("identify")
.instruction("Confirm the caller's identity.")
.needs(&["caller_name", "caller_org"])
.tools(vec!["lookup_contact".into()])
.transition_with("handle", |s| {
s.get::("verified").unwrap_or(false)
}, "when identity is verified")
.done()
.phase("handle")
.dynamic_instruction(|s| {
let topic: String = s.get("topic").unwrap_or_default();
format!("Help the caller with: {topic}")
})
.tools(vec!["search".into(), "calc".into()])
.transition_with("farewell", |s| {
s.get::("resolved").unwrap_or(false)
}, "when the request is resolved")
.done()
.phase("farewell")
.instruction("Say goodbye and provide a reference number.")
.terminal()
.done()
.initial_phase("greeting")
// Phase defaults inherited by all phases
.phase_defaults(|p| {
p.with_state(&["caller_name", "caller_org"])
.navigation() // inject phase navigation context
})
// Recommended: set persona once, steer via context injection
.steering_mode(SteeringMode::ContextInjection)
.connect_vertex(project, location, token)
.await?;
```

#### Steering Modes

Control how the SDK delivers phase instructions to the model. This is the most
impactful configuration choice for multi-phase apps:

| Mode | System Instruction | Phase Instructions | Best For |
|------|--------------------|--------------------|----------|
| `ContextInjection` | Set once at connect | Delivered as model-role context turns | Multi-phase apps with stable persona (**recommended**) |
| `InstructionUpdate` | Replaced on every transition | Baked into system instruction | Agents with radically different personas per phase |
| `Hybrid` | Replaced on transition | Modifiers as context turns | Persona shifts + per-turn steering |

```rust
// Recommended: base persona at connect, phase context injected per turn
Live::builder()
.instruction("You are a helpful assistant.")
.steering_mode(SteeringMode::ContextInjection)
```

#### Context Delivery Timing

Control when model-role context turns hit the wire:

| Mode | Behavior | Best For |
|------|----------|----------|
| `Immediate` (default) | Send as single batched frame during TurnComplete | Low-latency, text-only apps |
| `Deferred` | Queue until next user send (audio/text/video) | Voice apps — eliminates mid-silence frames |

```rust
// Voice app: flush context alongside user audio, not during silence
Live::builder()
.steering_mode(SteeringMode::ContextInjection)
.context_delivery(ContextDelivery::Deferred)
```

With `Deferred`, the `DeferredWriter` wraps the session writer and drains pending context before each `send_audio`/`send_text`/`send_video`. Context that requires a prompt (e.g. `prompt_on_enter`) is always sent immediately.

See the [Steering Modes guide](docs/user-guide/steering-modes.md) for the full
decision matrix, anti-patterns, and implementation details.

#### Phase Navigation Context

The `.navigation()` modifier injects a structured description of the current
phase graph into the model's instruction, giving it awareness of where it is,
what it still needs, and where it can go:

```
[Navigation]
Current phase: identify -- Confirm the caller's identity.
Previous: greeting (turn 2)
Still needed: caller_org
Possible next:
-> handle: when identity is verified
```

This is auto-generated from `.needs()`, `.transition_with()` descriptions, and
phase history. The model can use this to guide the conversation naturally.

### Extraction Pipeline

Run out-of-band LLM calls to extract structured data from the conversation
transcript. Schema-guided via `schemars::JsonSchema`.

```rust
use schemars::JsonSchema;

#[derive(Deserialize, Serialize, JsonSchema)]
struct CallerInfo {
caller_name: Option,
caller_org: Option,
request_type: Option,
}

let handle = Live::builder()
.instruction("You are a receptionist.")
// Extract every 2 turns instead of every turn (reduces LLM costs)
.extract_turns_triggered::(
flash_llm,
"Extract caller name, organization, and request type",
5, // transcript window size
ExtractionTrigger::Interval(2),
)
.on_extracted(|name, value| async move {
println!("Extracted {name}: {value}");
})
.connect_vertex(project, location, token)
.await?;

// Read latest extraction at any time
let info: Option = handle.extracted("CallerInfo");
```

Extractors automatically enable transcription and warm up the OOB LLM
connection at session start for fast first-extraction latency.

### State Watchers & Temporal Patterns

React to state changes and time-based conditions declaratively:

```rust
Live::builder()
// Fire when app:score crosses above 0.9
.watch("app:score")
.crossed_above(0.9)
.then(|_old, _new, state| async move {
state.set("high_score_alert", true);
})
// Fire when a boolean becomes true
.watch("app:escalated")
.became_true()
.blocking() // block turn processing until complete
.then(|_old, _new, _state| async move {
notify_supervisor().await;
})
// Fire when condition holds for 30 seconds continuously
.when_sustained("user_confused",
|s| s.get::("confused").unwrap_or(false),
Duration::from_secs(30),
|_state, writer| async move { /* offer help */ },
)
// Fire after 3 consecutive turns matching condition
.when_turns("stuck_in_loop",
|s| s.get::("repeating").unwrap_or(false),
3,
|_state, writer| async move { /* break loop */ },
)
```

### Computed (Derived) State

Register reactive computed variables that update when their dependencies change:

```rust
Live::builder()
.computed("risk_level", &["app:sentiment_score"], |state| {
let score: f64 = state.get("app:sentiment_score")?;
if score < 0.3 { Some(json!("high")) }
else { Some(json!("low")) }
})
// Read transparently: state.get("risk_level") auto-checks "derived:risk_level"
```

### Text Agent Combinators

Build complex request/response LLM pipelines that can be dispatched from
Live session hooks. These use standard `generate()` calls (not WebSocket
sessions), enabling background processing during a voice conversation.

| Combinator | Purpose |
|-----------|---------|
| `LlmTextAgent` | Core agent -- generate, tool dispatch, loop |
| `FnTextAgent` | Zero-cost state transform (no LLM call) |
| `SequentialTextAgent` | Run children in order, state flows forward |
| `ParallelTextAgent` | Run children concurrently via `tokio::spawn` |
| `LoopTextAgent` | Repeat until max iterations or predicate |
| `FallbackTextAgent` | Try each child, first success wins |
| `RouteTextAgent` | State-driven deterministic branching |
| `RaceTextAgent` | Run concurrently, first to finish wins |
| `TimeoutTextAgent` | Wrap an agent with a time limit |
| `MapOverTextAgent` | Iterate an agent over a list in state |
| `TapTextAgent` | Read-only observation (no mutation) |
| `DispatchTextAgent` | Fire-and-forget background tasks |
| `JoinTextAgent` | Wait for dispatched tasks |

Register text agents as tools the live model can call. The agent shares
the session's `State`, so mutations are visible to watchers and phase
transitions:

```rust
Live::builder()
.agent_tool("verify_identity", "Verify caller identity", verifier_agent)
.agent_tool("calc_payment", "Calculate payment plans", calc_pipeline)
```

### S.C.T.P.M.A Composition

Six operator namespaces for composing different aspects of agent configuration:

| Namespace | Operator | Purpose | Example |
|-----------|----------|---------|---------|
| `S::` | `>>` | State transforms | `S::set("key", val) >> S::rename("a", "b")` |
| `C::` | `+` | Context engineering | `C::last_n(5) + C::system_only()` |
| `T::` | `\|` | Tool composition | `T::function(search) \| T::google_search()` |
| `P::` | `+` | Prompt composition | `P::role("assistant") + P::task("summarize")` |
| `M::` | `\|` | Middleware composition | `M::log() \| M::rate_limit(10)` |
| `A::` | `+` | Artifact schemas | `A::produces(schema) + A::consumes(schema)` |

**Prompt composition example:**

```rust
use gemini_adk_fluent::prelude::*;

let prompt = P::role("a customer support agent for Acme Corp")
+ P::task("help customers with billing inquiries")
+ P::constraint("never reveal internal pricing formulas")
+ P::guidelines(vec![
"Be empathetic and professional",
"Confirm resolution before closing",
]);

let instruction = prompt.render();
```

### Callback Modes

Control-lane callbacks support two execution modes:

| Mode | Method suffix | Behavior |
|------|--------------|----------|
| **Blocking** | `.on_turn_complete()` | Awaited inline -- event loop waits |
| **Concurrent** | `.on_turn_complete_concurrent()` | Spawned as detached task -- fire and forget |

Use concurrent mode for logging, analytics, webhook dispatch, or background
agent triggering where you don't need ordering guarantees.

### REST APIs (Feature-Gated)

The L0 crate also provides feature-gated access to Gemini REST APIs beyond
the Live WebSocket connection:

```toml
[dependencies]
gemini-live = { version = "0.1", features = ["generate", "embed", "files"] }
# Or enable everything:
# gemini-live = { version = "0.1", features = ["all-apis"] }
```

| Feature | API |
|---------|-----|
| `generate` | Content generation (`generateContent`) |
| `embed` | Text embeddings |
| `files` | File upload and management |
| `models` | Model listing and info |
| `tokens` | Token counting |
| `caches` | Context caching |
| `tunings` | Fine-tuning jobs |
| `batches` | Batch prediction |
| `chats` | Multi-turn chat sessions |

---

## Three-Lane Processor Architecture

All Live session events are routed through a zero-copy dispatcher into three
independent lanes, each optimized for its latency profile:

```
SessionEvent (broadcast from L0)
|
+----+----+
| Router | Zero-work dispatcher -- NO state access on hot path
+--+--+--+-+
| | |
| | +------------------------------+
| +----------------+ |
| | |
+----v---------+ +-----v----------+ +---v--------------+
| Fast Lane | | Control Lane | | Telemetry Lane |
| (sync <1ms) | | (async) | | (own broadcast) |
+--------------+ +-------------- + +------------------+
| on_audio | | on_tool_call | | SessionSignals |
| on_text | | on_interrupted | | (State keys) |
| on_vad_* | | Phase trans. | | SessionTelemetry |
| on_input_ | | Extractors | | (AtomicU64) |
| transcript | | (concurrent) | | on_usage cb |
| on_output_ | | Watchers | | Debounced 100ms |
| transcript | | Computed state | | flush |
+--------------+ | Temporal ptns | +------------------+
| TranscriptBuf |
| (owned, no |
| mutex) |
+----------------+
```

**Design constraints:**
- Fast lane callbacks must be sync and complete in < 1ms (no allocations, no locks, no async)
- Control lane owns the `TranscriptBuffer` exclusively (no `Arc>`)
- Telemetry lane runs on its own broadcast receiver (never blocks the router)
- Extractors run concurrently via `futures::future::join_all`

---

## Examples

The `examples/` directory contains runnable examples organized by complexity.
Each demonstrates specific SDK features at the layer you need.

### Getting Started

```bash
# 1. Configure credentials
cp .env.example .env
# Edit .env: set GEMINI_API_KEY (Google AI) or GOOGLE_CLOUD_PROJECT + GOOGLE_CLOUD_LOCATION (Vertex AI)

# 2. Run a standalone example
cargo run -p text-chat # http://127.0.0.1:3001
cargo run -p voice-chat # http://127.0.0.1:3002
cargo run -p tool-calling # http://127.0.0.1:3003
cargo run -p transcription # http://127.0.0.1:3004

# 3. Run the multi-app Web UI (all apps + devtools panel)
cargo run -p gemini-adk-web # http://127.0.0.1:3000
```

### Standalone Examples

These run independently with their own Axum server and minimal UI.

| Example | Port | Layer | What You Learn |
|---------|------|-------|----------------|
| [`text-chat`](examples/text-chat) | 3001 | L0 | Wire protocol basics — connect, send text, receive streaming deltas |
| [`voice-chat`](examples/voice-chat) | 3002 | L0 | Bidirectional audio, voice selection, VAD events, transcription |
| [`tool-calling`](examples/tool-calling) | 3003 | L1 | `TypedTool` with auto-generated JSON Schema, `ToolDispatcher` routing |
| [`transcription`](examples/transcription) | 3004 | L0 | Every Gemini Live config option: VAD, activity handling, affective dialog, context compression, session resumption |
| [`agents`](examples/agents) | CLI | L1/L2 | Text agent combinators (`>>`, `\|`, `/`), `TypedTool`, copy-on-write builders |

### ADK Web UI (`gemini-adk-web`)

The Web UI bundles all apps below into a single Axum server with a shared
devtools panel showing real-time state, timeline, transcript, and telemetry.

#### Crawl (Beginner)

| App | What It Demonstrates | Key SDK Features |
|-----|---------------------|-----------------|
| **text-chat** | Minimal text-only session — no microphone needed | `Live::builder().text_only()`, text streaming |
| **voice-chat** | Native audio chat with real-time transcription | `Modality::Audio`, voice selection, input/output transcription |
| **tool-calling** | Three demo tools: weather, time, calculator | `FunctionDeclaration`, `on_tool_call`, `NonBlocking` behavior, `WhenIdle` scheduling |

#### Walk (Intermediate)

| App | What It Demonstrates | Key SDK Features |
|-----|---------------------|-----------------|
| **all-config** | Configuration playground — every Gemini Live option in one app | Dynamic tool creation, modality switching, Google Search, code execution, context compression |
| **guardrails** | Real-time policy monitoring with corrective injection | `RegexExtractor`, `.watch()` state reactions, `.instruction_amendment()`, PII/off-topic/sentiment detection |
| **playbook** | 6-phase customer support flow with state extraction | `.phase()` chains, `.transition_with()` guards, `.greeting()`, `.with_context()`, `RegexExtractor` |

#### Run (Advanced)

| App | What It Demonstrates | Key SDK Features |
|-----|---------------------|-----------------|
| **support-assistant** | Multi-agent handoff between billing and technical support | Dual state machines (10 phases), `.computed()` derived state, cross-agent transitions, telemetry |
| **call-screening** | Incoming call screening with sentiment analysis and smart routing | Phase machine, tool calling (`check_contact_list`, `check_calendar`, `take_message`, `transfer_call`, `block_caller`), `NonBlocking` tools |
| **clinic** | HIPAA-aware telehealth scheduling with clinical triage | 8 tools (`verify_patient`, `check_availability`, `book_appointment`, etc.), patient intake flow, department routing |
| **restaurant** | Restaurant reservation and ordering system | 6 tools (`check_availability`, `make_reservation`, `get_menu`, etc.), dietary handling, occasion tracking |
| **debt-collection** | FDCPA-compliant debt collection with compliance gates | `StateKey`, identity verification, payment negotiation, cease-and-desist handling, compliance watchers |

### Platform Support

All examples work with both **Google AI** (API key) and **Vertex AI** (project/location).
The SDK auto-strips unsupported features on Vertex AI — no code changes needed:

| Feature | Google AI | Vertex AI |
|---------|-----------|-----------|
| Async tool calling (`NonBlocking`, `WhenIdle`/`Silent`) | Supported | Stripped automatically |
| Thinking (`thinkingConfig`) | Supported | Stripped automatically |

---

## Common Errors & Solutions

### Vertex AI sends binary WebSocket frames

**Symptom:** `serde_json::from_str` fails on messages from Vertex AI.

**Cause:** Vertex AI sends Binary WebSocket frames, not Text frames (unlike
Google AI).

**Solution:** Already handled by `TungsteniteTransport::recv()`. If you build a
custom transport, handle both `Message::Text` and `Message::Binary`.

### Native audio model only supports AUDIO output modality

**Symptom:** Error when requesting `Modality::Text` with
`GeminiLive2_5FlashNativeAudio`.

**Solution:** Use `Modality::Audio` only, or switch to `Gemini2_0FlashLive`
which supports text output:

```rust
// Correct for native audio model:
config.response_modalities(vec![Modality::Audio])

// For text output, use the non-native model:
.model(GeminiModel::Gemini2_0FlashLive)
```

### Vertex AI endpoint URL

**Symptom:** Connection fails to `global-aiplatform.googleapis.com`.

**Solution:** Use `aiplatform.googleapis.com` (no `global-` prefix). The SDK
handles this automatically via the `Platform` enum.

### Tool declarations cannot be updated mid-session

**Symptom:** Attempting to add or remove tools after `connect()`.

**Cause:** The Gemini Live API does not support updating tool definitions after
session setup.

**Solution:** Declare all tools upfront. Use per-phase `tools_enabled` to
control which tools the model can call at any given point in the conversation.

### Extraction returns stale data

**Symptom:** `handle.extracted::(name)` returns the previous turn's data.

**Cause:** Extractors run asynchronously on the control lane after each turn
completes.

**Solution:** Use the `on_extracted` callback for real-time notifications, or
poll `handle.extracted()` after the turn-complete event.

### State key not found despite being set

**Symptom:** `state.get("risk")` returns `None` even though you called
`state.set("derived:risk", 0.85)`.

**Solution:** The derived fallback works correctly: `get("risk")` checks
`derived:risk` automatically. However, `get("app:risk")` does NOT trigger the
fallback -- prefixed keys are looked up exactly as specified.

### Session disconnects after inactivity

**Symptom:** Server sends `GoAway` and closes the connection.

**Solution:** Handle gracefully with `.on_go_away(|ttl| async move { ... })`.
Enable session resumption with `.session_resume(true)` for transparent reconnect
support.

### Context window fills up in long conversations

**Symptom:** Model responses degrade in quality after many turns.

**Solution:** Enable context window compression:

```rust
Live::builder()
.context_compression(4000, 2000) // trigger at 4k tokens, compress to 2k
```

---

## Development

### Prerequisites

| Requirement | Version | Purpose |
|------------|---------|---------|
| **Rust** | 1.75+ | Language toolchain ([install](https://rustup.rs/)) |
| **cargo** | (bundled) | Build system and package manager |
| **pkg-config** | any | Locates system libraries |
| **OpenSSL** | 1.1+ | TLS for WebSocket connections |
| **ALSA dev** (Linux) | any | Audio I/O for voice examples |

**Quick setup (Ubuntu/Debian):**

```bash
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# Install system dependencies
sudo apt-get update
sudo apt-get install -y pkg-config libssl-dev libasound2-dev build-essential
```

**Quick setup (macOS):**

```bash
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# System deps (OpenSSL via Homebrew)
brew install openssl pkg-config
```

**Environment variables:**

```bash
# Google AI (API key auth)
export GEMINI_API_KEY="your-api-key"

# Vertex AI (service account auth)
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"
```

### Build

```bash
cargo build --workspace
```

### Test

```bash
cargo test --workspace
```

### Lint

```bash
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --all -- --check
```

### Run the Web UI

```bash
cd apps/gemini-adk-web
GEMINI_API_KEY="your-key" cargo run
# Open http://localhost:3000
```

### Generate documentation

```bash
cargo doc --workspace --no-deps --open
```

### Feature flags (gemini-live)

```bash
# Default: live + vad + tracing
cargo build -p gemini-live

# With REST APIs
cargo build -p gemini-live --features generate,embed,files

# Everything
cargo build -p gemini-live --features all-apis,metrics,opus
```

---

## Project Structure

```
gemini-rs/
crates/
gemini-live/ L0: Wire protocol, transport, types
gemini-adk/ L1: Agent runtime, state, phases, tools
gemini-adk-fluent/ L2: Fluent builder API, operators
examples/
text-chat/ Minimal text-only session (L0)
voice-chat/ Bidirectional audio chat (L0)
tool-calling/ TypedTool + ToolDispatcher (L1)
transcription/ Every Gemini Live config option (L0)
agents/ Text agent combinators (L1/L2)
INDEX.md Full example reference with per-app docs
apps/
gemini-adk-web/ Multi-app Web UI with devtools (L2)
src/apps/ 13 showcase apps (see examples/INDEX.md)
tools/
gemini-adk-transpiler/ Python ADK to Rust transpiler
Cargo.toml Workspace root
```

---

## License

Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for
details.