https://github.com/vamsiramakrishnan/gemini-rs

Full Rust SDK for the Gemini Multimodal Live API — wire protocol, agent runtime, and fluent DX in three layered crates.
https://github.com/vamsiramakrishnan/gemini-rs
adk agent-framework async-rust function-calling gemini gemini-api google-ai llm multimodal real-time rust tokio vertex-ai voice-agents websocket
Last synced: 3 months ago
JSON representation
Full Rust SDK for the Gemini Multimodal Live API — wire protocol, agent runtime, and fluent DX in three layered crates.
Host: GitHub
URL: https://github.com/vamsiramakrishnan/gemini-rs
Owner: vamsiramakrishnan
License: mit
Created: 2026-03-01T07:06:43.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-17T12:39:33.000Z (4 months ago)
Last Synced: 2026-03-17T22:03:45.973Z (4 months ago)
Topics: adk, agent-framework, async-rust, function-calling, gemini, gemini-api, google-ai, llm, multimodal, real-time, rust, tokio, vertex-ai, voice-agents, websocket
Language: Rust
Homepage: https://crates.io/crates/rs-genai
Size: 1.8 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project

README

          # gemini-rs

> Full Rust SDK for the Gemini Multimodal Live API -- wire protocol, agent runtime, and fluent DX in three layered crates.

[![CI](https://github.com/vamsiramakrishnan/gemini-rs/actions/workflows/ci.yml/badge.svg)](https://github.com/vamsiramakrishnan/gemini-rs/actions/workflows/ci.yml)

[![Docs](https://github.com/vamsiramakrishnan/gemini-rs/actions/workflows/docs.yml/badge.svg)](https://github.com/vamsiramakrishnan/gemini-rs/actions/workflows/docs.yml)

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

[![crates.io](https://img.shields.io/crates/v/gemini-live.svg)](https://crates.io/crates/gemini-live)

[![Rust](https://img.shields.io/badge/rust-1.75%2B-orange.svg)](https://www.rust-lang.org)

---

## Why gemini-rs?

Google's Gemini Multimodal Live API enables full-duplex, real-time voice and

text conversations with tool calling, streaming audio, and mid-session

instruction updates. Building on it raw means wrestling with WebSocket frame

parsing, binary/text codec differences between Google AI and Vertex AI,

authentication token management, voice activity detection, barge-in handling,

and turn lifecycle -- before you write a single line of agent logic.

**gemini-rs** eliminates that friction. It gives you a layered Rust SDK where

each crate adds exactly the abstraction you need:

- **Wire-level access** for custom transports, proxies, or non-standard

  deployments (`gemini-live`).

- **Agent runtime** with typed state, phase machines, tool dispatch, text agent

  combinators, and a three-lane processor architecture (`gemini-adk`).

- **Fluent builder API** where a production voice agent is 20 lines of

  declarative Rust, not 200 lines of boilerplate (`gemini-adk-fluent`).

Every layer is independently usable. Pick the altitude that fits your problem.

### Raw WebSocket vs. Fluent API

Raw WebSocket (L0 only)Fluent API (L2)

```rust

// Connect, subscribe, send, match events,

// handle tool calls, manage turns, track

// state, parse audio frames ...

let session = quick_connect(

    "KEY", "gemini-2.0-flash-live-001"

).await?;

session.send_text("Hello").await?;

let mut events = session.subscribe();

while let Ok(event) = events.recv().await {

    match event {

        SessionEvent::Audio(data) => {

            /* decode, buffer, play */

        }

        SessionEvent::TextDelta(t) => {

            print!("{t}");

        }

        SessionEvent::ToolCall(calls) => {

            // dispatch, build responses,

            // send back ...

        }

        SessionEvent::TurnComplete => break,

        _ => {}

    }

}

```

```rust

let handle = Live::builder()

    .instruction("You are a helpful assistant.")

    .greeting("Say hello to the user.")

    .on_audio(|data| speaker.send(data))

    .on_text(|t| print!("{t}"))

    .on_tool_call(|calls, state| async move {

        // auto-dispatched with .tools()

        None

    })

    .connect_google_ai("KEY")

    .await?;

handle.send_text("Hello").await?;

```

---

## Architecture

```

+----------------------------------------------------------------------+

|  gemini-adk-fluent  (L2 -- Fluent DX)                                    |

|                                                                      |

|  Live::builder()  .  AgentBuilder  .  S.C.T.P.M.A operators         |

|  PhaseBuilder  .  WatchBuilder  .  Temporal patterns                 |

+----------------------------------------------------------------------+

|  gemini-adk  (L1 -- Agent Runtime)                                       |

|                                                                      |

|  LiveSessionBuilder  .  LiveHandle  .  Three-lane processor          |

|  State (prefix-scoped)  .  PhaseMachine  .  ToolDispatcher           |

|  TextAgent combinators  .  Extractors  .  Watchers  .  Telemetry    |

|  LlmAgent  .  Runner  .  SessionService  .  MCP  .  A2A            |

+----------------------------------------------------------------------+

|  gemini-live  (L0 -- Wire Protocol)                                     |

|                                                                      |

|  Transport (WebSocket + Mock)  .  Codec (JSON)  .  Auth providers    |

|  SessionHandle  .  Protocol types  .  VAD  .  Jitter buffer         |

|  Telemetry (OTel + Prometheus)  .  REST APIs (feature-gated)         |

+----------------------------------------------------------------------+

```

Each layer depends only on the one below it. Application code imports from the

highest layer it needs (`gemini_adk_fluent::prelude::*` re-exports all three).

---

## Core Concepts & How They Interplay

A gemini-rs voice session is built from six core concepts that work together.

This section shows what each one does and how they connect.

```

                         +------------------+

                         |   Live::builder  |  (L2 Fluent API)

                         +--------+---------+

                                  |  configures

          +-----------+-----------+-----------+-----------+

          |           |           |           |           |

     +----v---+  +----v----+  +--v---+  +----v----+  +--v--------+

     | Phases |  |Extractors| | Tools |  |Watchers |  | Telemetry |

     +----+---+  +----+----+  +--+---+  +----+----+  +-----+-----+

          |           |          |           |              |

          +-----+-----+----+----+-----+-----+              |

                |          |          |                     |

          +-----v----------v----------v-----+        +-----v-----+

          |            State                |        | Signals & |

          |  (prefix-scoped, concurrent)    |<-------+ Counters  |

          +---------------------------------+        +-----------+

```

### 1. State -- The Shared Spine

Everything reads from and writes to `State`. It is the single source of truth

for a session -- a concurrent, typed key-value store with prefix-scoped

namespaces.

```

State

  |

  +-- app:caller_name = "Alice"          (application state)

  +-- session:turn_count = 5             (auto-tracked by SessionSignals)

  +-- session:total_token_count = 1284   (auto-tracked from UsageMetadata)

  +-- derived:risk_level = "high"        (computed variable, read-only)

  +-- turn:transcript = "I need help"    (cleared each turn)

  +-- bg:verification_status = "pending" (background agent result)

```

**Why it matters:** Phase transitions check state. Extractors write to state.

Watchers fire when state changes. Computed variables derive from state.

Telemetry auto-populates state. Everything converges here.

### 2. Phases -- Conversation Structure

Phases define the *shape* of a conversation: what the model should do, what

tools are available, and when to move on.

```

  [greeting] ---> [identify_caller] ---> [handle_request] ---> [farewell]

       |               |                       |                    |

   instruction:    instruction:            instruction:         instruction:

   "Welcome..."   "Get name..."          "Help with..."       "Say goodbye"

       |               |                       |

   tools: []       tools: [lookup]         tools: [search, calc]

       |               |                       |

   transition:     transition:             transition:

   caller_name     request_type            resolved == true

   is_some()       is_some()

```

Each phase declares:

- **Instruction**: what the model should do (static or state-driven dynamic)

- **Tools**: which tools are available in this phase

- **Transitions**: state predicates that trigger moves to the next phase

- **Guards**: predicates that must be true before entering a phase

- **Needs**: state keys still required (drives navigation context)

- **Lifecycle hooks**: `on_enter` / `on_exit` for side effects

Phases don't micromanage the model. They set guardrails -- the LLM naturally

asks follow-up questions until the transition predicate becomes true.

### 3. Extractors -- Structured Data from Conversation

Extractors run out-of-band LLM calls to pull structured data from the

conversation transcript and write it into State.

```

 Conversation transcript        OOB LLM call           State

 +-----------------------+     +---------------+     +------------------+

 | "Hi, I'm Alice from   | --> | Extract with  | --> | caller_name:     |

 |  Acme Corp, I need    |     | JSON Schema   |     |   "Alice"        |

 |  help with billing."  |     +---------------+     | caller_org:      |

 +-----------------------+                           |   "Acme Corp"    |

                                                     | request_type:    |

                                                     |   "billing"      |

                                                     +------------------+

                                                           |

                                                    triggers phase

                                                    transition!

```

**Extraction triggers** control *when* extractors fire:

| Trigger | When it fires | Use case |

|---------|--------------|----------|

| `EveryTurn` | After every TurnComplete | Default, high-frequency extraction |

| `Interval(n)` | Every N turns | Reduce LLM costs for slow-changing data |

| `AfterToolCall` | After tool dispatch completes | Extract from tool results |

| `OnPhaseChange` | When phase transitions fire | Re-extract on context shift |

### 4. Watchers & Temporal Patterns -- Reactive State

Watchers observe state changes and fire callbacks. Temporal patterns detect

conditions that persist over time or turns.

```

  State change: app:score = 0.85 --> 0.95

                    |

            +-------v--------+

            | Watcher:       |

            | crossed_above  |

            | threshold=0.9  |

            +-------+--------+

                    |

            fires callback:

            state.set("alert", true)

  Condition held for 30s:          3 consecutive turns:

  +-------------------------+     +-------------------------+

  | when_sustained:         |     | when_turns:             |

  | confused == true        |     | repeating == true       |

  | for 30 seconds          |     | for 3 turns             |

  | --> offer help          |     | --> break loop           |

  +-------------------------+     +-------------------------+

```

### 5. Tools -- Model Actions

Tools give the model the ability to take actions. gemini-rs supports typed

tools (auto-schema from Rust structs), simple tools (raw JSON), built-in

tools (Google Search, code execution), and agent-as-tool (text agent pipelines

callable by the live model).

```

  Model decides to call tool

           |

  +--------v---------+

  |  ToolDispatcher   |  Routes by function name

  +--+-----+-----+---+

     |     |     |

  +--v-+ +-v--+ +v---------+

  |get_| |calc| |verify_   |

  |wx  | |pay | |identity  |

  +----+ +----+ +----------+

  Simple  Typed   AgentTool

  Tool    Tool    (text agent

                   pipeline)

  Background tools: model continues talking

  while the tool executes asynchronously.

```

**Background tool execution** eliminates dead air in voice sessions. Mark

tools as background and the model receives a "processing" acknowledgment

immediately, continuing the conversation while the tool runs:

```rust

Live::builder()

    .tools(dispatcher)

    .tool_background("search_kb")  // runs async, no dead air

```

### 6. Telemetry -- Observability Pipeline

Telemetry flows through two complementary systems, both running on the

telemetry lane (off the hot path):

```

  SessionEvent stream

        |

  +-----v--------------+     +------------------+

  | SessionSignals      |     | SessionTelemetry |

  | (State keys)        |     | (Atomic counters)|

  +-----+---------------+     +--------+---------+

        |                              |

        v                              v

  session:turn_count          audio_chunks_out: 1482

  session:total_token_count   avg_latency_ms: 340

  session:is_speaking         interruptions: 3

  session:silence_ms          total_token_count: 5280

        |                              |

        v                              v

  Available to phases,         snapshot() --> JSON

  watchers, extractors,        for devtools UI

  transition guards

```

**SessionSignals** writes to State -- so phases, watchers, and extractors can

react to session-level metrics (e.g., transition after N turns, alert when

tokens exceed budget).

**SessionTelemetry** tracks lock-free atomic counters (~1ns per operation) for

performance metrics: audio throughput, response latency (min/avg/max via CAS),

turn duration, token usage, and interruption counts.

**UsageMetadata** from the Gemini API is automatically tracked at all layers:

- L0 emits `SessionEvent::Usage(UsageMetadata)` with full token breakdowns

- L1 records in both SessionSignals (state keys) and SessionTelemetry (atomics)

- L2 exposes `.on_usage(|metadata| ...)` callback for real-time observation

### How They Work Together

Here's the flow for a single model turn in a phased conversation:

```

  User speaks: "I'm Alice from Acme Corp"

       |

  [1]  v  Fast lane: on_audio, on_input_transcript (sync, <1ms)

       |

  [2]  v  Model responds, turn completes

       |

  [3]  v  Control lane: TranscriptBuffer records the turn

       |

  [4]  v  Extractors run (OOB LLM call)

       |    --> writes caller_name="Alice", caller_org="Acme Corp" to State

       |

  [5]  v  Watchers fire on state changes

       |    --> crossed_above, became_true, changed_to callbacks

       |

  [6]  v  Computed variables recompute

       |    --> derived:risk_level updates based on new state

       |

  [7]  v  Phase machine evaluates transitions

       |    --> caller_name.is_some() == true

       |    --> transition: identify_caller --> handle_request

       |

  [8]  v  Phase on_exit / on_enter hooks fire

       |    --> instruction updated, navigation context regenerated

       |

  [9]  v  Telemetry lane: SessionSignals + SessionTelemetry update

            --> session:turn_count++, latency recorded, tokens tracked

```

---

## Quick Start

### Google AI (API Key)

```rust

use gemini_adk_fluent::prelude::*;

#[tokio::main]

async fn main() -> Result<(), Box> {

    let handle = Live::builder()

        .model(GeminiModel::Gemini2_0FlashLive)

        .instruction("You are a friendly assistant.")

        .on_text(|t| print!("{t}"))

        .on_turn_complete(|| async { println!("\n---") })

        .connect_google_ai(std::env::var("GEMINI_API_KEY")?)

        .await?;

    handle.send_text("What is the speed of light?").await?;

    tokio::signal::ctrl_c().await?;

    handle.disconnect().await?;

    Ok(())

}

```

### Vertex AI

```rust

let handle = Live::builder()

    .model(GeminiModel::Gemini2_0FlashLive)

    .voice(Voice::Kore)

    .instruction("You are a customer support agent.")

    .on_audio(|data| playback_tx.send(data.clone()).ok())

    .on_text(|t| print!("{t}"))

    .connect_vertex("my-project", "us-central1", access_token)

    .await?;

```

### Wire Level Only (L0)

```rust

use gemini_live::prelude::*;

let session = gemini_live::quick_connect(

    "API_KEY", "gemini-2.0-flash-live-001"

).await?;

session.send_text("What is the speed of light?").await?;

let mut events = session.subscribe();

while let Ok(event) = events.recv().await {

    if let SessionEvent::TextDelta(ref text) = event {

        print!("{text}");

    }

    if let SessionEvent::TurnComplete = event { break; }

}

```

---

## Crate Overview

| Crate | Layer | Description |

|-------|-------|-------------|

| [`gemini-live`](crates/gemini-live) | L0 -- Wire | Protocol types, WebSocket transport, auth providers, VAD, jitter buffer, REST APIs (feature-gated). Full Rust equivalent of Google's `@google/genai`. |

| [`gemini-adk`](crates/gemini-adk) | L1 -- Runtime | Agent runtime with state management, phase machines, tool dispatch, text agent combinators, extractors, watchers, telemetry. Full Rust equivalent of Google's `@google/adk`. |

| [`gemini-adk-fluent`](crates/gemini-adk-fluent) | L2 -- Fluent | `Live::builder()` API, `AgentBuilder`, S.C.T.P.M.A operator algebra, composition patterns, test utilities. |

---

## Features

### Voice / Live Sessions

Build full-duplex voice sessions with callbacks for every event type. Audio,

text, transcription, interruptions, and turn lifecycle are all handled.

```rust

let handle = Live::builder()

    .model(GeminiModel::GeminiLive2_5FlashNativeAudio)

    .voice(Voice::Puck)

    .instruction("You are a weather assistant.")

    .greeting("Greet the user and ask how you can help.")

    .transcription(true, true)          // input + output transcription

    .thinking(1024)                     // enable thinking with token budget

    .include_thoughts()                 // receive thought summaries

    .affective_dialog(true)             // emotionally expressive responses

    .context_compression(4000, 2000)    // auto-compress context window

    .on_audio(|data| speaker.write(data))

    .on_thought(|text| println!("[Thought] {text}"))

    .on_input_transcript(|text, _final| println!("[User] {text}"))

    .on_output_transcript(|text, _final| println!("[Agent] {text}"))

    .on_interrupted(|| async { speaker.flush().await })

    .on_turn_complete(|| async { println!("--- turn complete ---") })

    .on_usage(|usage| {

        if let Some(total) = usage.total_token_count {

            println!("Tokens used: {total}");

        }

    })

    .connect_vertex(project, location, token)

    .await?;

```

**Available voices:** `Aoede`, `Charon`, `Fenrir`, `Kore`, `Puck` (default), or `Voice::Custom("name")`.

### Thinking (Gemini 2.5+)

The `gemini-2.5-flash-native-audio-preview-12-2025` model supports thinking

capabilities with dynamic thinking enabled by default. Control the thinking

budget and receive thought summaries in your session:

```rust

let handle = Live::builder()

    .model(GeminiModel::Custom(

        "models/gemini-2.5-flash-native-audio-preview-12-2025".into(),

    ))

    .thinking(1024)           // set thinking token budget (0 = disable)

    .include_thoughts()       // receive thought summaries via on_thought

    .on_thought(|text| println!("[Thought] {text}"))

    .on_text(|t| print!("{t}"))

    .connect_google_ai(api_key)

    .await?;

```

**How it works in the three-lane architecture:**

- `thinkingConfig` (`thinkingBudget`, `includeThoughts`) is sent in the setup

  message's `generationConfig`

- When `includeThoughts` is true, thought parts arrive as `Part::Thought` in

  `model_turn` content — emitted as `SessionEvent::Thought(String)`

- Thought events are routed to the **fast lane** and delivered via the

  `on_thought` sync callback (< 1ms, no allocations)

**Platform support:** Google AI only. On Vertex AI, `thinkingConfig` is

automatically stripped from the setup message — no code changes needed.

### Tool Calling

Declare function tools with JSON Schema parameters. The SDK auto-dispatches

tool calls when you provide a `ToolDispatcher`, or you can handle them manually

in `on_tool_call`.

```rust

let handle = Live::builder()

    .instruction("You can check the weather and do math.")

    .on_tool_call(|calls, state| async move {

        let responses: Vec = calls.iter().map(|call| {

            let result = match call.name.as_str() {

                "get_weather" => json!({"temp": 22, "condition": "sunny"}),

                _ => json!({"error": "unknown tool"}),

            };

            FunctionResponse {

                name: call.name.clone(),

                response: result,

                id: call.id.clone(),

                scheduling: None,

            }

        }).collect();

        Some(responses)

    })

    .connect_google_ai(api_key)

    .await?;

```

Or use built-in tools directly:

```rust

Live::builder()

    .google_search()        // Google Search grounding

    .code_execution()       // Sandbox code execution

    .url_context()          // URL content retrieval

```

### State Management

A concurrent, type-safe `State` container with prefix-scoped namespaces,

atomic read-modify-write, delta tracking, and transparent derived fallbacks.

```rust

use gemini_adk::State;

use gemini_adk::state::StateKey;

// Typed keys eliminate typo bugs

const TURN_COUNT: StateKey = StateKey::new("session:turn_count");

const SENTIMENT: StateKey = StateKey::new("derived:sentiment");

let state = State::new();

// Prefix-scoped accessors

state.app().set("flag", true);              // writes to "app:flag"

state.user().set("name", "Alice");          // writes to "user:name"

state.session().set("turn_count", 0u32);    // writes to "session:turn_count"

state.turn().set("transcript", "hello");    // writes to "turn:transcript"

// Atomic read-modify-write

state.modify("session:turn_count", 0u32, |n| n + 1);

// Transparent derived fallback: get("risk") auto-checks "derived:risk"

state.set("derived:risk", 0.85);

let risk: Option = state.get("risk");  // returns Some(0.85)

// Delta tracking for transactional state

let tracked = state.with_delta_tracking();

tracked.set("temp:scratch", 42);

tracked.commit();   // merge into main store

// or: tracked.rollback();

```

**Prefix namespaces:**

| Prefix | Purpose | Lifetime |

|--------|---------|----------|

| `session:` | Auto-tracked signals (turn count, tokens, timing) | Session |

| `derived:` | Read-only computed variables | Session |

| `turn:` | Cleared each turn | Turn |

| `app:` | Application state | Session |

| `bg:` | Background task state | Session |

| `user:` | User-scoped state | Session |

| `temp:` | Scratch space | Explicit |

### Phase System

Declarative conversation phase management with guard-based transitions,

per-phase tool filtering, instruction composition, and async lifecycle callbacks.

```rust

let handle = Live::builder()

    .phase("greeting")

        .instruction("Welcome the user warmly.")

        .prompt_on_enter(true)

        .transition_with("identify", |s| {

            s.get::("caller_name").is_some()

        }, "when caller provides their name")

        .done()

    .phase("identify")

        .instruction("Confirm the caller's identity.")

        .needs(&["caller_name", "caller_org"])

        .tools(vec!["lookup_contact".into()])

        .transition_with("handle", |s| {

            s.get::("verified").unwrap_or(false)

        }, "when identity is verified")

        .done()

    .phase("handle")

        .dynamic_instruction(|s| {

            let topic: String = s.get("topic").unwrap_or_default();

            format!("Help the caller with: {topic}")

        })

        .tools(vec!["search".into(), "calc".into()])

        .transition_with("farewell", |s| {

            s.get::("resolved").unwrap_or(false)

        }, "when the request is resolved")

        .done()

    .phase("farewell")

        .instruction("Say goodbye and provide a reference number.")

        .terminal()

        .done()

    .initial_phase("greeting")

    // Phase defaults inherited by all phases

    .phase_defaults(|p| {

        p.with_state(&["caller_name", "caller_org"])

         .navigation()  // inject phase navigation context

    })

    // Recommended: set persona once, steer via context injection

    .steering_mode(SteeringMode::ContextInjection)

    .connect_vertex(project, location, token)

    .await?;

```

#### Steering Modes

Control how the SDK delivers phase instructions to the model. This is the most

impactful configuration choice for multi-phase apps:

| Mode | System Instruction | Phase Instructions | Best For |

|------|--------------------|--------------------|----------|

| `ContextInjection` | Set once at connect | Delivered as model-role context turns | Multi-phase apps with stable persona (**recommended**) |

| `InstructionUpdate` | Replaced on every transition | Baked into system instruction | Agents with radically different personas per phase |

| `Hybrid` | Replaced on transition | Modifiers as context turns | Persona shifts + per-turn steering |

```rust

// Recommended: base persona at connect, phase context injected per turn

Live::builder()

    .instruction("You are a helpful assistant.")

    .steering_mode(SteeringMode::ContextInjection)

```

#### Context Delivery Timing

Control when model-role context turns hit the wire:

| Mode | Behavior | Best For |

|------|----------|----------|

| `Immediate` (default) | Send as single batched frame during TurnComplete | Low-latency, text-only apps |

| `Deferred` | Queue until next user send (audio/text/video) | Voice apps — eliminates mid-silence frames |

```rust

// Voice app: flush context alongside user audio, not during silence

Live::builder()

    .steering_mode(SteeringMode::ContextInjection)

    .context_delivery(ContextDelivery::Deferred)

```

With `Deferred`, the `DeferredWriter` wraps the session writer and drains pending context before each `send_audio`/`send_text`/`send_video`. Context that requires a prompt (e.g. `prompt_on_enter`) is always sent immediately.

See the [Steering Modes guide](docs/user-guide/steering-modes.md) for the full

decision matrix, anti-patterns, and implementation details.

#### Phase Navigation Context

The `.navigation()` modifier injects a structured description of the current

phase graph into the model's instruction, giving it awareness of where it is,

what it still needs, and where it can go:

```

[Navigation]

Current phase: identify -- Confirm the caller's identity.

Previous: greeting (turn 2)

Still needed: caller_org

Possible next:

  -> handle: when identity is verified

```

This is auto-generated from `.needs()`, `.transition_with()` descriptions, and

phase history. The model can use this to guide the conversation naturally.

### Extraction Pipeline

Run out-of-band LLM calls to extract structured data from the conversation

transcript. Schema-guided via `schemars::JsonSchema`.

```rust

use schemars::JsonSchema;

#[derive(Deserialize, Serialize, JsonSchema)]

struct CallerInfo {

    caller_name: Option,

    caller_org: Option,

    request_type: Option,

}

let handle = Live::builder()

    .instruction("You are a receptionist.")

    // Extract every 2 turns instead of every turn (reduces LLM costs)

    .extract_turns_triggered::(

        flash_llm,

        "Extract caller name, organization, and request type",

        5,  // transcript window size

        ExtractionTrigger::Interval(2),

    )

    .on_extracted(|name, value| async move {

        println!("Extracted {name}: {value}");

    })

    .connect_vertex(project, location, token)

    .await?;

// Read latest extraction at any time

let info: Option = handle.extracted("CallerInfo");

```

Extractors automatically enable transcription and warm up the OOB LLM

connection at session start for fast first-extraction latency.

### State Watchers & Temporal Patterns

React to state changes and time-based conditions declaratively:

```rust

Live::builder()

    // Fire when app:score crosses above 0.9

    .watch("app:score")

        .crossed_above(0.9)

        .then(|_old, _new, state| async move {

            state.set("high_score_alert", true);

        })

    // Fire when a boolean becomes true

    .watch("app:escalated")

        .became_true()

        .blocking()   // block turn processing until complete

        .then(|_old, _new, _state| async move {

            notify_supervisor().await;

        })

    // Fire when condition holds for 30 seconds continuously

    .when_sustained("user_confused",

        |s| s.get::("confused").unwrap_or(false),

        Duration::from_secs(30),

        |_state, writer| async move { /* offer help */ },

    )

    // Fire after 3 consecutive turns matching condition

    .when_turns("stuck_in_loop",

        |s| s.get::("repeating").unwrap_or(false),

        3,

        |_state, writer| async move { /* break loop */ },

    )

```

### Computed (Derived) State

Register reactive computed variables that update when their dependencies change:

```rust

Live::builder()

    .computed("risk_level", &["app:sentiment_score"], |state| {

        let score: f64 = state.get("app:sentiment_score")?;

        if score < 0.3 { Some(json!("high")) }

        else { Some(json!("low")) }

    })

    // Read transparently: state.get("risk_level") auto-checks "derived:risk_level"

```

### Text Agent Combinators

Build complex request/response LLM pipelines that can be dispatched from

Live session hooks. These use standard `generate()` calls (not WebSocket

sessions), enabling background processing during a voice conversation.

| Combinator | Purpose |

|-----------|---------|

| `LlmTextAgent` | Core agent -- generate, tool dispatch, loop |

| `FnTextAgent` | Zero-cost state transform (no LLM call) |

| `SequentialTextAgent` | Run children in order, state flows forward |

| `ParallelTextAgent` | Run children concurrently via `tokio::spawn` |

| `LoopTextAgent` | Repeat until max iterations or predicate |

| `FallbackTextAgent` | Try each child, first success wins |

| `RouteTextAgent` | State-driven deterministic branching |

| `RaceTextAgent` | Run concurrently, first to finish wins |

| `TimeoutTextAgent` | Wrap an agent with a time limit |

| `MapOverTextAgent` | Iterate an agent over a list in state |

| `TapTextAgent` | Read-only observation (no mutation) |

| `DispatchTextAgent` | Fire-and-forget background tasks |

| `JoinTextAgent` | Wait for dispatched tasks |

Register text agents as tools the live model can call. The agent shares

the session's `State`, so mutations are visible to watchers and phase

transitions:

```rust

Live::builder()

    .agent_tool("verify_identity", "Verify caller identity", verifier_agent)

    .agent_tool("calc_payment", "Calculate payment plans", calc_pipeline)

```

### S.C.T.P.M.A Composition

Six operator namespaces for composing different aspects of agent configuration:

| Namespace | Operator | Purpose | Example |

|-----------|----------|---------|---------|

| `S::` | `>>` | State transforms | `S::set("key", val) >> S::rename("a", "b")` |

| `C::` | `+` | Context engineering | `C::last_n(5) + C::system_only()` |

| `T::` | `\|` | Tool composition | `T::function(search) \| T::google_search()` |

| `P::` | `+` | Prompt composition | `P::role("assistant") + P::task("summarize")` |

| `M::` | `\|` | Middleware composition | `M::log() \| M::rate_limit(10)` |

| `A::` | `+` | Artifact schemas | `A::produces(schema) + A::consumes(schema)` |

**Prompt composition example:**

```rust

use gemini_adk_fluent::prelude::*;

let prompt = P::role("a customer support agent for Acme Corp")

    + P::task("help customers with billing inquiries")

    + P::constraint("never reveal internal pricing formulas")

    + P::guidelines(vec![

        "Be empathetic and professional",

        "Confirm resolution before closing",

    ]);

let instruction = prompt.render();

```

### Callback Modes

Control-lane callbacks support two execution modes:

| Mode | Method suffix | Behavior |

|------|--------------|----------|

| **Blocking** | `.on_turn_complete()` | Awaited inline -- event loop waits |

| **Concurrent** | `.on_turn_complete_concurrent()` | Spawned as detached task -- fire and forget |

Use concurrent mode for logging, analytics, webhook dispatch, or background

agent triggering where you don't need ordering guarantees.

### REST APIs (Feature-Gated)

The L0 crate also provides feature-gated access to Gemini REST APIs beyond

the Live WebSocket connection:

```toml

[dependencies]

gemini-live = { version = "0.1", features = ["generate", "embed", "files"] }

# Or enable everything:

# gemini-live = { version = "0.1", features = ["all-apis"] }

```

| Feature | API |

|---------|-----|

| `generate` | Content generation (`generateContent`) |

| `embed` | Text embeddings |

| `files` | File upload and management |

| `models` | Model listing and info |

| `tokens` | Token counting |

| `caches` | Context caching |

| `tunings` | Fine-tuning jobs |

| `batches` | Batch prediction |

| `chats` | Multi-turn chat sessions |

---

## Three-Lane Processor Architecture

All Live session events are routed through a zero-copy dispatcher into three

independent lanes, each optimized for its latency profile:

```

  SessionEvent (broadcast from L0)

         |

    +----+----+

    |  Router  |   Zero-work dispatcher -- NO state access on hot path

    +--+--+--+-+

       |  |  |

       |  |  +------------------------------+

       |  +----------------+                 |

       |                   |                 |

  +----v---------+   +-----v----------+  +---v--------------+

  | Fast Lane    |   | Control Lane   |  | Telemetry Lane   |

  | (sync <1ms)  |   | (async)        |  | (own broadcast)  |

  +--------------+   +--------------  +  +------------------+

  | on_audio     |   | on_tool_call   |  | SessionSignals   |

  | on_text      |   | on_interrupted |  |  (State keys)    |

  | on_vad_*     |   | Phase trans.   |  | SessionTelemetry |

  | on_input_    |   | Extractors     |  |  (AtomicU64)     |

  |   transcript |   |  (concurrent)  |  | on_usage cb      |

  | on_output_   |   | Watchers       |  | Debounced 100ms  |

  |   transcript |   | Computed state |  |   flush          |

  +--------------+   | Temporal ptns  |  +------------------+

                     | TranscriptBuf  |

                     |  (owned, no    |

                     |   mutex)       |

                     +----------------+

```

**Design constraints:**

- Fast lane callbacks must be sync and complete in < 1ms (no allocations, no locks, no async)

- Control lane owns the `TranscriptBuffer` exclusively (no `Arc>`)

- Telemetry lane runs on its own broadcast receiver (never blocks the router)

- Extractors run concurrently via `futures::future::join_all`

---

## Examples

The `examples/` directory contains runnable examples organized by complexity.

Each demonstrates specific SDK features at the layer you need.

### Getting Started

```bash

# 1. Configure credentials

cp .env.example .env

# Edit .env: set GEMINI_API_KEY (Google AI) or GOOGLE_CLOUD_PROJECT + GOOGLE_CLOUD_LOCATION (Vertex AI)

# 2. Run a standalone example

cargo run -p text-chat       # http://127.0.0.1:3001

cargo run -p voice-chat      # http://127.0.0.1:3002

cargo run -p tool-calling    # http://127.0.0.1:3003

cargo run -p transcription   # http://127.0.0.1:3004

# 3. Run the multi-app Web UI (all apps + devtools panel)

cargo run -p gemini-adk-web         # http://127.0.0.1:3000

```

### Standalone Examples

These run independently with their own Axum server and minimal UI.

| Example | Port | Layer | What You Learn |

|---------|------|-------|----------------|

| [`text-chat`](examples/text-chat) | 3001 | L0 | Wire protocol basics — connect, send text, receive streaming deltas |

| [`voice-chat`](examples/voice-chat) | 3002 | L0 | Bidirectional audio, voice selection, VAD events, transcription |

| [`tool-calling`](examples/tool-calling) | 3003 | L1 | `TypedTool` with auto-generated JSON Schema, `ToolDispatcher` routing |

| [`transcription`](examples/transcription) | 3004 | L0 | Every Gemini Live config option: VAD, activity handling, affective dialog, context compression, session resumption |

| [`agents`](examples/agents) | CLI | L1/L2 | Text agent combinators (`>>`, `\|`, `/`), `TypedTool`, copy-on-write builders |

### ADK Web UI (`gemini-adk-web`)

The Web UI bundles all apps below into a single Axum server with a shared

devtools panel showing real-time state, timeline, transcript, and telemetry.

#### Crawl (Beginner)

| App | What It Demonstrates | Key SDK Features |

|-----|---------------------|-----------------|

| **text-chat** | Minimal text-only session — no microphone needed | `Live::builder().text_only()`, text streaming |

| **voice-chat** | Native audio chat with real-time transcription | `Modality::Audio`, voice selection, input/output transcription |

| **tool-calling** | Three demo tools: weather, time, calculator | `FunctionDeclaration`, `on_tool_call`, `NonBlocking` behavior, `WhenIdle` scheduling |

#### Walk (Intermediate)

| App | What It Demonstrates | Key SDK Features |

|-----|---------------------|-----------------|

| **all-config** | Configuration playground — every Gemini Live option in one app | Dynamic tool creation, modality switching, Google Search, code execution, context compression |

| **guardrails** | Real-time policy monitoring with corrective injection | `RegexExtractor`, `.watch()` state reactions, `.instruction_amendment()`, PII/off-topic/sentiment detection |

| **playbook** | 6-phase customer support flow with state extraction | `.phase()` chains, `.transition_with()` guards, `.greeting()`, `.with_context()`, `RegexExtractor` |

#### Run (Advanced)

| App | What It Demonstrates | Key SDK Features |

|-----|---------------------|-----------------|

| **support-assistant** | Multi-agent handoff between billing and technical support | Dual state machines (10 phases), `.computed()` derived state, cross-agent transitions, telemetry |

| **call-screening** | Incoming call screening with sentiment analysis and smart routing | Phase machine, tool calling (`check_contact_list`, `check_calendar`, `take_message`, `transfer_call`, `block_caller`), `NonBlocking` tools |

| **clinic** | HIPAA-aware telehealth scheduling with clinical triage | 8 tools (`verify_patient`, `check_availability`, `book_appointment`, etc.), patient intake flow, department routing |

| **restaurant** | Restaurant reservation and ordering system | 6 tools (`check_availability`, `make_reservation`, `get_menu`, etc.), dietary handling, occasion tracking |

| **debt-collection** | FDCPA-compliant debt collection with compliance gates | `StateKey`, identity verification, payment negotiation, cease-and-desist handling, compliance watchers |

### Platform Support

All examples work with both **Google AI** (API key) and **Vertex AI** (project/location).

The SDK auto-strips unsupported features on Vertex AI — no code changes needed:

| Feature | Google AI | Vertex AI |

|---------|-----------|-----------|

| Async tool calling (`NonBlocking`, `WhenIdle`/`Silent`) | Supported | Stripped automatically |

| Thinking (`thinkingConfig`) | Supported | Stripped automatically |

---

## Common Errors & Solutions

### Vertex AI sends binary WebSocket frames

**Symptom:** `serde_json::from_str` fails on messages from Vertex AI.

**Cause:** Vertex AI sends Binary WebSocket frames, not Text frames (unlike

Google AI).

**Solution:** Already handled by `TungsteniteTransport::recv()`. If you build a

custom transport, handle both `Message::Text` and `Message::Binary`.

### Native audio model only supports AUDIO output modality

**Symptom:** Error when requesting `Modality::Text` with

`GeminiLive2_5FlashNativeAudio`.

**Solution:** Use `Modality::Audio` only, or switch to `Gemini2_0FlashLive`

which supports text output:

```rust

// Correct for native audio model:

config.response_modalities(vec![Modality::Audio])

// For text output, use the non-native model:

.model(GeminiModel::Gemini2_0FlashLive)

```

### Vertex AI endpoint URL

**Symptom:** Connection fails to `global-aiplatform.googleapis.com`.

**Solution:** Use `aiplatform.googleapis.com` (no `global-` prefix). The SDK

handles this automatically via the `Platform` enum.

### Tool declarations cannot be updated mid-session

**Symptom:** Attempting to add or remove tools after `connect()`.

**Cause:** The Gemini Live API does not support updating tool definitions after

session setup.

**Solution:** Declare all tools upfront. Use per-phase `tools_enabled` to

control which tools the model can call at any given point in the conversation.

### Extraction returns stale data

**Symptom:** `handle.extracted::(name)` returns the previous turn's data.

**Cause:** Extractors run asynchronously on the control lane after each turn

completes.

**Solution:** Use the `on_extracted` callback for real-time notifications, or

poll `handle.extracted()` after the turn-complete event.

### State key not found despite being set

**Symptom:** `state.get("risk")` returns `None` even though you called

`state.set("derived:risk", 0.85)`.

**Solution:** The derived fallback works correctly: `get("risk")` checks

`derived:risk` automatically. However, `get("app:risk")` does NOT trigger the

fallback -- prefixed keys are looked up exactly as specified.

### Session disconnects after inactivity

**Symptom:** Server sends `GoAway` and closes the connection.

**Solution:** Handle gracefully with `.on_go_away(|ttl| async move { ... })`.

Enable session resumption with `.session_resume(true)` for transparent reconnect

support.

### Context window fills up in long conversations

**Symptom:** Model responses degrade in quality after many turns.

**Solution:** Enable context window compression:

```rust

Live::builder()

    .context_compression(4000, 2000)  // trigger at 4k tokens, compress to 2k

```

---

## Development

### Prerequisites

| Requirement | Version | Purpose |

|------------|---------|---------|

| **Rust** | 1.75+ | Language toolchain ([install](https://rustup.rs/)) |

| **cargo** | (bundled) | Build system and package manager |

| **pkg-config** | any | Locates system libraries |

| **OpenSSL** | 1.1+ | TLS for WebSocket connections |

| **ALSA dev** (Linux) | any | Audio I/O for voice examples |

**Quick setup (Ubuntu/Debian):**

```bash

# Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

source $HOME/.cargo/env

# Install system dependencies

sudo apt-get update

sudo apt-get install -y pkg-config libssl-dev libasound2-dev build-essential

```

**Quick setup (macOS):**

```bash

# Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# System deps (OpenSSL via Homebrew)

brew install openssl pkg-config

```

**Environment variables:**

```bash

# Google AI (API key auth)

export GEMINI_API_KEY="your-api-key"

# Vertex AI (service account auth)

export GOOGLE_CLOUD_PROJECT="your-project-id"

export GOOGLE_CLOUD_LOCATION="us-central1"

```

### Build

```bash

cargo build --workspace

```

### Test

```bash

cargo test --workspace

```

### Lint

```bash

cargo clippy --workspace --all-targets -- -D warnings

cargo fmt --all -- --check

```

### Run the Web UI

```bash

cd apps/gemini-adk-web

GEMINI_API_KEY="your-key" cargo run

# Open http://localhost:3000

```

### Generate documentation

```bash

cargo doc --workspace --no-deps --open

```

### Feature flags (gemini-live)

```bash

# Default: live + vad + tracing

cargo build -p gemini-live

# With REST APIs

cargo build -p gemini-live --features generate,embed,files

# Everything

cargo build -p gemini-live --features all-apis,metrics,opus

```

---

## Project Structure

```

gemini-rs/

  crates/

    gemini-live/              L0: Wire protocol, transport, types

    gemini-adk/                L1: Agent runtime, state, phases, tools

    gemini-adk-fluent/         L2: Fluent builder API, operators

  examples/

    text-chat/             Minimal text-only session (L0)

    voice-chat/            Bidirectional audio chat (L0)

    tool-calling/          TypedTool + ToolDispatcher (L1)

    transcription/         Every Gemini Live config option (L0)

    agents/                Text agent combinators (L1/L2)

    INDEX.md               Full example reference with per-app docs

  apps/

    gemini-adk-web/               Multi-app Web UI with devtools (L2)

      src/apps/            13 showcase apps (see examples/INDEX.md)

  tools/

    gemini-adk-transpiler/        Python ADK to Rust transpiler

  Cargo.toml               Workspace root

```

---

## License

Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for

details.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vamsiramakrishnan/gemini-rs

Awesome Lists containing this project

README