An open API service indexing awesome lists of open source software.

https://github.com/maximbilan/habla-core-gemini

FastAPI backend for Habla using Gemini Live native audio for real-time translation and AI agent calls via Twilio on Cloud Run
https://github.com/maximbilan/habla-core-gemini

fastapi gemini gemini-live google-cloud-run python speech-to-speech translation twilio voice-agent websocket

Last synced: about 3 hours ago
JSON representation

FastAPI backend for Habla using Gemini Live native audio for real-time translation and AI agent calls via Twilio on Cloud Run

Awesome Lists containing this project

README

          

# Habla Core

- real-time speech-to-speech translation over live phone calls
- outbound AI agent phone calls with transcript and verification events
- caller-ID verification with device-scoped ownership

It uses Google Gemini Live for audio/model behavior and Twilio Programmable Voice for PSTN call control + media streams.

Architecture and runtime diagrams: [`architecture.md`](architecture.md).
Agent runtime sequence: [`architecture.md#71-agent-mode-runtime-sequence`](architecture.md#71-agent-mode-runtime-sequence).

## Current Implementation Summary

### Live Call Mode (audio-first path)

- Full-duplex translation with two Gemini sessions per call
- iOS translation websocket uses binary audio only:
- iOS -> backend: PCM16 mono 16 kHz
- backend -> iOS: PCM16 mono 16 kHz
- Bounded queues preserve low latency under backpressure

### Agent Mode

- Autonomous outbound agent calls over Twilio
- iOS WebSocket receives events including:
- `status`
- `agent_status`
- `transcript`
- `transcript_update`
- `critical_confirmation`
- `verified_facts_summary`
- Supports live instruction injection and interruption/barge-in handling

### Caller ID Isolation

- Caller ID verify/list/delete uses Twilio outgoing caller IDs
- Access is isolated per device via `X-Habla-Device-ID`
- Ownership state is stored in `habla-accounts` (`HABLA_ACCOUNTS_*`)

## API Surface

OpenAPI/Swagger:

- `GET /docs`
- `GET /openapi.json`

### Client-Facing Routes

- `GET /` (health)
- `GET /translation/languages`
- `POST /call`
- `POST /call/{call_sid}/end`
- `GET /call/{call_sid}/status`
- `WS /ws/{call_sid}`
- `POST /agent/call`
- `POST /agent/call/{call_sid}/end`
- `GET /agent/call/{call_sid}/status`
- `WS /agent/ws/{call_sid}`
- `POST /caller-id/verify/start`
- `GET /caller-id/verify/status/{phone_number:path}`
- `GET /caller-id/list`
- `DELETE /caller-id/{sid}`

### Twilio Callback Routes

- `POST /twilio/webhook/{call_sid}`
- `WS /twilio/media-stream/{call_sid}`
- `POST /twilio/status/{call_sid}`
- `POST /agent/twilio/webhook/{call_sid}`
- `WS /agent/twilio/media-stream/{call_sid}`

## Request Authentication

If `HABLA_SECRET` is set, iOS-facing REST + WebSocket routes require `Authorization`.

Accepted header formats:

- `Authorization: `
- `Authorization: Bearer `

Expected digest:

- `HMAC_SHA256(HABLA_SECRET, HABLA_APP_BUNDLE_ID)`

Caller ID ownership-sensitive routes also require:

- `X-Habla-Device-ID`

Twilio callback routes are server-to-server and are not protected by the Habla authorization header.

## Supported Languages

- `en-US`, `en-GB`, `en-AU`, `en-IN`
- `es-US`, `fr-FR`, `de-DE`, `it-IT`, `pt-BR`, `hi-IN`

Supported `voice_gender` values:

- `female`
- `male`

## Prerequisites

- Python `3.11+` (CI uses `3.13`)
- Twilio account + number with Voice + Media Streams
- Gemini credentials:
- AI Studio (`GOOGLE_API_KEY`) or
- Vertex AI (`GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION`)
- Public HTTPS URL for Twilio callbacks (`PUBLIC_URL`)

## Local Development

### 1) Install

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
```

### 2) Configure

```bash
cp .env.example .env
```

Choose one Gemini auth mode:

- AI Studio: `GOOGLE_GENAI_USE_VERTEXAI=FALSE` + `GOOGLE_API_KEY`
- Vertex AI: `GOOGLE_GENAI_USE_VERTEXAI=TRUE` + `GOOGLE_CLOUD_PROJECT` + `GOOGLE_CLOUD_LOCATION`

Also configure:

- Twilio (`TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN` or API key pair, `TWILIO_FROM_NUMBER`)
- `PUBLIC_URL` reachable by Twilio
- optional iOS auth (`HABLA_SECRET`, `HABLA_APP_BUNDLE_ID`)
- caller-ID ownership service (`HABLA_ACCOUNTS_BASE_URL`, `HABLA_ACCOUNTS_SERVICE_TOKEN`)

### 3) Run

```bash
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload
```

## Validation

```bash
ruff check app tests
vulture --config pyproject.toml
python -m compileall app
PYTHONPATH=. pytest -q
```

## Deployment (Cloud Run)

### Manual

```bash
./deploy.sh
```

### CI/CD

`.github/workflows/deploy-main.yml` deploys on `main` using GitHub OIDC + Workload Identity Federation.

It forwards runtime vars/secrets for Gemini, Twilio, iOS auth, and `habla-accounts` integration.

### Public Deployment Notes

- Keep `HABLA_SECRET` set outside local development
- Store Twilio/Gemini secrets in secret storage (for example, Secret Manager)
- Keep `PUBLIC_URL` HTTPS and reachable by Twilio webhook/media callbacks
- Runtime call registries are in memory; use sticky routing/session affinity when scaling
- Add Twilio request-signature validation in front of callback routes if your deployment requires it

## Repository Layout

```text
app/
main.py
config.py
models.py
call_manager.py
translation_call_bridge.py
websocket_handler.py
gemini_client.py
audio_utils.py
language_support.py
request_auth.py
twilio_client.py
caller_id_router.py
caller_id_ownership_client.py
agent_manager.py
agent_gemini_session.py
agent_prompts.py
agent_transcript.py
critical_info.py
extraction_patterns.py
```