https://github.com/maximbilan/habla-core-gemini
FastAPI backend for Habla using Gemini Live native audio for real-time translation and AI agent calls via Twilio on Cloud Run
https://github.com/maximbilan/habla-core-gemini
fastapi gemini gemini-live google-cloud-run python speech-to-speech translation twilio voice-agent websocket
Last synced: about 3 hours ago
JSON representation
FastAPI backend for Habla using Gemini Live native audio for real-time translation and AI agent calls via Twilio on Cloud Run
- Host: GitHub
- URL: https://github.com/maximbilan/habla-core-gemini
- Owner: maximbilan
- License: mit
- Created: 2026-02-22T11:14:08.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-07T14:55:41.000Z (3 months ago)
- Last Synced: 2026-03-07T20:58:20.205Z (3 months ago)
- Topics: fastapi, gemini, gemini-live, google-cloud-run, python, speech-to-speech, translation, twilio, voice-agent, websocket
- Language: Python
- Homepage:
- Size: 138 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Habla Core
- real-time speech-to-speech translation over live phone calls
- outbound AI agent phone calls with transcript and verification events
- caller-ID verification with device-scoped ownership
It uses Google Gemini Live for audio/model behavior and Twilio Programmable Voice for PSTN call control + media streams.
Architecture and runtime diagrams: [`architecture.md`](architecture.md).
Agent runtime sequence: [`architecture.md#71-agent-mode-runtime-sequence`](architecture.md#71-agent-mode-runtime-sequence).
## Current Implementation Summary
### Live Call Mode (audio-first path)
- Full-duplex translation with two Gemini sessions per call
- iOS translation websocket uses binary audio only:
- iOS -> backend: PCM16 mono 16 kHz
- backend -> iOS: PCM16 mono 16 kHz
- Bounded queues preserve low latency under backpressure
### Agent Mode
- Autonomous outbound agent calls over Twilio
- iOS WebSocket receives events including:
- `status`
- `agent_status`
- `transcript`
- `transcript_update`
- `critical_confirmation`
- `verified_facts_summary`
- Supports live instruction injection and interruption/barge-in handling
### Caller ID Isolation
- Caller ID verify/list/delete uses Twilio outgoing caller IDs
- Access is isolated per device via `X-Habla-Device-ID`
- Ownership state is stored in `habla-accounts` (`HABLA_ACCOUNTS_*`)
## API Surface
OpenAPI/Swagger:
- `GET /docs`
- `GET /openapi.json`
### Client-Facing Routes
- `GET /` (health)
- `GET /translation/languages`
- `POST /call`
- `POST /call/{call_sid}/end`
- `GET /call/{call_sid}/status`
- `WS /ws/{call_sid}`
- `POST /agent/call`
- `POST /agent/call/{call_sid}/end`
- `GET /agent/call/{call_sid}/status`
- `WS /agent/ws/{call_sid}`
- `POST /caller-id/verify/start`
- `GET /caller-id/verify/status/{phone_number:path}`
- `GET /caller-id/list`
- `DELETE /caller-id/{sid}`
### Twilio Callback Routes
- `POST /twilio/webhook/{call_sid}`
- `WS /twilio/media-stream/{call_sid}`
- `POST /twilio/status/{call_sid}`
- `POST /agent/twilio/webhook/{call_sid}`
- `WS /agent/twilio/media-stream/{call_sid}`
## Request Authentication
If `HABLA_SECRET` is set, iOS-facing REST + WebSocket routes require `Authorization`.
Accepted header formats:
- `Authorization: `
- `Authorization: Bearer `
Expected digest:
- `HMAC_SHA256(HABLA_SECRET, HABLA_APP_BUNDLE_ID)`
Caller ID ownership-sensitive routes also require:
- `X-Habla-Device-ID`
Twilio callback routes are server-to-server and are not protected by the Habla authorization header.
## Supported Languages
- `en-US`, `en-GB`, `en-AU`, `en-IN`
- `es-US`, `fr-FR`, `de-DE`, `it-IT`, `pt-BR`, `hi-IN`
Supported `voice_gender` values:
- `female`
- `male`
## Prerequisites
- Python `3.11+` (CI uses `3.13`)
- Twilio account + number with Voice + Media Streams
- Gemini credentials:
- AI Studio (`GOOGLE_API_KEY`) or
- Vertex AI (`GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION`)
- Public HTTPS URL for Twilio callbacks (`PUBLIC_URL`)
## Local Development
### 1) Install
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
```
### 2) Configure
```bash
cp .env.example .env
```
Choose one Gemini auth mode:
- AI Studio: `GOOGLE_GENAI_USE_VERTEXAI=FALSE` + `GOOGLE_API_KEY`
- Vertex AI: `GOOGLE_GENAI_USE_VERTEXAI=TRUE` + `GOOGLE_CLOUD_PROJECT` + `GOOGLE_CLOUD_LOCATION`
Also configure:
- Twilio (`TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN` or API key pair, `TWILIO_FROM_NUMBER`)
- `PUBLIC_URL` reachable by Twilio
- optional iOS auth (`HABLA_SECRET`, `HABLA_APP_BUNDLE_ID`)
- caller-ID ownership service (`HABLA_ACCOUNTS_BASE_URL`, `HABLA_ACCOUNTS_SERVICE_TOKEN`)
### 3) Run
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload
```
## Validation
```bash
ruff check app tests
vulture --config pyproject.toml
python -m compileall app
PYTHONPATH=. pytest -q
```
## Deployment (Cloud Run)
### Manual
```bash
./deploy.sh
```
### CI/CD
`.github/workflows/deploy-main.yml` deploys on `main` using GitHub OIDC + Workload Identity Federation.
It forwards runtime vars/secrets for Gemini, Twilio, iOS auth, and `habla-accounts` integration.
### Public Deployment Notes
- Keep `HABLA_SECRET` set outside local development
- Store Twilio/Gemini secrets in secret storage (for example, Secret Manager)
- Keep `PUBLIC_URL` HTTPS and reachable by Twilio webhook/media callbacks
- Runtime call registries are in memory; use sticky routing/session affinity when scaling
- Add Twilio request-signature validation in front of callback routes if your deployment requires it
## Repository Layout
```text
app/
main.py
config.py
models.py
call_manager.py
translation_call_bridge.py
websocket_handler.py
gemini_client.py
audio_utils.py
language_support.py
request_auth.py
twilio_client.py
caller_id_router.py
caller_id_ownership_client.py
agent_manager.py
agent_gemini_session.py
agent_prompts.py
agent_transcript.py
critical_info.py
extraction_patterns.py
```