{"id":50525555,"url":"https://github.com/maximbilan/habla-core-gemini","last_synced_at":"2026-06-03T07:31:24.877Z","repository":{"id":342826501,"uuid":"1163921895","full_name":"maximbilan/habla-core-gemini","owner":"maximbilan","description":"FastAPI backend for Habla using Gemini Live native audio for real-time translation and AI agent calls via Twilio on Cloud Run","archived":false,"fork":false,"pushed_at":"2026-03-07T14:55:41.000Z","size":141,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-07T20:58:20.205Z","etag":null,"topics":["fastapi","gemini","gemini-live","google-cloud-run","python","speech-to-speech","translation","twilio","voice-agent","websocket"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maximbilan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-22T11:14:08.000Z","updated_at":"2026-03-07T14:55:44.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/maximbilan/habla-core-gemini","commit_stats":null,"previous_names":["maximbilan/habla-core-gemini"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/maximbilan/habla-core-gemini","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximbilan%2Fhabla-core-gemini","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximbilan%2Fhabla-core-gemini/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximbilan%2Fhabla-core-gemini/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximbilan%2Fhabla-core-gemini/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maximbilan","download_url":"https://codeload.github.com/maximbilan/habla-core-gemini/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximbilan%2Fhabla-core-gemini/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33853984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-03T02:00:06.370Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","gemini","gemini-live","google-cloud-run","python","speech-to-speech","translation","twilio","voice-agent","websocket"],"created_at":"2026-06-03T07:31:24.355Z","updated_at":"2026-06-03T07:31:24.872Z","avatar_url":"https://github.com/maximbilan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Habla Core\n\n- real-time speech-to-speech translation over live phone calls\n- outbound AI agent phone calls with transcript and verification events\n- caller-ID verification with device-scoped ownership\n\nIt uses Google Gemini Live for audio/model behavior and Twilio Programmable Voice for PSTN call control + media streams.\n\nArchitecture and runtime diagrams: [`architecture.md`](architecture.md).  \nAgent runtime sequence: [`architecture.md#71-agent-mode-runtime-sequence`](architecture.md#71-agent-mode-runtime-sequence).\n\n## Current Implementation Summary\n\n### Live Call Mode (audio-first path)\n\n- Full-duplex translation with two Gemini sessions per call\n- iOS translation websocket uses binary audio only:\n  - iOS -\u003e backend: PCM16 mono 16 kHz\n  - backend -\u003e iOS: PCM16 mono 16 kHz\n- Bounded queues preserve low latency under backpressure\n\n### Agent Mode\n\n- Autonomous outbound agent calls over Twilio\n- iOS WebSocket receives events including:\n  - `status`\n  - `agent_status`\n  - `transcript`\n  - `transcript_update`\n  - `critical_confirmation`\n  - `verified_facts_summary`\n- Supports live instruction injection and interruption/barge-in handling\n\n### Caller ID Isolation\n\n- Caller ID verify/list/delete uses Twilio outgoing caller IDs\n- Access is isolated per device via `X-Habla-Device-ID`\n- Ownership state is stored in `habla-accounts` (`HABLA_ACCOUNTS_*`)\n\n## API Surface\n\nOpenAPI/Swagger:\n\n- `GET /docs`\n- `GET /openapi.json`\n\n### Client-Facing Routes\n\n- `GET /` (health)\n- `GET /translation/languages`\n- `POST /call`\n- `POST /call/{call_sid}/end`\n- `GET /call/{call_sid}/status`\n- `WS /ws/{call_sid}`\n- `POST /agent/call`\n- `POST /agent/call/{call_sid}/end`\n- `GET /agent/call/{call_sid}/status`\n- `WS /agent/ws/{call_sid}`\n- `POST /caller-id/verify/start`\n- `GET /caller-id/verify/status/{phone_number:path}`\n- `GET /caller-id/list`\n- `DELETE /caller-id/{sid}`\n\n### Twilio Callback Routes\n\n- `POST /twilio/webhook/{call_sid}`\n- `WS /twilio/media-stream/{call_sid}`\n- `POST /twilio/status/{call_sid}`\n- `POST /agent/twilio/webhook/{call_sid}`\n- `WS /agent/twilio/media-stream/{call_sid}`\n\n## Request Authentication\n\nIf `HABLA_SECRET` is set, iOS-facing REST + WebSocket routes require `Authorization`.\n\nAccepted header formats:\n\n- `Authorization: \u003chex_hmac_digest\u003e`\n- `Authorization: Bearer \u003chex_hmac_digest\u003e`\n\nExpected digest:\n\n- `HMAC_SHA256(HABLA_SECRET, HABLA_APP_BUNDLE_ID)`\n\nCaller ID ownership-sensitive routes also require:\n\n- `X-Habla-Device-ID`\n\nTwilio callback routes are server-to-server and are not protected by the Habla authorization header.\n\n## Supported Languages\n\n- `en-US`, `en-GB`, `en-AU`, `en-IN`\n- `es-US`, `fr-FR`, `de-DE`, `it-IT`, `pt-BR`, `hi-IN`\n\nSupported `voice_gender` values:\n\n- `female`\n- `male`\n\n## Prerequisites\n\n- Python `3.11+` (CI uses `3.13`)\n- Twilio account + number with Voice + Media Streams\n- Gemini credentials:\n  - AI Studio (`GOOGLE_API_KEY`) or\n  - Vertex AI (`GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION`)\n- Public HTTPS URL for Twilio callbacks (`PUBLIC_URL`)\n\n## Local Development\n\n### 1) Install\n\n```bash\npython3 -m venv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\npip install -r requirements-dev.txt\n```\n\n### 2) Configure\n\n```bash\ncp .env.example .env\n```\n\nChoose one Gemini auth mode:\n\n- AI Studio: `GOOGLE_GENAI_USE_VERTEXAI=FALSE` + `GOOGLE_API_KEY`\n- Vertex AI: `GOOGLE_GENAI_USE_VERTEXAI=TRUE` + `GOOGLE_CLOUD_PROJECT` + `GOOGLE_CLOUD_LOCATION`\n\nAlso configure:\n\n- Twilio (`TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN` or API key pair, `TWILIO_FROM_NUMBER`)\n- `PUBLIC_URL` reachable by Twilio\n- optional iOS auth (`HABLA_SECRET`, `HABLA_APP_BUNDLE_ID`)\n- caller-ID ownership service (`HABLA_ACCOUNTS_BASE_URL`, `HABLA_ACCOUNTS_SERVICE_TOKEN`)\n\n### 3) Run\n\n```bash\nuvicorn app.main:app --host 0.0.0.0 --port 8080 --reload\n```\n\n## Validation\n\n```bash\nruff check app tests\nvulture --config pyproject.toml\npython -m compileall app\nPYTHONPATH=. pytest -q\n```\n\n## Deployment (Cloud Run)\n\n### Manual\n\n```bash\n./deploy.sh\n```\n\n### CI/CD\n\n`.github/workflows/deploy-main.yml` deploys on `main` using GitHub OIDC + Workload Identity Federation.\n\nIt forwards runtime vars/secrets for Gemini, Twilio, iOS auth, and `habla-accounts` integration.\n\n### Public Deployment Notes\n\n- Keep `HABLA_SECRET` set outside local development\n- Store Twilio/Gemini secrets in secret storage (for example, Secret Manager)\n- Keep `PUBLIC_URL` HTTPS and reachable by Twilio webhook/media callbacks\n- Runtime call registries are in memory; use sticky routing/session affinity when scaling\n- Add Twilio request-signature validation in front of callback routes if your deployment requires it\n\n## Repository Layout\n\n```text\napp/\n  main.py\n  config.py\n  models.py\n  call_manager.py\n  translation_call_bridge.py\n  websocket_handler.py\n  gemini_client.py\n  audio_utils.py\n  language_support.py\n  request_auth.py\n  twilio_client.py\n  caller_id_router.py\n  caller_id_ownership_client.py\n  agent_manager.py\n  agent_gemini_session.py\n  agent_prompts.py\n  agent_transcript.py\n  critical_info.py\n  extraction_patterns.py\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaximbilan%2Fhabla-core-gemini","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaximbilan%2Fhabla-core-gemini","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaximbilan%2Fhabla-core-gemini/lists"}