{"id":50508957,"url":"https://github.com/gsans/gemini-3-live-angular","last_synced_at":"2026-06-02T18:31:30.491Z","repository":{"id":277232594,"uuid":"931746312","full_name":"gsans/gemini-3-live-angular","owner":"gsans","description":"This project showcases Gemini 3 real-time multimodal AI capabilities in a web application using Angular.","archived":false,"fork":false,"pushed_at":"2026-05-23T16:11:06.000Z","size":1208,"stargazers_count":28,"open_issues_count":0,"forks_count":13,"subscribers_count":3,"default_branch":"js-genai","last_synced_at":"2026-05-23T18:10:12.602Z","etag":null,"topics":["angular","gemini","gemini-api","gemini-live","google-ai"],"latest_commit_sha":null,"homepage":"https://aistudio.google.com/live","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gsans.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-12T19:37:38.000Z","updated_at":"2026-05-23T16:11:10.000Z","dependencies_parsed_at":"2025-07-07T22:01:41.313Z","dependency_job_id":"ee1629c3-d137-4d3d-ad9c-fa4553825dfc","html_url":"https://github.com/gsans/gemini-3-live-angular","commit_stats":null,"previous_names":["gsans/gemini-2-live-angular","gsans/gemini-3-live-angular"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gsans/gemini-3-live-angular","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsans%2Fgemini-3-live-angular","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsans%2Fgemini-3-live-angular/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsans%2Fgemini-3-live-angular/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsans%2Fgemini-3-live-angular/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gsans","download_url":"https://codeload.github.com/gsans/gemini-3-live-angular/tar.gz/refs/heads/js-genai","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsans%2Fgemini-3-live-angular/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33833277,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["angular","gemini","gemini-api","gemini-live","google-ai"],"created_at":"2026-06-02T18:31:29.421Z","updated_at":"2026-06-02T18:31:30.486Z","avatar_url":"https://github.com/gsans.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gemini 3.1 Flash Live Preview Demo\n\nhttps://github.com/user-attachments/assets/c4a0ebaa-fc1a-486f-89da-26b5c26c6dd6\n\n## Overview\nThis project showcases Gemini 3.1 real-time multimodal AI capabilities in a web application using Angular. Currently the Live API is set to use `Gemini 3.1 Flash Live Preview`.\n\n![diagram](https://i.imgur.com/74hv0ay.png)\n\nThis project demonstrates integration with Google's Gemini AI models through the `@google/genai` library now in (Technical) [Preview](https://github.com/googleapis/js-genai/commit/da38b6df88705c8ff1ea9a2e1c5ffa596054b382).\n\n\u003e This project started as a migration to Angular of the [Live API - Web console](https://github.com/google-gemini/multimodal-live-api-web-console) as is only available in React at the moment.\n\n## What's new? \n\n[19th May 2026]\n- New model: `gemini-3.1-flash-live-preview` replaces `gemini-2.5-flash-native-audio-latest`.\n- Features Real-Time Multimodal Interaction with full barge-in and conversational rhythm adaptation.\n- Explicit Voice Activity Detection (VAD) toggle added to the UI (`explicitVadSignal`).\n- Deep conversational session context natively supported.\n- Check `gemini31-flash-live.md` for a full breakdown of the new features configuration and status!\n\n[5th March 2026]\n- Upgraded `@google/genai` SDK from `1.8.0` to `2.4.0`.\n- Updated model to `gemini-2.5-flash-native-audio-latest` (latest stable native audio model).\n- Enabled **native Gemini transcription** for both user and model audio via `inputAudioTranscription` and `outputAudioTranscription` in the Live API config. No third-party service (Deepgram) required.\n- Transcription output is now **buffered by turn** — fragments accumulate until turn-complete, interruption, or a 2-second inactivity timeout, then emit as a single log entry.\n- Fixed MCP tool response handling: parse `content[0].text` from MCP `callTool` responses, with graceful fallback for non-JSON error strings.\n- Fixed `functionResponses` payload to use an array as required by the updated SDK.\n\n[30th September]\n- New model: `Gemini 2.5 Flash Native Audio Preview` replaces `Gemini 2.0 Flash Live`.\n- Updated to latest model `gemini-2.5-flash-native-audio-preview-09-2025`.\n- Updated all dependencies.\n- Known issue: MCP Server specs are not compatible with GenAI SDK (1.9.0 or later). See details [here](https://github.com/googleapis/js-genai/issues/990) \n- Known issue: MCP Weather Server definition for `getCurrentTemperature` is not working with MCP SDK (1.8.2) downgrading to MCP SDK (1.5.0) until resolved.\n\n[8th July]\n- Added MCP support. Integrated Model Context Protocol SDK with access to two servers: weather and multiplication. \n- Function calling is not available for native audio. Make sure the `affective` and `proactive` flags are disabled. To use you can try prompts like `What's the temperature in Barcelona?` or `Multiply 2 by 2`. You can inspect the `tool call` and `tool responses` by expanding the left side panel.\n\n[7th July]\n- New model: `Gemini 2.5 Flash Live` replaces `Gemini 2.0 Flash Live`.\n- Native audio: 30 voices, 24 languages, accents and voice effects (whispering, laughing). Tool usage is limited to function calling and search.\n- Live configuration options:\n  - Native audio: [affective dialog](https://ai.google.dev/gemini-api/docs/live-guide#affective-dialog) and [proactive audio](https://ai.google.dev/gemini-api/docs/live-guide#proactive-audio) options.\n  - Cascade audio: new [language support](https://ai.google.dev/gemini-api/docs/live-guide#supported-languages).\n- Previous models are referred as half-cascade or cascade audio: `gemini-live-2.5-flash-preview` and `gemini-2.0-flash-live-001`. As opposed to new native audio models, these models go through a two step process: native audio input and text-to-speech output. All tool usage options are available. More details about how to choose your audio architecture [here](https://ai.google.dev/gemini-api/docs/live#audio-generation).\n\n[10th April]\n- New model: `Gemini 2.0 Flash Live` replaces `Gemini 2.0 Flash Experimental`.\n- 3 more voices: Leda, Orus, and Zephyr.\n- Live configuration options:\n  - Setup automatic context window compression via `config.contextWindowCompression`.\n  - Adjust Gemini's voice quality output: 16kHz (low) and 24kHz (medium) via `config.generationConfig.mediaResolution`.\n\n[26th March]\n- Enable transcripts for both user and Gemini via a third party API (DeepGram).\n\n## Core Features\n- Starter kit based on [Live API - Web console](https://github.com/google-gemini/multimodal-live-api-web-console)\n- TypeScript GenAI SDK for Gemini 3.1 API\n- MCP support: Typescript MCP SDK\n- Real-time streaming voice from and to Gemini 3.1 Live API\n- Real-time streaming video from webcam or screen to Gemini 3.1 Live API\n- Support for both native and cascade audio models\n- Natural language text generation\n- Interactive chat functionality\n- Google Search integration for current information\n- Secure Python code execution in sandbox\n- Automated function calling for API integration\n- Live transcription for streamed audio (user and model) via native Gemini transcription (built-in)\n- Legacy Deepgram transcription support (optional, disabled by default)\n\n### Potential Future Extensions (Gemini 3.1)\n- **Stream Translation**: Configure stream-level audio translations to target specific languages on the fly using `streamTranslationConfig`.\n- **Session Resumption**: Persist and seamlessly resume dropped or paused multimodal sessions natively via `sessionResumption`.\n- **Avatar Configuration**: Integrate with real-time visual avatars driven directly by the model's responses and emotions via `avatarConfig`.\n\n## Gemini Intelligence and the Future Roadmap\n\n**Gemini Intelligence (Expected Summer 2026)** is Google's new initiative that helps you automate tedious tasks so you can focus on what matters. \n\n- **Automate multi-step tasks across your apps**: App automation is even more powerful when you add screen or image context. Instead of manually switching between apps and copying data.\n- **Gemini can turn visual context into instant action**: Get instant summaries, generate replies, or find information without ever leaving your app.\n\n\u003e Rolling out for Samsung Galaxy and Google Pixel phones, and will become available across Android devices, including watches, cars, glasses and laptops.\n\n[![Gemini Intelligence](https://files.catbox.moe/7ouckm.webp)](https://www.youtube.com/watch?v=4f7VamjPHaM)\n\n### How Gemini 3.1 Live Fits In\n\nThe **Gemini 3.1 Live API** serves as the underlying real-time, multimodal engine that makes these dynamic experiences possible. By providing continuous bidirectional streaming, robust voice activity detection (VAD), and deep contextual awareness, Gemini 3.1 Live enables developers to build the same seamless, intelligent interactions that power the Gemini ecosystem.\n\n**The Gemini App (available for Android and iOS)** leverages these capabilities to power innovative applications across devices and platforms:\n\n- **Hands-free AI Assistance**: Users interact naturally through voice while cooking, driving, or multitasking\n- **Real-time Visual Understanding**: Get instant AI responses as you show objects, documents, or scenes through your camera\n- **Smart Home Automation**: Control your environment with natural voice commands - from adjusting lights to managing thermostats\n- **Seamless Shopping**: Browse products, compare options, and complete purchases through conversation\n- **Live Problem Solving**: Share your screen to get real-time guidance, troubleshooting, or explanations\n- **Integration with Google services**: Leverage existing Google services like Search or Maps to enhance capabilities\n\n## Setup Instructions\n\n### System Requirements\n- Node.js and npm (latest stable version)\n- Angular CLI (globally installed via `npm install -g @angular/cli`)\n- Google AI API key from [Google AI Studio](https://makersuite.google.com/)\n- Deepgram API key from [Deepgram](https://deepgram.com/) (optional)\n\n\u003e As of the March 2026 update, native Gemini transcription is enabled by default via `inputAudioTranscription` and `outputAudioTranscription` in the Live API config. Deepgram is no longer required. Legacy Deepgram support remains in the codebase but is disabled by default.\n\n### Installation Steps\n\n1. **Set Up Environment Variables**\n   ```bash\n   ng g environments\n   ```\n   Create `environment.development.ts` in `src/environments/` with:\n   ```typescript\n   export const environment = {\n     API_KEY: 'YOUR_GOOGLE_AI_API_KEY',\n     DEEPGRAM_API_KEY: 'YOUR_DEEPGRAM_API_KEY', // optional\n   };\n   ```\n\n2. **Install Dependencies**\n   ```bash\n   npm install\n   ```\n\n## Usage Guide\n\n### Getting Started\n1. Launch the application and click the `Connect` button under `Connection Status`\n2. The demo uses Gemini 3.1 Live API which requires a WebSocket connection\n3. Monitor the browser's Developer Tools Console for connection issues\n4. Before diving into development, explore Gemini 3.1's Live capabilities (voice interactions, webcam, and screen sharing) using [Google AI Studio Live](https://aistudio.google.com/live). This interactive playground will help you understand the available features and integration options before implementing them in your project.\n\n### Feature Testing Examples\nTest the various capabilities using these example prompts:\n\n1. **Google Search Integration**\n   - \"Tell me the scores for the last 3 games of FC Barcelona.\"\n\n2. **Code Execution**\n   - \"What's the 50th prime number?\"\n   - \"What's the square root of 342.12?\"\n\n3. **Function Calling**\n   - \"What's the weather in London?\" (Note: Currently returns mock data of 25 degrees)\n\n### Configuration Options\n\nThe main configuration is handled in `src/gemini/gemini-client.service.ts` within the `MultimodalLiveService` class. You can customize the `LiveConnectConfig` settings, including response modalities (e.g. text vs. audio), proactivity features, and tools:\n\n```typescript\npublic config: LiveConnectConfig = {\n  // responseModalities: [Modality.TEXT],\n  responseModalities: [Modality.AUDIO], // note \"audio\" doesn't send a text response over\n\n  //maxOutputTokens: 100,\n  mediaResolution: MediaResolution.MEDIA_RESOLUTION_MEDIUM, // API only supports \"low\" and \"medium\" for now\n  contextWindowCompression: {\n    triggerTokens: '25600',\n    slidingWindow: { targetTokens: '12800' },\n  },\n  // Native Gemini transcription (no Deepgram needed)\n  inputAudioTranscription: {},\n  outputAudioTranscription: {},\n};\n```\n\n### Usage Limits\n- Daily and session-based limits apply\n- Token count restrictions to prevent abuse\n- If limits are exceeded, wait until the next day to resume\n\n## Development Guide\n\n### Local Development\nStart the development server:\n```bash\nng serve\n```\nAccess the application at `http://localhost:4200/`\n\n### Available Commands\n\n1. **Generate New Components**\n   ```bash\n   ng generate component component-name\n   ```\n\n2. **Build Project**\n   ```bash\n   ng build\n   ```\n   Build artifacts will be stored in the `dist/` directory\n\n3. **Run Tests**\n   - Unit Tests:\n     ```bash\n     ng test\n     ```\n   - E2E Tests:\n     ```bash\n     ng e2e\n     ```\n     Note: Select and install your preferred E2E testing framework\n\n## Project Information\n- Built with Angular CLI version 20.3.26\n- Logging state management including Dev Tools with NgRx version 20.1.0\n- TypeScript GenAI SDK version 2.4.0\n- Typescript SDK for Model Context Protocol version 1.15.0\n- Native Gemini transcription (no third-party dependency required)\n- Features automatic reload during development\n- Includes production build optimizations\n\n## Additional Resources\n- [Angular CLI Documentation](https://angular.dev/tools/cli)\n- [Google AI Studio](https://makersuite.google.com/)\n- Browser Developer Tools for debugging\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsans%2Fgemini-3-live-angular","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgsans%2Fgemini-3-live-angular","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsans%2Fgemini-3-live-angular/lists"}