https://github.com/viaanthroposbenevolentia/gemini-2-live-api-demo

Vanilla JS web interface for Gemini 2.0 flash-exp Multimodal API with text, audio, camera, screen inputs and audio responses and function calling
https://github.com/viaanthroposbenevolentia/gemini-2-live-api-demo

function-calling gemini-api gemini-flash google-api vanilla-javascript websocket

Last synced: 24 days ago
JSON representation

Vanilla JS web interface for Gemini 2.0 flash-exp Multimodal API with text, audio, camera, screen inputs and audio responses and function calling

Host: GitHub
URL: https://github.com/viaanthroposbenevolentia/gemini-2-live-api-demo
Owner: ViaAnthroposBenevolentia
License: mit
Created: 2024-12-20T11:48:26.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-02-26T01:53:46.000Z (2 months ago)
Last Synced: 2025-04-01T19:19:26.113Z (about 1 month ago)
Topics: function-calling, gemini-api, gemini-flash, google-api, vanilla-javascript, websocket
Language: JavaScript
Homepage: https://viaanthroposbenevolentia.github.io/gemini-2-live-api-demo/
Size: 86.9 KB
Stars: 331
Watchers: 9
Forks: 142
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Gemini 2.0 Flash Multimodal Live API Client

A lightweight vanilla JavaScript implementation of the Gemini 2.0 Flash Multimodal Live API client. This project provides real-time interaction with Gemini's API through text, audio, video, and screen sharing capabilities.

This is a simplified version of [Google's original React implementation](https://github.com/google-gemini/multimodal-live-api-web-console), created in response to [this issue](https://github.com/google-gemini/multimodal-live-api-web-console/issues/19).

## Live Demo on GitHub Pages

[Live Demo](https://viaanthroposbenevolentia.github.io/gemini-2-live-api-demo/)

## Key Features

- Real-time chat with Gemini 2.0 Flash Multimodal Live API
- Real-time audio responses from the model
- Real-time audio input from the user, allowing interruptions
- Real-time video streaming from the user's webcam
- Real-time screen sharing from the user's screen
- Function calling
- Transcription of the model's audio (if Deepgram API key provided)
- Built with vanilla JavaScript (no dependencies)
- Mobile-friendly

## Prerequisites

- Modern web browser with WebRTC, WebSocket, and Web Audio API support
- Google AI Studio API key
- `python -m http.server` or `npx http-server` or Live Server extension for VS Code (to host a server for index.html)

## Quick Start

1. Get your API key from Google AI Studio
2. Clone the repository

```bash
git clone https://github.com/ViaAnthroposBenevolentia/gemini-2-live-api-demo.git
```

3. Start the development server (adjust port if needed):

```bash
cd gemini-2-live-api-demo
python -m http.server 8000 # or npx http-server 8000 or Open with Live Server extension for VS Code
```

4. Access the application at `http://localhost:8000`

5. Open the settings at the top right, paste your API key, and click "Save"
6. Get free API key from [Deepgram](https://deepgram.com/pricing) and paste in the settings to get real-time transcript (Optional).

## Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

## License

This project is licensed under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/viaanthroposbenevolentia/gemini-2-live-api-demo

Awesome Lists containing this project

README