https://github.com/aws-samples/sample-amazon-ivs-python-demos

A comprehensive collection of Python demo scripts demonstrating Amazon IVS (Interactive Video Service) capabilities across Real-Time Stages (WebRTC) and Channels (low-latency HLS). Features AI-powered video analysis, real-time transcription, speech-to-speech functionality, timed metadata publishing, and advanced media processing.
https://github.com/aws-samples/sample-amazon-ivs-python-demos

ai ai-agents aiortc amazon-bedrock amazon-ivs hls webrtc

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/aws-samples/sample-amazon-ivs-python-demos
Owner: aws-samples
License: mit-0
Created: 2025-07-30T14:24:51.000Z (6 months ago)
Default Branch: mainline
Last Pushed: 2025-10-07T20:28:41.000Z (3 months ago)
Last Synced: 2025-10-07T22:26:02.036Z (3 months ago)
Topics: ai, ai-agents, aiortc, amazon-bedrock, amazon-ivs, hls, webrtc
Language: Python
Homepage:
Size: 269 KB
Stars: 1
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

# Amazon IVS Python Demo Scripts

A comprehensive collection of Python demo scripts demonstrating various Amazon IVS (Interactive Video Service) capabilities across both **Real-Time Stages** and **Channels** (low-latency HLS). This project showcases **publishing**, **subscribing**, **transcription**, **AI video analysis**, **AI-powered speech-to-speech**, and **timed metadata publishing** functionality.

**This project is intended for education purposes only and not for production usage.**

## Table of Contents

- [Overview](#overview)
- [Project Structure](#project-structure)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Configuration](#configuration)
- [Sub-Projects](#sub-projects)
- [Channels Subscribe](#channels-subscribe)
- [Stages Publish](#stages-publish)
- [Stages Subscribe](#stages-subscribe)
- [Stages Nova Speech-to-Speech](#stages-nova-speech-to-speech)
- [Stages OpenAI Real-time API](#stages-openai-real-time-api)
- [Stages SEI Publishing](#stages-sei-publishing)
- [Usage Examples](#usage-examples)
- [Troubleshooting](#troubleshooting)
- [Dependencies](#dependencies)
- [Contributing](#contributing)
- [License](#license)
- [Support](#support)

## Overview

This project demonstrates how to integrate Amazon IVS services with various AI and media processing capabilities:

### IVS Real-Time Stages (WebRTC)

- **WebRTC Publishing**: Stream video/audio content to IVS stages
- **WebRTC Subscribing**: Receive and process streams from IVS stages
- **AI Speech-to-Speech**: Integrate Amazon Nova Sonic for conversational AI
- **SEI Publishing**: Embed metadata directly into H.264 video streams using SEI NAL units
- **Event Handling**: Process real-time stage events via WebSocket connections
- **Audio Visualization**: Generate dynamic audio visualizations

### IVS Channels (Low-Latency HLS)

- **Channel Subscription**: Subscribe to and analyze IVS channel streams
- **Frame Analysis**: AI-powered video frame analysis using Amazon Bedrock Claude
- **Video Analysis**: Comprehensive video segment analysis using TwelveLabs Pegasus
- **Real-time Transcription**: Convert speech to text using OpenAI Whisper
- **Timed Metadata Publishing**: Publish analysis results back to IVS as timed metadata
- **Rendition Selection**: Automatic or manual selection of stream quality

> [!IMPORTANT]
> Using these demos with your AWS account will create and consume AWS resources, which will cost money.

## Project Structure

```
amazon-ivs-python-demos/
├── README.md # This file
├── requirements.txt # Python dependencies
├── channels-subscribe/ # IVS Channel analysis tools
│ ├── README.md # Channel tools documentation
│ ├── ivs-channel-subscribe-analyze-frames.py # Frame analysis with Claude
│ ├── ivs-channel-subscribe-analyze-video.py # Video analysis with Pegasus
│ ├── ivs-channel-subscribe-analyze-audio-video.py # Combined audio/video analysis
│ ├── ivs-channel-subscribe-transcribe.py # Real-time transcription
│ └── ivs_metadata_publisher.py # Timed metadata publisher
├── stages-publish/ # Real-Time Stages publishing
│ ├── ivs-stage-publish.py # Basic media publishing
│ ├── ivs-stage-publish-events.py # Publishing with event handling
│ └── ivs-stage-pub-sub.py # Simultaneous publish/subscribe
├── stages-subscribe/ # Real-Time Stages subscribing
│ ├── ivs-stage-subscribe-transcribe.py # Subscribe with transcription
│ ├── ivs-stage-subscribe-analyze-frames.py # Subscribe with AI frame analysis
│ └── ivs-stage-subscribe-analyze-video.py # Subscribe with AI video analysis
├── stages-nova-s2s/ # AI Speech-to-Speech
│ └── ivs-stage-nova-s2s.py # Nova Sonic integration
├── stages-gpt-realtime/ # GPT RealTime API
│ └── ivs-stage-gpt-realtime.py # gpt-realtime integration
└── stages_sei/ # SEI Publishing System
├── SEI.md # SEI documentation and usage guide
├── sei_publisher.py # High-level SEI message publishing
└── h264_sei_patch.py # Low-level H.264 encoder patching
```

## Prerequisites

- Python 3.8 or higher
- AWS CLI configured with appropriate credentials
- Amazon IVS Real-Time Stage ARN and participant tokens
- FFmpeg (for media processing when using transcription demo - not necessary otherwise)
- Audio input/output devices (for speech-to-speech functionality)

### AWS Permissions Required

Your AWS credentials need the following permissions:

**For IVS Real-Time Stages:**

- `ivs:CreateParticipantToken`
- `bedrock:InvokeModel` (for video frame analysis with Claude)
- `bedrock:InvokeModelWithBidirectionalStream` (for Nova Sonic)
- Access to Amazon IVS Real-Time Stages

**For IVS Channels:**

- `ivs:PutMetadata` (for publishing timed metadata)
- `bedrock:InvokeModel` (for Claude frame analysis and TwelveLabs Pegasus video analysis)
- Access to Amazon IVS Channels

## Installation

1. **Clone and navigate to the project directory:**

```bash
cd /amazon-ivs-aiortc-demos
```

2. **Create and activate a virtual environment:**

```bash
python3 -m venv .venv
source .venv/bin/activate # On macOS/Linux
# or
.venv\Scripts\activate # On Windows
```

3. **Install dependencies:**

```bash
pip install -r requirements.txt
```

4. **Install system dependencies:**

**macOS:**

```bash
brew install ffmpeg portaudio
```

**Ubuntu/Debian:**

```bash
sudo apt-get update
sudo apt-get install ffmpeg portaudio19-dev
```

**Windows:**

```bash
# Install FFmpeg
# Download from https://ffmpeg.org/download.html and add to PATH
# Or use chocolatey:
choco install ffmpeg

# PortAudio is typically installed automatically with pyaudio
# If you encounter issues, you may need to install Microsoft Visual C++ Build Tools
```

## Configuration

### Environment Variables

Set the following environment variables or ensure AWS CLI is configured:

```bash
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key

# Optional: For weather functionality in Nova speech-to-speech
export WEATHER_API_KEY=your_weather_api_key

# Optional: For web search functionality in Nova speech-to-speech
export BRAVE_API_KEY=your_brave_api_key
```

### Weather API (Optional)

The Nova speech-to-speech script supports weather queries through WeatherAPI.com:

1. Sign up at [WeatherAPI.com](https://www.weatherapi.com/) for a free account
2. Get your API key from the dashboard
3. Set the `WEATHER_API_KEY` environment variable
4. The AI assistant will then be able to answer weather-related questions

### Web Search API (Optional)

The Nova speech-to-speech script supports web search capabilities through Brave Search API:

1. Sign up at [Brave Search API](https://api.search.brave.com/) for a free account
2. Get your API key from the dashboard
3. Set the `BRAVE_API_KEY` environment variable or use the `--brave-api-key` command line argument
4. The AI assistant will then be able to search the web for current information, news, and facts

## Sub-Projects

### Channels Subscribe

The `channels-subscribe/` directory contains scripts for subscribing to and analyzing Amazon IVS Channels (low-latency HLS streams).

#### Key Features

- **Frame Analysis**: Analyze individual video frames using Amazon Bedrock Claude models
- **Video Analysis**: Process video segments using TwelveLabs Pegasus for comprehensive content analysis
- **Audio/Video Analysis**: Combined audio and video processing with proper synchronization using PyAV
- **Real-Time Transcription**: Live speech-to-text using OpenAI Whisper with multi-language support
- **Timed Metadata Publishing**: Publish analysis results back to IVS channels as timed metadata
- **Rendition Selection**: Automatic or manual selection of stream quality

#### Scripts Overview

**ivs-channel-subscribe-analyze-frames.py**

- Analyzes individual video frames at configurable intervals using Amazon Bedrock Claude
- Supports multiple Claude models (Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku)
- Configurable analysis intervals for cost control
- Optional video display and rendition quality selection

**ivs-channel-subscribe-analyze-video.py**

- Records and analyzes video segments using TwelveLabs Pegasus
- Encodes video chunks to MP4 for comprehensive analysis
- OpenCV-based video capture with configurable recording duration

**ivs-channel-subscribe-analyze-audio-video.py**

- Advanced script using PyAV for proper audio/video stream handling
- Native audio capture and encoding with H.264 video and AAC audio
- Complete media analysis with TwelveLabs Pegasus

**ivs-channel-subscribe-transcribe.py**

- Real-time audio transcription using OpenAI Whisper
- Support for 99+ languages with auto-detection
- Multiple Whisper models from tiny to large-v3
- Optional publishing of transcripts as IVS timed metadata

**ivs_metadata_publisher.py**

- Reusable module for publishing timed metadata to IVS channels
- Automatic channel ARN extraction from M3U8 playlist URLs
- Rate limiting compliance and automatic payload splitting
- Support for transcripts, events, and custom metadata

#### Usage Examples

```bash
# Frame analysis with Claude Sonnet 4
python channels-subscribe/ivs-channel-subscribe-analyze-frames.py \
--playlist-url "https://example.com/playlist.m3u8" \
--highest-quality

# Real-time transcription with metadata publishing
python channels-subscribe/ivs-channel-subscribe-transcribe.py \
--playlist-url "https://example.com/playlist.m3u8" \
--language en \
--whisper-model base \
--publish-transcript-as-timed-metadata

# Video analysis with TwelveLabs Pegasus
python channels-subscribe/ivs-channel-subscribe-analyze-video.py \
--playlist-url "https://example.com/playlist.m3u8" \
--analysis-duration 15 \
--show-video
```

For detailed documentation, see [`channels-subscribe/README.md`](channels-subscribe/README.md).

### Stages Publish

The `stages-publish/` directory contains scripts for publishing media content to IVS Real-Time Stages from MP4 files or live HLS streams.

#### ivs-stage-publish.py

Basic media publishing script that streams video/audio content to an IVS stage from MP4 files or HLS streams.

**Features:**

- Publishes video and audio tracks from MP4 files or M3U8 HLS streams to IVS Real-Time Stages
- JWT token validation and capability checking
- WebRTC connection management
- Option to publish video-only streams
- Optional HLS stream health monitoring with automatic exit when stream ends
- Configurable stream check intervals for cost-effective monitoring

**Usage:**

```bash
cd stages-publish

# Publish MP4 file
python ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "path/to/video.mp4"

# Publish HLS stream
python ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--m3u8-url "https://example.com/stream.m3u8"

# Publish HLS stream with automatic exit when stream ends
python ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--m3u8-url "https://example.com/stream.m3u8" \
--stream-check-interval 30
```

**Command-line Arguments:**

- `--token`: JWT participant token with publish capabilities (required)
- `--path-to-mp4`: Path to MP4 file to publish (mutually exclusive with --m3u8-url)
- `--m3u8-url`: M3U8 playlist URL for HLS stream to publish (mutually exclusive with --path-to-mp4)
- `--video-only`: Publish video only, no audio (optional flag)
- `--stream-check-interval`: Interval in seconds to check HLS stream health - enables automatic exit when stream ends (optional, HLS only)

**HLS Stream Monitoring:**

When using `--stream-check-interval`, the script monitors HLS stream health by periodically checking if the M3U8 playlist is still accessible:

- **Automatic Exit**: Script gracefully exits when the HLS stream stops broadcasting
- **Rapid Verification**: After a health check failure, the next 2 checks use a 1-second interval for quick verification
- **Consecutive Failures**: Requires 3 consecutive failures before declaring the stream offline
- **Cost Control**: Only makes HTTP requests when explicitly enabled with the parameter
- **No Interference**: Stream monitoring doesn't affect video/audio quality or WebRTC performance

**Stream Monitoring Behavior:**

```
Normal check (30s) → ✅ Healthy → Wait 30s
Normal check (30s) → ❌ Failed → Wait 1s (rapid check 1/2)
Rapid check (2s) → ❌ Failed → Wait 1s (rapid check 2/2)
Rapid check (2s) → ❌ Failed → Stream declared offline, exit gracefully
```

**Without `--stream-check-interval`**: Script runs indefinitely until manually stopped (Ctrl+C), regardless of stream status.

#### ivs-stage-publish-events.py

Enhanced publishing script with real-time event handling via WebSocket connections.

**Features:**

- All features of basic publisher
- Real-time stage event monitoring via WebSocket
- Participant join/leave notifications
- Stage state change handling

**Usage:**

```bash
cd stages-publish
python ivs-stage-publish-events.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "path/to/video.mp4"
```

**Command-line Arguments:**

- `--token`: JWT participant token with publish capabilities (required)
- `--path-to-mp4`: Path to MP4 file to publish (required)
- `--video-only`: Publish video only, no audio (optional flag)

#### ivs-stage-pub-sub.py

Advanced script that demonstrates simultaneous publishing and subscribing capabilities.

**Features:**

- Publishes audio from MP4 file while subscribing to other participants
- Demonstrates bidirectional communication
- Audio/video track management
- SDP (Session Description Protocol) handling

**Usage:**

```bash
cd stages-publish
python ivs-stage-pub-sub.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "path/to/audio.mp4"
```

**Command-line Arguments:**

- `--token`: JWT participant token with both publish and subscribe capabilities (required)
- `--path-to-mp4`: Path to MP4 file to publish audio from (required)
- `--video-only`: Publish video only, no audio (optional flag)
- `--subscribe-to`: List of participant IDs to subscribe to (optional)

### Stages Subscribe

The `stages-subscribe/` directory contains scripts for receiving and processing streams from IVS Real-Time Stages.

#### ivs-stage-subscribe-transcribe.py

Subscribes to IVS stage audio streams and provides real-time speech-to-text transcription using OpenAI Whisper.

**Features:**

- Subscribes to audio tracks from specific participants in IVS Real-Time Stages
- Real-time speech transcription using Whisper
- Audio chunk processing and buffering
- Multiple language support
- Audio format conversion and normalization

**Usage:**

```bash
cd stages-subscribe
python ivs-stage-subscribe-transcribe.py \
--participant-id "participant123" \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..."
```

**Command-line Arguments:**

- `--participant-id`: ID of the participant to subscribe to (required)
- `--token`: JWT participant token with subscribe capabilities (required)
- `--whisper-model`: Whisper model size - "tiny", "base", "small", "medium", "large" (default: "tiny")
- `--fp16`: Enable FP16 precision for faster processing (default: true)
- `--language`: Language code for transcription (default: "en")
- `--chunk-duration`: Audio chunk duration in seconds (default: 5)

**Supported Languages:**

- English ("en")
- Spanish ("es")
- French ("fr")
- German ("de")
- Italian ("it")
- Portuguese ("pt")
- And many more supported by Whisper

#### ivs-stage-subscribe-analyze-frames.py

Subscribes to IVS stage video streams and provides AI-powered video frame analysis using Amazon Bedrock Claude models for content discovery, moderation, and accessibility.

**Features:**

- Subscribes to video tracks from specific participants in IVS Real-Time Stages
- AI-powered video frame analysis using Claude Sonnet 4
- Configurable analysis intervals to control costs
- Support for multiple Claude models (Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku)
- Detailed frame descriptions for content moderation and accessibility
- Background processing to avoid blocking video streams
- Cost-conscious design with smart frame sampling

**Usage:**

```bash
cd stages-subscribe
python ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"
```

**Command-line Arguments:**

- `--token`: JWT participant token with subscribe capabilities (required)
- `--subscribe-to`: Participant ID to subscribe to (required)
- `--analysis-interval`: Time in seconds between frame analyses (default: 30.0)
- `--bedrock-region`: AWS region for Bedrock service (default: "us-east-1")
- `--bedrock-model-id`: Bedrock model ID for analysis (default: "us.anthropic.claude-sonnet-4-20250514-v1:0")
- `--disable-analysis`: Disable video frame analysis, just subscribe to video (optional flag)

**Supported Models:**

- **Claude Sonnet 4** (default): `us.anthropic.claude-sonnet-4-20250514-v1:0` - Most capable, best for complex analysis
- **Claude 3.5 Sonnet**: `anthropic.claude-3-5-sonnet-20241022-v2:0` - Very capable, good balance of performance and cost
- **Claude 3.5 Haiku**: `anthropic.claude-3-5-haiku-20241022-v1:0` - Fastest and cheapest, good for basic content moderation

**Use Cases:**

- **Content Moderation**: Automatically detect inappropriate content in live streams
- **Content Discovery**: Generate descriptions and tags for video content
- **Accessibility**: Create detailed descriptions for visually impaired users
- **Analytics**: Track objects, activities, and engagement in video streams
- **Compliance**: Monitor streams for regulatory compliance

**Cost Control Features:**

- Configurable analysis intervals (default 30 seconds to minimize costs)
- Background processing doesn't block video streaming
- Option to disable analysis entirely for testing
- Smart error handling prevents failed analyses from crashing streams

#### ivs-stage-subscribe-analyze-video.py

Subscribes to IVS stage audio and video streams and provides AI-powered video analysis using Amazon Bedrock TwelveLabs Pegasus for comprehensive video understanding.

**Features:**

- Subscribes to both audio and video tracks from specific participants
- Records short video clips (configurable duration) for analysis
- Encodes audio and video to MP4 format in memory
- AI-powered video analysis using TwelveLabs Pegasus model
- Detailed video content descriptions including people, objects, activities, and text
- Asynchronous processing to maintain stream performance
- Configurable analysis duration and frequency

**Usage:**

```bash
cd stages-subscribe
python ivs-stage-subscribe-analyze-video.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"
```

**Command-line Arguments:**

- `--token`: JWT participant token with subscribe capabilities (required)
- `--subscribe-to`: Participant ID to subscribe to (required)
- `--analysis-duration`: Duration in seconds for video recording before analysis (default: 10.0)
- `--bedrock-region`: AWS region for Bedrock service (default: "us-west-2")
- `--bedrock-model-id`: Bedrock model ID for analysis (default: "us.twelvelabs.pegasus-1-2-v1:0")
- `--disable-analysis`: Disable video analysis, just subscribe to video (optional flag)

### Stages Nova Speech-to-Speech

The `stages-nova-s2s/` directory contains the most advanced script integrating Amazon Nova Sonic for AI-powered speech-to-speech functionality.

#### ivs-stage-nova-s2s.py

A comprehensive script that combines IVS Real-Time Stages with Amazon Nova Sonic for conversational AI experiences.

**Features:**

- Bidirectional audio streaming with IVS participants
- Amazon Nova Sonic integration for AI responses
- Real-time waveform visualization
- Audio resampling and format conversion
- WebRTC track management for both publishing and subscribing
- Dynamic audio visualization with gradient colormaps
- AI-powered video frame analysis using Amazon Bedrock Claude models
- Built-in tools for date/time, weather, and visual analysis
- Configurable frame analysis with multiple Claude model options

**Usage:**

```bash
cd stages-nova-s2s
python ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"
```

**Command-line Arguments:**

- `--token`: JWT participant token with both publish and subscribe capabilities (required)
- `--subscribe-to`: Participant ID to subscribe to (required)
- `--nova-model-id`: Amazon Nova model identifier (default: "amazon.nova-sonic-v1:0")
- `--nova-region`: AWS region for Nova service (default: "us-east-1")
- `--disable-frame-analysis`: Disable video frame analysis (default: enabled)
- `--bedrock-model-id`: Bedrock model ID for frame analysis (default: "us.anthropic.claude-sonnet-4-20250514-v1:0")
- `--bedrock-region`: AWS region for Bedrock service (default: "us-east-1")
- `--weather-api-key`: Weather API key for weather tool functionality (overrides WEATHER_API_KEY environment variable)
- `--brave-api-key`: Brave Search API key for web search tool functionality (overrides BRAVE_API_KEY environment variable)
- `--ice-timeout`: ICE gathering timeout in seconds (default: 1, original: 5) - Lower values speed up connection establishment

**Key Components:**

1. **AgentAudioTrack**: Custom audio track for streaming Nova responses
2. **AgentVideoTrack**: Dynamic waveform visualization with thinking states
3. **BedrockStreamManager**: Manages bidirectional Nova Sonic streaming
4. **Audio Processing**: Handles resampling between IVS (48kHz) and Nova (16kHz)
5. **Tool Support**: Built-in tools for date/time, weather, and video frame analysis
6. **Frame Analysis**: Non-blocking AI-powered video frame analysis using Claude models

**Available Tools:**

- **Date/Time Tool**: Get current date and time information for specific locations with timezone support
- **Weather Tool**: Get current weather and 5-day forecast (requires `WEATHER_API_KEY`)
- **Web Search Tool**: Search the web for current information, news, and facts (requires `BRAVE_API_KEY`)
- **Frame Analysis Tool**: Analyze video frames for visual assistance and content description

#### Assistant Management

For automated management of multiple Nova assistant instances via WebSocket integration with IVS Chat, see:

**[IVS Stage Assistant Manager Documentation](stages-nova-s2s/MANAGING_ASSISTANT_DEMO.md)**

This companion tool allows you to dynamically launch and manage multiple Nova S2S instances based on chat messages, perfect for scaling AI assistants across multiple participants.

#### OpenAI Assistant Management

For automated management of multiple OpenAI assistant instances via WebSocket integration with IVS Chat, see:

**[IVS Stage OpenAI Assistant Manager Documentation](stages-gpt-realtime/MANAGING_OPENAI_ASSISTANT_DEMO.md)**

This companion tool allows you to dynamically launch and manage multiple OpenAI real-time instances based on chat messages, with full control over voice, VAD settings, and vision capabilities.

### Stages OpenAI Real-time API

The `stages-gpt-realtime/` directory contains integration with OpenAI's real-time API for speech-to-speech conversations with IVS Real-Time Stages.

#### ivs-stage-gpt-realtime.py

A comprehensive script that integrates OpenAI's gpt-realtime API with IVS Real-Time Stages for conversational AI experiences.

**Features:**

- Bidirectional audio streaming with IVS participants
- OpenAI real-time API integration for AI responses
- WebSocket-based real-time communication with OpenAI
- Real-time audio visualization
- Audio resampling and format conversion (24kHz for OpenAI)
- WebRTC track management for both publishing and subscribing
- Voice activity detection and interruption handling
- Multiple voice options (alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar)

**Usage:**

```bash
cd stages-gpt-realtime
python ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--openai-key "sk-..."
```

**Command-line Arguments:**

- `--token`: JWT participant token with both publish and subscribe capabilities (required)
- `--subscribe-to`: Participant ID to subscribe to (required)
- `--openai-key`: OpenAI API key (optional, uses OPENAI_API_KEY environment variable if not provided)
- `--model`: OpenAI model to use (default: "gpt-realtime")
- `--voice`: Voice to use for responses - "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse", "marin", "cedar" (default: "cedar")
- `--disable-frame-analysis`: Disable video frame analysis (default: enabled)
- `--bedrock-region`: AWS region for Bedrock service (default: "us-east-1")
- `--bedrock-model-id`: Bedrock model ID for frame analysis (default: "us.anthropic.claude-sonnet-4-20250514-v1:0")
- `--ice-timeout`: ICE gathering timeout in seconds (default: 1, original: 5)

**Key Components:**

1. **OpenAIAudioTrack**: Custom audio track for streaming OpenAI responses
2. **OpenAIVideoTrack**: Dynamic audio visualization with OpenAI branding
3. **OpenAIRealtimeManager**: Manages bidirectional OpenAI real-time API streaming
4. **Audio Processing**: Handles resampling for OpenAI's 24kHz requirement
5. **WebSocket Management**: Handles OpenAI real-time API WebSocket connection
6. **Vision Capabilities**: AI-powered video frame analysis using Amazon Bedrock Claude models

**Available Voices:**

- **cedar** (default): Warm and conversational
- **alloy**: Balanced and natural
- **ash**: Clear and articulate
- **ballad**: Smooth and melodic
- **coral**: Bright and engaging
- **echo**: Clear and articulate
- **sage**: Wise and thoughtful
- **shimmer**: Soft and gentle
- **verse**: Expressive and dynamic
- **marin**: Professional and polished

**Prerequisites:**

- OpenAI API key with real-time API access
- IVS stage token with both publish and subscribe capabilities
- Python 3.8+ with required dependencies

**Known Limitations:**

- **Semantic VAD Transcriptions**: When using `--vad-mode semantic_vad`, user input transcriptions may not be generated. Use `--vad-mode server_vad` if you need reliable user transcriptions for SEI metadata or logging.

**Environment Variables:**

```bash
export OPENAI_API_KEY="sk-your-openai-api-key-here"
```

**Example Usage:**

```bash
# Basic OpenAI conversation
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"

# Using different voice and model
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--voice "nova" \
--model "gpt-4o-realtime-preview-2024-10-01"

# With explicit API key
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--openai-key "sk-..."
```

### Stages SEI Publishing

The `stages_sei/` directory contains a comprehensive SEI (Supplemental Enhancement Information) publishing system for embedding metadata directly into H.264 video streams.

**What is SEI?**

SEI NAL units are part of the H.264/AVC video compression standard that allow embedding additional metadata within the video stream itself. This metadata travels with the video frames, ensuring perfect synchronization between video content and associated data.

**Key Features:**

- **Perfect Synchronization**: Metadata is embedded directly in video frames
- **Low Latency**: No separate data channels needed
- **Standards Compliant**: Uses official H.264 specification
- **Multi-format Support**: Handles Annex B, AVCC, and RTP H.264 formats
- **Automatic Integration**: Patches aiortc and PyAV encoders automatically
- **Reliable Delivery**: 3x repetition with client-side deduplication

**Components:**

- **`sei_publisher.py`**: High-level interface for publishing SEI messages
- **`h264_sei_patch.py`**: Low-level H.264 encoder patching system
- **`SEI.md`**: Comprehensive documentation and usage guide

**Usage Example:**

```python
from stages_sei import SeiPublisher, patch_h264_encoder, set_global_sei_publisher

# Apply H.264 encoder patch (do this early in your application)
patch_h264_encoder()

# Create and configure SEI publisher
sei_publisher = SeiPublisher()
set_global_sei_publisher(sei_publisher)

# Publish metadata
await sei_publisher.publish_json({
"type": "chat_message",
"user": "alice",
"message": "Hello world!",
"timestamp": time.time()
})
```

**Integration:**

The Nova speech-to-speech script (`stages-nova-s2s/ivs-stage-nova-s2s.py`) demonstrates SEI publishing in action, embedding AI assistant responses directly into the video stream for synchronized delivery.

**For detailed documentation, see [`stages_sei/SEI.md`](stages_sei/SEI.md).**

### Utility Scripts

_Note: Utility scripts are excluded from this documentation as they are development/testing tools._

## Usage Examples

### IVS Channel Examples

```bash
# Subscribe to IVS channel and analyze frames with Claude
python channels-subscribe/ivs-channel-subscribe-analyze-frames.py \
--playlist-url "https://example.com/playlist.m3u8" \
--highest-quality \
--analysis-interval 30

# Real-time transcription of IVS channel audio
python channels-subscribe/ivs-channel-subscribe-transcribe.py \
--playlist-url "https://example.com/playlist.m3u8" \
--language en \
--whisper-model base \
--publish-transcript-as-timed-metadata

# Comprehensive video analysis with TwelveLabs Pegasus
python channels-subscribe/ivs-channel-subscribe-analyze-video.py \
--playlist-url "https://example.com/playlist.m3u8" \
--analysis-duration 10 \
--bedrock-region us-west-2

# Combined audio/video analysis using PyAV
python channels-subscribe/ivs-channel-subscribe-analyze-audio-video.py \
--playlist-url "https://example.com/playlist.m3u8" \
--highest-quality \
--analysis-duration 15
```

### IVS Real-Time Stages Examples

#### Basic Publishing Examples

```bash
# Publish MP4 file to IVS stage
python stages-publish/ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "sample-video.mp4"

# Publish HLS stream to IVS stage
python stages-publish/ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--m3u8-url "https://example.com/live/stream.m3u8"

# Publish HLS stream with automatic exit when stream ends (check every 30 seconds)
python stages-publish/ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--m3u8-url "https://example.com/live/stream.m3u8" \
--stream-check-interval 30

# Publish HLS stream with frequent monitoring (check every 10 seconds)
python stages-publish/ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM8NCJ9..." \
--m3u8-url "https://example.com/live/stream.m3u8" \
--stream-check-interval 10

# Publish video-only HLS stream
python stages-publish/ivs-stage-publish.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--m3u8-url "https://example.com/live/stream.m3u8" \
--video-only
```

#### Publishing with Events Example

```bash
# Publish with real-time event monitoring
python stages-publish/ivs-stage-publish-events.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "sample-video.mp4"
```

#### Transcription Example

```bash
# Subscribe and transcribe audio in Spanish
python stages-subscribe/ivs-stage-subscribe-transcribe.py \
--participant-id "participant123" \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--language "es" \
--whisper-model "medium"
```

#### Video Frame Analysis Examples

```bash
# Basic video frame analysis (every 30 seconds with Claude Sonnet 4)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"

# Frequent analysis for real-time moderation (every 5 seconds)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--analysis-interval 5.0

# Cost-effective analysis using Claude 3.5 Haiku (every 60 seconds)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--bedrock-model-id "anthropic.claude-3-5-haiku-20241022-v1:0" \
--analysis-interval 60.0

# Analysis in different AWS region
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--bedrock-region "eu-west-1"

# Subscribe to video without analysis (testing connectivity)
python stages-subscribe/ivs-stage-subscribe-analyze-frames.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--disable-analysis
```

#### Video Analysis Examples

```bash
# Basic video analysis with TwelveLabs Pegasus
python stages-subscribe/ivs-stage-subscribe-analyze-video.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"

# Shorter video clips for more frequent analysis
python stages-subscribe/ivs-stage-subscribe-analyze-video.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--analysis-duration 5.0
```

#### AI Speech-to-Speech Example

```bash
# Start Nova Sonic conversation with frame analysis
python stages-nova-s2s/ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--nova-model-id "amazon.nova-sonic-v1:0" \
--nova-region "us-east-1"

# Nova conversation without frame analysis
python stages-nova-s2s/ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--disable-frame-analysis

# Nova conversation with custom Bedrock model and region
python stages-nova-s2s/ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--bedrock-model-id "anthropic.claude-3-5-sonnet-20241022-v2:0" \
--bedrock-region "us-west-2"

# Nova conversation with fast connection setup
python stages-nova-s2s/ivs-stage-nova-s2s.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--ice-timeout 1
```

#### OpenAI Real-time API Examples

```bash
# Basic OpenAI real-time conversation
python stages-gpt-realtimeltime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123"

# Using different voice
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--voice "nova"

# With explicit OpenAI API key
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--openai-key "sk-your-key-here"

# With vision capabilities disabled
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--disable-frame-analysis

# With custom Bedrock model for vision
python stages-gpt-realtimeltime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--bedrock-model-id "anthropic.claude-3-5-sonnet-20241022-v2:0"

# Fast connection setup
python stages-gpt-realtime/ivs-stage-openai-realtime.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--subscribe-to "participant123" \
--ice-timeout 1
```

#### Publish and Subscribe Example

```bash
# Simultaneously publish and subscribe
python stages-publish/ivs-stage-pub-sub.py \
--token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
--path-to-mp4 "audio-file.mp4" \
--subscribe-to "participant1" "participant2"
```

#### Creating Participant Tokens

Use the AWS CLI to create participant tokens:

```bash
# Create a token with publish capabilities
aws ivs-realtime create-participant-token \
--stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
--user-id "user123" \
--capabilities PUBLISH \
--duration 720

# Create a token with subscribe capabilities
aws ivs-realtime create-participant-token \
--stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
--user-id "user456" \
--capabilities SUBSCRIBE \
--duration 720

# Create a token with both publish and subscribe capabilities
aws ivs-realtime create-participant-token \
--stage-arn "arn:aws:ivs:us-east-1:123456789012:stage/abcdefgh" \
--user-id "user789" \
--capabilities PUBLISH SUBSCRIBE \
--duration 720
```

## Troubleshooting

### Common Issues

#### IVS Channels Issues

1. **"No audio stream found"**

- Check if the M3U8 stream contains audio using `ffprobe`
- Try different rendition quality options
- Verify stream accessibility with `curl`

2. **"Unable to open video stream"**

- Verify M3U8 URL is accessible
- Check network connectivity and firewall settings
- Try different rendition selections

3. **Whisper Model Issues**

- Clear Whisper cache: `rm -rf ~/.cache/whisper/`
- Use smaller models for memory-constrained environments
- Enable FP16 for faster processing

4. **Timed Metadata Publishing Issues**
- Verify AWS credentials have `ivs:PutMetadata` permissions
- Check rate limiting (5 RPS per channel, 155 RPS per account)
- Ensure channel ARN extraction is working correctly

#### IVS Real-Time Stages Issues

1. **Audio Quality Problems**

- Ensure consistent chunk sizes (512 samples recommended)
- Check audio resampling configuration
- Verify WebRTC connection stability

2. **WebRTC Connection Failures**

- Verify JWT token has correct capabilities
- Check network connectivity and firewall settings
- Ensure SDP munging is applied correctly

3. **Nova Sonic Issues**
- Verify AWS credentials have Bedrock permissions
- Check model availability in your region
- Ensure proper event sequence (START_SESSION → START_PROMPT → content)

#### General Issues

1. **Video Frame Analysis Issues**

- Verify AWS credentials have `bedrock:InvokeModel` permissions
- Check Claude/Pegasus model availability in your region
- Monitor analysis costs with appropriate intervals
- Ensure video track is receiving frames before analysis begins

2. **Transcription Accuracy**
- Use appropriate Whisper model size for your use case
- Ensure clean audio input
- Consider language-specific models

### Debug Mode

Enable debug logging for detailed troubleshooting:

```bash
export PYTHONPATH=$PYTHONPATH:.
python -c "import logging; logging.basicConfig(level=logging.DEBUG)"
python your-script.py --your-args
```

### Performance Optimization

#### IVS Channels Optimization

1. **For Channel Transcription:**

- Use `--whisper-model tiny` or `--whisper-model base` for real-time processing
- Enable FP16: `--fp16 true`
- Use shorter chunks: `--chunk-duration 3`
- Specify language: `--language en` (faster than auto-detect)

2. **For Channel Video Analysis:**

- Use `--lowest-quality` for faster processing
- Adjust `--analysis-duration` based on content complexity
- Run without `--show-video` for headless operation

3. **For Channel Frame Analysis:**
- Increase `--analysis-interval` for less frequent analysis (cost control)
- Use `--lowest-quality` for faster frame processing
- Choose appropriate Claude model for your use case

#### IVS Real-Time Stages Optimization

1. **Connection Speed:**

- Use `--ice-timeout 1` for faster WebRTC connection establishment (default)
- Original WebRTC ICE timeout is 5 seconds, optimized to 1 second for better user experience
- Increase timeout if experiencing connection issues in poor network conditions
- This optimization reduces startup time from ~11 seconds to ~3 seconds

2. **For Nova Sonic:**

- Use consistent 1ms delays between audio chunks
- Implement proper buffering strategies
- Monitor memory usage during long sessions

3. **For Stage Transcription:**
- Choose appropriate chunk duration (5-10 seconds)
- Use smaller Whisper models for real-time processing
- Consider GPU acceleration for large models

#### General Optimization

1. **For Video Frame Analysis:**
- Use longer analysis intervals (30+ seconds) to control costs
- Choose appropriate Claude model for your use case:
- Claude 3.5 Haiku for basic content moderation
- Claude 3.5 Sonnet for balanced performance
- Claude Sonnet 4 for complex analysis requiring highest accuracy
- Monitor Bedrock usage and costs in AWS console
- Consider regional model availability and latency

## Dependencies

### Core Dependencies

- `aiortc>=1.12.0` - WebRTC implementation
- `av>=10.0.0` - Media processing
- `requests>=2.28.0` - HTTP client
- `websockets>=11.0.0` - WebSocket client
- `numpy>=1.21.0` - Numerical computing

### AI/ML Dependencies

- `whisper` (from GitHub) - Speech recognition
- `boto3>=1.34.0` - AWS SDK for Bedrock and IVS
- `aws-sdk-bedrock-runtime` - Amazon Bedrock client
- `smithy-aws-core>=0.0.1` - AWS SDK core
- `pyaudio>=0.2.13` - Audio I/O
- `rx>=3.2.0` - Reactive extensions
- `Pillow>=10.0.0` - Image processing for video frame analysis
- `opencv-python>=4.8.0` - Computer vision for video processing

### Utility Dependencies

- `pytz` - Timezone handling
- `tzlocal` - Local timezone detection

### System Requirements

- Python 3.8+
- FFmpeg
- PortAudio (for audio I/O)
- Sufficient bandwidth for WebRTC streams

## Contributing

See [CONTRIBUTING](CONTRIBUTING.md) for more information.

## License

This library is licensed under the MIT-0 License. See the [LICENSE](./LICENSE) file.

## Support

For issues related to:

- **Amazon IVS Real-Time Stages**: Check the [IVS Real-Time Streaming documentation](https://docs.aws.amazon.com/ivs/latest/RealTimeUserGuide/)
- **Amazon IVS Channels**: Check the [IVS Low-Latency Streaming documentation](https://docs.aws.amazon.com/ivs/latest/LowLatencyUserGuide/)
- **Amazon Nova**: Check the [Bedrock documentation](https://docs.aws.amazon.com/bedrock/)
- **Amazon Bedrock**: Check the [Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/)
- **aiortc**: Check the [aiortc documentation](https://aiortc.readthedocs.io/)
- **OpenAI Whisper**: Check the [Whisper repository](https://github.com/openai/whisper)

---

_This project demonstrates advanced integration patterns between Amazon IVS services and AI capabilities. From real-time conversational AI with Nova Sonic to comprehensive video analysis with Claude and TwelveLabs Pegasus, these demos showcase the power of combining live video streaming with cutting-edge AI services._

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aws-samples/sample-amazon-ivs-python-demos

Awesome Lists containing this project

README