https://github.com/cdpath/g4fd
https://github.com/cdpath/g4fd
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/cdpath/g4fd
- Owner: cdpath
- License: apache-2.0
- Created: 2024-11-20T11:48:48.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-11-20T11:50:34.000Z (over 1 year ago)
- Last Synced: 2025-03-20T09:51:15.326Z (about 1 year ago)
- Language: Swift
- Size: 48.8 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
基于 WebRTC 技术、AI 智能代理与函数调用(Function Call)的实时语音交互解决方案
```mermaid
sequenceDiagram
participant User (iOS App)
participant Frontend
participant LiveKit Cloud
participant Backend Agent
participant Silero VAD
participant Deepgram
participant OpenAI
participant MyEnvironmentAPI
participant ElevenLabs
User (iOS App)->>MyEnvironmentAPI: Sends photo every 10s
User (iOS App)->>Frontend: Clicks "Start conversation"
Frontend->>Frontend: Generates room name & token
Frontend->>LiveKit Cloud: Connects to room with token
Backend Agent->>LiveKit Cloud: Monitors for new rooms
LiveKit Cloud->>Backend Agent: Notifies of new room
Backend Agent->>LiveKit Cloud: Joins room
Note over Frontend,Backend Agent: WebRTC connection established
Frontend->>Backend Agent: Streams audio
Backend Agent->>Silero VAD: Detects voice activity
Silero VAD->>Backend Agent: Returns speech segments
Backend Agent->>Deepgram: Audio for STT
Deepgram->>Backend Agent: Transcribed text
Backend Agent->>OpenAI: Sends text to LLM
OpenAI->>Backend Agent: LLM response
Backend Agent->>MyEnvironmentAPI: Fetches current environment
MyEnvironmentAPI->>Backend Agent: Returns environment data
Backend Agent->>ElevenLabs: Sends response for TTS
ElevenLabs->>Backend Agent: Synthesized audio
Backend Agent->>Frontend: Streams audio response
```