https://github.com/darren277/gpt-realtime
https://github.com/darren277/gpt-realtime
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/darren277/gpt-realtime
- Owner: darren277
- Created: 2024-12-19T00:50:56.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-02-11T11:26:18.000Z (4 months ago)
- Last Synced: 2025-02-11T12:31:16.747Z (4 months ago)
- Language: JavaScript
- Size: 1.54 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# gpt-realtime
## About
This repository serves the purpose of providing a space to experiment with the OpenAI Realtime API.
While there are both a WebRTC and a WebSockets version, I will be primarily exploring the WebSockets version.
The plan is to include both some Python and JavaScript code showcasing the core functionality of the API, especially with regards to voice interaction.
### Some resources:
https://platform.openai.com/docs/guides/realtime-model-capabilities
https://platform.openai.com/docs/guides/realtime-websocket
https://platform.openai.com/docs/api-reference/realtime
### Images
#### All Events

#### Server Events: Session and Error and Conversation Created

#### Server Events: Conversation Item

#### Server Events: Input Audio Buffer

#### Server Events: Response

#### Client Events

## Python version
An implementation of the GPT Real Time API using WebSockets. It is built in Flask and has a very simple single page for the front end that uses vanilla HTML and JavaScript.
So far, it successfully interacts with the Real Time API (using the `gpt-4o-realtime-preview-2024-12-17` beta model). It handles voice that is recorded via the browser and converted using `ffmpeg` (server side as a subprocess).
It still needs some tweaking to get the live prompts working right as the last testing resulted in it simply repeating back what I had said to it.
## JavaScript version
There is now a Node/Express/React version of the same application. It has the same functionality as the Python version currently, only, for whatever reason, the first test case worked flawlessly without having to make any adjustments to the intercepted prompts. Perhaps my first test case for the Python version was just too trivial to expect any results (I simply uttered "Testing speech to text. Testing speech to text." - not much of a prompt, afterall).
### Tool Calls
...