https://github.com/aavato-c/ai-voice-annotation-from-webcam
A set of scripts to capture pictures from a webcam feed and then getting the description of them from openai. After that, we covert the text to speech using Elevenlabs.
https://github.com/aavato-c/ai-voice-annotation-from-webcam
api elevenlabs narration openai python tts video webcam
Last synced: 3 months ago
JSON representation
A set of scripts to capture pictures from a webcam feed and then getting the description of them from openai. After that, we covert the text to speech using Elevenlabs.
- Host: GitHub
- URL: https://github.com/aavato-c/ai-voice-annotation-from-webcam
- Owner: Aavato-c
- License: mit
- Created: 2023-11-24T20:33:09.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-02T03:21:57.000Z (about 1 year ago)
- Last Synced: 2025-01-17T22:24:03.045Z (9 months ago)
- Topics: api, elevenlabs, narration, openai, python, tts, video, webcam
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# Elevenlabs webcam descriptor
The source is quite simple. I don't think it's necessary to explain everything here but in short:
- The script will save frames from your webcam and store them in the media -folder
- The most recent frame will be used to make a call to an OpenAI multimodal-endpoint
- As the descriptions are generated, the dubs are then generated via ElevenLabs API
- The sound files will also be storedTo start using this little thing do the following:
* Make a venv using python3.8
* `python3.8 venv -m venv`
* Activate the environment (Linux/MacOs)
* `source venv/bin/activate`
* Install dependencies
* `pip install -r requirements.txt`
* Make your own env
* You can use the env.example provided, remove the .example extension and fill in your api keys
* Run the save_video_frames.py
* `python3 save_video_frames.py`
* Run the main_app.py
* `python3 main_app.py`
* Enjoy