An open API service indexing awesome lists of open source software.

https://github.com/aavato-c/ai-voice-annotation-from-webcam

A set of scripts to capture pictures from a webcam feed and then getting the description of them from openai. After that, we covert the text to speech using Elevenlabs.
https://github.com/aavato-c/ai-voice-annotation-from-webcam

api elevenlabs narration openai python tts video webcam

Last synced: 3 months ago
JSON representation

A set of scripts to capture pictures from a webcam feed and then getting the description of them from openai. After that, we covert the text to speech using Elevenlabs.

Awesome Lists containing this project

README

          

# Elevenlabs webcam descriptor

The source is quite simple. I don't think it's necessary to explain everything here but in short:
- The script will save frames from your webcam and store them in the media -folder
- The most recent frame will be used to make a call to an OpenAI multimodal-endpoint
- As the descriptions are generated, the dubs are then generated via ElevenLabs API
- The sound files will also be stored

To start using this little thing do the following:
* Make a venv using python3.8
* `python3.8 venv -m venv`
* Activate the environment (Linux/MacOs)
* `source venv/bin/activate`
* Install dependencies
* `pip install -r requirements.txt`
* Make your own env
* You can use the env.example provided, remove the .example extension and fill in your api keys
* Run the save_video_frames.py
* `python3 save_video_frames.py`
* Run the main_app.py
* `python3 main_app.py`
* Enjoy