Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/semaj87/image-to-text-to-speech
An app that uses Hugging Face AI models together with OpenAI & LangChain, to generate text from an image, which then generates audio from the text
https://github.com/semaj87/image-to-text-to-speech
generative-ai gpt huggingface-transformers langchain openai prompt-engineering python3 streamlit-webapp
Last synced: about 1 month ago
JSON representation
An app that uses Hugging Face AI models together with OpenAI & LangChain, to generate text from an image, which then generates audio from the text
- Host: GitHub
- URL: https://github.com/semaj87/image-to-text-to-speech
- Owner: semaj87
- License: mit
- Created: 2023-11-21T10:56:39.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-28T18:22:33.000Z (about 1 year ago)
- Last Synced: 2023-11-29T12:47:13.152Z (about 1 year ago)
- Topics: generative-ai, gpt, huggingface-transformers, langchain, openai, prompt-engineering, python3, streamlit-webapp
- Language: Python
- Homepage: https://generative-stories.streamlit.app/
- Size: 163 KB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Generating text & audio from images
An app that uses Hugging Face AI models to generate text from an image, which then generates audio from the text.
This application uses three steps to complete the above-mentioned task:
* Image to text model to let the machine understand the scenario based upon the content of the photo
* Use an LLM to generate a short story based on the text that was generated in step 1
* Lastly, we will use a text to speech model to generate audio for the story from the LLM## System Design
![system-design](img/system-design.drawio.png)
## Models
- Image to text model: [nlpconnect/vit-gpt2-image-captioning](https://huggingface.co/nlpconnect/vit-gpt2-image-captioning)
- Text to story model: [gpt-3.5-turbo](https://platform.openai.com/docs/models/gpt-3-5)
- Story to speech model: [bark](https://huggingface.co/suno/bark)## Requirements
```bash
python -m pip install -r requirements.txt
```## Run App Locally
```bash
source build.sh
```## Run App with Streamlit Cloud
[Launch App](https://generative-stories.streamlit.app/)
## License
Distributed under the MIT License. See `LICENSE` for more information.