Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gurpreetkaurjethra/image-to-speech-genai-tool-using-llm
AI tool that generates an Audio short story based on the context of an uploaded image by prompting a GenAI LLM model, Hugging Face AI models together with OpenAI & LangChain
https://github.com/gurpreetkaurjethra/image-to-speech-genai-tool-using-llm
generative-ai gpt-3 huggingface huggingface-transformers image-to-speech langchain large-language-models llm llms openai openai-api project prompt-engineering python-3 streamlit-webapp
Last synced: 2 months ago
JSON representation
AI tool that generates an Audio short story based on the context of an uploaded image by prompting a GenAI LLM model, Hugging Face AI models together with OpenAI & LangChain
- Host: GitHub
- URL: https://github.com/gurpreetkaurjethra/image-to-speech-genai-tool-using-llm
- Owner: GURPREETKAURJETHRA
- License: mit
- Created: 2024-01-09T19:45:41.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-11T20:01:55.000Z (about 1 year ago)
- Last Synced: 2024-05-06T17:23:34.073Z (9 months ago)
- Topics: generative-ai, gpt-3, huggingface, huggingface-transformers, image-to-speech, langchain, large-language-models, llm, llms, openai, openai-api, project, prompt-engineering, python-3, streamlit-webapp
- Language: Python
- Homepage: https://image-to-speech-genai-tool-using-llm.streamlit.app/
- Size: 3.43 MB
- Stars: 11
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🖼️Image to Speech GenAI Tool Using LLM 🌟♨️
AI tool that generates an Audio short story based on the context of an uploaded image by prompting a GenAI LLM model, Hugging Face AI models together with OpenAI & LangChain. Deployed on Streamlit & Hugging Space Cloud Separately.## 📢Run App with Streamlit Cloud
[Launch App On Streamlit](https://image-to-speech-genai-tool-using-llm.streamlit.app/)
## 📢Run App with HuggingFace Space Cloud
[Launch App On HuggingFace Space](https://huggingface.co/spaces/GurpreetKJ/Image-to-SpeechStory_GenAI-Tool)
## 🎯 Demo:
![Demo 1: Couple Test Image Output](img-audio/CoupleOutput.jpg)You can listen respective audio file of this test demo images on respective `img-audio` folder
## 📈System Design
![system-design](img/system-design.drawio.png)
## 🏆Approach
An app that uses Hugging Face AI models to generate text from an image, which then generates audio from the text.Execution is divided into 3 parts:
- **Image to text:**
an image-to-text transformer model ([Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)) is used to generate a text scenario based on the on the AI understanding of the image context
- **Text to story:**
OpenAI LLM model is prompted to create a short story (50 words: can be adjusted as reqd.) based on the generated scenario. [gpt-3.5-turbo](https://platform.openai.com/docs/models/gpt-3-5)
- **Story to speech:**
a text-to-speech transformer model ([espnet/kan-bayashi_ljspeech_vits](https://huggingface.co/espnet/kan-bayashi_ljspeech_vits)) is used to convert the generated short story into a voice-narrated audio file
- A user interface is built using streamlit to enable uploading the image and playing the audio file
![Demo 3: Family Test Image Output](img-audio/FamilyOutput.jpg)
You can listen respective audio file of this test image on respective `img-audio` folder## 🌟Requirements
- os
- python-dotenv
- transformers
- torch
- langchain
- openai
- requests
- streamlit
## 🚀Usage
- Before using the app, the user should have personal tokens for Hugging Face and Open AI
- The user should set venv environment and install ipykernel library for running app on local system ide.
- The user should save the personal tokens in an ".env" file within the package as string objects under object names: HUGGINGFACE_TOKEN and OPENAI_TOKEN
- The user can then run the app using the command: streamlit run app.py
- Once the app is running on streamlit, the user can upload the target image
- Execution will start automatically and it may take a few minutes to complete
- Once completed, the app will display:
- The scenario text generated by the image-to-text transformer HuggingFace model
- The short story generated by prompting the OpenAI LLM
- The audio file narrating the short story generated by the text-to-speech transformer model
- Deployed Gen AI App on streamlit cloud and Hugging Space![Demo 2: Picnic Vaction Test Image Output](img-audio/PicnicOutput.jpg)
## ▶️Installation
Clone the repository:
`git clone https://github.com/GURPREETKAURJETHRA/Image-to-Speech-GenAI-Tool-Using-LLM.git`
Install the required Python packages:
`pip install -r requirements.txt`
Set up your OpenAI API key & Hugging Face Token by creating a .env file in the root directory of the project with the following contents:
`OPENAI_API_KEY=`
`HUGGINGFACE_API_TOKEN=<`Run the Streamlit app:
`streamlit run app.py`
## ©️ License
Distributed under the MIT License. See `LICENSE` for more information.
---
#### **If you like this LLM Project do drop ⭐ to this repo and Contributions are welcome! If you have any suggestions for improving this AI Img-Speech Converter, please submit a pull request.💁**
#### Follow me on [![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/gurpreetkaurjethra/) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/GURPREETKAURJETHRA/)---