Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/abdelrahman-amen/gemini_image_insights

This project leverages Google's Gemini LLM to analyze uploaded images and generate detailed descriptions or insights based on user prompts. It combines advanced AI capabilities with Streamlit for an interactive and user-friendly experience.
https://github.com/abdelrahman-amen/gemini_image_insights

api dotenv gemini imagecaptioning os pil python streamlit

Last synced: 9 days ago
JSON representation

This project leverages Google's Gemini LLM to analyze uploaded images and generate detailed descriptions or insights based on user prompts. It combines advanced AI capabilities with Streamlit for an interactive and user-friendly experience.

Awesome Lists containing this project

README

        

# 🌟 Gemini Image Demo 🌟
This project is a Streamlit-based web application that integrates with Google's Gemini AI models to analyze images and respond to user prompts. It's an interactive and intuitive app where you can upload an image 📷, provide an optional text prompt ✍️, and get AI-generated insights 💡.

![Image](https://github.com/user-attachments/assets/220ee2cd-84a4-402e-8d3d-6f2134e8ca18)

# Key Features 🔑

• 📤 Upload Images: Supports popular formats like JPG, JPEG, and PNG for seamless image handling.

• ✍️ Input Prompts: Add custom text prompts to refine AI responses.

• 🤖 Gemini AI Integration: Leverages Google's Generative AI models for image-text interaction.

• 📄 AI Output: Displays the model's response to the uploaded image and/or text input directly in the app.

# How It Works 🚀

1. Upload an image 📷 using the file uploader.

2. Optionally, enter a text prompt ✍️ to provide context for the AI model.

3. Click the "Tell me about the image" button 🖱️.

4. View the AI-generated response 🧠 in the app.

# Gemini Model Comparison ⚙️

### Gemini-1.5-Flash

• Supported Inputs: Audio, images, videos, and text.

• Primary Focus: General-purpose tasks like content analysis and versatile image-to-text responses.

• Performance: Optimized for speed and efficiency, making it ideal for dynamic and real-time applications.

### Gemini-1.5-Pro

• Supported Inputs: Audio, images, videos, and text.

• Primary Focus: Complex tasks like code generation, text editing, and problem-solving.

• Performance: Designed for advanced reasoning and in-depth content generation, albeit with slightly higher processing times.

In this project, we use Gemini-1.5-Flash for its speed and versatility 🏃‍♂️. If your use case involves complex reasoning, you may consider switching to Gemini-1.5-Pro.

# Code Workflow 🔧

### 1.Environment Setup:

• Load sensitive information, such as the Google API key, from a .env file to ensure security 🔒.

• Use the dotenv library to streamline environment variable management.

### 2.Image Handling:

• The Pillow (PIL) library handles image uploads and ensures compatibility with the Gemini API.

• Uploaded images are displayed in the app for visual confirmation 🖼️.

### 3.Gemini AI Response:

• The Generative AI model processes the image and optional text prompt to generate a meaningful response.

• The output is displayed as plain text in the app for easy reading.

### 4.Streamlit UI:

• Intuitive input fields for uploading images 📂 and entering prompts ✍️.

• Buttons and sections to display the generated AI response seamlessly.

# Possible Use Cases 📋

• Nutrition analysis 🥗 (Identify food items and calculate calories).

• Image description for accessibility 🌟.

• Generating textual insights from complex visual content.

## Demo 📽

Below is a demonstration of how the application works:

![Demo of the Application](https://github.com/Abdelrahman-Amen/Gemini_Image_Insights/blob/main/Demo.gif)