https://github.com/bigsk1/vision-image-gen

Openai GPT Vision - Dalle3 - CLI & Streamlit UI Image generator based on your input
https://github.com/bigsk1/vision-image-gen

dalle-3 gpt gpt-4-vision-preview image-generation-ai openai openai-api python streamlit-webapp

Last synced: 2 days ago
JSON representation

Openai GPT Vision - Dalle3 - CLI & Streamlit UI Image generator based on your input

Host: GitHub
URL: https://github.com/bigsk1/vision-image-gen
Owner: bigsk1
Created: 2024-02-07T03:24:19.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-02-07T09:22:53.000Z (over 1 year ago)
Last Synced: 2025-04-16T05:19:51.724Z (6 months ago)
Topics: dalle-3, gpt, gpt-4-vision-preview, image-generation-ai, openai, openai-api, python, streamlit-webapp
Language: Python
Homepage: https://www.youtube.com/watch?v=Eh7atfdpRAo
Size: 5.36 MB
Stars: 5
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.MD
- Funding: .github/FUNDING.yml

Awesome Lists containing this project

README

          ## AI Image Generator

This project leverages OpenAI's GPT Vision and DALL-E models to analyze images and generate new ones based on user modifications. It provides two interfaces: a web UI built with Streamlit for interactive use and a command-line interface (CLI) for direct script execution.

Features

-    Image Analysis: Automatically describes images using GPT-4 Vision.

-    Image Generation: Generates modified images based on user inputs using DALL-E 3.

-    Web Interface: Interactive web UI for easy operation.

-    CLI: Command-line version for script or batch processing.

![Vision Image Gen](https://imagizer.imageshack.com/img924/4660/6TcFez.jpg)

### How it Works:

 - The app first downloads the image from the provided URL or path locally and analyzes it using the pre-trained AI model gpt-4-vision-preview to generate a description.

- You're then given the opportunity to modify this description to guide the image generation process, the original description from the vision model and your included description are used.

- Finally, the app uses DALL-E 3 to generate a new image 1790x1024 based on the modified description.

- You can see the original image and then newly created image. Right click to save. 

[Youtube Video Showing how it works](https://www.youtube.com/watch?v=Eh7atfdpRAo)

## Installation

Tested in Python 3.11.4

Clone the repository to your local machine:

```bash

git clone https://github.com/bigsk1/vision-image-gen.git

cd vision-image-gen

```

Install the required dependencies:

pip install -r requirements.txt

## Usage

Web UI

To start the web interface, run:

```bash

streamlit run vision_image_gen_ui_local.py

```

Navigate to the URL provided by Streamlit,  http://localhost:8501, in your web browser. Enter you Open AI API Key or Have your Open Ai Api key added to your system enviroment variables in PATH

-    Upload an Image: Use the provided input to upload an image or specify an image URL.

-    View Analysis: See the AI-generated description of the image.

-    Modify and Generate: Enter modifications to the original description and generate a new image.

-    View and Save: The generated image will be displayed, and you can save it locally.

## CLI Version

The CLI version allows you to process images directly from your terminal.

```bash

python vision_image_gen.py

```

## Using Streamlit Cloud Sharing

Use the vision_image_gen_ui.py for Streamlit Cloud sharing, in the settings just add 

```bash

[openai]

api_key = "sk-paste-your-api-key"

```

## Example of output

```python

==================================================

Vision Response:

==================================================

The image shows a computer terminal interface with ASCII art and text. At the top would be ASCII art resembling a face with a pattern of "#" and "." characters. Below it, within a minimalist window frame, is a navigation menu with options depicted as a pixel-style globe icon labeled "sumfetch," a document icon labeled "ABOUT," a link icon labeled "Website," a folder icon labeled "This Repo," and a series of contact methods including an email address, GitHub URL, and Twitter handle, all associated with the username "bigsk1". The central feature is a bold ASCII art logo or emblem saying "BIGSK1" inside a stylized circular border.

For a text-to-image model, you could describe it as follows:

"Create an image of a dark computer terminal screen with a pixelated face made out of ASCII characters at the top. Include a stylized ASCII art logo that says 'BIGSK1' in the center, enclosed in a circular patterned border. Below the logo, depict a simple user interface with text and monochrome icons signifying navigation options, including a globe for 'sumfetch,' a document for 'ABOUT,' a link chain for 'Website,' and a folder for 'This Repo.' Add additional details

==================================================

User's Modification Input:

==================================================

make it in the style of an american flag

==================================================

Final Prompt Sent to DALL-E 3:

==================================================

The image shows a computer terminal interface with ASCII art and text. At the top would be ASCII art resembling a face with a pattern of characters. Below it, within a minimalist window frame, is a navigation menu with options depicted as a pixel-style globe icon labeled "sumfetch," a document icon labeled "ABOUT," a link icon labeled "Website," a folder icon labeled "This Repo," and a series of contact methods including an email address, GitHub URL, and Twitter handle, all associated with the username "bigsk1". The central feature is a bold ASCII art logo or emblem saying "BIGSK1" inside a stylized circular border.

For a text-to-image model, you could describe it as follows:

"Create an image of a dark computer terminal screen with a pixelated face made out of ASCII characters at the top. Include a stylized ASCII art logo that says 'BIGSK1' in the center, enclosed in a circular patterned border. Below the logo, depict a simple user interface with text and monochrome icons signifying navigation options, including a globe for 'sumfetch,' a document for 'ABOUT,' a link chain for 'Website,' and a folder for 'This Repo.' Add additional details make it in the style of an american flag

```

Example image in the original_image folder this is were your downloaded images will end up.  

The generated_images folder is were the new Dalle generated image will end up.

This is a work in progress, more to add soon.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bigsk1/vision-image-gen

Awesome Lists containing this project

README