https://github.com/loglux/shutterstockimageanalyzer

Shutterstock Image Analyzer is a Python tool that leverages AI to analyze images, generate Shutterstock-compatible metadata (description, keywords, categories), and save the output in CSV format for easy image contribution.
https://github.com/loglux/shutterstockimageanalyzer

ai artificial-intelligence llama3-2-vision ollama ollama-api ollama-client photography shutterstock

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/loglux/shutterstockimageanalyzer
Owner: loglux
Created: 2024-12-27T11:48:51.000Z (5 months ago)
Default Branch: master
Last Pushed: 2025-01-06T14:04:15.000Z (4 months ago)
Last Synced: 2025-02-19T18:16:33.681Z (3 months ago)
Topics: ai, artificial-intelligence, llama3-2-vision, ollama, ollama-api, ollama-client, photography, shutterstock
Language: Python
Homepage:
Size: 62.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# **Shutterstock Metadata Generator Powered by AI**

This project uses **Ollama** and advanced AI models to automate metadata generation for Shutterstock images. By analyzing images and extracting meaningful details, it streamlines the process of creating descriptive text, keywords, and categories, saving time and reducing manual effort.

This tool is ideal for photographers, content creators, and agencies aiming to focus more on creativity and less on tedious metadata tasks.

### **What It Does**

1. **Description**: Generates a descriptive text summarizing the image.
2. **Keywords**: Produces a list of relevant keywords.
3. **Categories**: Suggests appropriate categories for the image.
4. **Image Classification**: Identifies if the image is commercial or editorial.
5. **Mature Content Detection**: Flags images with nudity, violence, or unsuitable content.
6. **Illustration Detection**: Recognizes digitally created or heavily manipulated images.

----------

## **Features**

- Utilizes the `llama3.2-vision` model hosted locally.
- Parses AI responses to provide clean and consistent metadata.
- Outputs results in Shutterstock-compatible CSV format.
- Supports customizable prompts and advanced analysis options.
- Recursive directory processing for batch operations.

----------

## **Prerequisites**

- **Ollama** installed and configured locally with GPU support.
- Recommended hardware: Modern NVIDIA RTX series GPU.
- Tested and works flawlessly on **NVIDIA RTX 4070**.
- **Python 3.10+**.
- Required Python libraries:
- `ollama`
- `pandas`
- `pydantic`

----------

## **Installation**

1. Clone the repository:

```bash
git clone https://github.com/loglux/ShutterstockImageAnalyzer.git
cd ShutterstockImageAnalyzer`
```

2. Install dependencies:

```bash
pip install ollama pandas pydantic
```

3. Ensure **Ollama** is running on your local machine:

```bash
ollama run llama3.2-vision
```

----------

## **Usage**

### **1. Configuration**

Update `base_url` to match your Ollama instance:
```python
base_url = "http://localhost:11434/"
```

Set the paths for the image directory and output file:
```python
image_directory_path = r"ShutterstockImageAnalyzer"
csv_file_path = r"shutterstock.csv"
```
Enable or disable recursive processing:
```python
recursive=True # Process subfolders
recursive=False # Only process files in the main directory
```

### **2. Run the Script**
Execute the script to process images:
```bash
python image_analyzer.py`
```

### **3. View the Results**

The metadata will be saved to `shutterstock.csv` in the project directory.

### **4. Customize Prompts and Options**

Modify the prompt or advanced options to fine-tune model behavior:

- **Custom Prompt**:
```python
prompt = "Describe the image, highlighting its objects and their relationships."
```

- **Advanced Options**:
```python
advanced_options = {"temperature": 0.7, "top_p": 0.95}
```

### **5. Process a Directory**

Analyze all images in a directory:
```python
analyzer.process_images_in_directory(
directory_path="path/to/images",
file_path="results.csv",
prompt="Analyze this image and provide details...",
advanced_options={"temperature": 0.6},
recursive=True # Enable subfolder processing
)
```

----------

## **Default Behaviors**

- **Recursive Directory Search**: Enabled by default.
- **Prompt and Options**: Defaults to a generic prompt and balanced settings unless specified.

### **Custom Execution**

To analyze a single image, modify the script’s `__main__` block or use the following method:

```python
analyzer.start_analysis(
image_path="path/to/image.jpg",
file_path="results.csv",
prompt="Analyze this image and generate metadata."
)
```
----------

## **File Details**

- **`image_analyzer.py`**: Core script for metadata generation and processing.
- **`shutterstock.csv`**: Output file containing structured metadata.
----------
## Optional `Hint` Feature for Image Analysis

### Overview:
This tool now supports an optional `hint` parameter when processing multiple images in a directory. The `hint` provides additional context (such as location or event) that helps guide the model during the analysis, ensuring that the generated metadata is more relevant and accurate.

### How to use the hint:
To use the `hint` feature, simply pass a string that describes the context or location when calling the `process_images_in_directory` method. For example:

```python
hint = "White Park Bay, Atlantic Ocean"
analyzer.process_images_in_directory(
image_directory_path,
csv_file_path,
prompt=None,
advanced_options=None,
recursive=False,
hint=hint # Provide the location-based hint for all images in the directory
)
```
If the hint is not provided (Default: `hint=None`), the analysis will proceed without the context, and the model will generate metadata based only on the images themselves.

----------

## Prompt Compliance Evaluation
The evaluation method provides a summary of how well the generated results adhere to the prompt's requirements. Please note:
- **Compliance may not always be 100%** due to factors like repetitive starting phrases, insufficient keyword counts, or model-specific constraints.
- The provided statistics are intended to guide improvements in model settings or prompt design.

### Example Usage
```python
compliance_stats = analyzer.evaluate_prompt_compliance(csv_file_path)
print("Compliance with prompt requirements:", compliance_stats)
```
### What the Metrics Represent:
The evaluation method provides detailed metrics to assess how well the generated results adhere to the prompt's requirements. Below is a breakdown of each metric:

- **`description_compliance`**:
- Percentage of descriptions with a length not exceeding 200 characters.
- **Why it matters**: Ensures descriptions are concise and meet platform constraints.

- **`keyword_min_compliance`**:
- Percentage of cases where at least 7 keywords are generated.
- **Why it matters**: Guarantees a minimum level of diversity and relevance in keywords.

- **`keyword_max_compliance`**:
- Percentage of cases where the number of keywords does not exceed 50.
- **Why it matters**: Prevents overly verbose outputs while keeping the keywords manageable.

- **`category_compliance`**:
- Percentage of cases where exactly 1 or 2 categories are returned.
- **Why it matters**: Ensures strict adherence to the requirement of selecting categories from a predefined list without exceeding the allowed number.

- **`description_uniqueness`**:
- Percentage of unique descriptions. The higher, the better.
- **Why it matters**: Encourages variety and creativity in generated outputs, avoiding repetition across multiple results.

- **`duplicate_start_phrases`**:
- Percentage of descriptions starting with the same first 5 words.
- **Why it matters**: A high percentage might indicate formulaic or repetitive outputs, but this metric is less critical for some use cases. Depending on the application, repetitive starting phrases may or may not impact the overall quality of the results.

### Key Notes:
1. **Focus on the critical metrics**:
- Metrics like `description_compliance`, `keyword_min_compliance`, `keyword_max_compliance`, and `category_compliance` are generally more important for ensuring adherence to platform and prompt requirements.

2. **Secondary metrics (`description_uniqueness`, `duplicate_start_phrases`)**:
- While these provide insights into the variety and diversity of outputs, they may not be crucial for all applications.
- For instance, repetitive starting phrases (`duplicate_start_phrases`) might not matter as long as the descriptions are otherwise unique and meet length requirements.

3. **Using the metrics effectively**:
- Use the compliance metrics as a diagnostic tool to identify areas for improvement in the prompt or model settings.
- Treat lower scores on `description_uniqueness` or `duplicate_start_phrases` as a potential improvement area, rather than a strict requirement.

----------
### **Alternative Models**

While the tool is optimized for **`llama3.2-vision`**, it also supports other vision models, such as **`llava-7b`** and **`llava-llama3-8b`**, which come with their own advantages and limitations:

#### **Model Options**

1. **Llama3.2-vision**:

- **Strengths**: Highly accurate, capable of recognizing well-known objects (e.g., "Eiffel Tower" instead of "tower" or "landmark").
- **Weaknesses**: Slightly slower compared to other models.
2. **Llava**:

- **Strengths**: Faster than `llama3.2-vision`, suitable for larger datasets.
- **Weaknesses**: Limited to general concepts like "bird" or "monkey" without specifying exact species or object types.
3. **Llava-llama3**:

- **Strengths**: Extremely fast, capable of creating more emotional and engaging descriptions.
- **Weaknesses**: May not follow specific instructions (e.g., generating at least 7 keywords) and struggles with detailed object recognition.

----------

### **Customizing for Other Models**

When using alternative models, some parameters or parts of the prompt may need to be adapted. For example:

- **Keywords**: Models like `llava` may not generate a specific number of keywords as instructed. You might need to post-process or fine-tune prompts to address this.
- **Descriptions**: While less precise in identifying objects, `llava` models excel at creating emotional and atmospheric descriptions.
- **Categories**: Consider simplifying the list of categories, as some models may perform better with fewer options.

To switch models, update the `model` parameter in the script:

```python
analyzer = ImageAnalyzer(model="llava", base_url="http://localhost:11434/")
```

----------

### **Choosing the Right Model**

If speed is a priority (e.g., for processing thousands of images), models like `llava` or `llava-llama3` might be more suitable. However, if accuracy in recognizing real-world objects and providing detailed classifications (e.g., identifying specific landmarks, species, or distinct features) is essential, `llama3.2-vision` remains the best choice.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/loglux/shutterstockimageanalyzer

Awesome Lists containing this project

README