Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/manuelfay/imagesearcher

This repository aims to implement an Image Search engine powered by the CLIP model.
https://github.com/manuelfay/imagesearcher

clip google google-image-search image image-search-engine neural-image-search openai python search search-engine

Last synced: 2 months ago
JSON representation

This repository aims to implement an Image Search engine powered by the CLIP model.

Host: GitHub
URL: https://github.com/manuelfay/imagesearcher
Owner: ManuelFay
License: mit
Created: 2021-07-05T21:05:34.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2022-07-15T14:44:57.000Z (over 2 years ago)
Last Synced: 2024-11-15T23:10:42.566Z (3 months ago)
Topics: clip, google, google-image-search, image, image-search-engine, neural-image-search, openai, python, search, search-engine
Language: Python
Homepage:
Size: 2.1 MB
Stars: 39
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # ImageSearcher

### Leveraging CLIP to perform image search on personal pictures

This repository implements an Image Search engine on local photos powered by the CLIP model.

It is surprisingly precise and is able to find images given complex queries. For more information, refer to

the Medium blogpost [here](https://medium.com/@manuelfaysse/building-a-powerful-image-search-engine-for-your-pictures-using-deep-learning-16d06df10385?source=friends_link&sk=ca5130cb63a1fcb3a3e3f54ff494e56b).

The added functionality of classifying pictures depending on the persons portrayed is implemented 

with the `face_recognition` library. Several filters are also available, enabling you to find your 

group pictures, screenshots, etc...

## Setup

In a Python 3.8+ virtual environment, install either from PIP or from source:

#### Installation from the PIP package:

```bash

pip install image-searcher

pip install face_recognition  # Optional to enable face features

pip install flask flask_cors  # Optional to enable a flask api

```

#### Installation from source

```bash

pip install -r dev_requirements.txt

pip install face_recognition  # Optional to enable face features

pip install flask flask_cors  # Optional to enable a flask api

```

Troubleshooting: If problems are encountered building wheels for dlib during the face_recognition installation, make sure to install the `python3.8-dev`

package (respectively `python3.x-dev`) and recreate the virtual environment from scratch with the aforementionned command 

once it is installed.

## Usage

Currently, the usage is as follows. The library first computes the embeddings of all images one by one, and stores them in 

a picked dictionary for further reference. To compute and store information about the persons in 

the picture, enable the `include_faces` flag (note that it makes the indexing process up to 10x slower).

```python

from image_searcher import Search

searcher = Search(image_dir_path="/home/manu/perso/ImageSearcher/data/", 

                  traverse=True, 

                  include_faces=False)

```

Once this process has been done once, through Python, the library is used as such:

```python

from image_searcher import Search

searcher = Search(image_dir_path="/home/manu/perso/ImageSearcher/data/", 

                  traverse=True, 

                  include_faces=False)

# Option 1: Pythonic API

from PIL import Image

ranked_images = searcher.rank_images("A photo of a bird.", n=5)

for image in ranked_images:

    Image.open(image.image_path).convert('RGB').show()

# Option 2: Launch Flask api from code

from image_searcher.api import run

run(searcher=searcher)

```

### Using tags in the query

Adding tags at the end of the query (example: `A bird singing #photo`) will filter the search based on the tag list.

Supported tags for the moment are:

  - \#{category}: Amongst "screenshot", "drawing", "photo", "schema", "selfie"

  - \#groups: Group pictures (more than 5 people)

To come is support for:

  

  - \#dates: Filtering based on the time period

### Running through the API for efficient use

After having indexed the images of interest, a Flask API can be used to load models once and then search efficiently.

#### Specify a Config YAML file:

  

```yaml

image_dir_path: /home/manu/Downloads/facebook_logs/messages/inbox/

save_path: /home/manu/

traverse: true

include_faces: true

reindex: false

n: 42

port:

host:

debug:

threaded:

```

#### Start a server:

```python

from image_searcher.api import run

# Option 1: Through a config file

run(config_path="path_to_config_file.yml")

# Option 2: Through an instanciated Search object

from image_searcher import Search

run(searcher=Search(image_dir_path="/home/manu/perso/ImageSearcher/data/", 

                    traverse=True, 

                    include_faces=False))

```

A gunicorn process can also be launched locally with:

```bash

gunicorn "api.run_flask_gunicorn:create_app('path_to_config_file.yml')" \

    --name image_searcher \

    --bind 0.0.0.0:${GUNICORN_PORT:-5000} \

    --worker-tmp-dir /dev/shm \

    --workers=${GUNICORN_WORKERS:-2} \

    --threads=${GUNICORN_THREADS:-4} \

    --worker-class=gthread \

    --log-level=info \

    --log-file '-' \

    --timeout 30

```

Note: Adapt the timeout parameter (in seconds) if a lot of new images are being indexed/

#### Query it:

- By opening in a browser the webpage with the demo search engine `search.html`.

- Through the API endpoint online: `http://127.0.0.1:5000/get_best_images?q=a+photo+of+a+bird`

- In Python:

```python

import requests

import json

import urllib.parse

query = "a photo of a bird"

r = requests.get(f"http://127.0.0.1:5000/get_best_images?q={urllib.parse.quote(query)}")

print(json.loads(r.content)["results"])

```

### Tips

Using this tool with vacation photos, or Messenger and Whatsapp photo archives leads to rediscovering 

old photos and is amazing at locating long lost ones.

## Tests

Run the tests with 

```bash

python -m unittest

```

and lint with:

```bash

pylint image_searcher

```

## Contributing

This repo is a work in progress that has recently been started. As is, it computes about 10 images per second during the initial indexing phase, then is almost instantaneous during the querying phase.

Feature requests and contributions are welcomed. Improvements to the Search Web interface would also

be greatly appreciated !

## Todo list

Simplify and robustify the Search class instanciation:

- Check indexation arguments are compatible with pre-loaded file

- Store indexation arguments in pre-loaded file and give option to index new pictures with these options

- Add the option to index for faces on previously CLIP indexed images

Speed:

- Parallel indexation / dynamic batching based on image size

- Data loader before indexation

- Optimized vector computation with optimized engine (FAISS)

Features:

- Image auto-tagging (screenshot, drawing, photo, nature, group picture, selfie, etc)

- Image deduplication (perceptual hashing)

Embedding files:

- Integrate with local version control (git-lfs ?)

Frontend:

- Overall UX and design changes

- Enable Image upload

Deployment:

- Dockerize and orchestrate containers (image uploader, storage, indexation pipeline, inference)