Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rom1504/image_embeddings
Using efficientnet to provide embeddings for retrieval
https://github.com/rom1504/image_embeddings
computer-vision efficientnet embeddings recommendation-system representation-learning retrieval
Last synced: 13 days ago
JSON representation
Using efficientnet to provide embeddings for retrieval
- Host: GitHub
- URL: https://github.com/rom1504/image_embeddings
- Owner: rom1504
- License: mit
- Created: 2020-06-05T20:51:55.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-05-28T11:15:14.000Z (over 1 year ago)
- Last Synced: 2024-10-23T04:14:37.267Z (21 days ago)
- Topics: computer-vision, efficientnet, embeddings, recommendation-system, representation-learning, retrieval
- Language: Jupyter Notebook
- Homepage: https://rom1504.github.io/image_embeddings/
- Size: 16.2 MB
- Stars: 153
- Watchers: 3
- Forks: 32
- Open Issues: 18
-
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.md
- License: LICENSE
Awesome Lists containing this project
README
# image_embeddings
[![pypi](https://img.shields.io/pypi/v/image_embeddings.svg)](https://pypi.python.org/pypi/image_embeddings)
[![ci](https://github.com/rom1504/image_embeddings/workflows/Continuous%20integration/badge.svg)](https://github.com/rom1504/image_embeddings/actions?query=workflow%3A%22Continuous+integration%22)Using efficientnet to provide embeddings for retrieval. Read the blog post at https://medium.com/@rom1504/image-embeddings-ed1b194d113e
Why this repo ? Embeddings are a widely used technique that is well known in scientific circles. But it seems to be underused and not very well known for most engineers. I want to show how easy it is to represent things as embeddings, and how many application this unlocks. Checkout the [demo](https://rom1504.github.io/image_embeddings/) first!
![knn example](knn_example.png)
## Workflow
1. download some pictures
2. run inference on them to get embeddings
3. simple knn example, to understand what's the point : click on some pictures and see KNN## Simple Install
Run `pip install image_embeddings`
## Example workflow
1. run `image_embeddings save_examples_to_folder --images_count=1000 --output_folder=tf_flower_images`, this will retrieve 1000 image files from https://www.tensorflow.org/datasets/catalog/tf_flowers (but you can also pick any other dataset)
2. produce tf records with `image_embeddings write_tfrecord --image_folder=tf_flower_images --output_folder=tf_flower_tf_records --shards=10`
3. run the inference with `image_embeddings run_inference --tfrecords_folder=tf_flower_tf_records --output_folder=tf_flower_embeddings`
4. run a random knn search on them `image_embeddings random_search --path=tf_flower_embeddings`Optionally if you want to use the embeddings in numpy (in other languages), run `image_embeddings embeddings_to_numpy --input_path=tf_flower_embeddings --output_path=tf_flower_numpy`. In particular this can be used in the [web demo](https://github.com/rom1504/image_embeddings/tree/web)
```
$ image_embeddings random_search --path=tf_flower_embeddings
image_roses_261
160.83 image_roses_261
114.36 image_roses_118
102.77 image_roses_537
92.95 image_roses_659
88.49 image_roses_197
```Explore the [Simple notebook](notebooks/using_the_lib.ipynb) for more details.
You can try it locally or [try it in colab](https://colab.research.google.com/github/rom1504/image_embeddings/blob/master/notebooks/using_the_lib.ipynb)
The [From scratch](notebooks/from_scratch.ipynb) notebook provides an explanation on how to build this from scratch.
## API
### image_embeddings.downloader
Downloader from tensorflow datasets. Any other set of images could be used instead
#### image_embeddings.downloader.save_examples_to_folder(output_folder, images_count=1000, dataset="tf_flowers")
Save https://www.tensorflow.org/datasets/catalog/tf_flowers to folder
Also works with other tf datasets### image_embeddings.inference
Create tf recors from images files, and apply inference with an efficientnet model. Other models could be used.
#### image_embeddings.inference.write_tfrecord(image_folder, output_folder, num_shards=100)
Write tf records from an image folders
#### image_embeddings.inference.run_inference(tfrecords_folder, output_folder, batch_size=1000)
Run inference on provided tf records and save to folder the embeddings
### image_embeddings.knn
Convenience methods to read, build indices and apply search on them. These methods are provided as example.
Use [faiss](https://github.com/facebookresearch/faiss) directly for bigger datasets.#### image_embeddings.knn.read_embeddings(path)
Run embeddings from path and return a tuple with
* embeddings as a numpy matrix
* an id to name dictionary
* a name to id dictionary#### image_embeddings.knn.build_index(emb)
Build a simple faiss inner product index using the provided matrix of embeddings
#### image_embeddings.knn.search(index, id_to_name, emb, k=5)
Search the query embeddings and return an array of (distance, name) images
#### image_embeddings.knn.display_picture(image_path, image_name)
Display one picture from the given path and image name in jupyter
#### image_embeddings.knn.display_results(image_path, results)
Display the results from search method
#### image_embeddings.knn.random_search(path)
Load the embeddings, apply a random search on them and display the result
#### image_embeddings.knn.embeddings_to_numpy(input_path, output_folder)
Load the embeddings from the input folder as parquet and save them as
* json for the id -> name mapping
* numpy for the embeddingsParticularly useful to read the embeddings from other languages
## Advanced Installation
### Prerequisites
Make sure you use `python>=3.6` and an up-to-date version of `pip` and
`setuptools`python --version
pip install -U pip setuptoolsIt is recommended to install `image_embeddings` in a new virtual environment. For
examplepython3 -m venv image_embeddings_env
source image_embeddings_env/bin/activate
pip install -U pip setuptools
pip install image_embeddings### Using Pip
pip install image_embeddings
### From Source
First, clone the `image_embeddings` repo on your local machine with
git clone https://github.com/rom1504/image_embeddings.git
cd image_embeddings
make installTo install development tools and test requirements, run
make install-dev
## Test
To run unit tests in your current environment, run
make test
To run lint + unit tests in a fresh virtual environment,
runmake venv-lint-test
## Lint
To run `black --check`:
make lint
To auto-format the code using `black`
make black
## Tasks
* [x] simple downloader in python
* [x] simple inference in python using https://github.com/qubvel/efficientnet
* [x] build python basic knn example using https://github.com/facebookresearch/faiss
* [x] build basic ui using lit element and some brute force knn to show what it does, put in github pages
* [x] use to illustrate embeddings blogpost