{"id":13469331,"url":"https://github.com/algolia/color-extractor","last_synced_at":"2025-06-19T15:44:44.200Z","repository":{"id":10021997,"uuid":"60194806","full_name":"algolia/color-extractor","owner":"algolia","description":"Extract the dominant color(s) of your fashion articles!","archived":false,"fork":false,"pushed_at":"2022-11-22T01:11:58.000Z","size":487,"stargazers_count":282,"open_issues_count":11,"forks_count":71,"subscribers_count":80,"default_branch":"master","last_synced_at":"2025-04-25T11:53:16.018Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://algolia.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/algolia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-06-01T16:49:23.000Z","updated_at":"2025-04-01T20:08:11.000Z","dependencies_parsed_at":"2023-01-11T17:45:30.190Z","dependency_job_id":null,"html_url":"https://github.com/algolia/color-extractor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/algolia/color-extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fcolor-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fcolor-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fcolor-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fcolor-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/algolia","download_url":"https://codeload.github.com/algolia/color-extractor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fcolor-extractor/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260781833,"owners_count":23062310,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T15:01:34.103Z","updated_at":"2025-06-19T15:44:39.180Z","avatar_url":"https://github.com/algolia.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Color Extractor\n\nThis project is both a library and a CLI tool to extract the dominant colors of\nthe main object of an image. Most of the preprocessing steps assume that the\nimages are related to e-commerce, meaning that the objects targeted by the\nalgorithms are supposed to be mostly centered and with a fairly simple\nbackground (single color, gradient, low contrast, etc.). The algorithm may\nstill perform if any of those two conditions is not met, but be aware that its\nprecision will certainly be hindered.\n\n***A blog post describing this experiment can be found [here](https://blog.algolia.com/how-we-handled-color-identification/).***\n\n\u003e Note: this project is released as-is, and is no longer maintained by us,\n\u003e however feel free to edit the code and use as you see fit.\n\n## Installation\n\nThe script and the library are currently targeting python 3 and won't work with\npython 2.\n\nMost of the dependencies can be installed using\n\n```sh\npip install -r requirements.txt\n```\n\nNote that library and the CLI tool also depend on opencv 3.1.0 and its python 3\nbindings.\nFor Linux users, the steps to install it are available\n[here](http://www.pyimagesearch.com/2015/07/20/install-opencv-3-0-and-python-3-4-on-ubuntu/).\nFor OSX users, the steps to install it are available\n[here](http://www.pyimagesearch.com/2015/06/29/install-opencv-3-0-and-python-3-4-on-osx/).\n\nYou then just have to ensure that this repository root is present in your\n`PYTHONPATH`.\n\n## Color tagging\n\nSearching objects by color is a common practice while browsing e-commerce\nweb sites and relying only on the description and the title of the object may\nnot be enough to provide top-notch relevancy.\nWe propose this tool to automatically associate color tags to an image by\ntrying to guess the main object of the picture and extracting its dominant\ncolor(s).\n\nThe design of the library can be viewed as a pipeline composed of several\nsequential processing. Each of these processings accepts several options in order\nto tune its behavior to better fit your catalog.\nThose processings are (in order):\n\n1. Resizing and cropping\n\n2. Background detection\n\n3. Skin detection\n\n4. Clustering of remaining pixels\n\n5. Selection of the _best_ clusters\n\n6. Giving color names to clusters\n\n### Usage\n\nThe library can be used as simply as this:\n\n```python\nimport cv2\nimport numpy as np\n\nfrom color_extractor import ImageToColor\n\nnpz = np.load('color_names.npz')\nimg_to_color = ImageToColor(npz['samples'], npz['labels'])\n\nimg = cv2.imread('image.jpg')\nprint(img_to_color.get(img))\n```\n\nThe CLI tool as simply as this:\n\n```sh\n./color-extractor color_names.npz image.jpg\n\u003e red,black\n```\n\nThe file `color_names.pnz` can be found in this repository.\n\n### Passing Settings\n\nAll algorithms can be used right out of the box thanks to settings tweaked for\nthe larger range of images possible. Because these settings don't target any\nspecial kind of catalog, changing them may cause a gain of precision.\n\nSettings can be passed at three different levels.\n\nThe lowest level is at the algorithm-level. Each algorithm is embodied by a\npython class which accepts a `settings` dictionary. This dictionary is then\nmerged with its default settings. The given settings have precedence over the\ndefault one.\n\nA slightly higher level still concerns the library users. The process of chaining\nall those algorithms together is also embedded in 3 classes called `FromJson`,\n`FromFile` and `ImageToColor`. Those three classes also take a `settings`\nparameter, composed of several dictionary to be forwarded to each algorithm.\n\nThe higher level is to pass those settings to the CLI tool. When passing the\n`--settings` option with a JSON file the latter is parsed as a dictionary and\ngiving to the underlying `FromJson` or `FromFile` object (which in turn will\nforward to the individual algorithms).\n\n### Resizing and Cropping\n\nThis step is available as the `Resize` class.\n\nPictures with a too high resolution have too much details that can be considered\nas noise when the goal is to find the most dominant colors. Moreover, smaller\nimages mean faster processing time. Most of the testing has been done on\n`100x100` images, and it is usually the best compromise between precision and\nspeed.\nMost of the time the object of the picture is centered, cropping can make sense\nin order to reduce the quantity of background and ease its removal.\n\nThe available settings are:\n\n- `'crop'` sets the cropping ratio. A ratio of `1.` means no cropping.\n  Default is `0.9`.\n\n- `'rows'` gives the number of rows to reduce the image to. The columns are\n  computed to keep the same ratio.\n  Default is `100`.\n\n### Background Detection\n\nThis step is available as the `Back` class.\n\nThis algorithm tries to discard the background from the foreground by combining\ntwo simple algorithms.\n\nThe first algorithm takes the colors of the four corners of the image and treat\nas background all pixels _close_ to those colors.\n\nThe second algorithm uses a Sobel filter to detect edges and then runs a\nflood fill algorithm from all four corners. All pixels touched by the flood fill\nare considered background.\n\nThe masks created by the two algorithms are then combined together with a\nlogical `or`.\n\nThe available settings are:\n\n- `'max_distance'` sets the maximum distance for two colors to be considered\n  close by the first algorithm. A higher value means more pixels will be\n  considered as background.\n  Default is `5`.\n\n- `'use_lab'` converts pixels to the LAB color space before using the first\n  algorithm. The conversion makes the process a bit more expensive but the\n  computed distances are closer to human perception.\n  Default is `True`.\n\n### Skin Detection\n\nThis step is available as the `Skin` class.\n\nWhen working with fashion pictures models are usually present in the picture.\nThe main problem is that their skin color can be confused with the object color\nand yield to incorrect tags. One way to avoid that is to ignore ranges of colors\ncorresponding to common color skins.\n\nThe available settings are:\n\n- `'skin_type'` The skin type to target. At the moment only `'general'` and\n  `'none'` are supported. `'none'` returns an empty mask every time,\n  deactivating skin detection.\n  Default is `'general'`.\n\n\n### Clustering\n\nThis step is available as the `Cluster` class.\n\nAs we want to find the most dominant color(s) of an object, grouping them into\nbuckets allows us to retain only a few ones and to have a sense of which are the\nmost present.\nThe clustering is done using the K-Means algorithm. K-Means doesn't result\nin the most accurate clusterings (compared to Mean Shift for example) but its\nspeed certainly compensate. Before all images are different, it's hard to\nuse a fixed number of clusters for the entire catalog. We implemented a method\nthat tries to find an optimal number of clusters called the\n[jump](https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set#An_Information_Theoretic_Approach)\nmethod.\n\nThe available settings are:\n\n- `'min_k'` The minimum number of clusters to consider.\n  Default is `2`.\n\n- `'max_k'` The maximum number of clusters to consider. Allowing more clusters\n  results in greater computing times.\n  Default is `7`.\n\n### Selection of Clusters\n\nThis step is available as the `Selector` class.\n\nOnce clusters are made, all of them may not be worth a color tag: some may be\nvery tiny for example. The purpose of this step is to only keep the clusters\nthat are worth it.\nWe implemented different way of selecting clusters:\n\n- `'all'` keeps all clusters.\n\n- `'largest'` keeps only the largest cluster.\n\n- `'ratio'` keeps the biggest clusters until their total number of pixels\n  exceeds a certain percentage of all clustered pixels.\n\nWhile the outcome of `all` is quite obvious, the use of `largest` versus\n`ratio` is trickier. `largest` will yield very few colors, meaning the chance\nof assigning a tag not really relevant is greatly diminished. On the other\nhand objects with two colors in equal quantity will see one of them discarded.\nIt's up to you to decide which one behaves the best with your catalog.\n\nThe available settings are:\n\n- `'strategy'`: The strategy to used among `'all'`, `'largest'` and `'ratio'`.\n  Default is `'largest'`.\n\n- `'ratio.threshold'`: The percentage of clustered pixels to target while\n  selecting clusters with the `'ratio'` strategy.\n  Default is `0.75`.\n\n### Naming Color Values\n\nThis step is available as the `Name` class.\n\nThe last step is to give human readable color names to RGB values. To solve\nthis last step we use a K Nearest Neighbors algorithm applied to a large\ndictionary of colors taken from the XKCD color survey. Because of the erratic\ndistribution of colors (some colors are far more represented that others) a\nKNN behaves in most cases better than more statistical classifiers.\nThe \"learning\" phase of the classifier is done when the object is built, and\nrequires that two arrays are passed to its constructor: an array of BGR colors\nand an array of the corresponding names. When using the CLI tool, the path\nto an `.npz` numpy archive containing those two matrices must be given.\n\nEven if the algorithm used defaults to KNN, it's still possible to use a custom\nclass to do it. The supplied class must support a `fit` method in lieu of\ntraining phase and a `predict` method for the actual classification.\n\nThe available settings are:\n\n- `'algorithm'` The algorithm to use to perform the classification. Must be\n  either `'knn'` or `'custom'`. If custom is given, `'classifier.class'` must\n  also be given.\n  Default is `'knn'`\n\n- `'hard_monochrome'` Monochrome colors (especially gray) may be hard to\n  classify, this option makes use of a built in way of qualifying colors as\n  \"white\", \"gray\" or \"black\". It uses the rejection of the color vector against\n  the gray axis and uses a threshold to determine whether or not the color can\n  be considered monochrome and the luminance to classify it as \"black\", \"white\"\n  or \"gray\".\n  Default is `True`.\n\n- `'{gray,white,black}_name'` When using `'hard_monochrome'` changes the name\n  actually given to \"gray\", \"white\" and \"black\" respectively. Useful when\n  wanting color names in another language.\n  Default is `\"gray\"`, `\"white\"` and `\"black\"`\n\n- `'classifier.args'` Arguments passed to the classifier constructor. Default\n  one are provided for `'knn'` being\n  `{\"n_neighbors\": 50, \"weights\": \"distance\", \"n_jobs\": -1}`. The possible\n  arguments are the ones available to the scikit-learn implementation of the\n  `KNeighborsClassifier`.\n\n- `'classifier.scale'` Many classification algorithms make strong assumption\n  regarding the distribution of the samples, and may need some kind of\n  standardization of the data to behave better. This settings controls the\n  application of such a standardization before training and prediction.\n  Default is `True` but is ignored when using `'knn'`.\n\n### Complete Processing\n\nInstead of instantiating each of the aforementioned classes, you can simply use\n`ImageToColor` or `FromFile`. Those two classes take the same\narguments for their construction.\n\n- An array of BGR colors to learn how to associate color names to color values.\n\n- An array of strings corresponding to the labels of the previous array.\n\n- A dictionary of settings to be passed to each processing.\n\nThe dictionary can have the following keys:\n\n- `'resize'` settings to be given to the `Resize` object\n\n- `'back'` settings to be given to the `Back` object\n\n- `'skin'` settings to be given to the `Skin` object\n\n- `'cluster'` settings to be given to the `Cluster` object\n\n- `'selector'` settings to be given to the `Selector` object\n\n- `'name'` settings to be given to the `Name` object\n\n\nThe main difference is the source of the image used. `ImageToColor` expects a\nnumpy array while `FromFile` expects both a local path or a URL where the\nimage can be (down)loaded from.\n\n### Enriching JSON\n\nBecause we want Algolia customers to be able to enrich their JSON records easily\nwe provide a class able to stream JSON and add color tags on the fly.\nThe object is initialized with the same arguments as `FromFile` plus the name\nof the field where the URI of the images can be found. While reading the JSON\nfile if the given name is encountered the corresponding image is downloaded and\nits colors computed. Those colors are then added to the JSON object under the\nfield `_color_tags`. The name of this field can be changed thanks to an optional\nparameter of the constructor.\n\nEnriching JSON can be used directly from the command line as this:\n\n```sh\n./color-extractor -j color_names.npz file.json\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falgolia%2Fcolor-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falgolia%2Fcolor-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falgolia%2Fcolor-extractor/lists"}