{"id":17679994,"url":"https://github.com/costajob/inci_score","last_synced_at":"2025-10-29T21:10:41.061Z","repository":{"id":56877525,"uuid":"56043112","full_name":"costajob/inci_score","owner":"costajob","description":"A library that computes the hazard of cosmetic products components, based on the Biodizionario data","archived":false,"fork":false,"pushed_at":"2023-01-09T13:43:38.000Z","size":699,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-27T22:42:25.155Z","etag":null,"topics":["hazard","inci-catalog","inci-score","ingredients","levenshtein","ruby"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/costajob.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-04-12T08:01:12.000Z","updated_at":"2023-01-04T11:58:19.000Z","dependencies_parsed_at":"2023-02-08T12:00:16.084Z","dependency_job_id":null,"html_url":"https://github.com/costajob/inci_score","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/costajob/inci_score","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Finci_score","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Finci_score/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Finci_score/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Finci_score/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/costajob","download_url":"https://codeload.github.com/costajob/inci_score/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Finci_score/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278847695,"owners_count":26056354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hazard","inci-catalog","inci-score","ingredients","levenshtein","ruby"],"created_at":"2024-10-24T09:05:04.495Z","updated_at":"2025-10-13T18:45:12.408Z","avatar_url":"https://github.com/costajob.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Table of Contents\n\n* [Scope](#scope)\n* [INCI catalog](#inci-catalog)\n* [Computation](#computation)\n  * [Component matching](#component-matching)\n  * [Sources](#sources)\n* [Installation](#installation)\n* [Usage](#usage)\n  * [Library](#library)\n  * [CLI](#cli)\n* [Benchmarks](#benchmark)\n  * [Levenshtein in C](#levenshtein-in-c)\n  * [Run benchmarks](#run-benchmarks)\n\n## Scope\nThis gem computes the score of cosmetic components basing on the information provided by the [Biodizionario site](http://www.biodizionario.it/) by Fabrizio Zago.\n\n## INCI catalog\n[INCI](https://en.wikipedia.org/wiki/International_Nomenclature_of_Cosmetic_Ingredients) catalog is fetched directly by the bidizionario site and kept in memory.  \nCurrently there are more than 5000 components with a hazard score that ranges from 0 (safe) to 4 (dangerous).\n\n## Computation\nThe computation takes care to score each component of the cosmetic basing on:\n* its hazard basing on the biodizionario score\n* its position on the list of ingredients\n\nThe total score is then calculated on a percent basis.\n\n### Component matching\nSince the ingredients list could come from an unreliable source (e.g. data scanned from a captured image), the gem tries to fuzzy match the ingredients by using different algorithms:\n* exact matching\n* [edit distance](https://en.wikipedia.org/wiki/Levenshtein_distance) behind a specified tolerance\n* known hazards (ie ending in `ethicone`) \n* first relevant matching digits \n* matching splitted tokens\n\n### Sources\nThe library accepts the list of ingredients as a single string of text.  \nSince this source could come from an OCR program, the library performs a normalization by stripping invalid characters and removing the unimportant parts.  \nThe ingredients are typically separated by comma, although normalizer will detect the most appropriate separator:\n\n```\n\"Ingredients: Aqua, Disodium Laureth Sulfosuccinate, Cocamidopropiyl\\nBetaine\"\n```\n\n## Installation\nInstall the gem from your shell:\n\n```shell\ngem install inci_score\n```\n\n## Usage\n\n### Library\nYou can include this gem into your own library and start computing the INCI score:\n\n```ruby\nrequire \"inci_score\"\n\ninci = InciScore::Computer.new(src: 'aqua, dimethicone').call\ninci.score # 56.25\ninci.precision # 100.0\n```\n\nAs you see the results are wrapped by an *InciScore::Response* object, this is useful when dealing with the CLI (read below).\n\n#### Unrecognized components\nThe API treats unrecognized components as a common case by just marking the object as non valid.  \nIn such case the score is computed anyway by considering only recognized components.  \nYou can check the `precision` value, which is zero for unrecognized components, and changes based on the applied recognizer rule (100% when exact matching).\n\n```ruby\ninci = InciScore::Computer.new(src: 'ingredients:aqua,noent1,noent2')\ninci.valid? # false\ninci.score # 100.0\ninci.precision # 33.33\ninci.unrecognized # [\"noent1\", \"noent2\"]\n```\n\n### CLI\nYou can collect INCI data by using the available CLI interface:\n\n```shell\ninci_score --src=\"ingredients: aqua, dimethicone, pej-10, noent\"\n\nTOTAL SCORE:\n      \t53.22\nPRECISION:\n      \t71.54\nCOMPONENTS:\n      \taqua (0), dimethicone (4), peg-10 (3)\nUNRECOGNIZED:\n      \tnoent\n```\n\n#### Getting help\nYou can get CLI interface help by:\n\n```shell\nUsage: inci_score --src=\"aqua, parfum, etc\"\n    -s, --src=SRC                    The INCI list: \"aqua, parfum, etc\"\n    -h, --help                       Prints this help\n```\n\n## Benchmarks\n\n### Levenshtein in C\nI noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.  \nI profiled the code by using the [benchmark-ips](https://github.com/evanphx/benchmark-ips) gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.  \n\nAfter some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward [Ruby Inline](https://github.com/seattlerb/rubyinline) library to call the C code straight from Ruby, gaining an order of magnitude in speed (x30).\n\n### Run benchmarks\nOnce downloaded source code, run the benchmarks by:\n\n```shell\nbundle exec rake bench\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcostajob%2Finci_score","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcostajob%2Finci_score","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcostajob%2Finci_score/lists"}