{"id":17832421,"url":"https://github.com/q-m/food-ingredient-parser-ruby","last_synced_at":"2025-03-19T10:30:58.080Z","repository":{"id":56847416,"uuid":"136165747","full_name":"q-m/food-ingredient-parser-ruby","owner":"q-m","description":"Extract the structure of ingredient lists on food products","archived":false,"fork":false,"pushed_at":"2024-10-24T12:56:26.000Z","size":12812,"stargazers_count":16,"open_issues_count":10,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-28T01:37:11.947Z","etag":null,"topics":["food-additives","food-products","ingredient-lists","ingredients","parser","ruby","ruby-gem","structured-data","treetop"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/q-m.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-05T11:15:01.000Z","updated_at":"2024-10-24T12:56:30.000Z","dependencies_parsed_at":"2024-10-24T16:39:22.358Z","dependency_job_id":"475f2301-bf03-4443-b634-47670951c539","html_url":"https://github.com/q-m/food-ingredient-parser-ruby","commit_stats":{"total_commits":166,"total_committers":2,"mean_commits":83.0,"dds":"0.0060240963855421326","last_synced_commit":"3c4f2a3e8285da1cf304c820dd8c64f9466d51db"},"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/q-m%2Ffood-ingredient-parser-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/q-m%2Ffood-ingredient-parser-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/q-m%2Ffood-ingredient-parser-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/q-m%2Ffood-ingredient-parser-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/q-m","download_url":"https://codeload.github.com/q-m/food-ingredient-parser-ruby/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243982182,"owners_count":20378605,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["food-additives","food-products","ingredient-lists","ingredients","parser","ruby","ruby-gem","structured-data","treetop"],"created_at":"2024-10-27T19:56:51.319Z","updated_at":"2025-03-19T10:30:55.562Z","avatar_url":"https://github.com/q-m.png","language":"Ruby","readme":"# Food ingredient parser\n\n[![Gem Version](https://badge.fury.io/rb/food_ingredient_parser.svg)](https://rubygems.org/gems/food_ingredient_parser)\n\nIngredients are listed on food products in various ways. This [Ruby](https://www.ruby-lang.org/)\ngem and program parses the ingredient text and returns a structured representation.\n\n## Installation\n\n```\ngem install food_ingredient_parser\n```\n\nThis will also install the dependency [treetop](http://cjheath.github.io/treetop).\nIf you want colored output for the test program, also install [pry](http://pryrepl.org/): `gem install pry`.\n\n## Example\n\n```ruby\nrequire 'food_ingredient_parser'\n\ns = \"Water* 60%, suiker 30%, voedingszuren: citroenzuur, appelzuur, zuurteregelaar: E576/E577, \" \\\n    + \"natuurlijke citroen-limoen aroma's 0,2%, zoetstof: steviolglycosiden, * = Biologisch. \" \\\n    + \"E = door de E.U. goedgekeurde toevoeging.\"\nparser = FoodIngredientParser::Strict::Parser.new\nputs parser.parse(s).to_h.inspect\n```\nResults in\n```ruby\n{\n  :contains=\u003e[\n    {:name=\u003e\"Water\", :amount=\u003e\"60%\", :marks=\u003e[\"*\"]},\n    {:name=\u003e\"suiker\", :amount=\u003e\"30%\"},\n    {:name=\u003e\"voedingszuren\", :contains=\u003e[\n      {:name=\u003e\"citroenzuur\"}\n    ]},\n    {:name=\u003e\"appelzuur\"},\n    {:name=\u003e\"zuurteregelaar\", :contains=\u003e[\n      {:name=\u003e\"E576\"},\n      {:name=\u003e\"E577\"}\n    ]},\n    {:name=\u003e\"natuurlijke citroen-limoen aroma's\", :amount=\u003e\"0,2%\"},\n    {:name=\u003e\"zoetstof\", :contains=\u003e[\n      {:name=\u003e\"steviolglycosiden\"}\n    ]}\n  ],\n  :notes=\u003e[\n    \"* = Biologisch\",\n    \"E = door de E.U. goedgekeurde toevoeging\"\n  ]\n}\n```\n\n## Test tool\n\nThe executable `food_ingredient_parser` is available after installing the gem. If you're\nrunning this from the source tree, use `bin/food_ingredient_parser` instead.\n\n```\n$ food_ingredient_parser -h\nUsage: bin/food_ingredient_parser [options] --file|-f \u003cfilename\u003e\n       bin/food_ingredient_parser [options] --string|-s \u003cingredients\u003e\n\n    -f, --file FILE                  Parse all lines of the file as ingredient lists.\n    -s, --string INGREDIENTS         Parse specified ingredient list.\n    -q, --[no-]quiet                 Only show summary.\n    -p, --parsed                     Only show lines that were successfully parsed.\n    -n, --noresult                   Only show lines that had no result.\n    -r, --parser PARSER              Use specific parser (strict, loose).\n    -e, --[no-]escape                Escape newlines\n    -c, --[no-]color                 Use color\n        --[no-]html                  Print as HTML with parsing markup\n    -v, --[no-]verbose               Show more data (parsed tree).\n        --version                    Show program version.\n    -h, --help                       Show this help\n\n$ food_ingredient_parser -v -s \"tomato\"\n\"tomato\"\nRootNode+Root3 offset=0, \"tomato\" (contains,notes):\n  SyntaxNode offset=0, \"\"\n  SyntaxNode offset=0, \"\"\n  SyntaxNode offset=0, \"\"\n  ListNode+List13 offset=0, \"tomato\" (contains):\n    SyntaxNode+List12 offset=0, \"tomato\" (ingredient):\n      SyntaxNode+Ingredient0 offset=0, \"tomato\":\n        SyntaxNode offset=0, \"\"\n        IngredientNode+IngredientSimpleWithAmount3 offset=0, \"tomato\" (ing):\n          IngredientNode+IngredientSimple5 offset=0, \"tomato\" (name):\n            SyntaxNode+IngredientSimple4 offset=0, \"tomato\" (word):\n              SyntaxNode offset=0, \"tomato\":\n                SyntaxNode offset=0, \"t\"\n                SyntaxNode offset=1, \"o\"\n                SyntaxNode offset=2, \"m\"\n                SyntaxNode offset=3, \"a\"\n                SyntaxNode offset=4, \"t\"\n                SyntaxNode offset=5, \"o\"\n              SyntaxNode offset=6, \"\"\n        SyntaxNode offset=6, \"\"\n      SyntaxNode offset=6, \"\"\n  SyntaxNode+Root2 offset=6, \"\":\n    SyntaxNode offset=6, \"\"\n    SyntaxNode offset=6, \"\"\n    SyntaxNode offset=6, \"\"\n  SyntaxNode offset=6, \"\"\n{:contains=\u003e[{:name=\u003e\"tomato\"}]}\n\n$ food_ingredient_parser --html -s \"tomato\"\n\u003cdiv class=\"root\"\u003e\u003cspan class='depth0'\u003e\u003cspan class='name'\u003etomato\u003c/span\u003e\u003c/span\u003e\u003c/div\u003e\n\n$ food_ingredient_parser -v -r loose -s \"tomato\"\n\"tomato\"\nNode interval=0..5\n  Node interval=0..5, name=\"tomato\"\n{:contains=\u003e[{:name=\u003e\"tomato\"}]}\n\n$ food_ingredient_parser -q -f data/test-cases\nparsed 35 (100.0%), no result 0 (0.0%)\n```\n\nIf you want to use the output in (shell)scripts, the options `-e -c` may be quite useful.\n\n## `to_html`\n\nWhen ingredient lists are entered manually, it can be very useful to show how the text is\nrecognized. This can help understanding why a certain ingredients list cannot be parsed.\n\nFor this you can use the `to_html` method on the parsed output, which returns the original\ntext, augmented with CSS classes for different parts.\n\n```ruby\nrequire 'food_ingredient_parser'\n\nparsed = FoodIngredientParser::Strict::Parser.new.parse(\"Saus (10% tomaat*, zout). * = bio\")\nputs parsed.to_html\n```\n\n```html\n\u003cspan class='depth0'\u003e\n  \u003cspan class='name'\u003eSaus\u003c/span\u003e (\n  \u003cspan class='contains depth1'\u003e\n    \u003cspan class='amount'\u003e10%\u003c/span\u003e \u003cspan class='name'\u003etomaat\u003c/span\u003e\u003cspan class='mark'\u003e*\u003c/span\u003e,\n    \u003cspan class='name'\u003ezout\u003c/span\u003e\n  \u003c/span\u003e)\n\u003c/span\u003e.\n\u003cspan class='note'\u003e* = bio\u003c/span\u003e\n```\n\nFor an example of an interactive editor, see [examples/editor.rb](examples/editor.rb).\n\n![editor example screenshot](examples/editor-screenshot.png)\n\n## Loose parser\n\nThe strict parser only parses ingredient lists that conform to one of the many different\nformats expected. If you'd like to return a result always, even if that is not necessarily\ncompletely correct, you can use the _loose_ parser. This does not use Treetop, but looks\nat the input character for character and tries to make the best of it. Nevertheless, if you\njust want to have _some_ result, this can still be very useful.\n\n```ruby\nrequire 'food_ingredient_parser'\n\nparsed = FoodIngredientParser::Loose::Parser.new.parse(\"Saus [10% tomaat*, (zout); peper.\")\nputs parsed.to_h\n```\n\nEven though the strict parser would not give a result, the loose parser returns:\n```ruby\n{\n  :contains=\u003e[\n    {:name=\u003e\"Saus\", :contains=\u003e[\n      {:name=\u003e\"tomaat\", :marks=\u003e[\"*\"], :amount=\u003e\"10%\", {\n        :contains=\u003e[{:name=\u003e\"zout\"}\n      ]},\n      {:name=\u003e\"peper\"}\n    ]}\n  ]\n}\n```\n\n## Compatibility\n\nFrom the 1.0.0 release, the main interface will be stable. This comprises the two parser's `parse`\nmethods (incl. documented options), its `nil` result when parsing failed, and the parsed output's\n`to_h` and `to_html` methods. Please note that parsed node trees may be subject to change, even within\na major release. Within a minor release, node trees are expected to remain stable.\n\nSo if you only use the stable interface (`parse`, `to_h` and `to_html`), you can lock your version\nto e.g. `~\u003e 1.0`. If you depend on more, lock your version against e.g. `~\u003e 1.0.0` and test when you\nupgrade to `1.1`.\n\n## Languages\n\nWhile most of the parsing is language-independent, some parts need knowledge about certain words\n(like abbreviations and amount specifiers). The gem was developed with ingredient lists in Dutch (nl),\nplus a bit of English and German. Support for other languages is already good, but lacks in certain\nareas: improvements are welcome (starting with a corpus in [data/](data/)).\n\nMany ingredient lists from the USA are structured a bit differently than those from Europe, they\nparse less well (that is probably a matter of tine-tuning).\n\n## Test data\n\n[`data/ingredient-samples-qm-nl`](data/ingredient-samples-qm-nl) contains about 150k\nreal-world ingredient lists found on the Dutch market. Each line contains one ingredient\nlist (newlines are encoded as `\\n`, empty lines and those starting with `#` are ignored).\nThe strict parser currently parses 80%, while the loose parser returns something for all of them.\n\n## License\n\nThis software is distributed under the [MIT license](LICENSE). Data may have a [different license](data/README.md).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fq-m%2Ffood-ingredient-parser-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fq-m%2Ffood-ingredient-parser-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fq-m%2Ffood-ingredient-parser-ruby/lists"}