{"id":19059940,"url":"https://github.com/q-m/food-fish-parser-ruby","last_synced_at":"2026-05-12T20:30:16.327Z","repository":{"id":56847407,"uuid":"247702232","full_name":"q-m/food-fish-parser-ruby","owner":"q-m","description":"Extract fish details from food product descriptions","archived":false,"fork":false,"pushed_at":"2021-03-26T09:10:46.000Z","size":387,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-03T05:16:49.904Z","etag":null,"topics":["aquaculture","fao","fisheries","food-products","ingredients","parser","ruby","ruby-gem","species","structured-data"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/q-m.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-16T12:56:15.000Z","updated_at":"2021-03-26T09:10:49.000Z","dependencies_parsed_at":"2022-09-09T06:51:41.261Z","dependency_job_id":null,"html_url":"https://github.com/q-m/food-fish-parser-ruby","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/q-m%2Ffood-fish-parser-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/q-m%2Ffood-fish-parser-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/q-m%2Ffood-fish-parser-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/q-m%2Ffood-fish-parser-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/q-m","download_url":"https://codeload.github.com/q-m/food-fish-parser-ruby/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240114326,"owners_count":19749837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aquaculture","fao","fisheries","food-products","ingredients","parser","ruby","ruby-gem","species","structured-data"],"created_at":"2024-11-09T00:12:14.760Z","updated_at":"2026-05-12T20:30:16.267Z","avatar_url":"https://github.com/q-m.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Food fish parser\n\n[![Gem Version](https://badge.fury.io/rb/food_fish_parser.svg)](https://rubygems.org/gems/food_fish_parser)\n\nFood products with fish in them often list some details about the particular species,\nfishing method and origin. This [Ruby](https://www.ruby-lang.org/) gem and program parses\nthe text found on the product and returns a structured representation.\n\nAt this moment, the parser mostly recognises Dutch-language text.\n\nPlease note that this code is in an early stage of development.\n\n## Installation\n\n```\ngem install food_fish_parser\n```\n\n## Example\n\n### Strict parser\n\n```ruby\nrequire 'food_fish_parser'\n\ns = \u003c\u003cEOT.gsub(/\\n/, '').strip\n  zalm (salmo salar), gekweekt in noorwegen, kweekmethode: kooien.pangasius\n  (pangasius spp), gekweekt in vietnam,  kweekmethode: vijver. coquilles\n  (placopecten magellanicus), vangstgebied noordwest atlantische oceaan fao 21,\n  kabeljauw (gadus macrocephalus), vangstgebied stille oceaan fao 67, garnaal\n  (litopenaeus vannamei), gekweekt in ecuador, kweekmethode: vijver.\nEOT\nparser = FoodFishParser::Strict::Parser.new\nputs parser.parse(s).to_a.inspect\n```\n\nResults in a list of detected fishes\n\n```ruby\n[\n  {\n    :names =\u003e               [{ :common=\u003e\"zalm\", :latin=\u003e\"salmo salar\" }],\n    :catch_areas =\u003e         [],\n    :catch_methods =\u003e       [],\n    :aquaculture_areas =\u003e   [{ :text=\u003e\"noorwegen\", :fao_codes=\u003e[] }],\n    :aquaculture_methods =\u003e [{ :text=\u003e\"kooien\" }]\n  },\n  {\n    :names =\u003e               [{ :common=\u003e\"pangasius\", :latin=\u003e\"pangasius spp\" }],\n    :catch_areas =\u003e         [],\n    :catch_methods =\u003e       [],\n    :aquaculture_areas =\u003e   [{ :text=\u003e\"vietnam\", :fao_codes=\u003e[] }],\n    :aquaculture_methods =\u003e [{ :text=\u003e\"vijver\" }]\n  },\n  {\n    :names =\u003e               [{ :common=\u003e\"coquilles\", :latin=\u003e\"placopecten magellanicus\" }],\n    :catch_areas =\u003e         [{ :text=\u003e\"noordwest atlantische oceaan\", :fao_codes=\u003e[\"21\"] }],\n    :catch_methods =\u003e       [],\n    :aquaculture_areas =\u003e   [],\n    :aquaculture_methods =\u003e []\n  },\n  {\n    :names =\u003e               [{ :common=\u003e\"kabeljauw\", :latin=\u003e\"gadus macrocephalus\" }],\n    :catch_areas =\u003e         [{ :text=\u003e\"stille oceaan\", :fao_codes=\u003e[\"67\"] }],\n    :catch_methods =\u003e       [],\n    :aquaculture_areas =\u003e   [],\n    :aquaculture_methods =\u003e []\n  },\n  {\n    :names =\u003e               [{ :common=\u003e\"garnaal\", :latin=\u003e\"litopenaeus vannamei\" }],\n    :catch_areas =\u003e         [],\n    :catch_methods =\u003e       [],\n    :aquaculture_areas =\u003e   [{ :text=\u003e\"ecuador\", :fao_codes=\u003e[] }],\n    :aquaculture_methods =\u003e [{ :text=\u003e\"vijver\" }]\n  }\n]\n```\n\n### Anywhere\n\nWhen you have a piece of text and don't know where (or if) any fish details are\npresent, you can use the `anywhere` option.\n\n```ruby\nrequire 'food_fish_parser'\n\nparser = FoodFishParser::Strict::Parser.new\ns = \"tomaat, vis (zalm (salmo salar) gevangen in Noorwegen), zout\"\nputs parser.parse(s, anywhere: true).to_a.inspect\n```\n\nThis will find as many occurences as possible. It is assumed that all fish details\nin the text have the same amount of information (so fish name plus catch or aquaculture\ninformation, or only fish names, or only catch or aquaculture information).\nWhile the parser would normally return nothing, with `anywhere` it returns:\n\n```ruby\n[\n  {\n    :names               =\u003e [{ :common=\u003e\"zalm\", :latin=\u003e\"salmo salar\" }],\n    :catch_areas         =\u003e [{ :text=\u003e\"Noorwegen\", :fao_codes=\u003e[] }],\n    :catch_methods       =\u003e [],\n    :aquaculture_areas   =\u003e [],\n    :aquaculture_methods =\u003e []\n  }\n]\n```\n\nPlease note that the `anywhere` option can make the parser much slower.\n\n### Flat parser\n\nWhile the strict parser can recognize the structure of multiple fishes, it is really\nstrict about what it expects. Many cases are not recognized, or sometimes incomplete.\n\nThe flat parser does basic named entity recognition anywhere in the text. Any structure\nis lost, so it always returns an array with one or zero items - but you get all the\nFAO regions and fish names found.\n\n```ruby\nrequire 'food_fish_parser'\n\nparser = FoodFishParser::Flat::Parser.new\ns = \"Foobar zalm (salmo salar) *\u0026! gevangen met lijnen pangasius spp FAO 61 ?or ?FAO 67 what more.\")\nputs parser.parse(s).to_a.inspect\n```\n\n```ruby\n[\n  {\n    :names =\u003e [\n      { :common=\u003e\"zalm\", :latin=\u003e\"salmo salar\" },\n      { :common=\u003enil, :latin=\u003e\"pangasius spp\" }\n    ],\n    :catch_areas =\u003e [\n      { :name=\u003enil, :fao_codes=\u003e[\"61\"] },\n      { :name=\u003enil, :fao_codes=\u003e[\"67\"] }\n    ],\n    :catch_methods       =\u003e [{ :text=\u003e\"lijnen\" }],\n    :aquaculture_areas   =\u003e [],\n    :aquaculture_methods =\u003e []\n  }\n]\n```\n\nThis might be expanded to more information at some point.\n\n\n## Test tool\n\nThe executable `food_fish_parser` is available after installing the gem. If you're\nrunning from the source tree, use `bin/food_fish_parser` instead.\n\n```\n$ food_fish_parser -h\nUsage: bin/food_fish_parser [options] --file|-f \u003cfilename\u003e\n       bin/food_fish_parser [options] --string|-s \u003ctext\u003e\n\n    -f, --file FILE                  Parse all lines of the file as fish detail text.\n    -s, --string TEXT                Parse specified fish detail text.\n    -q, --[no-]quiet                 Only show summary.\n    -p, --parsed                     Only show lines that were successfully parsed.\n    -n, --noresult                   Only show lines that had no result.\n    -r, --parser PARSER              Use specific parser (strict, flat).\n    -a, --[no-]anywhere              Search for fish details anywhere in the text.\n    -e, --[no-]escape                Escape newlines\n    -c, --[no-]color                 Use color\n    -v, --[no-]verbose               Show more data (parsed tree).\n        --version                    Show program version.\n    -h, --help                       Show this help\n\n$ food_fish_parser -v -s \"salmo salar\"\n\"salmo salar\"\nSyntaxNode+Root6+RootNode+SyntaxNodeAdditions offset=0, \"salmo salar\" (to_a,to_a_deep):\n  SyntaxNode+Root3 offset=0, \"salmo salar\" (fish_only_names):\n    SyntaxNode+FishNode+SyntaxNodeAdditions+FishNameList1 offset=0, \"salmo salar\" (to_h,to_a_deep,fish_name):\n      SyntaxNode+FishNameNode+SyntaxNodeAdditions+FishNameLatin1+FishNameLatinNode offset=0, \"salmo salar\" (to_h,to_a_deep,fish_name_latin_first):\n        SyntaxNode offset=0, \"salmo\"\n        SyntaxNode+FishNameLatin0 offset=5, \" salar\" (fish_name_latin_second):\n          SyntaxNode offset=5, \" \":\n            SyntaxNode offset=5, \" \"\n          SyntaxNode offset=6, \"salar\"\n      SyntaxNode offset=11, \"\"\n    SyntaxNode offset=11, \"\"\n  SyntaxNode offset=11, \"\"\n  SyntaxNode offset=11, \"\"\n  SyntaxNode offset=11, \"\"\n[\n  {\n    :names=\u003e[{:common=\u003enil, :latin=\u003e\"salmo salar\"}],\n    :catch_areas=\u003e[],\n    :catch_methods=\u003e[],\n    :aquaculture_areas=\u003e[],\n    :aquaculture_methods=\u003e[]\n  }\n]\n\n$ food_fish_parser -q -f data/test-cases\nparsed 51 (100.0%), no result 0 (0.0%)\n```\n\nIf you want to use the output in (shell)scripts, the options `-e -c` may be quite useful.\n\n\n## Test data\n\n[`data/fish-ingredient-samples-qm-nl`](data/fish-ingredient-samples-qm-nl) contains about 2k\nreal-world ingredient lists with fish found on the Dutch market. Each line contains one ingredient\nlist (newlines are encoded as `\\n`, empty lines and those starting with `#` are ignored). Of those,\nsomething is returned for 99.8% of them (with the `anywhere` option), but quality varies greatly.\n\n\n## Species\n\nThis gem does very basic named entity recognition of fish names. There are more fish names than the\nparser can handle, so the detected fish names are limited to those actually found in packaged food products.\nAt the moment only a very limited number of names is detected. To add more, expand the _species-found_ text\nfiles in [species/](species/) and run `species/species-treetop-gen.sh`. This updates the fish name grammars.\n\n\n## License\n\nThis software is distributed under the [MIT license](LICENSE). Data may have a [different license](data/README.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fq-m%2Ffood-fish-parser-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fq-m%2Ffood-fish-parser-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fq-m%2Ffood-fish-parser-ruby/lists"}