{"id":15673210,"url":"https://github.com/rugk/crops-parser","last_synced_at":"2026-03-11T05:30:57.665Z","repository":{"id":22345779,"uuid":"95978588","full_name":"rugk/crops-parser","owner":"rugk","description":"🌱🍎🍆 A shell script to parse the data by the Food and Agriculture Organization of the United Nations on crops/fruits.","archived":false,"fork":false,"pushed_at":"2022-03-07T21:57:14.000Z","size":52488,"stargazers_count":15,"open_issues_count":5,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-10-19T11:16:13.606Z","etag":null,"topics":["agriculture","agriculture-research","crop","crops","data-analysis","data-science","food","fruit","fruits","statistics","streetcomplete","tree","vegetables"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rugk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-data.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-01T17:32:14.000Z","updated_at":"2022-12-05T15:19:04.000Z","dependencies_parsed_at":"2022-08-07T10:15:31.667Z","dependency_job_id":null,"html_url":"https://github.com/rugk/crops-parser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rugk/crops-parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rugk%2Fcrops-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rugk%2Fcrops-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rugk%2Fcrops-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rugk%2Fcrops-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rugk","download_url":"https://codeload.github.com/rugk/crops-parser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rugk%2Fcrops-parser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30372161,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T21:41:54.280Z","status":"online","status_checked_at":"2026-03-11T02:00:07.027Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agriculture","agriculture-research","crop","crops","data-analysis","data-science","food","fruit","fruits","statistics","streetcomplete","tree","vegetables"],"created_at":"2024-10-03T15:38:30.463Z","updated_at":"2026-03-11T05:30:57.631Z","avatar_url":"https://github.com/rugk.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Crops parser\n\nThis shell script parses data from the [Food and Agriculture Organization of the United Nations](https://www.fao.org/faostat/en/#data/QCL) about the cultivated/planted plants/fruits on the world into a YAML file, which groups them per country to see the top 15.\n\nIt has been created for the OpenStreetMap mapping app [StreetComplete](https://github.com/westnordost/StreetComplete), see [this issue](https://github.com/westnordost/StreetComplete/issues/368) for details.\n\n## How to download data?\n\nGo to the [FAQ website](https://www.fao.org/faostat/en/#data/QCL) and download the FAO data. Things to remember:\n1. Select all countries and **make sure to select the FAO** coding system.\n2. Either select the area harvested (in ha) or the production quantity (in tonnes) to get useful results.\n3. Select all crops in the items list. (The new FAO website merged crops [C] and livestock [L].)\n4. Save the data.\n\n![screenshot of the FAO website export with important things to select highlighted as explained above](./fao-website-guide.png)\n\n## How to run it?\n\nThe script is mostly POSIX-compliant, so it should work on all systems, but a CLI tool called [csvtool](https://github.com/Chris00/ocaml-csv) has to be installed as it is used as a CSV parser.\n\nIf this is done, you can just execute it:\n```shell\n$ ./parseCrops.sh source/area_harvested_2019+2020.csv result/OsmOnly/mostAreaHarvest_2019+2020.yml    \nPrepare CSV…\nAdjusting datasets…\nSum up duplicate elements…\nSummed up 289 duplicates.\nCalculate yearly average…\nSort data…\nEvaluate data…\nWARNING: No language code for China could be found. Skip.\nFinish processing…\n```\n\nThe language code warning for China is to be expected, see [the contributing guide for details](./CONTRIBUTING.md).\n\n## What does it?\n\nThis is an overview of what happens:\n* `Prepare CSV…` – It strips the table header and extracts the columns of interest.\n* `Adjusting datasets…` – Adjusts each dataset. E.g. it strips commas for easier processing, applies the blacklist and coverts the crop names to OSM keys (optional).\n* `Sum up duplicate elements…` – Finds exact duplicates (considering the year too) and sums them up. Afterwards reports the sucess. (Usually items should only be summed up when converting OSM tags.)\n* `Calculate yearly average…` – Calculates the average tonnes/area in production when multiple years are given.\n* `Sort data…` – It sorts the whole data according to the tonnes of produced crops, independent of the country.\n* `Evaluate data…` – It extracts all crops for each country and transforms the first fifteen crops listet into the YAML format. Additionally it replaces the country name with the 2-letter country code (ISO 3166).\n* `Finish processing…` – It adds the header and default crops and sorts the YAML another time, so the countries are sorted.\n\n## Result\n\nThe results can be seen in the directory [result](result). All legacy and more up-to-date data are included.\n\nThe script can handle multiple data from multiple years quite well. After summing up equal items per year (and country) it later calculates the average of the production numbers from both years.\n\n## Extras\n\nAdditionally, there is a collection of square images of all \"OSM fruits\", which are included in the top-15. You can find it in the directory [`images`](images/).\n\n## Legal stuff\n\nThe data taken from the FAO is [licensed under the terms they describe](https://www.fao.org/contact-us/terms/db-terms-of-use/en/), i.e. CC BY-NC-SA 3.0 IGO. This is described [in detail in this document](LICENSE-data.md).\n\n![This work is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 IGO license (CC BY-NC-SA 3.0 IGO; https://creativecommons.org/licenses/by-nc-sa/3.0/igo). In addition to this license, some database specific terms of use are listed in the Terms of Use of Datasets.](https://www.fao.org/faostat/en/src/images/creative_commons.png)\n\nApart from that, all code part is licensed [under the MIT license](LICENSE.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frugk%2Fcrops-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frugk%2Fcrops-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frugk%2Fcrops-parser/lists"}