{"id":13546975,"url":"https://github.com/FrontierDevelopmentLab/CUMULO","last_synced_at":"2025-04-02T19:32:08.990Z","repository":{"id":73747123,"uuid":"218525794","full_name":"FrontierDevelopmentLab/CUMULO","owner":"FrontierDevelopmentLab","description":"a benchmark dataset for training and evaluating global cloud classification models.","archived":false,"fork":false,"pushed_at":"2023-07-06T21:27:11.000Z","size":1175,"stargazers_count":29,"open_issues_count":1,"forks_count":12,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-03T15:38:20.903Z","etag":null,"topics":["atmospheric-physics","climate-change","esa","machine-learning","nasa-cloudsat","nasa-modis"],"latest_commit_sha":null,"homepage":"https://www.dropbox.com/sh/i3s9q2v2jjyk2it/AACxXnXfMF5wuIqLXqH4NJOra?dl=0","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FrontierDevelopmentLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-10-30T12:46:58.000Z","updated_at":"2024-08-27T16:22:41.000Z","dependencies_parsed_at":"2023-07-09T18:02:24.287Z","dependency_job_id":null,"html_url":"https://github.com/FrontierDevelopmentLab/CUMULO","commit_stats":{"total_commits":268,"total_committers":5,"mean_commits":53.6,"dds":0.5634328358208955,"last_synced_commit":"3373a389dab0f47f0771a8f095328fd8174a3ae0"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrontierDevelopmentLab%2FCUMULO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrontierDevelopmentLab%2FCUMULO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrontierDevelopmentLab%2FCUMULO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrontierDevelopmentLab%2FCUMULO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FrontierDevelopmentLab","download_url":"https://codeload.github.com/FrontierDevelopmentLab/CUMULO/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246880140,"owners_count":20848819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atmospheric-physics","climate-change","esa","machine-learning","nasa-cloudsat","nasa-modis"],"created_at":"2024-08-01T12:00:49.063Z","updated_at":"2025-04-02T19:32:08.395Z","avatar_url":"https://github.com/FrontierDevelopmentLab.png","language":"Jupyter Notebook","funding_links":[],"categories":["Atmosphere"],"sub_categories":[],"readme":"\u003cimg src=\"https://github.com/FrontierDevelopmentLab/CUMULO/blob/master/docs/images/cumulo.png\" width=\"300\"\u003e\n\na benchmark dataset for training and evaluating global cloud classification models. \nIt merges two satellite products from the [A-train constellation](https://atrain.nasa.gov/): \nthe [Moderate Resolution Imaging Spectroradiometer (MODIS) from Aqua satellite](https://modis.gsfc.nasa.gov/about/) and the [2B-CLDCLASS-LIDAR product](http://www.cloudsat.cira.colostate.edu/data-products/level-2b/2b-cldclass-lidar) derived from the combination of CloudSat Cloud Profiling Radar (CPR) and CALIPSO Cloud‐Aerosol Lidar with Orthogonal Polarization (CALIOP).\n\n[FULL README](https://www.dropbox.com/sh/6gca7f0mb3b0ikz/AAAeTWF21WGZ7-y9MpSiL9P3a/CUMULO?dl=0\u0026preview=README.pdf\u0026subfolder_nav_tracking=1)\n\n# Dataset\n\nThe dataset is hosted [here](https://www.dropbox.com/sh/6gca7f0mb3b0ikz/AADq2lk4u7k961Qa31FwIDEpa?dl=0).\nIt contains over 300k annotated multispectral images at 1km x 1km resolution, providing daily full coverage of the Earth for 2008, 2009 and 2016.\n\n## Download\n\n#### Option 1: syncing with your DropBox Account\n1. add [CUMULO](https://www.dropbox.com/sh/i3s9q2v2jjyk2it/AACxXnXfMF5wuIqLXqH4NJOra?dl=0) to your DropBox account\n2. use [rclone](https://rclone.org/dropbox/) for syncing it on your machine\n\n#### Option 2: direct download -- DEPRECATED!\n1. use one of these download [scripts](https://www.dropbox.com/sh/i3s9q2v2jjyk2it/AADpo6AhMk7OgkB2yWHkM_E2a/download-scripts_2008?dl=0\u0026subfolder_nav_tracking=1)\n\n### File Format\n\nData is stored in **Network Common Data Form (NetCDF)** following this [convention](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html).\n\nThere is 1 NetCDF file per swath of 1354x2030 pixels, 1 every 5 minutes, named:\n\n```\nfilename = AYYYYDDD.HHMM.nc\n\nYYYY =\u003e year\nDDD =\u003e absolute day since 01.01.YYYY \nHH =\u003e hour of day\nMM =\u003e minutes    \n```\n\n### File Content\n\nTo see the variables available for a netcdf file and their description, run: \n\n```bash\nncdump -h netcdf/cumulo.nc\n```\n\n## Code Source\n\n1. The script [pipeline.py](pipeline.py) extracts one CUMULO's swath (as a netcdf file) from the corresponding MODIS' MYD02, MYD03, MYD06 and MYD35 files, and CloudSat's CS_2B-CLDCLASS and/or CS_2B-CLDCLASS-LIDAR files.\n\n```python\npython3 pipeline \u003csave-dir\u003e \u003cmyd02-filename\u003e\n```\n\n2. [src/](src/) contains the code source for extracting the different CUMULO's features, for alignment them and for completing the missing values when possible.\n\n### Dependencies\n\n```bash\npip install gcsfs\nconda install -c conda-forge pyhdf  #The pip install's wheels are broken at time of writing\npip install satpy\npip install satpy[modis_l1b]\npip install -r requirements.txt\n```\n\n## Machine Learning Baselines\nExamples for training models on CUMULO are provided [here](ml-examples/).\n\n## Cite\nIf you find this work useful, please cite the [original paper](https://arxiv.org/abs/1911.04227):\n\n```\n@article{zantedeschi2019cumulo,\n        title={Cumulo: A Dataset for Learning Cloud Classes},\n        author={Zantedeschi, Valentina and Falasca, Fabrizio and Douglas, Alyson and Strange, Richard and Kusner, Matt J and Watson-Parris, Duncan},\n        journal={Tackling Climate Change with Machine Learning Workshop, NeurIPS},\n        year={2019}}\n```\n\n## Acknowledgments\n\nThis work is the result of the 2019 ESA [Frontier Development Lab](https://fdleurope.org/) Atmospheric Phenomena and Climate Variability challenge. \nWe are grateful to all organisers, mentors and sponsors for providing us this opportunity. We thank Google Cloud for providing computing and storage resources to complete this work.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFrontierDevelopmentLab%2FCUMULO","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFrontierDevelopmentLab%2FCUMULO","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFrontierDevelopmentLab%2FCUMULO/lists"}