{"id":13448962,"url":"https://github.com/pangeo-data/WeatherBench","last_synced_at":"2025-03-22T18:32:16.463Z","repository":{"id":38779995,"uuid":"209004244","full_name":"pangeo-data/WeatherBench","owner":"pangeo-data","description":"A benchmark dataset for data-driven weather forecasting","archived":false,"fork":false,"pushed_at":"2023-12-08T09:54:39.000Z","size":18203,"stargazers_count":708,"open_issues_count":17,"forks_count":165,"subscribers_count":32,"default_branch":"master","last_synced_at":"2024-10-28T15:42:18.626Z","etag":null,"topics":["benchmark","dataset","deep-learning","weather-forecast"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pangeo-data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-09-17T08:49:33.000Z","updated_at":"2024-10-25T11:39:08.000Z","dependencies_parsed_at":"2023-12-13T21:12:41.518Z","dependency_job_id":null,"html_url":"https://github.com/pangeo-data/WeatherBench","commit_stats":{"total_commits":91,"total_committers":6,"mean_commits":"15.166666666666666","dds":0.08791208791208793,"last_synced_commit":"c8f53a9b243453fef3edecf5fdc1150b6b0f8f32"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pangeo-data%2FWeatherBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pangeo-data%2FWeatherBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pangeo-data%2FWeatherBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pangeo-data%2FWeatherBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pangeo-data","download_url":"https://codeload.github.com/pangeo-data/WeatherBench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245002941,"owners_count":20545519,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","dataset","deep-learning","weather-forecast"],"created_at":"2024-07-31T06:00:26.411Z","updated_at":"2025-03-22T18:32:11.422Z","avatar_url":"https://github.com/pangeo-data.png","language":"Jupyter Notebook","funding_links":[],"categories":["Weather","Datasets","PFMs for Video Data","🔬 Domain-Specific Applications","Domain-Specific Resources"],"sub_categories":["🌊 Computational Fluid Dynamics, PDE \u0026 Engineering Datasets","✨Benchmarks","🌍 Earth \u0026 Climate Science","Climate \u0026 Environmental Science"],"readme":"![Logo](https://github.com/ai4environment/WeatherBench/blob/master/figures/logo_text_left.png?raw=true)\n# WeatherBench: A benchmark dataset for data-driven weather forecasting\n\n**🚨🚨🚨 [WeatherBench 2](https://github.com/google-research/weatherbench2) has been released. It provides an updated and much improved benchmark including more comprehensive and more easily accessible datasets.🚨🚨🚨**\n\n[![Binder](https://binder.pangeo.io/badge_logo.svg)](https://binder.pangeo.io/v2/gh/pangeo-data/WeatherBench/master?filepath=quickstart.ipynb)\n\nIf you are using this dataset please cite \n\u003e Stephan Rasp, Peter D. Dueben, Sebastian Scher, Jonathan A. Weyn, Soukayna Mouatadid, and Nils Thuerey, 2020.\n\u003e WeatherBench: A benchmark dataset for data-driven weather forecasting.\n\u003e arXiv: [https://arxiv.org/abs/2002.00469](https://arxiv.org/abs/2002.00469)\n\nThis repository contains all the code for downloding and processing the data as well as code for the baseline models\n in the paper.\n \n ---\n *Note!\n The data has been changed from the original release. Here is a list of changes:*\n - *New vertical levels. Used to be [1, 10, 100, 200, 300, 400, 500, 600, 700, 850,\n1000], now is [50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000]. This is to be compatible with CMIP output. The new levels include all of the old ones with the exception of [1, 10].*\n- *CMIP data. Regridded CMIP data of some variables was added. This is the historical simulation of the MPI-ESM-HR model.*\n ---\n \nIf you have any questions about this dataset, please use the [Github Issue](https://github.com/pangeo-data/WeatherBench/issues) feature on this page! \n\n## Leaderboard\n| Model | Z500 RMSE (3 / 5 days) [m\u003csup\u003e2\u003c/sup\u003e/s\u003csup\u003e2\u003c/sup\u003e] | T850 RMSE (3 / 5 days) [K] | Notes | Reference |\n|--------------------|----------------------------------|----------------------------|----------------------|------------------|\n| Operational IFS | 154 / 334 | 1.36 / 2.03 | ECWMF physical model (10 km) | [Rasp et al. 2020](https://arxiv.org/abs/2002.00469) |\n| Rasp and Thuerey 2020 (direct/continuous) | **268 / 499** | **1.65 / 2.41** | Resnet with CMIP pretraining (5.625 deg) | [Rasp and Thuerey 2020](http://arxiv.org/abs/2008.08626) |\n| IFS T63 | 268 / 463 | 1.85 / 2.52 | Lower resolution physical model (approx. 1.9 deg) | [Rasp et al. 2020](https://arxiv.org/abs/2002.00469) |\n| Weyn et al. 2020 (iterative) | **373 / 611** | **1.98 / 2.87** | UNet with cube-sphere mapping (2 deg) | [Weyn et al. 2020](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2020MS002109) |\n| Clare et al. 2021 (direct) | 375 / 627 | 2.11 / 2.91 | Stacked ResNets with probabilistic output (5.625 deg) | [Clare et al. 2021](https://rmets.onlinelibrary.wiley.com/doi/full/10.1002/qj.4180) |\n| IFS T42 | 489 / 743 | 3.09 / 3.83 |Lower resolution physical model (approx. 2.8 deg)  | [Rasp et al. 2020](https://arxiv.org/abs/2002.00469) |\n| Weekly climatology | 816 | 3.50 | Climatology for each calendar week | [Rasp et al. 2020](https://arxiv.org/abs/2002.00469) |\n| Persistence | 936 / 1033 | 4.23 / 4.56 |  | [Rasp et al. 2020](https://arxiv.org/abs/2002.00469) |\n| Climatology | 1075 | 5.51 |  | [Rasp et al. 2020](https://arxiv.org/abs/2002.00469) |\n\n\n## Quick start\nYou can follow the quickstart guide in [this notebook](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb) or lauch it directly from [Binder](https://binder.pangeo.io/v2/gh/pangeo-data/WeatherBench/master?filepath=quickstart.ipynb).\n\n## Download the data\nThe data is hosted [here](https://mediatum.ub.tum.de/1524895) with the following directory structure\n\n```\n.\n|-- 1.40625deg\n|   |-- 10m_u_component_of_wind\n|   |-- 10m_v_component_of_wind\n|   |-- 2m_temperature\n|   |-- constants\n|   |-- geopotential\n|   |-- old\n|   |   `-- temperature\n|   |-- potential_vorticity\n|   |-- relative_humidity\n|   |-- specific_humidity\n|   |-- temperature\n|   |-- toa_incident_solar_radiation\n|   |-- total_cloud_cover\n|   |-- total_precipitation\n|   |-- u_component_of_wind\n|   |-- v_component_of_wind\n|   `-- vorticity\n|-- 2.8125deg\n|   |-- 10m_u_component_of_wind\n|   |-- 10m_v_component_of_wind\n|   |-- 2m_temperature\n|   |-- constants\n|   |-- geopotential\n|   |-- potential_vorticity\n|   |-- relative_humidity\n|   |-- specific_humidity\n|   |-- temperature\n|   |-- toa_incident_solar_radiation\n|   |-- total_cloud_cover\n|   |-- total_precipitation\n|   |-- u_component_of_wind\n|   |-- v_component_of_wind\n|   `-- vorticity\n|-- 5.625deg\n|   |-- 10m_u_component_of_wind\n|   |-- 10m_v_component_of_wind\n|   |-- 2m_temperature\n|   |-- constants\n|   |-- geopotential\n|   |-- geopotential_500\n|   |-- potential_vorticity\n|   |-- relative_humidity\n|   |-- specific_humidity\n|   |-- temperature\n|   |-- temperature_850\n|   |-- toa_incident_solar_radiation\n|   |-- total_cloud_cover\n|   |-- total_precipitation\n|   |-- u_component_of_wind\n|   |-- v_component_of_wind\n|   `-- vorticity\n|-- baselines\n|   `-- saved_models\n|-- CMIP\n|   `-- MPI-ESM\n|       |-- 2.8125deg\n|       |   |-- geopotential\n|       |   |-- specific_humidity\n|       |   |-- temperature\n|       |   |-- u_component_of_wind\n|       |   `-- v_component_of_wind\n|       `-- 5.625deg\n|           |-- geopotential\n|           |-- specific_humidity\n|           |-- temperature\n|           |-- u_component_of_wind\n|           `-- v_component_of_wind\n|-- IFS_T42\n|   `-- raw\n|-- IFS_T63\n|   `-- raw\n`-- tigge\n    |-- 1.40625deg\n    |   |-- geopotential_500\n    |   `-- temperature_850\n    |-- 2.8125deg\n    |   |-- geopotential_500\n    |   `-- temperature_850\n    `-- 5.625deg\n        |-- 2m_temperature\n        |-- geopotential_500\n        |-- temperature_850\n        `-- total_precipitation\n```\n\nTo start out download either the entire 5.625 degree data (175G) using \n```shell\nwget \"https://dataserv.ub.tum.de/s/m1524895/download?path=%2F5.625deg\u0026files=all_5.625deg.zip\" -O all_5.625deg.zip\n```\nor simply the single level (500 hPa) geopotential data using\n```shell\nwget \"https://dataserv.ub.tum.de/s/m1524895/download?path=%2F5.625deg%2Fgeopotential_500\u0026files=geopotential_500_5.625deg.zip\" -O geopotential_500_5.625deg.zip\n```\nand then unzip the files using `unzip \u003cfile\u003e.zip`. You can also use `ftp` or `rsync` to download the data. For instructions, follow the [download link](https://mediatum.ub.tum.de/1524895).\n\n\n## Baselines and evaluation\n **IMPORTANT:** The format of the predictions file is a\n  NetCDF dataset with dimensions `[init_time, lead_time, lat, lon]`. Consult the notebooks for examples. You are\n   stongly encouraged to format your predictions in the same way and then use the same evaluation functions to ensure\n    consistent evaluation.\n### Baselines\nThe baselines are created using Jupyter notebooks in `notebooks/`. In all notebooks, the forecasts are saved as a\n NetCDF file in the `predictions` directory of the dataset. \n \n### CNN baselines\nAn example of how to load the data and train a CNN using Keras is given in `notebooks/3-cnn-example.ipynb`. In\n addition a command line script for training CNNs is provided in `src/train_nn.py`. For the baseline CNNs in the\n  paper the config files are given in `src/nn_configs/`. To reproduce the results in the paper run e.g. `python -m src.train_nn -c src/nn_configs/fccnn_3d.yml`. \n  \n### Evaluation\nEvaluation and comparison of the different baselines in done in `notebooks/4-evaluation.ipynb`. The scoring is done\n using the functions in `src/score.py`. The RMSE values for the baseline models are also saved in the `predictions\n ` directory of the dataset. This is useful for plotting your own models alongside the baselines.\n\n\n## Data processing\nThe dataset already contains the most important processed data. If you would like to download a different variable\n, regrid to a different resolution or extract single levels from the 3D files, here is how to do that!\n\n### Downloading and processing the raw data from the ERA5 archive\n\nThe workflow to get to the processed data that ended up in the data repository above is: \n1. Download monthly files from the ERA5 archive (`src/download.py`)\n2. Regrid the raw data to the required resolutions (`src/regrid.py`)\n\nThe raw data is from the ERA5 reanalysis archive. Information on how to download the data can be found \n[here](https://confluence.ecmwf.int/display/CKB/How+to+download+ERA5) and \n[here](https://cds.climate.copernicus.eu/api-how-to). \n\nBecause downloading the data can take a long time (several weeks), the workflow is encoded using [Snakemake](https://snakemake.readthedocs.io/). See `Snakefile` and the configuration files for each variable in `scripts/config_\n{variable}.yml`. These\n files can be modified if additional variables are required. To execute Snakemake for a particular variable type\n : `snakemake -p -j 4 all --configfile scripts/config_toa_incident_solar_radiation.yml`.\n \nIn addition to the time-dependent fields, the constant fields were downloaded and processed using `scripts\n/download_and_regrid_constants.sh`\n \n### Downloading the TIGGE IFS baseline\n\nTo obtain the operational IFS baseline, we use the [TIGGE Archive](https://confluence.ecmwf.int/display/TIGGE\n). Downloading the data for Z500 and T850 is done in `scripts/download_tigge.py`; regridding is done in `scripts\n/convert_and_regrid_tigge.sh`.\n\n### Regridding the T21 IFS baseline\n\nThe T21 baseline was created by Peter Dueben. The raw output can be found in the dataset. To regrid the data `scripts\n/convert_and_regrid_IFS_TXX.sh` was used.\n\n### Downloading and regridding CMIP historical climate model data.\n\nTo download historical climate model data use the Snakemake file in `snakemake_configs_CMIP`. Here, we downloaded data from the `MIP-ESM-HR` model. To download other models, search for the download links on the CMIP website and modify the scripts accordingly.\n\n### Extracting single levels from 3D files\n\nIf you would like to extract a single level from 3D data, e.g. 850 hPa temperature, you can use `src\n/extract_level.py`. This could be useful to reduce the amount of data that needs to be loaded into RAM. An example\n usage would be: `python extract_level.py --input_fns DATADIR/5.625deg/temperature/*.nc --output_dir OUTDIR --level 850`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpangeo-data%2FWeatherBench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpangeo-data%2FWeatherBench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpangeo-data%2FWeatherBench/lists"}