{"id":17332854,"url":"https://github.com/raybellwaves/xskillscore-gpu","last_synced_at":"2025-03-27T06:26:03.825Z","repository":{"id":226969999,"uuid":"770073736","full_name":"raybellwaves/xskillscore-gpu","owner":"raybellwaves","description":null,"archived":false,"fork":false,"pushed_at":"2024-03-15T01:13:14.000Z","size":799,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-01T11:41:24.502Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raybellwaves.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-10T20:53:26.000Z","updated_at":"2024-03-11T02:32:09.000Z","dependencies_parsed_at":"2025-02-01T11:40:21.374Z","dependency_job_id":"9852747b-9d77-4443-8f8e-d79fe589e3d6","html_url":"https://github.com/raybellwaves/xskillscore-gpu","commit_stats":null,"previous_names":["raybellwaves/xskillscore-gpu"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raybellwaves%2Fxskillscore-gpu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raybellwaves%2Fxskillscore-gpu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raybellwaves%2Fxskillscore-gpu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raybellwaves%2Fxskillscore-gpu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raybellwaves","download_url":"https://codeload.github.com/raybellwaves/xskillscore-gpu/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245794139,"owners_count":20673129,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T14:58:59.460Z","updated_at":"2025-03-27T06:26:03.806Z","avatar_url":"https://github.com/raybellwaves.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# xskillscore-gpu\n\nWIP: Run [xskillscore](https://github.com/xarray-contrib/xskillscore) on a GPU.\n\n## How to implement\n\n![alt text](xs-gpu-meme.png \"Title\")\n\nIdeally things work with zero code changes up the stack:\n - import numpy as np; gives you speed up for `numpy` operations if you have a GPU. Probably with some fallback to CPU if memory issues etc.\n - You can currently dispatch in `numpy` if you pass a cupy array (`np.absolute(cupy_array_forecasts, cupy_array_observations)`) to `numpy` functions which is a step towards zero-code changes.\n - xarray has [cupy-xarray](https://github.com/xarray-contrib/cupy-xarray) but I would say it needs to TLC.\n\nA simple way to proceed here is custom code which hopefully demonstrates a speed up using a GPU:\n - for loop using a numpy functions (CPU)\n - `xr.appy_unfunc` which has some acceleration build in and fit my data structure (CPU)\n - for loop using a cupy functions (GPU)\n - vectorized and parallel version using cupy and/or numba?\n\n## Problem to solve\n\n[weatherbench2](https://weatherbench2.readthedocs.io/en/latest/) offers data which represents the scale of the problem.\n\nHere I want my data to fit into GPU memory (~20 Gb).\n\nFrom taking a peek at the data:\n```\nimport xarray as xr\nxr.open_zarr(\"gs://weatherbench2/datasets/era5/1959-2022-6h-1440x721.zarr\")[\"2m_temperature\"]\n```\n\nWe can generate medium sized data using an array size of (time: 1000, latitude: 721, longitude: 1440)\nwhich is around 4GB using float32\n\n\n### Setup\n\nAWS EC2 machine g5.2xlarge with specs:\n - A10 (Compute Capability 8.6), 24 Gb GPU memory\n - 8 virtual CPUs, 32 Gb memory\n\nAMI: Deep Learning AMI GPU PyTorch 2.1.0 (Ubuntu 20.04) (20240208; ami-0da80daf69cab6d24)\n\nLog in using\n`ssh -v -i ~/.aws/KEYNAME.pem -L 8000:localhost:8000 -L 8787:localhost:8787 -L 8888:localhost:8888 ubuntu@IPADRESS`\n\nEdit `~/.condarc` from `channel_priority: strict` to `channel_priority: flexible`\n\n```\n$ conda init bash\n$ exec bash\n$ git clone https://github.com/raybellwaves/xskillscore-gpu.git\n$ cd xskillscore-gpu\n$ mamba env create -f env.yml\n$ conda activate xskillscore-gpu-dev\n$ jupyter lab\n```\n\n## Why?\n\nGoal of zero code changes if you are an xskillscore user and you have a GPU.\n\n## Background\n\n### What is xskillscore?\n\nA generic library to calculate skillscores but mostly for for weather/climate forecasts (xarray).\n\n### How does xskillscore work?\n\nxskillscore is mostly just a ufunc library that contains ufuncs that\nare passed to [`xarray.apply_ufunc`](https://docs.xarray.dev/en/stable/generated/xarray.apply_ufunc.html).\n\nFunctions are written using `numpy` for `xarray.Dataset`'s and there\nis acceleration built into xarray using `dask` and `numba`.\n\nTraditional calcuation of skillscores happens using for loops. The example below is how someone could create the \nmean absolute error (MAE) for a few weather forecasts to create a 2D map (latitude, longitude) of MAE.\nHere MAE is applied over the time dimension of two arrays (observations and forecasts).\n\n```\nmae = np.zeros((len(latitudes), len(latitudes))\nfor i lat in range(0, len(latitudes)):\n    for j in range(0, len(latitudes)):\n        mae[i, j] = np.absolute(forecasts[i, j, :], observations[i, j, :]\n```\n\nThese for loops are essentially an embaraissingly parallel problem as they can be run independently.\nThis is what `xarray.apply_ufunc` does.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraybellwaves%2Fxskillscore-gpu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraybellwaves%2Fxskillscore-gpu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraybellwaves%2Fxskillscore-gpu/lists"}