{"id":17405027,"url":"https://github.com/answerquest/imd-grid-data-work","last_synced_at":"2025-04-15T19:41:10.048Z","repository":{"id":45206797,"uuid":"513411103","full_name":"answerquest/IMD-grid-data-work","owner":"answerquest","description":"Some work on IMD Pune's gridded data sets","archived":false,"fork":false,"pushed_at":"2023-01-15T11:03:02.000Z","size":3232,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T00:22:43.579Z","etag":null,"topics":["gis","gridded-data","india","postgresql","python","rainfall-data","weather-data"],"latest_commit_sha":null,"homepage":"https://server.nikhilvj.co.in/imd_data/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/answerquest.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-07-13T06:46:54.000Z","updated_at":"2024-08-25T17:16:58.000Z","dependencies_parsed_at":"2023-02-09T22:00:54.276Z","dependency_job_id":null,"html_url":"https://github.com/answerquest/IMD-grid-data-work","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/answerquest%2FIMD-grid-data-work","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/answerquest%2FIMD-grid-data-work/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/answerquest%2FIMD-grid-data-work/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/answerquest%2FIMD-grid-data-work/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/answerquest","download_url":"https://codeload.github.com/answerquest/IMD-grid-data-work/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249140384,"owners_count":21219285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gis","gridded-data","india","postgresql","python","rainfall-data","weather-data"],"created_at":"2024-10-16T20:22:37.461Z","updated_at":"2025-04-15T19:41:10.025Z","avatar_url":"https://github.com/answerquest.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IMD-grid-data-work\nSome work on IMD Pune's gridded data sets  \nAuthor: Nikhil VJ, https://nikhilvj.co.in  \n\nSource URL: https://imdpune.gov.in/Clim_Pred_LRF_New/Grided_Data_Download.html | Alternate: https://imdpune.gov.in/lrfindex.php -\u003e See under Gridded data in side menu.  \n\nIntentions of this project: To make this data more accessible for people, to show much simpler code to extract the data than what I've seen online. And to get my hands dirty on a large trove of Indian open data :)\n\n\n## Website\n- See https://server.nikhilvj.co.in/imd_data/ -\u003e Has the code in imd_data_api/ folder deployed.\n- You can select a location and year, and download data as a simple flat CSV (table) file.\n\n\n## Direct data extract from .GRD files\nNote: All code is in python  \nInstall [imdlib](https://pypi.org/project/imdlib/) package  \nAssuming that you've downloaded 2010 data file, saved it as \"2010.grd\" in \"rain\" folder next to the program :  \n```\nimport imdlib\nrain1 = imdlib.open_data('rain', 2010, 2010, 'yearwise').get_xarray().to_dataframe()\nrain2 = rain1[rain1['rain'] \u003e -100].reset_index()\nrain2.to_csv('rain_2010.csv',index=False)\n```\n\nThe functions `.get_xarray()`, `.to_dataframe()` and `.reset_index()` do the job of converting the multi-dimensional dataset into a flat table that can be saved to CSV, excel, etc.\n\nTip: Do this in Jupyter Notebook, do just till .get_xarray() and then print the variable directly in a cell. It's beautiful.\n\nMore extended tutorial in the author's blog: https://saswatanandi.github.io/softwares/imdlib/\n\n## Database loading program : imd_grid_import\n- A script (2 actually) to fetch the gridded data downloads from IMD, process it and load into a local dockerized PostGreSQL DB. See the Readme in the imd_grid_import/ folder for more details.\n\n## Database structure explainer\n- Update: Separate temperature data tables made; same structure, but with just temperature data which is smaller and will be faster to query. \n- Even after removing all junk data, there's a v.large number of datapoints per yr - around 1.18 Million. Granularity: per date and location.\n- Loading each of these into DB takes more time, occupies huge space and even fetching them takes v.long\n- Nature of fetching data: Most likely we'll never be fetching just one date's data (like: 2020-01-14) at a time. More likely we'll be fetching for a whole month at a go at least, but for an individual location.\n- So, it makes sense to group the data by : Year + month + Location, and store the grouped data in a JSON column.\n- Sample data in DB for Grid location (28.5,72.5) , Jan 2020:\n```\n{\"2020-01-01\": {\"rain\": 0.0, \"tmax\": 17.580678939819336, \"tmin\": 3.262223720550537},\n \"2020-01-02\": {\"rain\": 0.0, \"tmax\": 20.557445526123047, \"tmin\": 6.12726354598999},\n \"2020-01-03\": {\"rain\": 0.0, \"tmax\": 21.97892951965332, \"tmin\": 7.391693115234375},\n \"2020-01-04\": {\"rain\": 0.0, \"tmax\": 22.463241577148438, \"tmin\": 7.331312656402588},\n \"2020-01-05\": {\"rain\": 0.0, \"tmax\": 22.185802459716797, \"tmin\": 7.799867630004883},\n \"2020-01-06\": {\"rain\": 0.0, \"tmax\": 19.416574478149414, \"tmin\": 10.629579544067383},\n \"2020-01-07\": {\"rain\": 0.0, \"tmax\": 18.216487884521484, \"tmin\": 10.342350959777832},\n \"2020-01-08\": {\"rain\": 0.2756316661834717, \"tmax\": 17.91546058654785, \"tmin\": 9.598325729370117},\n \"2020-01-09\": {\"rain\": 0.0, \"tmax\": 18.36847496032715, \"tmin\": 4.737661838531494},\n \"2020-01-10\": {\"rain\": 0.0, \"tmax\": 19.597763061523438, \"tmin\": 3.922478437423706},\n \"2020-01-11\": {\"rain\": 0.0, \"tmax\": 21.578903198242188, \"tmin\": 6.793002605438232},\n \"2020-01-12\": {\"rain\": 0.0, \"tmax\": 23.62282943725586, \"tmin\": 8.204022407531738},\n \"2020-01-13\": {\"rain\": 7.230362892150879, \"tmax\": 17.89722442626953, \"tmin\": 11.31865119934082},\n \"2020-01-14\": {\"rain\": 0.5124186873435974, \"tmax\": 17.625137329101562, \"tmin\": 5.582608222961426},\n \"2020-01-15\": {\"rain\": 0.0, \"tmax\": 17.577001571655273, \"tmin\": 4.793914794921875},\n \"2020-01-16\": {\"rain\": 0.0, \"tmax\": 16.759170532226562, \"tmin\": 6.7936177253723145},\n \"2020-01-17\": {\"rain\": 0.0, \"tmax\": 19.58401870727539, \"tmin\": 5.236929416656494},\n \"2020-01-18\": {\"rain\": 0.0, \"tmax\": 19.54751205444336, \"tmin\": 5.679737567901611},\n \"2020-01-19\": {\"rain\": 0.0, \"tmax\": 18.521821975708008, \"tmin\": 5.7684712409973145},\n \"2020-01-20\": {\"rain\": 0.0, \"tmax\": 19.22909164428711, \"tmin\": 6.524430751800537},\n \"2020-01-21\": {\"rain\": 0.0, \"tmax\": 21.767934799194336, \"tmin\": 8.751236915588379},\n \"2020-01-22\": {\"rain\": 0.0, \"tmax\": 21.532318115234375, \"tmin\": 8.174297332763672},\n \"2020-01-23\": {\"rain\": 0.0, \"tmax\": 21.776113510131836, \"tmin\": 7.345406532287598},\n \"2020-01-24\": {\"rain\": 0.0, \"tmax\": 22.189123153686523, \"tmin\": 6.468899250030518},\n \"2020-01-25\": {\"rain\": 0.0, \"tmax\": 24.130014419555664, \"tmin\": 6.97148323059082},\n \"2020-01-26\": {\"rain\": 0.0, \"tmax\": 25.91004180908203, \"tmin\": 7.372500896453857},\n \"2020-01-27\": {\"rain\": 0.0, \"tmax\": 23.614274978637695, \"tmin\": 11.573892593383789},\n \"2020-01-28\": {\"rain\": 7.257974147796631, \"tmax\": 19.39422607421875, \"tmin\": 11.302903175354004},\n \"2020-01-29\": {\"rain\": 0.0, \"tmax\": 21.57762336730957, \"tmin\": 6.99928092956543},\n \"2020-01-30\": {\"rain\": 0.0, \"tmax\": 21.596620559692383, \"tmin\": 7.587302207946777},\n \"2020-01-31\": {\"rain\": 0.0, \"tmax\": 21.153125762939453, \"tmin\": 6.719666004180908}}\n ```\n - With one line, this dict can be turned into a flat pandas dataframe table in python:  \n `df = pd.DataFrame(data).transpose().reset_index().rename(columns={'index':'date'})`\n- Like this, the number of rows in DB for one year reduces from 1.81M to around 60k : reduction to around 3% or by 30x.\n- This results in a lot faster speed in retrieving the data from DB, doing geospatial queries etc.\n- The entire IMD gridded dataset is in DB in 7,247,891 (~7.2M) rows.\n\nNote: tmax and tmin were available at lower grid resolution than rainfall data, so in the DB table imd_data there will be locations that only have rainfall data.\n\n### Update: separate temperature tables added\n`imd_temp_data` and `temp_grid` tables contain data and grid locations respectively of just the temperature records. They're much smaller in quantity than the rain records, so use these if you only want temperature data.\n\n## Downloaded data checksums\n- See [sha256_checksum_rain.txt](sha256_checksum_rain.txt), [sha256_checksum_tmax.txt](sha256_checksum_tmax.txt), [sha256_checksum_tmin.txt](sha256_checksum_tmin.txt) files in this repo to see the checksums of the downloaded .grd data from IMD site. \n- This can be used to cross-check data authenticity / detect if there have been changes in the data published in the website after July 2022 when I had downloaded them.\n- It's good practice for the publishing site to publish these checksums next to their data, to give end users a way to ensure there's been no file corruption or middle-player manipulation. Recommending IMD site to do this.\n\n## Sample notebooks\nCheck out the .ipynb Jupyter notebooks (python3 programs) here showing sample code to work with the data in Database once you have it ready.\n\n## Sample viz\nFor location [18.5,74.0 nr Pune, India](https://www.openstreetmap.org/#map=11/18.5/74/0), cumulative monthly rainfall from 1901 to 2021:\n\n![rainV_18.5,74.0.png](rainV_18.5,74.0.png)\n\nSee [2022-07-14 rainfal viz 1.ipynb](https://github.com/answerquest/IMD-grid-data-work/blob/main/2022-07-14%20rainfal%20viz%201.ipynb) for the code that made this.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanswerquest%2Fimd-grid-data-work","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanswerquest%2Fimd-grid-data-work","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanswerquest%2Fimd-grid-data-work/lists"}