{"id":22410979,"url":"https://github.com/e3sm-project/hiccup","last_synced_at":"2025-10-25T20:02:48.830Z","repository":{"id":45501913,"uuid":"233124100","full_name":"E3SM-Project/HICCUP","owner":"E3SM-Project","description":"Hindcast Initial Condition Creation Utility/Processor","archived":false,"fork":false,"pushed_at":"2024-11-05T19:00:08.000Z","size":214157,"stargazers_count":10,"open_issues_count":0,"forks_count":3,"subscribers_count":134,"default_branch":"main","last_synced_at":"2024-11-05T20:19:25.217Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/E3SM-Project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-10T20:20:33.000Z","updated_at":"2024-11-05T19:00:13.000Z","dependencies_parsed_at":"2024-08-23T21:27:54.738Z","dependency_job_id":"bccf6628-1803-4fe1-b37f-0325d60b58ca","html_url":"https://github.com/E3SM-Project/HICCUP","commit_stats":{"total_commits":317,"total_committers":4,"mean_commits":79.25,"dds":0.03785488958990535,"last_synced_commit":"3a7a2d592b1373596b02aa9372c1f2851cfae7e5"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/E3SM-Project%2FHICCUP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/E3SM-Project%2FHICCUP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/E3SM-Project%2FHICCUP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/E3SM-Project%2FHICCUP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/E3SM-Project","download_url":"https://codeload.github.com/E3SM-Project/HICCUP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228304099,"owners_count":17898920,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-05T13:12:37.071Z","updated_at":"2025-10-10T03:13:43.333Z","avatar_url":"https://github.com/E3SM-Project.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## HINDCAST INITIAL CONDITION CREATION UTILITY/PROCESSOR (HICCUP)\n\nThis is a tool for creating [E3SM](https://e3sm.org/) initial condition files from reanalysis with \na focus on modularity and portability.\n\nThe tool is used by selecting one of the template scripts, such as:\n\n  `template_scripts/create_initial_condition_from_obs.py`\n\nMake a copy of this in the `user_scripts` directory, which will be ignored by git. This template\nscript copy then needs to be edited to update paths and configure the list of tasks to fit the\nuser's needs.The new user script can then be excuted after loading a suitable conda environment\n(see Setup Notes).\n\n--------------------------------------------------------------------------------\n\n### TABLE OF CONTENTS\n  - [Setup Notes](#setup-notes)\n  - [Obtaining Input data](#obtaining-input-data)\n  - [Vertical Grid Files](#vertical-grid-files)\n  - [SST and Sea Ice Initial Conditions](#sst-and-sea-ice-initial-conditions)\n  - [Land Model Initial Conditions](#land-model-initial-conditions)\n  - [Generating HICCUP Initial Conditions](#generating-hiccup-initial-conditions)\n  - [Running a Hindcast](#running-a-hindcast)\n  - [Plotting Initial Condition Data](#plotting-initial-condition-data)\n  - [Hindcast Analysis and Verification](#hindcast-analysis-and-validation)\n  - [Data for Testing and Development](#data-for-testing-and-development)\n  - [Development Plans](#development-plans)\n\n--------------------------------------------------------------------------------\n\n### Setup Notes\n\nDependencies:\n  * [NCO](http://nco.sourceforge.net/)\n  * [TempestRemap](https://github.com/ClimateGlobalChange/tempestremap)\n  * Python modules:\n    * [xarray](http://xarray.pydata.org/en/stable/) - *primary data manipulation tool*\n    * [pandas](https://pandas.pydata.org/) - *helpful for handling time coordinate*\n    * [netcdf4](https://unidata.github.io/netcdf4-python/) - *needed for writing netcdf4 files with xarray*\n    * [hdf5](https://www.h5py.org/) - *needed for netcdf4 format - important for fine grids like ne1024*\n    * [scipy](https://www.scipy.org/) - *needed to fill missing SST data around poles*\n    * [cdsapi](https://pypi.org/project/cdsapi/) - *for obtaining ECMWF data*\n    * [ftplib](https://docs.python.org/3/library/ftplib.html) - *for obtaining NOAA sst/ice data*\n\nIt is convenient to create a conda env that includes all these dependencies:\n  ```\n  conda create --name hiccup_env -c conda-forge xarray dask pandas numpy scipy netcdf4 hdf5 cdsapi tempest-remap \"nco\u003e=5.3.1\" \n  ```\n\nAfter creating the environment it can be activated via:\n\n  `source activate hiccup_env`\n\nAlternatively, you can use the E3SM unified environment\n\nOn Perlmutter:\n\n`source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh`\n\nOn Chrysalis:\n\n`source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh`\n\nYou can then install HICCUP into your python environment by running:\n```\n   pip install ./\n```\nwhich will allow you to import `hiccup` from any directory.\n\nNote that if the user is making changes to the underlying hiccup code, the `./setup.py install` process needs to be repeated for any changes to take effect.\n\nTempestRemap and NCO may already be locally available if you are working on a machine at a super-computing center. They can also be installed manually, but we recommend including them in the hiccup conda environment to avoid conflicts.\n\nThe default paths for things like grid files, mapping files, and output data is set to local directories. However, when working on a machine at a super-computering center, like NERSC or OLCF, it is useful to avoid filling up ones home directory with this data, especially for high resolution output data. We recommend creating a folder on scratch space and using this to set file path variables when calling create_hiccup_data().\n\n--------------------------------------------------------------------------------\n\n### Obtaining Input Data\n\nCurrently. ERA5 + NOAA SST/ice is the preferred input data option.\nTo aquire new ERA5 data, be sure \"cdsapi\" is in your conda environment\nand you've sset up your CDS API key in `~/.cdsapirc`.\n\nYou can then use the `get_hindcast_data.ERA5.py` tool to obtain a single pair of \nERA5 pressure level and surface data files with\n\n  `python get_hindcast_data.ERA5.py --start-date=\u003cyyyymmdd\u003e --output-root=\u003cpath\u003e`\n\nAlternatively, you can obtain ERA5 files over a range of dates with a specified\nhourly frequency with\n\n  `python get_hindcast_data.ERA5.py --start-date=\u003cyyyymmdd\u003e --final-date=\u003cyyyymmdd\u003e --start-hour=\u003chh\u003e --final-hour=\u003chh\u003e --data-freq=3h --output-root=\u003cpath\u003e`\n\nNote that while the `--output-root` argument is optional, it is recommended to \nmake sure this points to a location on a scratch disk with sufficient space \nfor large data files.\n\nSimilarly, 0.25 degree NOAA OI daily SST and sea ice data can be obtained in\nyearly files by using the `get_hindcast_data.NOAA_SSTICE.py` tool with command\nline arguments to specify a year, or range of years as follows:\n\n  `python get_hindcast_data.NOAA_SSTICE.py --start-year=\u003cyyyy\u003e --final-year=\u003cyyyy\u003e --output-root=\u003cpath\u003e`\n\nFor a single year, omit the `--final-year` argument.\n\n--------------------------------------------------------------------------------\n\n### Vertical Grid Files\n\nThe current E3SM vertical grid was created through an iterative process \ninvolving numerous, undocumented, subjective decisions mainly by Phil Rasch \nand Po-Lun Ma who did not document the process, so there is no recipe to \nrecreate the grid from scratch. \n\nA vertical grid file for the L80 grid used by E3SMv3 atmosphere is included in the HICCUP repository.\n  \n  `files_vert/L80_for_E3SMv3.nc`\n\nIn addition to other atmosphere vertical grids for other E3SM configuration.\n\nTo create a new vertical coordinate file it must be extracted from a \npre-existing model data file as follows:\n\n  1. Dump the vertical grid data into a text file using ncdump:\n     \n     `ncdump -v P0,hyam,hybm,hyai,hybi,lev,ilev \u003chistory_file\u003e \u003e vert_coord.txt`\n\n  2. manually edit the file to remove extra header info,\n     but keep the general CDL format created by ncdump\n\n  3. Generate a new netcdf file from the edited text file using ncgen:\n     \n     `ncgen vert_coord.txt -o vert_coord.nc`\n\nVertical grid information can also be procedurally constructed. Future HICCUP updates will bring in template scripts for modifying existing grids and generating new vertical grids from scratch.\n\n--------------------------------------------------------------------------------\n\n### SST and Sea Ice Initial Conditions\n\nHICCUP can generate a file with SST and sea ice data that matches the format that E3SM expects when running a hindcast with prescribed ocean/ice conditions. NOAA OI data is currently the only supported option for this, but HICCUP can easily use ERA5 data. \n\nSeveral options are implemented in the `sstice_slice_and_remap()` routine to control how the time coordinate of this data is handled:\n```\ntime_slice_method='match_atmos'   match the time coordinate with the date of the atmospheric initial condition\ntime_slice_method='initial'       use the first time index of the SST and sea ice data\ntime_slice_method='use_all'       remap all times provided for transient SSTs\n```\n\nFor more information on the difference between these approaches see this wiki page =\u003e [Fixed vs. Transient SST](https://github.com/E3SM-Project/HICCUP/wiki/Fixed-vs.-Transient-SST)\n\nThe first two methods will yield a simulation with SST and sea conditions that are \"fixed\" at the time of initialization, while the third option provides a simple way to produce a simulation with transient SST and sea ice conditions. \n\nWe plan to implement other methods in the future for handling the time of SST data \nto be more flexible for hig-res runs, such as specifying a specific window of \nSST/ice data to remap and include in the file output file. \n\n--------------------------------------------------------------------------------\n\n### Land Model Initial Conditions\n\nHICCUP does not currently support the generation of land model initial condition\nfiles. This might be possible with the data available from ERA5, but the current\nrecommendation is to spin up the land model for 5-10 years leading up to the \ndesired initialization date using the standalone land model forced by the data\natmosphere component. \n\n--------------------------------------------------------------------------------\n\n### Generating HICCUP Initial Conditions\n\nAfter the input data is aquired, HICCUP can be used to generate initial conditions \nby editing and running the `template_scripts/create_initial_condition_from_obs.py` \nscript. This script controls the workflow for generating the atmosphere initial \ncondition as well as the SST/sea-ice data file.\n\nThe HICCUP workflow centers on a \"hiccup_data\" object that carries the information \nneeded for processing the data as well as class methods for processing the data. \nThere is also a python dictionary of temporary file names that are used to store \nthe data for each variable during processing. This appraoch of separating the \ndata variables may seem odd, but it is necessary for very large datasets, so it \nwas adopted to avoid supporting multiple workflows. Currently, this dict of files\nand the final output file is separate from the hiccup_data object, but we are \nconsidering putting these into the hiccup_data object to simplify the workflow.\n\nHICCUP is designed to be as modular as possible, but the order in which the input \ndata are processed is very important. The most important part of this is the \nregridding and surface adjustment sections. The process must start with the \nhorizontal regridding, which alters the surface topography and requires an \nadjustment of surface temperature and pressure. The variable renaming and \nadjustment of time and date information is also done after the horizontal \nregridding. The vertical regridding is the last step in this process because it \nmust follow the surface adjustment.\n\n--------------------------------------------------------------------------------\n\n### Running a Hindcast\n\nAfter using HICCUP to generate the atmosphere and SST/ice files, an E3SM \nhindcast can be run by following the steps to run a typical \"F-compset\" run, \nusing the compsets such as FC5AV1C-L. The initialization data needs to be copied \nto the scratch space of the machine to ensure they are accessible to the compute \nnodes. \n\nThe atmospheric initial condition file is specified by editing the \"user_nl_eam\"\nfile found in the case directory to include:\n  \n  ncdata = \u003c path to hiccup atmos initial condition file \u003e\n\nThe SST file and start date values also need to be specified by modifying the \nenv_run.xml file in the case directory. The preferred method for doing this is \nto use the xmlchange command from the case directory as in the example below:\n\n  ```\n  ./xmlchange SSTICE_DATA_FILENAME=\u003cpath to SST file\u003e\n  ./xmlchange RUN_STARTDATE=2016-08-01\n  ./xmlchange SSTICE_YEAR_ALIGN=2016\n  ./xmlchange SSTICE_YEAR_START=2016\n  ./xmlchange SSTICE_YEAR_END=2017\n  ```\n\nIf using a python script to run the hindcast, here's a snippet of code that \ndoes the modifications described above:\n\n  ```python\n  ################################################\n  # python code to setup hindcast files\n  ################################################\n  iyr,imn,idy = 2016,8,1\n  init_date = f'{iyr}-{imn:02d}-{idy:02d}'\n  init_file_atm = f'\u003cpath-to-hiccup-data\u003e/HICCUP.atm_era5.{init_date}.ne30np4.L72.nc'\n  init_file_sst = f'\u003cpath-to-hiccup-data\u003e/HICCUP.sst_noaa.{init_date}.nc'\n  os.chdir(case_directory)\n  os.system(f'./xmlchange RUN_STARTDATE={init_date}')\n  os.system(f'./xmlchange SSTICE_DATA_FILENAME={init_file_sst}')\n  os.system(f'./xmlchange SSTICE_YEAR_ALIGN={iyr}')\n  os.system(f'./xmlchange SSTICE_YEAR_START={iyr}')\n  os.system(f'./xmlchange SSTICE_YEAR_END={iyr+1}')\n\n  file = open('user_nl_eam','a') \n  file.write(f' ncdata = \\'{init_file_atm}\\'\\n')\n  file.close()\n  ################################################\n  ################################################\n  ```\n\n--------------------------------------------------------------------------------\n\n### Plotting Initial Condition Data\n\nA plotting script is also included (plot.sanity_check.py), but it requires \nPyNGL (https://www.pyngl.ucar.edu/) to be installed in the python environment.\nThis was done becase PyNGL has excellent support for plotting data on \nunstructured grids. However, PyNGL has been put into \"maintenance mode\", so in \nthe future we need to change these scripts to use MatPlotLib https://matplotlib.org/ \nor GeoCAT (https://geocat.ucar.edu/).\n\n--------------------------------------------------------------------------------\n\n### Hindcast Analysis and Validation\n\nThe task of analyzing the hindcast output data is up to user for now, although \nwe may include some simple skill/error metrics in the future. For now, we have \nincluded a few simple scripts for obtaining and remapping ERA5 validation data.\n\n  `get_validation_data.ERA5.py`\n\n  `remap.validation_data.ERA5.py`\n\n\nThese scripts are configured to obtain a set of atmospheric fields on common \npressure levels, like U200 and Z500, that are typically used for calculating \nforecast skill. The remap script is configured to put the data on a relatively \ncoarse 2 degree grid in order to simplify the calculation of global metrics. \n\n--------------------------------------------------------------------------------\n\n### Testing\n\nFor simple testing of HICCUP functionality the repo includes low-resolution test data from ERA5 and NOAA in the `test_data` folder. These files are used by the `test_scripts/test.*` to exercise the typical HICCUP workflow for generating model input data from observation data and reanalysis. There are also remapping scripts that can be used to regenerate the low-res test data. A unit test script `unit_test.state_adjustment.py` is also provided to directly test the surface adjustment routines, and hopefully more unit tests will be added in the future. \n\n--------------------------------------------------------------------------------\n\n### Development Plans\n\nBelow are some ideas for future HICCUP enhancments:\n\n- **Fix EAMxx/SCREAM support** - The current workflow is tailored for EAM, and the user needs to use the conversion script in the SCREAM_utils folder to rename and reshape the data in the initial condition file. This step can be completely comitted by adding some special logic to the EAMxx HICCUP data class to do this renaming and reshaping. \n- **Detect whether input files are packed** - The ERA5 data from CDS come \"packed\" and the NCO unpacking command takes quite a while, even for small files that are already unpacked. The current workflow requires the user to know the state of the input files, so a method for automatically checking whether unpacking needs to be done would be very helpful. This would simplify the workflow a bit because we could delete the flag and line for this unpacking step and just have it done when the hiccup_data object is created.\n- **Add simple run script templates for hindcasts and land spin-up** - I have many scripts for these things that are much simpler than the standard [monolithic] E3SM run script for production coupled runs, but they are specific to individual machines and file systems. A more general and simplified script for this would be helpful. \n- **Validation data framework?** - Currently there are few python scripts for obtaining ERA5 validation data, but this could be improved by polishing a separate workflow for this, perhaps including an automated calculation of forecast error.\n- **Fix support for using ERA5 SST and sea-ice** - I forget what the issue was here, but this has been requested a few times and I think it would be a valuable feature to have.\n- **Add support for CFSR / GFS / MERRA / JRA55** - This has proven difficult due to the ways these datasets are organized. ERA5 offers a lot of flexibilty to facilitate an automated workflow, but other datasets have a single format that must be accomodated. For example, if files are only offered as one variable per file with multiple time steps then a user who needs a single initial condition file at 00Z will have to download orders of magnitude more data than they, and the HICCUP back-end will require special exceptions for how to load each dataset and how the input arguments are sturctured, which will also require many more specialized checks to ensure the data is self-consistent, which seems error-prone. \n\n\n--------------------------------------------------------------------------------\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fe3sm-project%2Fhiccup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fe3sm-project%2Fhiccup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fe3sm-project%2Fhiccup/lists"}