{"id":42555418,"url":"https://github.com/jgcri/diffesm","last_synced_at":"2026-01-28T19:39:09.678Z","repository":{"id":213148522,"uuid":"730785149","full_name":"JGCRI/diffesm","owner":"JGCRI","description":"Diffusion model software to emulate Earth System Models (ESMs) for daily temperature and precipitation","archived":false,"fork":false,"pushed_at":"2024-05-31T15:14:33.000Z","size":4007,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-09-04T21:59:34.674Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JGCRI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-12-12T17:03:59.000Z","updated_at":"2025-03-23T18:12:07.000Z","dependencies_parsed_at":"2023-12-18T22:27:13.330Z","dependency_job_id":"9d8cada2-125b-4540-b807-69210a4de3c6","html_url":"https://github.com/JGCRI/diffesm","commit_stats":null,"previous_names":["jgcri/diffesm"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/JGCRI/diffesm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JGCRI%2Fdiffesm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JGCRI%2Fdiffesm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JGCRI%2Fdiffesm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JGCRI%2Fdiffesm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JGCRI","download_url":"https://codeload.github.com/JGCRI/diffesm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JGCRI%2Fdiffesm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28850473,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-28T15:15:36.453Z","status":"ssl_error","status_checked_at":"2026-01-28T15:15:13.020Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-28T19:39:09.140Z","updated_at":"2026-01-28T19:39:09.669Z","avatar_url":"https://github.com/JGCRI.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DiffESM\nDiffusion model software to emulate Earth System Models (ESMs) for daily temperature and precipitation. This software is capable of generating new daily precipitation or temperature data for previously unseen scenarios, with many potential applications (e.g., estimating or characterizing extreme weather phenomena such heat waves or dry spells under hypothetical future climate scenarios).\n\n## Setup Instructions\n\n1. We use weights and biases for logging. To use it, you will need to make an account: [https://wandb.ai/site](https://wandb.ai/site)\n\n2. To set up your environment using Conda, follow these steps:\n\n   a. First, ensure you have Conda installed on your system. If not, download and install it from [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or [Anaconda](https://www.anaconda.com/products/individual).\n\n   b. Clone the repository to your local machine:\n      ```bash\n      git clone https://github.com/your-username/your-repo-name.git\n      cd your-repo-name\n      ```\n\n   c. Create a Conda environment using the `environment.yml` file provided in the repository:\n      ```bash\n      conda env create -f environment.yml\n      ```\n\n   d. Activate the newly created environment:\n      ```bash\n      conda activate diffesm\n      ```\n\n   e. After activating the environment, you can proceed with the rest of the setup.\n\n## Preparing the Data\nTo train the diffusion model, we first have to preprocess the data into a format that the training script is expecting. Follow these steps to preprocess and organize your data:\n\n### Step 1: Consolidate Dataset\nCollect all data files into a single directory. Ensure each file is in `.nc` format.\n\n### Step 2: Create Dataset Description\nDevelop a JSON file to describe your dataset's structure and its variables. This file should outline at least three realizations for each of the training, validation, and testing sets.\n\nExample JSON structure:\n```json\n{\n   \"load_dir\" : \"/path/to/data_directory/\",\n    \"realizations\" : {\n        // Example of realizations for precipitation (pr) and temperature (tas)\n        // under different scenarios and time frames\n        \"r1\" : {\n            \"pr\" : [\"file1_1850_1950.nc\", \"file2_1950_2100.nc\", ...],\n            \"tas\" : [\"file3_1850_2006.nc\", \"file4_2006_2100.nc\", ...]\n        },\n        \"r2\" : {...},\n        \"r3\" : {...}\n    }\n}\n```\n### Step 3: Save JSON file\nStore the JSON file in a structured directory format:\n`/{path_to_directory}/{ESM_name}/{scenario_name}/data.json`\n\n### Step 4: Update Configuration Paths\nModify `configs/paths/default.yaml` (or an alternative configuration file in the paths directory) to include:\n\n- **json_data_dir**: The leading path to the JSON files.\n- **data_dir**: The path to the directory where processed data will be stored.\n\n### Step 5: Run Preprocessing Script\nIn `configs/prepare_data.yaml` specify:\n- The Earth System Model (IPSL, CESM, etcm...)\n- Scenario (rcp85, rcp45, etc...)\n- Dataset's start and end years\n- Number of chunks for data splitting\n\nFinally run `make prepare_data` to start processing. Note: This may take up to an hour for large datasets, but will only need to be run once\n\n\n\n## Training the Model\n\nAfter your data is ready, follow these steps to train the diffusion model, which aims to approximate the Earth System Model (ESM):\n\n### Configuration Setup\nThe `configs` directory contains all necessary configuration files for training. `configs/train.yaml` selects the default configuration for the following options\n\n- **Model Architecture:** Located in `config/model/`. These files defines the structure of the diffusion model.\n- **Scheduler:** Found in `config/scheduler/`. It manages the diffusion scheduler we use and defines the noising and denoising process.\n- **Dataset Configuration:** Specified in `config/data/`. It details what ESM and variables you want to use for your dataset.\n- **Training Hyperparameters:** Located in `config/trainer/`. This file includes settings like batch size, learning rate, and other critical parameters for training.\n\n### Hyperparameter Customization\nAdjust the hyperparameters in the configuration files to suit your specific training requirements.\n\n### Training Script Configuration\nUse the `scripts/train.sh` script to set the number of GPUs for training.\n\n### Start Training\nOnce all configurations are set, initiate the training process by running the command:\n```bash\nmake train\n```\n\nThis will start the model training based on your specified configurations.\n\n## Evaluation\n\nThe evaluation process involves generating and comparing 20 years of daily data to assess the model's performance. This is done in two main steps:\n\n### Step 1: Generating Validation and Test Sets\n- **Configure Data Generation:** Use the `generate.yaml` file to specify the type of data you want to generate (ESM, scenario, start/end years, and validation/test). This file is crucial for defining the parameters of your data generation process.\n- **Initial Generation:** First, generate the validation and test sets using the original Earth System Model (ESM) data. This step does not involve the trained model but relies on the ESM data to create baseline datasets.\n\n### Step 2: Generating Data with Trained Model\n- **Run Model Generation:** After creating the baseline datasets, run the same generation process, this time using your trained model. This will allow you to produce data that reflects the model's capabilities.\n- **Saving Generated Data:** The output from this process will be automatically saved to the directory specified as `save_dir` in the \"paths\" configuration file.\n\n### Executing the Generation Script\nTo initiate the data generation process for both steps, execute the following command:\n```bash\nmake generate\n```\n\n**Additional Notes:**\n\n- **Process Configuration:** The number of processes used during generation is set in the `scripts/gen_sample.sh` script.\n- **Time Consideration:** Depending on your hardware setup, the generation process may take several minutes.\n\n\n## Visualization\nFinally, you are ready to visualize your results! The bulk of visualization is performed in the `notebooks/data-viz.ipynb` notebook. The configuration for the data vizualization is specified in `configs/data_viz.yaml`. Currently, the notebook only supports vizualizations for temperature and precipitation, although other variables can be added in the future.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjgcri%2Fdiffesm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjgcri%2Fdiffesm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjgcri%2Fdiffesm/lists"}