{"id":21318811,"url":"https://github.com/sydney-informatics-hub/agrefed-ml","last_synced_at":"2025-07-12T03:31:10.095Z","repository":{"id":157021129,"uuid":"424401741","full_name":"Sydney-Informatics-Hub/AgReFed-ML","owner":"Sydney-Informatics-Hub","description":"Machine learning tools for modelling and predicting agriculture systems and their uncertainties.","archived":false,"fork":false,"pushed_at":"2024-05-06T23:22:37.000Z","size":49692,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-05-07T06:25:42.285Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sydney-Informatics-Hub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/Contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-03T22:41:57.000Z","updated_at":"2024-05-06T23:22:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"07614207-b4f2-4746-9dc4-e412422c50b6","html_url":"https://github.com/Sydney-Informatics-Hub/AgReFed-ML","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sydney-Informatics-Hub%2FAgReFed-ML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sydney-Informatics-Hub%2FAgReFed-ML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sydney-Informatics-Hub%2FAgReFed-ML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sydney-Informatics-Hub%2FAgReFed-ML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sydney-Informatics-Hub","download_url":"https://codeload.github.com/Sydney-Informatics-Hub/AgReFed-ML/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225789424,"owners_count":17524430,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-21T19:23:25.310Z","updated_at":"2024-11-21T19:23:25.909Z","avatar_url":"https://github.com/Sydney-Informatics-Hub.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AgReFed-ML\nData-driven Machine Learning for modelling and predicting agriculture systems and their uncertainties.\n\n\u003c!-- Badges  start --\u003e\n\n[![License](https://img.shields.io/badge/License-LGPL3-blue)](#license)\n[![DOI](https://zenodo.org/badge/424401741.svg)](https://zenodo.org/badge/latestdoi/424401741)\n\n\u003c!-- Badges end --\u003e\n\n## Content\n\n- [Introduction](#introduction)\n- [Method](#method)\n- [Functionality](#functionality)\n- [Installation](#installation)\n- [Use Case Scenarios](#use-case-scenarios)\n- [Contributions](#contributions)\n- [Attribution and Acknowledgments](#attribution-and-acknowledgments)\n- [License](#license)\n- [References](#references)\n\n\n## Introduction\n\nMachine learning (ML) models have emerged as a powerful approach for building agriculture soil models, allowing researchers to analyze large and complex spatiotemporal datasets to make predictions about soil properties and processes. The output of these models, such as spatiotemporal predictions, is used for a wide range of application (e.g., soil, yield, crops, carbon cycle). These models need to take into account multiple data sources such as ground and subsurface soil measurements, satellite imagery, climate data, and other remote sources. The fusion of these data sources is a challenging task, as it requires the development of inter-operable probabilistic data models and uncertainty quantification. The AgReFed-ML project is contributing software tools that provide reproducible machine learning workflows for agriculture researchers, with a focus on developing applications to map soil properties under sparse and uncertain input. These data-driven models are not limited to soil modeling only, but can be applied for a wide range of environmental applications.\n\n## Method\n\nThis model uses Gaussian Process regression with a complex base function and is particularly well-suited to agricultural applications because it can capture the underlying patterns and trends in soil data, as well as the inherent uncertainties associated with soil properties. By using such probabilistic Mixture Model, we can generate more accurate and reliable predictions of soil properties, which can be used to inform decision making and optimize crop management. More information about the probabilistic model details and feature selection can be found in the method description [Probabilistic Machine Learning for Modeling Environmental Systems and their Uncertainties](docs/Method.md).\n\nEach workflow consists of the following main steps:\n\n0) data preprocessing (included sample data already pre-processed)\n1) feature analysis and selection\n2) model training, optimization, evaluation, and model selection\n3) generating geo-referenced prediction and uncertainty maps\n\n\n## Functionality\n\nThe main functions supported by the workflow scripts are:\n\n- automatic feature importance analysis and ranking using using a multiple model approach\n- multiple machine learning models for soil properties under sparse and uncertain input:\n    - static 3D properties\n    - change model and temporal covariances\n    - spatial-temporal model\n- multi-model testing and automatic cross-validation on subsets of train and test data\n- visualisation of prediction maps of soil properties and uncertainties\n- support for importing/saving settings using YAML settings files for reproducible workflows\n- support for different spatial prediction types (e.g., points, blocks, polygons)\n- support for measurement errors in the observations (if provided as part of the input data)\n- generator function for of range of synthetic data for testing\n\n\u003cfigure\u003e\n    \u003cimg src=\"figures/feature_importance.jpg\" alt=\"Feature Importance\"\u003e\n    \u003cfigcaption\u003eExample plot of feature importance scores for multiple models.\u003cfigcaption\u003e\n\u003c/figure\u003e \n\nThe modelling approach includes the following features:\n\n- accommodate the spatial (-temporal) support of the observations\n- accommodate the spatial (-temporal) auto-correlation of the observations\n- accommodate measurement error of the observations\n- incorporate numerous variables as predictors (covariates)\n- prediction of heteroscedastic uncertainty estimates\n\nFor a complete overview of all functions, please refer to the [API reference documentation](https://sydney-informatics-hub.github.io/AgReFed-ML/python_scripts/index.html).\n\n\n## Installation\n\n### Local Installation\n\n1) Download or clone github repo\n2) Unzip samples.zip in folder notebook, which creates a folder notebook/samples with all sample data files\n3) Setup AgReFed environment with conda/mamba (installation):\n    - if conda not installed yet, please install (see e.g., for conda-miniforge [https://github.com/conda-forge/miniforge](https://github.com/conda-forge/miniforge)) \n    - run following commands in your terminal, as shown here for conda (if other environment used, please adjust):\n        ```bash\n        conda env create --file env_agrefed_combined.yaml\n\n        conda activate agrefed\n\n        cd notebooks\n        ```\n4) Open notebooks (see section below). Notebooks can be run, for example, in JupyterLab environment, or within VSCode (using Jupyter or Quarto plugin), or via ```jupyter notebook```\n\nThe environment file `env_agrefed_combined.yaml` includes all dependencies for this AgReFed Machine Learning project plus all dependencies for the AgReFed Harvester project, so both projects can be run in the same environment.\n\n### AgReFed Nectar Cloud Environment\n\nAs play-ground for testing the AgReFed-ML notebooks we provide a pre-installed cloud Python Jupyterlab environment, which does not require any local installation. This Jupyter environment is hosted on the ARDC Nectar Research Cloud in partnership with AgReFed and Australian Research Data Commons (ARDC). Note that this sandbox is currently hosted for test purposes only and generated data is not permanently stored.\n\nTo login to this platform, please follow the instructions:\n- login to [AgReFed Nectar Cloud](https://jupyterhub.rc.nectar.org.au/hub/login?next=%2Fhub%2F).\n- select as Server Option the `AgReFed Python environment` \n- open new Jupyter notebook and run the following command to clone the AgReFed-ML repo to your cloud home directory, and to unzip sample data:\n    ```python\n    # clone AgReFed-ML repo\n    !git clone https://github.com/Sydney-Informatics-Hub/AgReFed-ML\n    # unzip sample data\n    import zipfile\n    with zipfile.ZipFile(\"./AgReFed-ML/notebooks/samples.zip\", 'r') as zip_file:\n        zip_file.extractall(\"./AgReFed-ML/notebooks/\")\n    ```\n- start with opening a AgReFed-ML notebook in the `notebooks` folder\n\nThe AgReFed cloud environment is pre-installed with all dependencies for this AgReFed-ML project plus all dependencies for the AgReFed Harvester project, so both projects can be run in the same environment. In case any additional packages are required, please contact us. Alternatively additional packages can be installed via `!pip install` in a new cell in the notebook.\n\n\n## Use Case Scenarios\n\nThis project aims to demonstrates ML workflows for three use case scenarios as example applications for agricultural research. Each scenario is described by a reproducible workflow that includes feature engineering, model selection and validation, and prediction mapping/cubing. The workflows are implemented in Jupyter notebooks and can be run in a local environment. The notebooks are configured using YAML settings files, which can be used to adjust the workflow to different use cases. For how to run the notebooks, see the [Notebooks Guide](notebooks/README.md).\n\n### A) Static Soil Model\n\nThe static model is a spatial model for generating prediction maps of soil properties for one given time. The output are geo-referenced prediction and uncertainty maps (2D) at multiple soil depths. The soil model takes into account the spatial and depth correlations via a joint 3D GP kernel with two lengthscale hyperparameters (spatial and depth).\nAs example use-case, a spatial probabilistic model is trained and predictions are produced for multiple soil properties for a farm area (see figure below). \n\n\u003cfigure\u003e\n    \u003cimg src=\"figures/Map_data.jpg\" alt=\"Data Map\"\u003e\n    \u003cfigcaption\u003eMap of data probe locations for sample data (included).\u003cfigcaption\u003e\n\u003c/figure\u003e \n\n\n### B) Change Model for Carbon Accounting Mapping\n\nThis workflow generates prediction and uncertainty maps for the change of soil properties within a certain period of time. The use-case goal for this example is to model the change of the Organic Carbon (OC) stock volume for a farm. A particular focus is to model the uncertainty of the change, which needs to take into account the covariances of the prediction in space and time. \n\n\u003cfigure\u003e\n    \u003cimg src=\"figures/prediction_change.png\" alt=\"Change Prediction\"\u003e\n    \u003cfigcaption\u003eChange prediction for Organic Carbon\u003cfigcaption\u003e\n\u003c/figure\u003e \n\n\n### C) Spatial-Temporal Model\n\nThis workflow generates soil moisture prediction maps (for top-soil layer) and their uncertainty for multiple time intervals. Model training data is based on daily and weekly averaged data from soil moisture probes and multiple spatial-temporal dependent covariates for 2020-2022 from sample sites. \n\n\u003cfigure\u003e\n    \u003cimg src=\"figures/prediction_st.jpg\" alt=\"Spatial Temporal Prediction\"\u003e\n    \u003cfigcaption\u003eSpatial-temporal predictions and uncertainty for Organic Carbon at different dates.\u003cfigcaption\u003e\n\u003c/figure\u003e \n\n\n## Contributions\nWe are happy for any contribution to this project, whether feedbacks and bug reports via github Issues, adding use-case examples via notebook contributions, to improving source-code and adding new data examles.\n\nFor more details about about how to contribute to the development, please visit the [AgReFed-ML contribution guidelines](docs/Contributing.md).\n\n\n## Attribution and Acknowledgments\n\nThis software was developed by the Sydney Informatics Hub, a core research facility of the University of Sydney, as part of the project `Mechanistic and data-driven models under uncertainty for agricultural systems` for the Agricultural Research Federation (AgReFed).\n\nAcknowledgments are an important way for us to demonstrate the value we bring to your research. Your research outcomes are vital for ongoing funding of the Sydney Informatics Hub.\n\nIf you make use of this software for your research project, please include the following acknowledgment:\n\n“This research was supported by the Sydney Informatics Hub, a Core Research Facility of the University of Sydney, and the Agricultural Research Federation (AgReFed).\"\n\nAgReFed is supported by the Australian Research Data Commons (ARDC) and the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS).\n\nTo reference this software, please use the latest zenodo DOI at the top or the following bibtex entry:\n```bibtex\n@software{seb_haan_2023_7939459,\n  author       = {Sebastian Haan},\n  title        = {Sydney-Informatics-Hub/AgReFed-ML: v0.2.0},\n  month        = may,\n  year         = 2023,\n  publisher    = {Zenodo},\n  version      = {v0.2.0},\n}\n```\n\n## License\n\nCopyright 2023 The University of Sydney\n\nThis is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License (LGPL version 3) as published by the Free Software Foundation.\n\nThis program is distributed in the hope that it will be useful, but\nWITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.\n\nYou should have received a copy of the GNU Lesser General Public License\nalong with this program (see LICENSE). If not, see\n\u003chttps://www.gnu.org/licenses/\u003e.\n\n## References\n\n- [API references](https://sydney-informatics-hub.github.io/AgReFed-ML/python_scripts/index.html)\n\n- [AgReFed Homepage](https://agrefed.org.au/)\n\n- [AgReFed Geodata-Harvester Overview](https://sydney-informatics-hub.github.io/geodata-harvester/py_dataharvester.html)\n\n- [AgReFed Geodata-Harvester Python Github](https://github.com/Sydney-Informatics-Hub/geodata-harvester)\n\n- [Method documentation](docs/Method.pdf)\n\n- [Feature Importance and Selection](https://pypi.org/project/selectio/)\n\n## Authors\n\n- [Sebastian Haan](https://github.com/sebhaan)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsydney-informatics-hub%2Fagrefed-ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsydney-informatics-hub%2Fagrefed-ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsydney-informatics-hub%2Fagrefed-ml/lists"}