{"id":13488574,"url":"https://github.com/gabrieltseng/pycrop-yield-prediction","last_synced_at":"2026-01-18T22:07:59.261Z","repository":{"id":41125560,"uuid":"171045076","full_name":"gabrieltseng/pycrop-yield-prediction","owner":"gabrieltseng","description":"A PyTorch Implementation of Jiaxuan You's Deep Gaussian Process for Crop Yield Prediction","archived":false,"fork":false,"pushed_at":"2022-01-17T15:32:56.000Z","size":5711,"stargazers_count":182,"open_issues_count":2,"forks_count":62,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-28T01:46:12.526Z","etag":null,"topics":["agriculture","deep-learning","gaussian-processes","machine-learning","remote-sensing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gabrieltseng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-16T19:50:51.000Z","updated_at":"2025-02-21T05:40:18.000Z","dependencies_parsed_at":"2022-08-29T22:22:40.172Z","dependency_job_id":null,"html_url":"https://github.com/gabrieltseng/pycrop-yield-prediction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gabrieltseng/pycrop-yield-prediction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrieltseng%2Fpycrop-yield-prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrieltseng%2Fpycrop-yield-prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrieltseng%2Fpycrop-yield-prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrieltseng%2Fpycrop-yield-prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gabrieltseng","download_url":"https://codeload.github.com/gabrieltseng/pycrop-yield-prediction/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrieltseng%2Fpycrop-yield-prediction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28552374,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T20:59:07.572Z","status":"ssl_error","status_checked_at":"2026-01-18T20:59:02.799Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agriculture","deep-learning","gaussian-processes","machine-learning","remote-sensing"],"created_at":"2024-07-31T18:01:18.264Z","updated_at":"2026-01-18T22:07:59.245Z","avatar_url":"https://github.com/gabrieltseng.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# PyCrop Yield Prediction\n\nA PyTorch implementation of [Jiaxuan You](https://cs.stanford.edu/~jiaxuan/)'s 2017 Crop Yield Prediction Project.\n\n\u003e [Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data](https://cs.stanford.edu/~ermon/papers/cropyield_AAAI17.pdf)\n\nThis paper won the Food Security Category from the World Bank's \n[2017 Big Data Innovation Challenge](http://www.worldbank.org/en/news/feature/2017/03/27/and-the-winners-of-the-big-data-innovation-challenge-are).\n\n## Introduction\n\nThis repo contains a PyTorch implementation of the Deep Gaussian Process for Crop Yield Prediction. It draws from the\noriginal [TensorFlow implementation](https://github.com/JiaxuanYou/crop_yield_prediction).\n\nDeep Gaussian Processes combine the expressivity of Deep Neural Networks with Gaussian Processes' ability to leverage\nspatial and temporal correlations between data points.\n\nIn this pipeline, a Deep Gaussian Process is used to predict soybean yields in US counties.\n\n### Results\n\nThese results were generated using early stopping with a patience of 10. They can be replicated by running the pipeline\nwith all the default arguments.\n\n* A comparison of RMSE of the two models, with and without the Gaussian Process. As in the original paper, this was\ngenerated by averaging the results of two runs, to account for random initialization in the neural network:\n\n| Year | LSTM    | LSTM + GP | CNN    | CNN + GP |\n|:----:|:-------:|:---------:|:------:|:--------:|\n|2009  |**5.18** |6.37       |6.07    |5.56      |\n|2010  |7.27     |7.30       |**6.75**|7.03      |\n|2011  |6.82     |6.72       |6.77    |**6.40**  |\n|2012  |7.01     |6.46       |5.91    |**5.72**  |\n|2013  |5.91     |**5.83**   |6.41    |6.00      |\n|2014  |5.99     |**4.65**   |5.28    |4.87      |\n|2015  |6.14     |**5.13**   |6.18    |5.36      |\n\n* A plot of errors of the CNN model for the year 2014, with and without the Gaussian Process. The color represents prediction error, \nin bushel per acre:\n\n\u003cimg src=\"diagrams/2014_cnn_errors.png\" alt=\"CNN errors\" height=\"400px\"/\u003e\n\n\n## Pipeline\n\nThe main entrypoint into the pipeline is [`run.py`](run.py). The pipeline is split into 4 major components. Note that\neach component reads files from the previous step, and saves all files that later steps will need, into the \n[`data`](data) folder.\n\nParameters which can be passed in each step are documented in [`run.py`](run.py). The default parameters are all taken\nfrom the original repository.\n\n[Python Fire](https://github.com/google/python-fire) is used to generate command line interfaces.\n\n#### Exporting\n\n```bash\npython run.py export\n```\n\nExports data from the Google Earth Engine to Google Drive. Note that to make the export more efficient, all the bands\nfrom a county - across all the export years - are concatenated, reducing the number of files to be exported.\n\nTo download the data used in the paper ([MODIS](data/README.md#MODIS) images of the top 11 soybean producing states in the US) requires\njust over **110 Gb** of storage. This can be done in steps - the export class allows for checkpointing.\n\n#### Preprocessing\n\n```bash\npython run.py process\n```\n\nTakes the exported and downloaded data, and splits the data by year. In addition, the temperature and reflection `tif` \nfiles are merged, and the mask is applied so only farmland is considered. Files are saved as `.npy` files.\n\nThe size of the processed files is **97 GB**. Running with the flag `delete_when_done=True` will \ndelete the `.tif` files as they get processed.\n\n#### Feature Engineering\n\n```bash\npython run.py engineer\n``` \nTake the processed `.npy` files and generate histogams which can be input into the models.\n\n#### Model training\n\n```bash\npython run.py train_cnn\n```\nand\n```bash\npython run.py train_rnn\n```\n\nTrains CNN and RNN models, respectively, with a Gaussian Process. The trained models are saved in \n`data/models/\u003cmodel_type\u003e` and results are saved in csv files in those folders. If a Gaussian Process is used, the\nresults of the model without a Gaussian Process are also saved for analysis.\n\n## Setup\n\n[Anaconda](https://www.anaconda.com/download/#macos) running python 3.7 is used as the package manager. To get set up\nwith an environment, install Anaconda from the link above, and (from this directory) run\n\n```bash\nconda env create -f environment.yml\n```\nThis will create an environment named `crop_yield_prediction` with all the necessary packages to run the code. To \nactivate this environment, run\n\n```bash\nconda activate crop_yield_prediction\n```\n\nRunning this code also requires you to sign up to [Earth Engine](https://developers.google.com/earth-engine/). Once you \nhave done so, active the `crop_yield_prediction` environment and run\n\n```bash\nearthengine authenticate\n```\n\nand follow the instructions. To test that everything has worked, run\n\n```bash\npython -c \"import ee; ee.Initialize()\"\n```\n\nNote that Earth Engine exports files to Google Drive by default (to the same google account used sign up to Earth Engine.)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabrieltseng%2Fpycrop-yield-prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgabrieltseng%2Fpycrop-yield-prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabrieltseng%2Fpycrop-yield-prediction/lists"}