https://github.com/sustainlab-group/sustainbench

deep-learning machine-learning sdgs sustainability

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/sustainlab-group/sustainbench
Owner: sustainlab-group
License: mit
Created: 2021-08-28T10:11:50.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2022-08-30T06:01:17.000Z (almost 4 years ago)
Last Synced: 2025-11-19T23:03:37.969Z (8 months ago)
Topics: deep-learning, machine-learning, sdgs, sustainability
Language: Python
Homepage: https://sustainlab-group.github.io/sustainbench
Size: 27.7 MB
Stars: 118
Watchers: 5
Forks: 25
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

open-sustainable-technology - SustainBench - Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning. (Sustainable Development / Sustainable Development Goals)

README

          [**Datasets**](https://sustainlab-group.github.io/sustainbench/docs/datasets/) |

[**Website**](https://sustainlab-group.github.io/sustainbench/) |

[**Raw Data**](https://drive.google.com/drive/folders/1jyjK5sKGYegfHDjuVBSxCoj49TD830wL?usp=sharing) |

[**OpenReview**](https://openreview.net/forum?id=5HR3vCylqD)

# SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

[Christopher Yeh](https://chrisyeh96.github.io/), [Chenlin Meng](https://cs.stanford.edu/~chenlin/), [Sherrie Wang](http://stanford.edu/~sherwang/), Anne Driscoll, [Erik Rozi](https://www.linkedin.com/in/erik-rozi/), Patrick Liu, [Jihyeon Lee](https://jlee24.github.io/), [Marshall Burke](http://web.stanford.edu/~mburke/), [David B. Lobell](https://fse.fsi.stanford.edu/people/david_lobell), [Stefano Ermon](https://cs.stanford.edu/~ermon/)

**California Institute of Technology, Stanford University, and UC Berkeley**

SustainBench is a collection of 15 benchmark tasks across 7 SDGs, including tasks related to economic development, agriculture, health, education, water and sanitation, climate action, and life on land. **Datasets for 11 of the 15 tasks are released publicly for the first time.** Our goals for SustainBench are to

1. lower the barriers to entry for the machine learning community to contribute to measuring and achieving the SDGs;

2. provide standard benchmarks for evaluating machine learning models on tasks across a variety of SDGs; and

3. encourage the development of novel machine learning methods where improved model performance facilitates progress towards the SDGs.

## Table of Contents

* [Overview](#overview)

* [Dataloaders](#dataloaders)

* [Running Baseline Models](#running-baseline-models)

* [Dataset Preprocessing](#dataset-preprocessing)

* [Computing Requirements](#computing-requirements)

* [Code Formatting and Type Checking](#code-formatting-and-type-checking)

* [Citation](#citation)

## Overview

SustainBench provides datasets and standardized benchmarks for 15 SDG-related tasks, listed below. Details for each dataset and task can be found in our [**paper**](https://openreview.net/forum?id=5HR3vCylqD) and on our [**website**](https://sustainlab-group.github.io/sustainbench/). The raw data can be downloaded from [**Google Drive**](https://drive.google.com/drive/folders/1jyjK5sKGYegfHDjuVBSxCoj49TD830wL?usp=sharing) and is released under a [CC-BY-SA 4.0 license](https://creativecommons.org/licenses/by-sa/4.0/).



- **SDG 1: No Poverty**

  - [Task 1A](https://sustainlab-group.github.io/sustainbench/docs/datasets/dhs.html): Predicting poverty over space

  - [Task 1B](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg1/change_in_poverty.html): Predicting change in poverty over time

- **SDG 2: Zero Hunger**

  - [Task 2A](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg2/weakly_supervised_cropland.html): Cropland mapping

  - [Task 2B1](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg2/crop_type_mapping_ghana-ss.html): Crop type mapping, in Ghana in South Sudan

  - [Task 2B2](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg2/crop_type_mapping_kenya.html): Crop type mapping, in Kenya

  - [Task 2C](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg2/crop_yield.html): Crop yield prediction

  - [Task 2D](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg2/field_delineation.html): Field delineation

- **SDG 3: Good Health and Well-being**

  - [Task 3A](https://sustainlab-group.github.io/sustainbench/docs/datasets/dhs.html): Child mortality rate

  - [Task 3B](https://sustainlab-group.github.io/sustainbench/docs/datasets/dhs.html): Women BMI

- **SDG 4: Quality Education**

  - [Task 4A](https://sustainlab-group.github.io/sustainbench/docs/datasets/dhs.html): Women educational attainment

- **SDG 6: Clean Water and Sanitation**

  - [Task 6A](https://sustainlab-group.github.io/sustainbench/docs/datasets/dhs.html): Clean water

  - [Task 6B](https://sustainlab-group.github.io/sustainbench/docs/datasets/dhs.html): Sanitation

- **SDG 13: Climate Action**

  - [Task 13A](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg13/brick_kiln.html): Brick kiln classification

- **SDG 15: Life on Land**

  - [Task 15A](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg15/land_cover_representation.html): Feature learning for land cover classification

  - [Task 15B](https://sustainlab-group.github.io/sustainbench/docs/datasets/sdg15/out_of_domain_land_cover.html): Out-of-domain land cover classification

## Dataloaders

For each dataset, we provide Python dataloaders that load the data as PyTorch tensors. Please see the `sustainbench` folder as well as our [website](https://sustainlab-group.github.io/sustainbench/) for detailed documentation.

## Running Baseline Models

We provide baseline models for many of the benchmark tasks included in SustainBench. See the `baseline_models` folder for the code and detailed instructions to reproduce our results.

## Dataset Preprocessing

11 of the 15 SustainBench benchmark tasks involve data that is being publicly released for the first time. We release the processed versions of our datasets on [Google Drive](https://drive.google.com/drive/folders/1jyjK5sKGYegfHDjuVBSxCoj49TD830wL?usp=sharing). However, we also provide code and detailed instructions for how we preprocessed the datasets in the `dataset_preprocessing` folder. You do NOT need anything from the `dataset_preprocessing` folder for downloading the processed datasets or running our baseline models.

## Computing Requirements

This code was tested on a system with the following specifications:

- operating system: Ubuntu 16.04.7 LTS

- CPU: Intel(R) Xeon(R) CPU E5-2620 v4

- memory (RAM): 125 GB

- disk storage: 5 TB

- GPU: NVIDIA P100 GPU

The main software requirements are Python 3.7 with TensorFlow r1.15, PyTorch 1.9, and R 4.1. The complete list of required packages and library are listed in the two conda environment YAML files (`env_create.yml` and `env_bench.yml`), which are meant to be used with `conda` (version 4.10). See [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) for instructions on installing conda via Miniconda. Once conda is installed, run one of the following commands to set up the desired conda environment:

```bash

conda env update -f env_create.yml --prune

conda env update -f env_bench.yml --prune

```

The conda environment files default to CPU-only packages. If you have a GPU, please comment/uncomment the appropriate lines in the environment files; you may need to also install CUDA 10 or 11 and cuDNN 7.

## Code Formatting and Type Checking

This repo uses [flake8](https://flake8.pycqa.org/) for Python linting and [mypy](https://mypy.readthedocs.io/) for type-checking. Configuration files for each are included in this repo: `.flake8` and `mypy.ini`.

To run either code linting or type checking, set the current directory to the repo root directory. Then run any of the following commands:

```bash

# LINTING

# =======

# entire repo

flake8

# all modules within utils directory

flake8 utils

# a single module

flake8 path/to/module.py

# a jupyter notebook - ignore these error codes, in addition to the ignored codes in .flake8:

# - E305: expected 2 blank lines after class or function definition

# - E402: Module level import not at top of file

# - F404: from __future__ imports must occur at the beginning of the file

# - W391: Blank line at end of file

jupyter nbconvert path/to/notebook.ipynb --stdout --to script | flake8 - --extend-ignore=E305,E402,F404,W391

# TYPE CHECKING

# =============

# entire repo

mypy .

# all modules within utils directory

mypy -p utils

# a single module

mypy path/to/module.py

# a jupyter notebook

mypy -c "$(jupyter nbconvert path/to/notebook.ipynb --stdout --to script)"

```

## Citation

Please cite this article as follows, or use the BibTeX entry below.

> C. Yeh, C. Meng, S. Wang, A. Driscoll, E. Rozi, P. Liu, J. Lee, M. Burke, D. B. Lobell, and S. Ermon, "SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning," in _Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)_, Dec. 2021. [Online]. Available: [https://openreview.net/forum?id=5HR3vCylqD](https://openreview.net/forum?id=5HR3vCylqD).

```tex

@inproceedings{

    yeh2021sustainbench,

    title = {{SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning}},

    author = {Christopher Yeh and Chenlin Meng and Sherrie Wang and Anne Driscoll and Erik Rozi and Patrick Liu and Jihyeon Lee and Marshall Burke and David B. Lobell and Stefano Ermon},

    booktitle = {Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},

    year = {2021},

    month = {12},

    url = {https://openreview.net/forum?id=5HR3vCylqD}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sustainlab-group/sustainbench

Awesome Lists containing this project

README