An open API service indexing awesome lists of open source software.

https://github.com/esipfed/data-readiness

This is the repository for Data Readiness Cluster to maintain the AI-ready data checklist.
https://github.com/esipfed/data-readiness

Last synced: 4 months ago
JSON representation

This is the repository for Data Readiness Cluster to maintain the AI-ready data checklist.

Awesome Lists containing this project

README

          

# ESIP Data Readiness Cluster

This is the repository for the Data Readiness Cluster to maintain the AI-ready data checklist. The cluster is a community-driven group focusing on developing
recommendations and community standards on AI-ready open environmental data. Although the work currently focuses on environmental data, the product could be
extended to data from other domains.

## The goal of AI-readiness assessment:
- For data producers/providers, the purpose is to understand to what extent the data being assessed meets the common research data management practices and
principles that are relevant to AI/ML application development. The assessment result can be used to justify targeted improvements
for the dataset when resources become available.
- For projects generating new datasets, the purpose of the AI-readiness checklist can be used to guide the development of the dataset.
For example:
- What documentation do you want to provide accompanying the dataset?
- Do you have a proper data quality assessment that will make the development of downstream AI/ML applications efficient?

## How to use the checklist for assessment:
The current version of the checklist is available [here](https://github.com/ESIPFed/data-readiness/blob/main/checklist-published/ai-ready-data-checklist-v.1.0.md) (last updated 2023-12-20).
The checklist will be maintained and updated by the community.

To assist with the assessment, we have created a fillable [Google sheet template](https://docs.google.com/spreadsheets/d/1OZDknI1UN8iJjX-SHlnb92QYXqMGvFJrpELeh9yNif0/edit?usp=sharing).
You can make a copy of the Google sheet for your assessment. Each dataset should be assessed separately as the checklist is designed for individual datasets.
More effort is ongoing to address the need for linked datasets.

If you are in the early stages of developing AI/ML applications with open environmental datasets. We encourage you to assess the input data used for
your applications. Although you may not have the ability to change other people’s datasets, this will help you document the effort spent on preparing
the dataset for your development.

## How to provide your feedback:
If you have any questions or suggestions related to the checklist and the assessment tool, you can provide feedback following the two options listed below:
- Contact Douglas Rao (douglas.rao@noaa.gov), cluster chair
- Open an issue in this GitHub repo.

## How to cite the checklist:
ESIP Data Readiness Cluster (2023). Checklist to Examine AI-readiness for Open Environmental Datasets. Version 1.0. Earth Science Information Partners. https://github.com/ESIPFed/data-readiness [date accessed].

## Relevant references:
- Mills, A. (2022) [Are Your Data Ready? Take Stock with ESIP’s New AI-Ready Checklist](https://www.esipfed.org/merge/collaboration-updates/checklist-ai-ready-data). Earth Science Information Partners [Retrieved on 2023-10-13].
- Long, S. and Romanoff, T. (2023). [AI-Ready Open Data](https://bipartisanpolicy.org/explainer/ai-ready-open-data/ ). Bipartisan Policy Center. [Retrieved on 2023-10-13].