https://github.com/opengeos/aws-open-data
A list of open datasets on AWS
https://github.com/opengeos/aws-open-data
amazon-web-services aws data-science deep-learning geospatial machine-learning open-data
Last synced: 11 months ago
JSON representation
A list of open datasets on AWS
- Host: GitHub
- URL: https://github.com/opengeos/aws-open-data
- Owner: opengeos
- License: mit
- Created: 2022-12-18T22:39:14.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2025-07-09T05:02:21.000Z (11 months ago)
- Last Synced: 2025-07-09T06:22:45.939Z (11 months ago)
- Topics: amazon-web-services, aws, data-science, deep-learning, geospatial, machine-learning, open-data
- Language: Python
- Homepage:
- Size: 6.96 MB
- Stars: 42
- Watchers: 2
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# aws-open-data
[](https://colab.research.google.com/github/giswqs/aws-open-data/blob/master/aws_open_datasets.ipynb)
[](https://mybinder.org/v2/gh/giswqs/aws-open-data/HEAD?labpath=aws_open_datasets.ipynb)
[](https://opensource.org/licenses/MIT)
## Introduction
The [AWS Open Data](https://registry.opendata.aws/) program hosts a lot of publicly available datasets. This repo compiles the list of all datasets on AWS as a CSV file and as a JSON file, making it easier to find and use them programmatically. The list is updated daily.
A complete list of AWS open datasets as individual YAML files is available [here](https://github.com/awslabs/open-data-registry).
## Usage
This repo provides the list of AWS open datasets in two formats:
- Tab separated values (TSV) file: [aws_open_datasets.tsv](https://github.com/giswqs/aws-open-data/blob/master/aws_open_datasets.tsv)
- JSON file: [aws_open_datasets.json](https://github.com/giswqs/aws-open-data/blob/master/aws_open_datasets.json)
The TSV file can be easily read into a Pandas DataFrame using the following code:
```python
import pandas as pd
url = 'https://github.com/giswqs/aws-open-data/raw/master/aws_open_datasets.tsv'
df = pd.read_csv(url, sep='\t')
df.head()
```
## Related Projects
- A list of open datasets on AWS: [aws-open-data](https://github.com/giswqs/aws-open-data)
- A list of open geospatial datasets on AWS: [aws-open-data-geo](https://github.com/giswqs/aws-open-data-geo)
- A list of open geospatial datasets on AWS with a STAC endpoint: [aws-open-data-stac](https://github.com/giswqs/aws-open-data-stac)
- A list of STAC endpoints from stacindex.org: [stac-index-catalogs](https://github.com/giswqs/stac-index-catalogs)
- A list of geospatial datasets on Microsoft Planetary Computer: [Planetary-Computer-Catalog](https://github.com/giswqs/Planetary-Computer-Catalog)
- A list of geospatial datasets on Google Earth Engine: [Earth-Engine-Catalog](https://github.com/giswqs/Earth-Engine-Catalog)
- A list of geospatial datasets on NASA's Common Metadata Repository (CMR): [NASA-CMR-STAC](https://github.com/giswqs/NASA-CMR-STAC)
- A list of geospatial data catalogs: [geospatial-data-catalogs](https://github.com/giswqs/geospatial-data-catalogs)