Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/googlecloudplatform/public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
https://github.com/googlecloudplatform/public-datasets-pipelines
airflow bigquery cloud-composer cloud-native cloud-storage data-architecture data-engineering data-pipelines datasets google-cloud open-data
Last synced: 3 days ago
JSON representation
Cloud-native, data onboarding architecture for Google Cloud Datasets
- Host: GitHub
- URL: https://github.com/googlecloudplatform/public-datasets-pipelines
- Owner: GoogleCloudPlatform
- License: apache-2.0
- Created: 2021-04-09T19:17:21.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2025-01-17T10:10:50.000Z (4 days ago)
- Last Synced: 2025-01-18T06:05:43.566Z (3 days ago)
- Topics: airflow, bigquery, cloud-composer, cloud-native, cloud-storage, data-architecture, data-engineering, data-pipelines, datasets, google-cloud, open-data
- Language: Python
- Homepage: https://cloud.google.com/solutions/datasets
- Size: 7.09 MB
- Stars: 156
- Watchers: 26
- Forks: 68
- Open Issues: 137
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Google Cloud Datasets: Data Pipelines and Documentation Set
![public-datasets-pipelines](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/images/architecture.png)
This repository contains the following:
- Cloud-native, data pipeline architecture for onboarding public datasets to [Google Cloud Datasets](https://cloud.google.com/datasets).
- Documentation set containing tutorials, samples, and other articles making use of the datasets hosted by the program.For detailed documentation, please see the [Wiki Pages](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/wiki).
## Datasets
Here are some of the featured datasets onboarded using this repository/architecture.
- [Google Search Trends](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-trends-intl)
- [Political Advertising on Google](https://console.cloud.google.com/marketplace/product/transparency-report/google-political-ads)
- [DeepMind AlphaFold](https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold)
- [Google's Diversity Annual Report](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-diversity-annual-report)
- [Google Cloud Release Notes](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google_cloud_release_notes)
- [Google's Open Source Insights (deps.dev)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/deps-dev)
- [Global Biodiversity Information Facility (GBIF)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/gbif-occurrences)
- [Cancer Imaging Data from Imaging Data Commons (IDC)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/nci-idc-data)
- [The New York Times US Coronavirus Database](https://console.cloud.google.com/marketplace/product/the-new-york-times/covid19_us_cases)