Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/GoogleCloudPlatform/public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
https://github.com/GoogleCloudPlatform/public-datasets-pipelines
airflow bigquery cloud-composer cloud-native cloud-storage data-architecture data-engineering data-pipelines datasets google-cloud open-data
Last synced: 3 months ago
JSON representation
Cloud-native, data onboarding architecture for Google Cloud Datasets
- Host: GitHub
- URL: https://github.com/GoogleCloudPlatform/public-datasets-pipelines
- Owner: GoogleCloudPlatform
- License: apache-2.0
- Created: 2021-04-09T19:17:21.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-17T01:26:27.000Z (7 months ago)
- Last Synced: 2024-04-17T07:10:16.953Z (7 months ago)
- Topics: airflow, bigquery, cloud-composer, cloud-native, cloud-storage, data-architecture, data-engineering, data-pipelines, datasets, google-cloud, open-data
- Language: Python
- Homepage: https://cloud.google.com/solutions/datasets
- Size: 7.12 MB
- Stars: 136
- Watchers: 15
- Forks: 61
- Open Issues: 74
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-apache-airflow - Google Cloud Platform Public Datasets Pipelines - Cloud-native, data pipeline architecture for onboarding datasets to the Google Cloud Public Datasets Program. (Sample projects)
README
# Google Cloud Datasets: Data Pipelines and Documentation Set
![public-datasets-pipelines](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/images/architecture.png)
This repository contains the following:
- Cloud-native, data pipeline architecture for onboarding public datasets to [Google Cloud Datasets](https://cloud.google.com/datasets).
- Documentation set containing tutorials, samples, and other articles making use of the datasets hosted by the program.For detailed documentation, please see the [Wiki Pages](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/wiki).
## Datasets
Here are some of the featured datasets onboarded using this repository/architecture.
- [Google Search Trends](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-trends-intl)
- [Political Advertising on Google](https://console.cloud.google.com/marketplace/product/transparency-report/google-political-ads)
- [DeepMind AlphaFold](https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold)
- [Google's Diversity Annual Report](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-diversity-annual-report)
- [Google Cloud Release Notes](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google_cloud_release_notes)
- [Google's Open Source Insights (deps.dev)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/deps-dev)
- [Global Biodiversity Information Facility (GBIF)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/gbif-occurrences)
- [Cancer Imaging Data from Imaging Data Commons (IDC)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/nci-idc-data)
- [The New York Times US Coronavirus Database](https://console.cloud.google.com/marketplace/product/the-new-york-times/covid19_us_cases)