{"id":15060714,"url":"https://github.com/googlecloudplatform/public-datasets-pipelines","last_synced_at":"2025-04-12T19:47:50.194Z","repository":{"id":37072405,"uuid":"356376019","full_name":"GoogleCloudPlatform/public-datasets-pipelines","owner":"GoogleCloudPlatform","description":"Cloud-native, data onboarding architecture for Google Cloud Datasets","archived":false,"fork":false,"pushed_at":"2025-02-18T11:42:42.000Z","size":6959,"stargazers_count":160,"open_issues_count":139,"forks_count":70,"subscribers_count":30,"default_branch":"main","last_synced_at":"2025-04-12T19:47:36.747Z","etag":null,"topics":["airflow","bigquery","cloud-composer","cloud-native","cloud-storage","data-architecture","data-engineering","data-pipelines","datasets","google-cloud","open-data"],"latest_commit_sha":null,"homepage":"https://cloud.google.com/solutions/datasets","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GoogleCloudPlatform.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-09T19:17:21.000Z","updated_at":"2025-04-10T14:44:47.000Z","dependencies_parsed_at":"2023-02-18T01:16:32.926Z","dependency_job_id":"ab3c98a0-c66d-4f4c-ab8d-a9782c32772f","html_url":"https://github.com/GoogleCloudPlatform/public-datasets-pipelines","commit_stats":null,"previous_names":[],"tags_count":32,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fpublic-datasets-pipelines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fpublic-datasets-pipelines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fpublic-datasets-pipelines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fpublic-datasets-pipelines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GoogleCloudPlatform","download_url":"https://codeload.github.com/GoogleCloudPlatform/public-datasets-pipelines/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248625497,"owners_count":21135513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","bigquery","cloud-composer","cloud-native","cloud-storage","data-architecture","data-engineering","data-pipelines","datasets","google-cloud","open-data"],"created_at":"2024-09-24T23:03:29.449Z","updated_at":"2025-04-12T19:47:50.161Z","avatar_url":"https://github.com/GoogleCloudPlatform.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Google Cloud Datasets: Data Pipelines and Documentation Set\n\n![public-datasets-pipelines](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/images/architecture.png)\n\nThis repository contains the following:\n\n- Cloud-native, data pipeline architecture for onboarding public datasets to [Google Cloud Datasets](https://cloud.google.com/datasets).\n- Documentation set containing tutorials, samples, and other articles making use of the datasets hosted by the program.\n\nFor detailed documentation, please see the [Wiki Pages](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/wiki).\n\n## Datasets\n\nHere are some of the featured datasets onboarded using this repository/architecture.\n\n- [Google Search Trends](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-trends-intl)\n- [Political Advertising on Google](https://console.cloud.google.com/marketplace/product/transparency-report/google-political-ads)\n- [DeepMind AlphaFold](https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold)\n- [Google's Diversity Annual Report](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-diversity-annual-report)\n- [Google Cloud Release Notes](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google_cloud_release_notes)\n- [Google's Open Source Insights (deps.dev)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/deps-dev)\n- [Global Biodiversity Information Facility (GBIF)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/gbif-occurrences)\n- [Cancer Imaging Data from Imaging Data Commons (IDC)](https://console.cloud.google.com/marketplace/product/bigquery-public-data/nci-idc-data)\n- [The New York Times US Coronavirus Database](https://console.cloud.google.com/marketplace/product/the-new-york-times/covid19_us_cases)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgooglecloudplatform%2Fpublic-datasets-pipelines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgooglecloudplatform%2Fpublic-datasets-pipelines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgooglecloudplatform%2Fpublic-datasets-pipelines/lists"}