{"id":15222227,"url":"https://github.com/googlecloudplatform/datashare-toolkit","last_synced_at":"2025-04-05T20:07:32.204Z","repository":{"id":38705771,"uuid":"206008476","full_name":"GoogleCloudPlatform/datashare-toolkit","owner":"GoogleCloudPlatform","description":"DIY commercial datasets on Google Cloud Platform","archived":false,"fork":false,"pushed_at":"2025-03-28T15:11:08.000Z","size":59426,"stargazers_count":88,"open_issues_count":112,"forks_count":25,"subscribers_count":40,"default_branch":"main","last_synced_at":"2025-03-30T15:42:38.490Z","etag":null,"topics":["bigquery","fsi","gcp","gcp-cloud-functions","gcp-marketplace-listing","gcp-pubsub","gcp-storage","google-cloud","google-cloud-platform","google-cloud-pubsub","google-cloud-storage","google-marketplace","marketplace","pubsub","sharing","sharing-data","sharing-economy","sharing-information","sharing-platform"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GoogleCloudPlatform.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-03T06:50:33.000Z","updated_at":"2025-01-07T09:42:21.000Z","dependencies_parsed_at":"2024-12-15T17:12:04.827Z","dependency_job_id":"bd95dd5f-36c4-4aa6-a9ef-e72c072ba3b1","html_url":"https://github.com/GoogleCloudPlatform/datashare-toolkit","commit_stats":{"total_commits":377,"total_committers":9,"mean_commits":"41.888888888888886","dds":"0.32625994694960214","last_synced_commit":"0db4c82cc02b4c9a846846c3579f4bfe5d83463d"},"previous_names":[],"tags_count":36,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdatashare-toolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdatashare-toolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdatashare-toolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdatashare-toolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GoogleCloudPlatform","download_url":"https://codeload.github.com/GoogleCloudPlatform/datashare-toolkit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393570,"owners_count":20931812,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","fsi","gcp","gcp-cloud-functions","gcp-marketplace-listing","gcp-pubsub","gcp-storage","google-cloud","google-cloud-platform","google-cloud-pubsub","google-cloud-storage","google-marketplace","marketplace","pubsub","sharing","sharing-data","sharing-economy","sharing-information","sharing-platform"],"created_at":"2024-09-28T15:11:08.588Z","updated_at":"2025-04-05T20:07:32.186Z","avatar_url":"https://github.com/GoogleCloudPlatform.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ```Datashare Toolkit```\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"card.png\" alt=\"Datashare\" height=\"175\"/\u003e\n\u003c/p\u003e\n\n## _DIY commercial datasets on Google Cloud Platform_\n\n_This is not an officially supported Google product._\n\nThe ```Datashare Toolkit``` is a solution for data publishers to easily manage datasets residing within [BigQuery](https://cloud.google.com/bigquery/). The toolkit includes functionality to ingest and entitle data, relieving consumers from much of the toil involved in onboarding datasets from a variety of providers. Publishers upload data files to a storage bucket and allocate permissioned datasets for their consumers to use with BigQuery [authorized views](https://cloud.google.com/bigquery/docs/authorized-views).\n\nWhile these tools are used for data management and entitlement, they follow a bring-your-own-license (BYOL) for entitling publisher data. Hence, publishers should already have licensing arrangements for those consumers withing to access their data within GCP, and the consumers can furnish the GCP account ID's corresponding to their entitled user principals. These account IDs are required for the creation of the authorized views.\n\nThe toolkit is open-source. Some supporting infrastructure, such as [storage buckets](https://cloud.google.com/storage/), serverless functions, and BigQuery datasets, must be maintained within GCP by publishers as a prerequisite. As a consumer, when the GCP accounts are added to the publisher entitlements, the published can be queried directly within BigQuery, ready to integrate into your analytics workflow, machine learning model, or runtime application. Publishers are responsible for managing the limited support infrastructure necessary. While consumers are billed for BigQuery compute and networking, publishers incur costs only on the storage of their data in BigQuery and Cloud Storage.\n\n## Key Features\n- Publisher UI for creating data sharing policies, managing user accounts, creating views\n- Ingestion performed by a [Google Cloud Function](https://cloud.google.com/functions/)\n- [GCP Marketplace integration](./frontend/user-guide/MARKETPLACE_INTEGRATION.md) for selling your data\n- [Multicast client](./client/README.md)\n\n## Getting started with Datashare\nIf you plan to use GCP Marketplace integration, the production project that you install and manage Datashare from must follow the required naming convention (punctuation and spaces not allowed): ```[yourcompanyname]-public```.\n\n1. [Install Datashare](./INSTALLING.md)\n5. [Initialize Schema](./frontend/user-guide/ADMIN.md#initialize_schema)\n\nThen get started, see the [User Guide](./frontend/README.md) for usage information.\n\n## Requirements\n\n### Publishers\n\n- A GCP account with billing enabled\n- A Google Cloud Storage bucket to store staged data\n\n### Consumers\n\n- A valid Google Account or Google Group [email address](https://cloud.google.com/iam/docs/overview#google_account) (which includes Gsuite and Gmail email addresses). \\\n  **Note**: Consumers can create a Google account with an existing email address [here](https://support.google.com/accounts/answer/27441)\n- Entitlements granted by the publisher to your specific licensed datasets\n\n## Architecture\n\n![Architecture](architecture.png \"Architecture\")\n\n## Disclaimers\n\n_This is not an officially supported Google product._\n\nDatashare is under active development. Interfaces and functionality may change at any time.\n\n## License\n\nThis repository  is licensed under the Apache 2 license (see [LICENSE](LICENSE.txt)).\n\nContributions are welcome. See [CONTRIBUTING](CONTRIBUTING.md) for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgooglecloudplatform%2Fdatashare-toolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgooglecloudplatform%2Fdatashare-toolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgooglecloudplatform%2Fdatashare-toolkit/lists"}