{"id":13791152,"url":"https://github.com/miraisolutions/sparkbq","last_synced_at":"2025-09-04T03:44:51.285Z","repository":{"id":27335031,"uuid":"108243972","full_name":"miraisolutions/sparkbq","owner":"miraisolutions","description":"Sparklyr extension package to connect to Google BigQuery","archived":false,"fork":false,"pushed_at":"2024-10-29T08:44:25.000Z","size":30926,"stargazers_count":19,"open_issues_count":5,"forks_count":3,"subscribers_count":8,"default_branch":"develop","last_synced_at":"2025-08-12T16:58:33.823Z","etag":null,"topics":["bigquery","r","spark","sparklyr"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miraisolutions.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-25T08:48:56.000Z","updated_at":"2024-09-30T08:30:33.000Z","dependencies_parsed_at":"2024-10-28T12:26:49.774Z","dependency_job_id":null,"html_url":"https://github.com/miraisolutions/sparkbq","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/miraisolutions/sparkbq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miraisolutions%2Fsparkbq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miraisolutions%2Fsparkbq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miraisolutions%2Fsparkbq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miraisolutions%2Fsparkbq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miraisolutions","download_url":"https://codeload.github.com/miraisolutions/sparkbq/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miraisolutions%2Fsparkbq/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273548924,"owners_count":25125256,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","r","spark","sparklyr"],"created_at":"2024-08-03T22:00:56.626Z","updated_at":"2025-09-04T03:44:51.255Z","avatar_url":"https://github.com/miraisolutions.png","language":"R","funding_links":[],"categories":["Sparklyr Infrastructure","R"],"sub_categories":["Big Query"],"readme":"\u003cimg src=\"man/figures/logo.png\" align=\"right\" width=\"15%\" height=\"15%\"/\u003e\n\n# sparkbq: Google BigQuery Support for sparklyr\n\n[![CRAN\\_Status\\_Badge](http://www.r-pkg.org/badges/version/sparkbq)](https://cran.r-project.org/package=sparkbq) [![Rdoc](http://www.rdocumentation.org/badges/version/sparkbq)](http://www.rdocumentation.org/packages/sparkbq)\n\n**sparkbq** is a [sparklyr](https://spark.rstudio.com/) [extension](https://spark.rstudio.com/articles/guides-extensions.html) package providing an integration with [Google BigQuery](https://cloud.google.com/bigquery/). It builds on top of [spark-bigquery](https://github.com/miraisolutions/spark-bigquery), which provides a Google BigQuery data source to [Apache Spark](https://spark.apache.org/).\n\n\n## Version Information\n\nYou can install the released version of **sparkbq** from CRAN via\n``` r\ninstall.packages(\"sparkbq\")\n```\nor the latest development version through\n``` r\ndevtools::install_github(\"miraisolutions/sparkbq\", ref = \"develop\")\n```\n\n\nThe following table provides an overview over supported versions of Apache Spark, Scala, and [Google Dataproc](https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions):\n\n| sparkbq | spark-bigquery | Apache Spark    | Scala | Google Dataproc |\n| :-----: | -------------- | --------------- | ----- | --------------- |\n| 0.1.x   | 0.1.0          | 2.2.x and 2.3.x | 2.11  | 1.2.x and 1.3.x |\n\n**sparkbq** is based on the Spark package [spark-bigquery](https://spark-packages.org/package/miraisolutions/spark-bigquery) which is available in a separate [GitHub repository](https://github.com/miraisolutions/spark-bigquery).\n\n\n## Example Usage\n\n``` r\nlibrary(sparklyr)\nlibrary(sparkbq)\nlibrary(dplyr)\n\nconfig \u003c- spark_config()\n\nsc \u003c- spark_connect(master = \"local[*]\", config = config)\n\n# Set Google BigQuery default settings\nbigquery_defaults(\n  billingProjectId = \"\u003cyour_billing_project_id\u003e\",\n  gcsBucket = \"\u003cyour_gcs_bucket\u003e\",\n  datasetLocation = \"US\",\n  serviceAccountKeyFile = \"\u003cyour_service_account_key_file\u003e\",\n  type = \"direct\"\n)\n\n# Reading the public shakespeare data table\n# https://cloud.google.com/bigquery/public-data/\n# https://cloud.google.com/bigquery/sample-tables\nhamlet \u003c- \n  spark_read_bigquery(\n    sc,\n    name = \"hamlet\",\n    projectId = \"bigquery-public-data\",\n    datasetId = \"samples\",\n    tableId = \"shakespeare\") %\u003e%\n  filter(corpus == \"hamlet\") # NOTE: predicate pushdown to BigQuery!\n  \n# Retrieve results into a local tibble\nhamlet %\u003e% collect()\n\n# Write result into \"mysamples\" dataset in our BigQuery (billing) project\nspark_write_bigquery(\n  hamlet,\n  datasetId = \"mysamples\",\n  tableId = \"hamlet\",\n  mode = \"overwrite\")\n```\n\n## Authentication\n\nWhen running outside of Google Cloud it is necessary to specify a service account JSON key file. The service account key file can be passed as parameter `serviceAccountKeyFile` to `bigquery_defaults` or directly to `spark_read_bigquery` and `spark_write_bigquery`.\n\nAlternatively, an environment variable `export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account_keyfile.json` can be set (see https://cloud.google.com/docs/authentication/getting-started for more information). Make sure the variable is set before starting the R session.\n\nWhen running on Google Cloud, e.g. Google Cloud Dataproc, application default credentials (ADC) may be used in which case it is not necessary to specify a service account key file.\n\n## Further Information\n\n* [spark-bigquery on GitHub](https://github.com/miraisolutions/spark-bigquery)\n* [spark-bigquery on Spark Packages](https://spark-packages.org/package/miraisolutions/spark-bigquery)\n\n* [BigQuery pricing](https://cloud.google.com/bigquery/pricing)\n* [BigQuery dataset locations](https://cloud.google.com/bigquery/docs/dataset-locations)\n* [General authentication](https://cloud.google.com/docs/authentication/)\n* [BigQuery authentication](https://cloud.google.com/bigquery/docs/authentication/)\n* [BigQuery: authenticating with a service account key file](https://cloud.google.com/bigquery/docs/authentication/service-account-file)\n* [Cloud Storage authentication](https://cloud.google.com/storage/docs/authentication/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiraisolutions%2Fsparkbq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiraisolutions%2Fsparkbq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiraisolutions%2Fsparkbq/lists"}