{"id":15060710,"url":"https://github.com/embulk/embulk-output-bigquery","last_synced_at":"2025-05-15T05:07:42.000Z","repository":{"id":28790173,"uuid":"32312972","full_name":"embulk/embulk-output-bigquery","owner":"embulk","description":"Embulk output plugin to load/insert data into Google BigQuery","archived":false,"fork":false,"pushed_at":"2025-05-14T11:28:09.000Z","size":560,"stargazers_count":126,"open_issues_count":6,"forks_count":60,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-05-15T05:07:33.110Z","etag":null,"topics":["bigquery","embulk","jruby"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embulk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-03-16T09:05:39.000Z","updated_at":"2025-05-14T11:25:36.000Z","dependencies_parsed_at":"2023-12-03T11:21:17.344Z","dependency_job_id":"c1cf7e85-6bea-4931-8fcb-651adbfcb20d","html_url":"https://github.com/embulk/embulk-output-bigquery","commit_stats":{"total_commits":324,"total_committers":32,"mean_commits":10.125,"dds":0.5277777777777778,"last_synced_commit":"c265236551726a33e373ed11d8821f5cf574ecca"},"previous_names":[],"tags_count":62,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-output-bigquery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-output-bigquery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-output-bigquery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-output-bigquery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embulk","download_url":"https://codeload.github.com/embulk/embulk-output-bigquery/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254276447,"owners_count":22043867,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","embulk","jruby"],"created_at":"2024-09-24T23:03:25.503Z","updated_at":"2025-05-15T05:07:36.990Z","avatar_url":"https://github.com/embulk.png","language":"Ruby","funding_links":[],"categories":["Ruby"],"sub_categories":[],"readme":"# embulk-output-bigquery\n\n[Embulk](https://github.com/embulk/embulk/) output plugin to load/insert data into [Google BigQuery](https://cloud.google.com/bigquery/) using [direct insert](https://cloud.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest)\n\n## Overview\n\nload data into Google BigQuery as batch jobs for big amount of data\nhttps://developers.google.com/bigquery/loading-data-into-bigquery\n\n* **Plugin type**: output\n* **Resume supported**: no\n* **Cleanup supported**: no\n* **Dynamic table creating**: yes\n\n### Supported Embulk\n\n| gem version      | Embulk version     |\n|------------------|--------------------|\n| 0.7.0 and higher | v0.11.0 and higher |\n| 0.6.9 and lower  | v0.9.X and lower   |\n\n### NOT IMPLEMENTED\n* insert data over streaming inserts\n  * for continuous real-time insertions\n  * Please use other product, like [fluent-plugin-bigquery](https://github.com/kaizenplatform/fluent-plugin-bigquery)\n  * https://developers.google.com/bigquery/streaming-data-into-bigquery#usecases\n\nCurrent version of this plugin supports Google API with Service Account Authentication, but does not support\nOAuth flow for installed applications.\n\n## Configuration\n\n#### Original options\n\n| name                                 | type        | required?  | default                  | description            |\n|:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|\n|  mode                                | string      | optional   | \"append\"                 | See [Mode](#mode)      |\n|  auth_method                         | string      | optional   | \"application\\_default\"   | See [Authentication](#authentication) |\n|  json_keyfile                        | string      | optional   |                          | keyfile path or `content` |\n|  project                             | string      | required unless service\\_account's `json_keyfile` is given. | | project\\_id |\n|  destination_project                 | string      | optional   | `project` value         |  A destination project to which the data will be loaded. Use this if you want to separate a billing project (the `project` value) and a destination project (the `destination_project` value). |\n|  dataset                             | string      | required   |                          | dataset |\n|  location                            | string      | optional   | nil                      | geographic location of dataset. See [Location](#location) |\n|  table                               | string      | required   |                          | table name, or table name with a partition decorator such as `table_name$20160929`|\n|  auto_create_dataset                 | boolean     | optional   | false                    | automatically create dataset |\n|  auto_create_table                   | boolean     | optional   | true                     | `false` is available only for `append_direct` mode. Other modes require `true`. See [Dynamic Table Creating](#dynamic-table-creating) and [Time Partitioning](#time-partitioning) |\n|  schema_file                         | string      | optional   |                          | /path/to/schema.json |\n|  template_table                      | string      | optional   |                          | template table name. See [Dynamic Table Creating](#dynamic-table-creating) |\n|  job_status_max_polling_time         | int         | optional   | 3600 sec                 | Max job status polling time |\n|  job_status_polling_interval         | int         | optional   | 10 sec                   | Job status polling interval |\n|  is_skip_job_result_check            | boolean     | optional   | false                    | Skip waiting Load job finishes. Available for append, or delete_in_advance mode |\n|  with_rehearsal                      | boolean     | optional   | false                    | Load `rehearsal_counts` records as a rehearsal. Rehearsal loads into REHEARSAL temporary table, and delete finally. You may use this option to investigate data errors as early stage as possible |\n|  rehearsal_counts                    | integer     | optional   | 1000                     | Specify number of records to load in a rehearsal |\n|  abort_on_error                      | boolean     | optional   | true if max_bad_records is 0, otherwise false | Raise an error if number of input rows and number of output rows does not match |\n|  column_options                      | hash        | optional   |                          | See [Column Options](#column-options) |\n|  default_timezone                    | string      | optional   | UTC                      | |\n|  default_timestamp_format            | string      | optional   | %Y-%m-%d %H:%M:%S.%6N    | |\n|  payload_column                      | string      | optional   | nil                      | See [Formatter Performance Issue](#formatter-performance-issue) |\n|  payload_column_index                | integer     | optional   | nil                      | See [Formatter Performance Issue](#formatter-performance-issue) |\n|  gcs_bucket                          | string      | optional   | nil                      | See [GCS Bucket](#gcs-bucket) |\n|  auto_create_gcs_bucket              | boolean     | optional   | false                    | See [GCS Bucket](#gcs-bucket) |\n|  progress_log_interval               | float       | optional   | nil (Disabled)           | Progress log interval. The progress log is disabled by nil (default). NOTE: This option may be removed in a future because a filter plugin can achieve the same goal |\n|  description                         | string      | optional   | nil                      | description of table |\n\nClient or request options\n\n| name                                 | type        | required?  | default                  | description            |\n|:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|\n|  open_timeout_sec                    | integer     | optional   | 300                      | Seconds to wait for the connection to open |\n|  timeout_sec                         | integer     | optional   | 300                      | Seconds to wait for one block to be read (google-api-ruby-client \u003c v0.11.0) |\n|  send_timeout_sec                    | integer     | optional   | 300                      | Seconds to wait to send a request (google-api-ruby-client \u003e= v0.11.0) |\n|  read_timeout_sec                    | integer     | optional   | 300                      | Seconds to wait to read a response (google-api-ruby-client \u003e= v0.11.0) |\n|  retries                             | integer     | optional   | 5                        | Number of retries |\n|  application_name                    | string      | optional   | \"Embulk BigQuery plugin\" | User-Agent |\n|  sdk_log_level                       | string      | optional   | nil (WARN)               | Log level of google api client library |\n\nOptions for intermediate local files\n\n| name                                 | type        | required?  | default                  | description            |\n|:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|\n|  path_prefix                         | string      | optional   |                          | Path prefix of local files such as \"/tmp/prefix_\". Default randomly generates with [tempfile](http://ruby-doc.org/stdlib-2.2.3/libdoc/tempfile/rdoc/Tempfile.html) |\n|  sequence_format                     | string      | optional   | .%d.%d                   | Sequence format for pid, thread id |\n|  file_ext                            | string      | optional   |                          | The file extension of local files such as \".csv.gz\" \".json.gz\". Default automatically generates from `source_format` and `compression`|\n|  skip_file_generation                | boolean     | optional   |                          | Load already generated local files into BigQuery if available. Specify correct path_prefix and file_ext. |\n|  delete_from_local_when_job_end      | boolean     | optional   | true                     | If set to true, delete generate local files when job is end |\n|  compression                         | string      | optional   | \"NONE\"                   | Compression of local files (`GZIP` or `NONE`) |\n\n\nOptions for intermediate tables on BigQuery\n\n| name                                 | type        | required?  | default                  | description            |\n|:-------------------------------------|:------------|:-----------|:-------------------------|:-----------------------|\n|  temporary_table_expiration          | integer     | optional   |                          | Temporary table's expiration time in seconds |\n\n`source_format` is also used to determine formatter (csv or jsonl).\n\n#### Same options of bq command-line tools or BigQuery job's property\n\nFollowing options are same as [bq command-line tools](https://cloud.google.com/bigquery/bq-command-line-tool#creatingtablefromfile) or BigQuery [job's property](https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource).\n\n| name                              | type     | required? | default | description            |\n|:----------------------------------|:---------|:----------|:--------|:-----------------------|\n|  source_format                    | string   | required  | \"CSV\"   |   File type (`NEWLINE_DELIMITED_JSON` or `CSV`) |\n|  max_bad_records                  | int      | optional  | 0       | |\n|  field_delimiter                  | char     | optional  | \",\"     | |\n|  encoding                         | string   | optional  | \"UTF-8\" | `UTF-8` or `ISO-8859-1` |\n|  ignore_unknown_values            | boolean  | optional  | false   | |\n|  allow_quoted_newlines            | boolean  | optional  | false   | Set true, if data contains newline characters. It may cause slow procsssing |\n|  time_partitioning                | hash     | optional  | `{\"type\":\"DAY\"}` if `table` parameter has a partition decorator, otherwise nil | See [Time Partitioning](#time-partitioning) |\n|  time_partitioning.type           | string   | required  | nil     | The only type supported is DAY, which will generate one partition per day based on data loading time. |\n|  time_partitioning.expiration_ms  | int      | optional  | nil     | Number of milliseconds for which to keep the storage for a partition. |\n|  time_partitioning.field          | string   | optional  | nil     | `DATE` or `TIMESTAMP` column used for partitioning |\n|  clustering                       | hash     | optional  | nil     | Currently, clustering is supported for partitioned tables, so must be used with `time_partitioning` option. See [clustered tables](https://cloud.google.com/bigquery/docs/clustered-tables) |\n|  clustering.fields                | array    | required  | nil     | One or more fields on which data should be clustered. The order of the specified columns determines the sort order of the data. |\n|  schema_update_options            | array    | optional  | nil     | (Experimental) List of `ALLOW_FIELD_ADDITION` or `ALLOW_FIELD_RELAXATION` or both. See [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions). NOTE for the current status: `schema_update_options` does not work for `copy` job, that is, is not effective for most of modes such as `append`, `replace` and `replace_backup`. `delete_in_advance` deletes origin table so does not need to update schema. Only `append_direct` can utilize schema update. |\n\n### Example\n\n```yaml\nout:\n  type: bigquery\n  mode: append\n  auth_method: service_account\n  json_keyfile: /path/to/json_keyfile.json\n  project: your-project-000\n  dataset: your_dataset_name\n  table: your_table_name\n  compression: GZIP\n  source_format: NEWLINE_DELIMITED_JSON\n```\n\n### Location\n\nThe geographic location of the dataset. Required except for US and EU.\n\nGCS bucket should be in same region when you use `gcs_bucket`.\n\nSee also [Dataset Locations | BigQuery | Google Cloud](https://cloud.google.com/bigquery/docs/dataset-locations)\n\n### Mode\n\n5 modes are provided.\n\n##### append\n\n1. Load to temporary table (Create and WRITE_APPEND in parallel)\n2. Copy temporary table to destination table (or partition). (WRITE_APPEND)\n\n##### append_direct\n\n1. Insert data into existing table (or partition) directly. (WRITE_APPEND in parallel)\n\nThis is not transactional, i.e., if fails, the target table could have some rows inserted.\n\n##### replace\n\n1. Load to temporary table (Create and WRITE_APPEND in parallel)\n2. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)\n\n```is_skip_job_result_check``` must be false when replace mode\n\nNOTE: BigQuery does not support replacing (actually, copying into) a non-partitioned table with a paritioned table atomically. You must once delete the non-partitioned table, otherwise, you get `Incompatible table partitioning specification when copying to the column partitioned table` error.\n\n##### replace_backup\n\n1. Load to temporary table (Create and WRITE_APPEND in parallel)\n2. Copy destination table (or partition) to backup table (or partition). (dataset_old, table_old)\n3. Copy temporary table to destination table (or partition). (WRITE_TRUNCATE)\n\n```is_skip_job_result_check``` must be false when replace_backup mode.\n\n##### delete_in_advance\n\n1. Delete destination table (or partition), if it exists.\n2. Load to destination table (or partition).\n\n### Authentication\n\nThere are four authentication methods\n\n1. `service_account` (or `json_key` for backward compatibility)\n1. `authorized_user`\n1. `compute_engine`\n1. `application_default`\n\n#### service\\_account (or json\\_key)\n\nUse GCP service account credentials.\nYou first need to create a service account, download its json key and deploy the key with embulk.\n\n```yaml\nout:\n  type: bigquery\n  auth_method: service_account\n  json_keyfile: /path/to/json_keyfile.json\n```\n\nYou can also embed contents of `json_keyfile` at config.yml.\n\n```yaml\nout:\n  type: bigquery\n  auth_method: service_account\n  json_keyfile:\n    content: |\n      {\n          \"private_key_id\": \"123456789\",\n          \"private_key\": \"-----BEGIN PRIVATE KEY-----\\nABCDEF\",\n          \"client_email\": \"...\"\n      }\n```\n\n#### authorized\\_user\n\nUse Google user credentials.\nYou can get your credentials at `~/.config/gcloud/application_default_credentials.json` by running `gcloud auth login`.\n\n```yaml\nout:\n  type: bigquery\n  auth_method: authorized_user\n  json_keyfile: /path/to/credentials.json\n```\n\nYou can also embed contents of `json_keyfile` at config.yml.\n\n```yaml\nout:\n  type: bigquery\n  auth_method: authorized_user\n  json_keyfile:\n    content: |\n      {\n        \"client_id\":\"xxxxxxxxxxx.apps.googleusercontent.com\",\n        \"client_secret\":\"xxxxxxxxxxx\",\n        \"refresh_token\":\"xxxxxxxxxxx\",\n        \"type\":\"authorized_user\"\n      }\n```\n\n#### compute\\_engine\n\nOn the other hand, you don't need to explicitly create a service account for embulk when you\nrun embulk in Google Compute Engine. In this third authentication method, you need to\nadd the API scope \"https://www.googleapis.com/auth/bigquery\" to the scope list of your\nCompute Engine VM instance, then you can configure embulk like this.\n\n```yaml\nout:\n  type: bigquery\n  auth_method: compute_engine\n```\n\n#### application\\_default\n\nUse Application Default Credentials (ADC).  ADC is a strategy to locate Google Cloud Service Account credentials.\n\n1. ADC checks to see if the environment variable `GOOGLE_APPLICATION_CREDENTIALS` is set. If the variable is set, ADC uses the service account file that the variable points to.\n2. ADC checks to see if `~/.config/gcloud/application_default_credentials.json` is located. This file is created by running `gcloud auth application-default login`.\n3. Use the default service account for credentials if the application running on Compute Engine, App Engine, Kubernetes Engine, Cloud Functions or Cloud Run.\n\nSee https://cloud.google.com/docs/authentication/production for details.\n\n```yaml\nout:\n  type: bigquery\n  auth_method: application_default\n```\n\n### Table id formatting\n\n`table` and option accept [Time#strftime](http://ruby-doc.org/core-1.9.3/Time.html#method-i-strftime)\nformat to construct table ids.\nTable ids are formatted at runtime\nusing the local time of the embulk server.\n\nFor example, with the configuration below,\ndata is inserted into tables `table_20150503`, `table_20150504` and so on.\n\n```yaml\nout:\n  type: bigquery\n  table: table_%Y%m%d\n```\n\n### Dynamic table creating\n\nThere are 3 ways to set schema.\n\n#### Set schema.json\n\nPlease set file path of schema.json.\n\n```yaml\nout:\n  type: bigquery\n  auto_create_table: true\n  table: table_%Y%m%d\n  schema_file: /path/to/schema.json\n```\n\n#### Set template_table in dataset\n\nPlugin will try to read schema from existing table and use it as schema template.\n\n```yaml\nout:\n  type: bigquery\n  auto_create_table: true\n  table: table_%Y%m%d\n  template_table: existing_table_name\n```\n\n#### Guess from Embulk Schema\n\nPlugin will try to guess BigQuery schema from Embulk schema.  It is also configurable with `column_options`. See [Column Options](#column-options).\n\n### Column Options\n\nColumn options are used to aid guessing BigQuery schema, or to define conversion of values:\n\n- **column_options**: advanced: an array of options for columns\n  - **name**: column name\n  - **type**: BigQuery type such as `BOOLEAN`, `INTEGER`, `FLOAT`, `STRING`, `TIMESTAMP`, `DATETIME`, `DATE`, and `RECORD`. See belows for supported conversion type.\n    - boolean:   `BOOLEAN`, `STRING` (default: `BOOLEAN`)\n    - long:      `BOOLEAN`, `INTEGER`, `FLOAT`, `STRING`, `TIMESTAMP` (default: `INTEGER`)\n    - double:    `INTEGER`, `FLOAT`, `STRING`, `TIMESTAMP` (default: `FLOAT`)\n    - string:    `BOOLEAN`, `INTEGER`, `FLOAT`, `STRING`, `TIME`, `TIMESTAMP`, `DATETIME`, `DATE`, `RECORD` (default: `STRING`)\n    - timestamp: `INTEGER`, `FLOAT`, `STRING`, `TIME`, `TIMESTAMP`, `DATETIME`, `DATE` (default: `TIMESTAMP`)\n    - json:      `STRING`,  `RECORD` (default: `STRING`)\n  - **mode**: BigQuery mode such as `NULLABLE`, `REQUIRED`, and `REPEATED` (string, default: `NULLABLE`)\n  - **fields**: Describes the nested schema fields if the type property is set to RECORD. Please note that this is **required** for `RECORD` column.\n  - **description**: description (string, default is `None`).\n  - **timestamp_format**: timestamp format to convert into/from `timestamp` (string, default is `default_timestamp_format`)\n  - **timezone**: timezone to convert into/from `timestamp`, `date` (string, default is `default_timezone`).\n- **default_timestamp_format**: default timestamp format for column_options (string, default is \"%Y-%m-%d %H:%M:%S.%6N\")\n- **default_timezone**: default timezone for column_options (string, default is \"UTC\")\n\nExample)\n\n```yaml\nout:\n  type: bigquery\n  auto_create_table: true\n  column_options:\n    - {name: date, type: STRING, timestamp_format: %Y-%m-%d, timezone: \"Asia/Tokyo\"}\n    - name: json_column\n      type: RECORD\n      fields:\n        - {name: key1, type: STRING}\n        - {name: key2, type: STRING}\n```\n\nNOTE: Type conversion is done in this jruby plugin, and could be slow. See [Formatter Performance Issue](#formatter-performance-issue) to improve the performance.\n\n### Formatter Performance Issue\n\nembulk-output-bigquery supports formatting records into CSV or JSON (and also formatting timestamp column).\nHowever, this plugin is written in jruby, and jruby plugins are slower than java plugins generally.\n\nTherefore, it is recommended to format records with filter plugins written in Java such as [embulk-filter-to_json](https://github.com/civitaspo/embulk-filter-to_json) as:\n\n```yaml\nfilters:\n  - type: to_json\n    column: {name: payload, type: string}\n    default_format: \"%Y-%m-%d %H:%M:%S.%6N\"\nout:\n  type: bigquery\n  payload_column_index: 0 # or, payload_column: payload\n```\n\nFurtheremore, if your files are originally jsonl or csv files, you can even skip a parser with [embulk-parser-none](https://github.com/sonots/embulk-parser-none) as:\n\n```yaml\nin:\n  type: file\n  path_prefix: example/example.jsonl\n  parser:\n    type: none\n    column_name: payload\nout:\n  type: bigquery\n  payload_column_index: 0 # or, payload_column: payload\n```\n\n### GCS Bucket\n\nThis is useful to reduce number of consumed jobs, which is limited by [100,000 jobs per project per day](https://cloud.google.com/bigquery/quotas#load_jobs).\n\nThis plugin originally loads local files into BigQuery in parallel, that is, consumes a number of jobs, say 24 jobs on 24 CPU core machine for example (this depends on embulk parameters such as `min_output_tasks` and `max_threads`).\n\nBigQuery supports loading multiple files from GCS with one job, therefore, uploading local files to GCS in parallel and then loading from GCS into BigQuery reduces number of consumed jobs to 1.\n\nUsing `gcs_bucket` option, such strategy is enabled. You may also use `auto_create_gcs_bucket` to create the specified GCS bucket automatically.\n\n```yaml\nout:\n  type: bigquery\n  gcs_bucket: bucket_name\n  auto_create_gcs_bucket: true\n```\n\nToDo: Use https://cloud.google.com/storage/docs/streaming if google-api-ruby-client supports streaming transfers into GCS.\n\n### Time Partitioning\n\nFrom 0.4.0, embulk-output-bigquery supports to load into partitioned table.\nSee also [Creating and Updating Date-Partitioned Tables](https://cloud.google.com/bigquery/docs/creating-partitioned-tables).\n\nTo load into a partition, specify `table` parameter with a partition decorator as:\n\n```yaml\nout:\n  type: bigquery\n  table: table_name$20160929\n```\n\nYou may configure `time_partitioning` parameter together as:\n\n```yaml\nout:\n  type: bigquery\n  table: table_name$20160929\n  time_partitioning:\n    type: DAY\n    expiration_ms: 259200000\n```\n\nYou can also create column-based partitioning table as:\n\n```yaml\nout:\n  type: bigquery\n  mode: replace\n  table: table_name\n  time_partitioning:\n    type: DAY\n    field: timestamp\n```\n\nNote the `time_partitioning.field` should be top-level `DATE` or `TIMESTAMP`.\n\nUse [Tables: patch](https://cloud.google.com/bigquery/docs/reference/v2/tables/patch) API to update the schema of the partitioned table, embulk-output-bigquery itself does not support it, though.\nNote that only adding a new column, and relaxing non-necessary columns to be `NULLABLE` are supported now. Deleting columns, and renaming columns are not supported.\n\nMEMO: [jobs#configuration.load.schemaUpdateOptions](https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.schemaUpdateOptions) is available\nto update the schema of the desitination table as a side effect of the load job, but it is not available for copy job.\nThus, it was not suitable for embulk-output-bigquery idempotence modes, `append`, `replace`, and `replace_backup`, sigh.\n\n## Development\n\n### Run example:\n\nPrepare a json\\_keyfile at example/your-project-000.json, then\n\n```\n$ embulk bundle install --path vendor/bundle\n$ embulk run -X page_size=1 -b . -l trace example/example.yml\n```\n\n### Run test:\n\nPlace your embulk with `.jar` extension:\n\n\n```\n$ curl -o embulk.jar --create-dirs -L \"http://dl.embulk.org/embulk-latest.jar\"\n$ chmod a+x embulk.jar\n```\n\nInvestigate JRUBY\\_VERSION and Bundler::VERSION included in the embulk.jar:\n\n```\n$ echo JRUBY_VERSION | ./embulk.jar irb\n2019-08-10 00:59:11.866 +0900: Embulk v0.9.17\nSwitch to inspect mode.\nJRUBY_VERSION\n\"X.X.X.X\"\n\n$ echo \"require 'bundler'; Bundler::VERSION\" | ./embulk.jar irb\n2019-08-10 01:59:10.460 +0900: Embulk v0.9.17\nSwitch to inspect mode.\nrequire 'bundler'; Bundler::VERSION\n\"Y.Y.Y\"\n```\n\nInstall the same version of jruby (change X.X.X.X to the version shown above) and bundler:\n\n```\n$ rbenv install jruby-X.X.X.X\n$ rbenv local jruby-X.X.X.X\n$ gem install bundler -v Y.Y.Y\n```\n\nInstall dependencies (NOTE: Use bundler included in the embulk.jar, otherwise, `gem 'embulk'` is not found):\n\n```\n$ ./embulk.jar bundle install --path vendor/bundle\n```\n\nRun tests with `env RUBYOPT=\"-r ./embulk.jar`:\n\n```\n$ bundle exec env RUBYOPT=\"-r ./embulk.jar\" rake test\n```\n\nTo run tests which actually connects to BigQuery such as test/test\\_bigquery\\_client.rb,\nprepare a json\\_keyfile at example/your-project-000.json, then\n\n```\n$ bundle exec env RUBYOPT=\"-r ./embulk.jar\" ruby test/test_bigquery_client.rb\n$ bundle exec env RUBYOPT=\"-r ./embulk.jar\" ruby test/test_example.rb\n```\n\n### Release gem:\n\nChange the version of gemspec, and write CHANGELOG.md. Then,\n\n```\n$ bundle exec rake release\n```\n\n## ChangeLog\n\n[CHANGELOG.md](CHANGELOG.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-output-bigquery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembulk%2Fembulk-output-bigquery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-output-bigquery/lists"}