{"id":13414004,"url":"https://github.com/OTA-Insight/bqwriter","last_synced_at":"2025-03-14T20:30:57.468Z","repository":{"id":40563137,"uuid":"416359424","full_name":"OTA-Insight/bqwriter","owner":"OTA-Insight","description":"Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.","archived":false,"fork":false,"pushed_at":"2023-09-11T20:32:08.000Z","size":347,"stargazers_count":15,"open_issues_count":1,"forks_count":2,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-07-31T20:53:18.714Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OTA-Insight.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS"}},"created_at":"2021-10-12T13:58:18.000Z","updated_at":"2024-03-07T16:06:27.000Z","dependencies_parsed_at":"2024-01-09T08:11:54.706Z","dependency_job_id":null,"html_url":"https://github.com/OTA-Insight/bqwriter","commit_stats":null,"previous_names":[],"tags_count":41,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OTA-Insight%2Fbqwriter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OTA-Insight%2Fbqwriter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OTA-Insight%2Fbqwriter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OTA-Insight%2Fbqwriter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OTA-Insight","download_url":"https://codeload.github.com/OTA-Insight/bqwriter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243642055,"owners_count":20323953,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:01:54.966Z","updated_at":"2025-03-14T20:30:56.921Z","avatar_url":"https://github.com/OTA-Insight.png","language":"Go","readme":"# bqwriter [![Go Workflow Status](https://github.com/OTA-Insight/bqwriter/workflows/Go/badge.svg)](https://github.com/OTA-Insight/bqwriter/actions/workflows/go.yml)\u0026nbsp;[![GoDoc](https://godoc.org/github.com/OTA-Insight/bqwriter?status.svg)](https://godoc.org/github.com/OTA-Insight/bqwriter)\u0026nbsp;[![Go Report Card](https://goreportcard.com/badge/github.com/OTA-Insight/bqwriter)](https://goreportcard.com/report/github.com/OTA-Insight/bqwriter)\u0026nbsp;[![license](https://img.shields.io/github/license/OTA-Insight/bqwriter.svg)](https://github.com/OTA-Insight/bqwriter/blob/master/LICENSE.txt)\u0026nbsp;[![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/OTA-Insight/bqwriter?include_prereleases)](https://github.com/OTA-Insight/bqwriter/releases)\u0026nbsp;[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)  \n\n\nA Go package to write data into [Google BigQuery](https://cloud.google.com/bigquery/)\nconcurrently with a high throughput. By default [the InsertAll() API](https://cloud.google.com/bigquery/streaming-data-into-bigquery)\nis used (REST API under the hood), but you can configure to use [the Storage Write API](https://cloud.google.com/bigquery/docs/write-api) (GRPC under the hood) as well.\n\nThe InsertAll API is easier to configure and can work pretty much out of the box without any configuration.\nIt is recommended to use the Storage API as it is faster and comes with a lower cost. The latter does however\nrequire a bit more configuration on your side, including a Proto schema file as well.\nSee [the Storage example below](#Storage-Streamer) on how to do this.\n\nA third API is available as well, and is a bit different than the other ones. A streamer using the batch API\nexpects the data to be written be an io.Reader of which encoded rows can be read from, in order to be batch\nloaded into BigQuery. As such its purpose is different from the other 2 clients, and is meant more in an\nenvironment where a lot of data has to be loaded at once into big query, rather than being part of a constant\nhigh-speed high-throughput environment.\n\nSee https://cloud.google.com/bigquery/docs/batch-loading-data for more information about the Batch API,\nand see [the Batch example below](#Batch-Streamer) on how to do create and use the Batch-driven Streamer.\n\nNote that (gcloud) [Authorization](#Authorization) is implemented in the most basic manner.\n[Write Error Handling](#write-error-handling) is currently not possible either, making the Streamer\na fire-and-forget BQ writer. Please read the sections on these to topics for more information\nand please consult he [Contributing section](#Contributing) section explains how you can actively\nhelp to get this supported if desired.\n\n## Install\n\n```go\nimport \"github.com/OTA-Insight/bqwriter\"\n```\n\nTo install the packages on your system, do not clone the repo. Instead:\n\n1. Change to your project directory:\n\n```bash\ncd /path/to/my/project\n```\n\n2. Get the package using the official Go tooling, which will also add it to your `Go.mod` file for you:\n\n```bash\ngo get github.com/OTA-Insight/bqwriter\n```\n\nNOTE: This package is under development, and may occasionally make backwards-incompatible changes.\n\n## Go Versions Supported\n\nWe currently support Go versions 1.17 and newer.\n\n## Examples\n\nIn this section you'll find some quick examples to help you get started\ntogether with the official documentation which you can find at \u003chttps://pkg.go.dev/github.com/OTA-Insight/bqwriter\u003e.\n\nThe `Streamer` client is safe for concurrent use and can be used from as many go routines as you wish.\nNo external locking or other concurrency-safe mechanism is required from your side. To keep these examples\nas small as possible however they are written in a linear synchronous fashion, but it is encouraged to use the\n`Streamer` client from multiple go routines, in order to be able to write rows at a sufficiently high throughput.\n\nNote that for the Batch-driven `Streamer` it is not abnormal to force it to run with a single worker routine.\nThe batch delay can also be disabled for it as no flushing is required for it anyhow.\n\nPlease also note that errors are not handled gracefully in these examples as ot keep them small and narrow in scope.\n\nFor extra reference you can also find some more examples, be it less pragmatic,\nin the [./internal/test/integration](./internal/test/integration) directory.\n\n### Basic InsertAll Streamer\n\n```go\nimport (\n    \"context\"\n\n    \"github.com/OTA-Insight/bqwriter\"\n)\n\n// TODO: use more specific context\nctx := context.Background()\n\n// create a BQ (stream) writer thread-safe client,\nbqWriter, err := bqwriter.NewStreamer(\n    ctx,\n    \"my-gcloud-project\",\n    \"my-bq-dataset\",\n    \"my-bq-table\",\n    nil, // use default config\n)\nif err != nil {\n    // TODO: handle error gracefully\n    panic(err)\n}\n// do not forget to close, to close all background resources opened\n// when creating the BQ (stream) writer client\ndefer bqWriter.Close()\n\n// You can now start writing data to your BQ table\nbqWriter.Write(\u0026myRow{Timestamp: time.UTC().Now(), Username: \"test\"})\n// NOTE: only write one row at a time using `(*Streamer).Write`,\n// multiple rows can be written using one `Write` call per row.\n```\n\nYou build a `Streamer` client using optionally the `StreamerConfig` as you can see in the above example.\nThe entire config is optional and has sane defaults, but note that there is a lot you can configure in this config prior to actually building the streamer. Please consult the \u003chttps://pkg.go.dev/github.com/OTA-Insight/bqwriter#StreamerConfig\u003e for more information.\n\nThe `myRow` structure used in this example is one way to pass in the information\nof a single row to the `(*Streamer).Write` method. This structure implements the\n[`ValueSaver`](https://pkg.go.dev/cloud.google.com/go/bigquery#ValueSaver) interface.\nAn example of this:\n\n```go\nimport (\n\t\"cloud.google.com/go/bigquery\"\n\t\"cloud.google.com/go/civil\"\n)\n\ntype myRow struct {\n\tTimestamp time.Time\n\tUsername  string\n}\n\nfunc (mr *myRow) Save() (row map[string]bigquery.Value, insertID string, err error) {\n\treturn map[string]bigquery.Value{\n\t\t\"timestamp\": civil.DateTimeOf(rr.Timestamp),\n\t\t\"username\":  mr.Username,\n\t}, \"\", nil\n}\n```\n\nYou can also pass in a `struct` directly and the schema will be inferred automatically\nbased on its public items. This flexibility has a runtime cost by having to apply reflection.\n\nA raw `struct` can also be stored by using the [`StructSaver`](https://pkg.go.dev/cloud.google.com/go/bigquery#StructSaver) interface,\nin which case you get the benefit of being able to write any kind of `struct` while at the same time being able to pass\nin the to be used scheme already such that it doesn't have to be inferred and giving you exact controls for each field on top of that.\n\nIf you have the choice however than we do recommend to implement the `ValueSaver` for your row `struct` as this gives you the best of both worlds,\nwhile at the same time also giving you the easy built-in ability to define a unique `insertID` per row which will help prevent potential duplicates\nthat can otherwise happen while retrying to write rows which have failed temporarily.\n\n### Custom InsertAll Streamer\n\nUsing the same `myRow` structure from previous example,\nhere is how we create a `Streamer` client with a more\ncustom configuration:\n\n```go\nimport (\n    \"context\"\n\n    \"github.com/OTA-Insight/bqwriter\"\n)\n\n// TODO: use more specific context\nctx := context.Background()\n\n// create a BQ (stream) writer thread-safe client,\nbqWriter, err := bqwriter.NewStreamer(\n    ctx,\n    \"my-gcloud-project\",\n    \"my-bq-dataset\",\n    \"my-bq-table\",\n    \u0026bqwriter.StreamerConfig{\n        // use 5 background worker threads\n        WorkerCount: 5,\n        // ignore errors for invalid/unknown rows/values,\n        // by default these errors make a write fail\n        InsertAllClient: \u0026bqwriter.InsertAllClientConfig{\n             // Write rows fail for invalid/unknown rows/values errors,\n             // rather than ignoring these errors and skipping the faulty rows/values.\n             // These errors are logged using the configured logger,\n             // and the faulty (batched) rows are dropped silently.\n            FailOnInvalidRows:    true,\n            FailForUnknownValues: true, \n        },\n    },\n)\nif err != nil {\n    // TODO: handle error gracefully\n    panic(err)\n}\n// do not forget to close, to close all background resources opened\n// when creating the BQ (stream) writer client\ndefer bqWriter.Close()\n\n// You can now start writing data to your BQ table\nbqWriter.Write(\u0026myRow{Timestamp: time.UTC().Now(), Username: \"test\"})\n// NOTE: only write one row at a time using `(*Streamer).Write`,\n// multiple rows can be written using one `Write` call per row.\n```\n\n### Storage Streamer\n\nIf you can you should use the StorageStreamer. The InsertAll API is now considered legacy\nand is more expensive and less efficient to use compared to the storage API.\n\nHere follows an example on how you can create such a storage API driven BigQuery streamer.\n\n```go\nimport (\n    \"context\"\n\n    \"github.com/OTA-Insight/bqwriter\"\n    \"google.golang.org/protobuf/reflect/protodesc\"\n\n    // TODO: define actual path to pre-compiled protobuf Go code\n    \"path/to/my/proto/package/protodata\"\n)\n\n// TODO: use more specific context\nctx := context.Background()\n\n// create proto descriptor to use for storage client\nprotoDescriptor := protodesc.ToDescriptorProto((\u0026protodata.MyCustomProtoMessage{}).ProtoReflect().Descriptor())\n// NOTE:\n//  - storage writer API expects proto2 semantics, proto3 shouldn't be used (yet);\n//  - the [normalizeDescriptor](https://pkg.go.dev/cloud.google.com/go/bigquery/storage/managedwriter/adapt#NormalizeDescriptor)\n//    should be used to get a descriptor with nested types in order to have it work nicely with nested types;\n//    - this means the line above would change to:\n//      `protoDescriptor := adapt.NormalizeDescriptor((\u0026protodata.MyCustomProtoMessage{}).ProtoReflect().Descriptor())`,\n//      which does require the `\"cloud.google.com/go/bigquery/storage/managedwriter/adapt\"` package to be imported;\n//  - known types cannot be used, you'll need to use type conversions instead\n//    https://cloud.google.com/bigquery/docs/write-api#data_type_conversions,\n//    e.g. int64 (micro epoch) instead of the known Google Timestamp proto type;\n\n// create a BQ (stream) writer thread-safe client,\nbqWriter, err := bqwriter.NewStreamer(\n    ctx,\n    \"my-gcloud-project\",\n    \"my-bq-dataset\",\n    \"my-bq-table\",\n    \u0026bqwriter.StreamerConfig{\n        // use 5 background worker threads\n        WorkerCount: 5,\n        // create the streamer using a Protobuf message encoder for the data\n        StorageClient: \u0026bqwriter.StorageClientConfig{\n            ProtobufDescriptor: protoDescriptor,\n        },\n    },\n)\n)\nif err != nil {\n    // TODO: handle error gracefully\n    panic(err)\n}\n// do not forget to close, to close all background resources opened\n// when creating the BQ (stream) writer client\ndefer bqWriter.Close()\n\n// TOOD: populate fields of the proto message\nmsg := new(protodata.MyCustomProtoMessage)\n\n// You can now start writing data to your BQ table\nbqWriter.Write(msg)\n// NOTE: only write one row at a time using `(*Streamer).Write`,\n// multiple rows can be written using one `Write` call per row.\n```\n\nYou must define the `StorageClientConfig`, as demonstrated in previous example,\nin order to be create a Streamer client using the Storage API.\nNote that you cannot create a blank `StorageClientConfig` or any kind of default,\nas you are required to configure it with either a `bigquery.Schema` or a `descriptorpb.DescriptorProto`,\nwith the latter being preferred and used of the first.\n\nThe schema or Protobuf descriptor are used to be able to encode the data prior to writing\nin the correct format as Protobuf encoded binary data.\n\n- `BigQuerySchema` can be used in order to use a data encoder for the StorageClient\n  based on a dynamically defined BigQuery schema in order to be able to encode any struct,\n  JsonMarshaler, Json-encoded byte slice, Stringer (text proto) or string (also text proto)\n  as a valid protobuf message based on the given BigQuery Schema;\n- `ProtobufDescriptor` can be used in order to use a data encoder for the StorageClient\n  based on a pre-compiled protobuf schema in order to be able to encode any proto Message\n  adhering to this descriptor;\n\n`ProtobufDescriptor` is preferred as you might have to pay a performance penalty\nshould you want to use the `BigQuerySchema` instead.\n\nYou can check out [./internal/test/integration/temporary_data_proto2.proto](./internal/test/integration/temporary_data_proto2.proto) for an example of a proto message that can be sent over the wire. The BigQuery\nschema for that definition can be found in [./internal/test/integration/tmpdata.go](./internal/test/integration/tmpdata.go). Finally, you can get inspired by [./internal/test/integration/generate.go](./internal/test/integration/generate.go) to know how to generate the required Go code in order for you to configure your streamer with the right proto descriptor and being able to send rows of data using your proto definitions.\n\n### Batch Streamer\n\nThe batch streamer can be used if you want to upload a big dataset of data to bigquery without any additional cost.\n\nHere follows an example on how you can create such a batch API driven BigQuery client.\n\n```go\nimport (\n    \"bytes\"\n    \"context\"\n    \"encoding/json\"\n    \"path/filepath\"\n\n    \"github.com/OTA-Insight/bqwriter\"\n\n    \"cloud.google.com/go/bigquery\"\n)\n\nfunc main() {\n    ctx := context.Background()\n\t\n    // By using new(bqwriter.BatchClientConfig) we will create a config with bigquery.JSON as default format\n    // And the schema will be autodetected via the data.\n    // Possible options are: \n    // - BigQuerySchema: Schema to use to upload to bigquery.\n    // - SourceFormat: Format of the data we want to send.\n    // - FailForUnknownValues: will treat records that have unknown values as invalid records.\n    // - WriteDisposition: Defines what the write disposition should be to the bigquery table.\n    batchConfig := new(bqwriter.BatchClientConfig)\n\t\n    // create a BQ (stream) writer thread-safe client.\n    bqWriter, err := bqwriter.NewStreamer(\n        ctx,\n        \"my-gcloud-project\",\n        \"my-bq-dataset\",\n        \"my-bq-table\",\n        \u0026bqwriter.StreamerConfig{\n            BatchClient: batchConfig\n        },\n    )\n\n    if err != nil {\n        // TODO: handle error gracefully\n        panic(err)\n    }\n\t\t\n    // do not forget to close, to close all background resources opened\n    // when creating the BQ (stream) writer client\n    defer bqWriter.Close()\n\n    // a batch-driven BQ Streamer expects an io.Reader,\n    // the source of the data isn't strictly defined as long as the source\n    // format is supported. Usually you would fetch the data from large files\n    // as this is where the batch client really shines\n    files, err := filepath.Glob(\"/usr/joe/my/data/path/exported_data_*.json\")\n    if err != nil {\n        // TODO: handle error gracefully\n        panic(err)\n    }\n    for _, fp := range files {\n        file, err := os.Open(fp)\n        if err != nil {\n            // TODO: handle error gracefully\n            panic(err)\n        }\n\n        // Write the data to bigquery.\n        err := bqWriter.Write(file)\n        if err != nil {\n            // TODO: handle error gracefully\n            panic(err)\n        }\n    }\n}\n```\n\nYou must define the `BatchClientConfig`, as demonstrated in previous example,\nin order to create a Batch client.\n\nNote that you cannot create a blank `BatchClientConfig` or any kind of default,\nas you are required to configure it with at least a `SourceFormat`.\n\nWhen using the Json format make sure the casing of your fields matches exactly the\nfields defined in your BigQuery schema of the desired target table. While field names\nare normally considered case insensitive, they do seem to cause \"duplicate field\" issues\nas part of the batch load io.Reader decode process such as the following ones:\n\n```\nJob returned an error status {Location: \"query\"; Message: \"Duplicate(Case Insensitive) field names: value and Value. Table: tmp_2e6895b9_b44b_4b5c_9941_def9a10e85d5_source\"; Reason: \"invalidQuery\"}\n```\n\nFix the casing of your json definition and this error should go away.\n\n**BatchClientConfig options**:\n\n- `BigQuerySchema` can be used in order to use a data encoder for the batchClient\n  based on a dynamically defined BigQuery schema in order to be able to encode any struct,\n  JsonMarshaler, Json-encoded byte slice, Stringer (text proto) or string (also text proto)\n  as a valid protobuf message based on the given BigQuery Schema.\n  \n  The `BigQuerySchema` is required for all `SourceFormat` except for `bigquery.CSV` and `bigquery.JSON` as these\n  2 formats will auto detect the schema via the content.\n\n- `SourceFormat` is used to define the format that the data is that we will send.\n  Possible options are:\n  - `bigquery.CSV`\n  - `bigquery.Avro`\n  - `bigquery.JSON`\n  - `bigquery.Parquet`\n  - `bigquery.ORC`\n\n- `FailForUnknownValues` causes records containing such values\n  to be treated as invalid records.\n  \n  Defaults to false, making it ignore any invalid values, silently ignoring these errors,\n  and publishing the rows with the unknown values removed from them.\n\n- `WriteDisposition` can be used to define what the write disposition should be to the bigquery table.\n  Possible options are:\n    - `bigquery.WriteAppend`\n    - `bigquery.WriteTruncate`\n    - `bigquery.WriteEmpty`\n  \n  Defaults to `bigquery.WriteAppend`, which will append the data to the table.\n\n#### Future improvements\n\nCurrently, the package does not support any additional options that the different `SourceFormat` could have, feel free to\nopen a feature request to add support for these.\n\n## Authorization\n\nThe streamer client will use [Google Application Default Credentials](https://developers.google.com/identity/protocols/application-default-credentials) for authorization credentials used in calling the API endpoints.\nThis will allow your application to run in many environments without requiring explicit configuration.\n\nPlease open an issue should you require more advanced forms of authorization. The issue should come with an example,\na clear statement of intention and motivation on why this is a useful contribution to this package. Even if you wish\nto contribute to this project by implementing this patch yourself, it is none the less best to create an issue prior to it,\nsuch that we can all be aligned on the specifics. Good communication is key here.\n\nIt was a choice to not support these advanced authorization methods for now. The reasons being that the package\nauthors didn't have a need for it and it allowed to keep the API as simple and small as possible. There however some\nadvanced authorizations still possible:\n\n- Authorize using [a custom Json key file path](https://cloud.google.com/iam/docs/creating-managing-service-account-keys);\n- Authorize with more control by using the [`https://pkg.go.dev/golang.org/x/oauth2`](https://pkg.go.dev/golang.org/x/oauth2) package\n  to create an `oauth2.TokenSource`;\n\nTo conclude. We currently do not support advanced ways for Authorization, but we're open to include support for these,\nif there is sufficient interest for it. The [Contributing section](#Contributing) section explains how you can actively\nhelp to get this supported if desired.\n\n## Instrumentation\n\nWe currently support the ability to implement your logger which can be used instead of the standard logger which prints\nto STDERR. It is used for debug statements as well as unhandled errors. Debug statements aren't used everywhere, any unhandled error that isn't propagated is logged using the used logger.\n\n\u003e You can find the interface you would need to implement to support your own Logger at\n\u003e \u003chttps://godoc.org/github.com/OTA-Insight/bqwriter/log#Logger\u003e.\n\nThe internal client of the Storage-API driven Streamer also provides the tracking of stats regarding its GRPC functionality.\nThis is implemented and utilized via the \u003chttps://github.com/census-instrumentation/opencensus-go\u003e package.\n\nIf you use `OpenCensus` for your own project it will work out of the box.\n\nIn case your project uses another data ingestion system you can none the less get these statistics within your system\nof choice by registering an exporter which exports the stats to the system used by your project. Please see\nhttps://github.com/census-instrumentation/opencensus-go#views as a starting point on how to register a view yourself.\nOpenCensus comes with a bunch of exporters already, all listed in https://github.com/census-instrumentation/opencensus-go#exporters.\nYou can however also implement your own one.\n\nThe official google cloud API will most likely switch to OpenCensus's successor OpenTelemetry once the latter becomes stable.\nFor now however it is OpenCensus that is used.\n\nNote that this extra form of instrumentation is only applicable to a Streamer using the Storage API. The InsertAll-\nand Batch-driven Streamers do not provide any form of stats tracking.\n\nPlease see also \u003chttps://github.com/googleapis/google-cloud-go/issues/5100#issuecomment-966461501\u003e for more information\non how you can hook up a built-in or your own system into the tracking system for any storage API driven streamer.\n\n## Write Error handling\n\nThe current version of the bqwriter is written with a fire-and-forget philosophy in mind.\nActual write errors occur on async worker goroutines and are only logged. Already today,\nyou can plugin your own logger implementation in order to get these logs in your alerting systems.\n\nPlease file a detailed feature request with a real use case as part of the verbose description\nshould you be in need of being able to handle errors.\n\nOne possible approach would be to allow a channel or callback to be defined in the `StreamerConfig`\nwhich would get a specific data structure for any write failure. This could contain the data which failed to write,\nany kind of offset/insertID as well as the actual error which occurred. The details would however to be worked out\nas part of the proposal.\n\nBesides a valid use case to motivate this proposal we would also need to think carefully about how\nwe can make the returned errors actionable. Returning it only to allow the user to log/print it is a bit silly,\nas that is anyway already the behavior today. The real value from this proposal would come from the fact that\nthe data can be retried to be inserted (if it makes sense within its context, as defined by at the very least the error type),\nand done so in an easy and safe manner, and with actual aid to help prevent duplicates. The Google Cloud API provides for this purpose\nthe offsets and insertID's, but the question is how we would integrate this and also to double check that this really does prevent\nduplicates or not.\n\nThe [Contributing section](#Contributing) section explains how you can actively\nhelp to get this supported if desired.\n\n## Contributing\n\nContributions are welcome. Please, see the [CONTRIBUTING](/CONTRIBUTING.md) document for details.\n\nPlease note that this project is released with a Contributor Code of Conduct.\nBy participating in this project you agree to abide by its terms.\nSee [Contributor Code of Conduct](/CONTRIBUTING.md#contributor-code-of-conduct) for more information.\n\n## Developer Instructions\n\nAs a developer you need to agree to the\n[Contributor Code of Conduct](/CONTRIBUTING.md#contributor-code-of-conduct) for more information.\nSee [the previous Contributing section](#contributing) for more info in regards of contributing to this project.\nIn this section we'll also assume that you've read \u0026 understood the [Install](#install) and [Examples](#examples) sections.\n\nPlease take your time and complete the forms with sufficient details when filing issues and proposals. Pull requests (PRs) should only be created once a related issue/proposal has been created and agreed upon. Also take your time and complete the PR description with sufficient detail when you're ready to create a PR.\n\n### Tests\n\nUsing [GitHub actions](./.github/workflows/go.yml) this codebase is being tested automatically for each commit/PR.\n- `$ go test -v ./...`:\n  - run against the Min and Max Go versions\n  - all tests are expected to pass\n- `$ golangci-lint run`:\n  - run against latest Go version only\n  - is expected to generate no warnings or errors of any kind\n\nFor each contribution that you do you'll have to make sure that all these tests pass.\nPlease do not modify any existing tests unless required because some kind of breaking change. If you do have to modify (or delete) existing tests than please document this in full detail with proper motivation as part of your PR description. Ensure your added and modified code is also sufficiently tested and covered.\n\nNext to this, the maintainers of this repository (see [CODEOWNERS](CODEOWNERS)) also run [integration tests](./internal/test/integration) against a real production-like BigQuery table within the actual Google Cloud infrastructure. These test the streamer for all implementations: `insertAll`, `storage`, `storage-json` (a regular `storage` client but using a bigQuery.Schema as to be able to insert JsonMarshalled data) and `batch`.\n\nYou can run these tests yourself as well using the following internal cmd tool:\n\n```batch\n$ go run ./internal/test/integration --help\nUsage of ./internal/test/integration/tmp/exe:\n  -dataset string\n        BigQuery dataset to write data to (default \"benchmarks_bqwriter\")\n  -debug\n        enable to show debug logs\n  -iterations int\n        how many values to write to each of the different streamer tests (default 100)\n  -project string\n        BigQuery project to write data to (default \"oi-bigquery\")\n  -streamers string\n        csv of streamers to test, one or multiple of following options: insertall, storage, storage-json, batch\n  -table string\n        BigQuery table to write data to (default \"tmp\")\n  -workers int\n        how many workers to use to run tests in parallel (default 12)\n```\n\nMost likely you'll need to pass the `--project`, `--dataset` and `--table` flag to use \na BigQuery table for which you have sufficient permissions and that is used\nonly for temporary testing purposes such as these.\n\nRunning these tests yourself is not required as part of a contribution,\nbut it can be run by you in case you are interested in doing so for whatever reason.\n\n## FAQ\n\n\u003e My insertAll streamer seems to insert 1 row per request instead of batching, how is this possible?\n\nMake sure your configuration matches the needs of your bandwidth.\nDo not use more workers than you need for example (`WorkerCount`). Also make\nsure the `MaxBatchDelay` and `BatchSize` values are configured appropriately.\n","funding_links":[],"categories":["Third-party APIs","Utility","第三方api"],"sub_categories":["Utility/Miscellaneous","实用程序/Miscellaneous","Fail injection"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOTA-Insight%2Fbqwriter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOTA-Insight%2Fbqwriter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOTA-Insight%2Fbqwriter/lists"}