{"id":47928188,"url":"https://github.com/vinted/flink-big-query-connector","last_synced_at":"2026-04-04T07:01:03.807Z","repository":{"id":183461278,"uuid":"670183800","full_name":"vinted/flink-big-query-connector","owner":"vinted","description":"Flink connector for BigQuery","archived":false,"fork":false,"pushed_at":"2023-11-29T11:12:52.000Z","size":159,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2023-11-29T16:54:49.519Z","etag":null,"topics":["bigquery","flink","flink-connector","flink-connector-bigquery","streaming"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vinted.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-07-24T13:27:10.000Z","updated_at":"2024-03-07T15:20:16.369Z","dependencies_parsed_at":"2023-10-04T15:12:17.330Z","dependency_job_id":"f2eb7a05-5fa9-416d-9e89-6116f70f307b","html_url":"https://github.com/vinted/flink-big-query-connector","commit_stats":null,"previous_names":["vinted/flink-big-query-connector"],"tags_count":14,"template":null,"template_full_name":null,"purl":"pkg:github/vinted/flink-big-query-connector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinted%2Fflink-big-query-connector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinted%2Fflink-big-query-connector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinted%2Fflink-big-query-connector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinted%2Fflink-big-query-connector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vinted","download_url":"https://codeload.github.com/vinted/flink-big-query-connector/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinted%2Fflink-big-query-connector/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31390695,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T04:26:24.776Z","status":"ssl_error","status_checked_at":"2026-04-04T04:23:34.147Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","flink","flink-connector","flink-connector-bigquery","streaming"],"created_at":"2026-04-04T07:00:53.222Z","updated_at":"2026-04-04T07:01:03.789Z","avatar_url":"https://github.com/vinted.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Flink BigQuery Connector ![Build](https://github.com/vinted/flink-big-query-connector/actions/workflows/gradle.yml/badge.svg) [![](https://jitpack.io/v/com.vinted/flink-big-query-connector.svg)](https://jitpack.io/#com.vinted/flink-big-query-connector)\n\nThis project provides a BigQuery sink that allows writing data with exactly-once or at-least guarantees.\n\n## Usage\n\nThere are builder classes to simplify constructing a BigQuery sink. The code snippet below shows an example of building a BigQuery sink in Java:\n\n```java\nvar credentials = new JsonCredentialsProvider(\"key\");\n\nvar clientProvider = new BigQueryProtoClientProvider\u003cString\u003e(credentials,\n    WriterSettings.newBuilder()\n                 .build()\n);\n\nvar bigQuerySink = BigQueryStreamSink.\u003cString\u003enewBuilder()\n    .withClientProvider(clientProvider)\n    .withDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE)\n    .withRowValueSerializer(new NoOpRowSerializer\u003c\u003e())\n    .build();\n```\n\nAsync connector for at least once delivery\n\n```java\nvar credentials = new JsonCredentialsProvider(\"key\");\n\nvar clientProvider = new AsyncClientProvider\u003cString\u003e(credentials,\n    WriterSettings.newBuilder()\n                 .build()\n);\n\nvar sink = AsyncBigQuerySink.builder()\n        .setRowSerializer(new NoOpRowSerializer\u003c\u003e())\n        .setClientProvider(clientProvider)\n        .setMaxBatchSize(30)\n        .setMaxBufferedRequests(10)\n        .setMaxBatchSizeInBytes(10000)\n        .setMaxInFlightRequests(4)\n        .setMaxRecordSizeInBytes(10000)\n        .build();\n```\n\nThe sink takes in a batch of records. Batching happens outside the sink by opening a window. Batched records need to implement the BigQueryRecord interface.\n\n```java\nvar trigger = BatchTrigger.\u003cRecord, GlobalWindow\u003ebuilder()\n    .withCount(100)\n    .withTimeout(Duration.ofSeconds(1))\n    .withSizeInMb(1)\n    .withResetTimerOnNewRecord(true)\n    .build();\n\nvar processor = new BigQueryStreamProcessor()\n    .withDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE)\n    .build();\n\nsource.key(s -\u003e s)\n    .window(GlobalWindows.create())\n    .trigger(trigger)\n    .process(processor);\n\n```\n\nTo write to BigQuery, you need to:\n\n- Define credentials\n- Create a client provider\n- Batch records\n- Create a value serializer\n- Sink to BigQuery\n\n# Credentials\n\nThere are two types of credentials:\n\n- Loading from a file\n\n```java\nnew FileCredentialsProvider(\"/path/to/file\")\n```\n\n- Passing as a JSON string\n\n```java\nnew JsonCredentialsProvider(\"key\")\n```\n\n# Types of Streams\n\nBigQuery supports two types of data formats: json and proto. When creating a stream, you can choose these types by creating the appropriate client and using the builder methods.\n\n- JSON\n\n```java\nvar clientProvider = new BigQueryJsonClientProvider\u003cString\u003e(credentials,\n    WriterSettings.newBuilder()\n                 .build()\n);\n\nvar bigQuerySink = BigQueryStreamSink.\u003cString\u003enewBuilder()\n```\n\n- Proto\n\n```java\nvar clientProvider = new BigQueryProtoClientProvider(credentials,\n    WriterSettings.newBuilder()\n                 .build()\n);\n\nvar bigQuerySink = BigQueryStreamSink.\u003cString\u003enewBuilder();\n```\n\n# Exactly once\n\nIt utilizes a [buffered stream](https://cloud.google.com/bigquery/docs/write-api#buffered_type), managed by the BigQueryStreamProcessor, to assign and process data batches. If a stream is inactive or closed, a new stream is created automatically. The BigQuery sink writer appends and flushes data to the latest offset upon checkpoint commit.\n\n# At least once\n\nData is written to the [default stream](https://cloud.google.com/bigquery/docs/write-api#default_stream) and handled by the BigQueryStreamProcessor, which batches and sends rows to the sink for processing.\n\n# Serializers\n\nFor the proto stream, you need to implement `ProtoValueSerializer`, and for the JSON stream, you need to implement `JsonRowValueSerializer`.\n\n# Metrics\n\n\u003ctable class=\"table table-bordered\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth class=\"text-left\" style=\"width: 15%\"\u003eScope\u003c/th\u003e\n      \u003cth class=\"text-left\" style=\"width: 18%\"\u003eMetrics\u003c/th\u003e\n      \u003cth class=\"text-left\" style=\"width: 39%\"\u003eDescription\u003c/th\u003e\n      \u003cth class=\"text-left\" style=\"width: 10%\"\u003eType\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n        \u003cth rowspan=\"8\"\u003eStream\u003c/th\u003e\n        \u003ctd\u003estream_offset\u003c/td\u003e\n        \u003ctd\u003eCurrent offset for the stream. When using at least once, the offset is always 0\u003c/td\u003e\n        \u003ctd\u003eGauge\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003ebatch_count\u003c/td\u003e\n        \u003ctd\u003eNumber of records in the appended batch\u003c/td\u003e\n        \u003ctd\u003eGauge\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003ebatch_size_mb\u003c/td\u003e\n        \u003ctd\u003eAppended batch size in mb\u003c/td\u003e\n        \u003ctd\u003eGauge\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003esplit_batch_count\u003c/td\u003e\n        \u003ctd\u003eNumber of times the batch hit the BigQuery limit and was split into two parts\u003c/td\u003e\n        \u003ctd\u003eGauge\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvinted%2Fflink-big-query-connector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvinted%2Fflink-big-query-connector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvinted%2Fflink-big-query-connector/lists"}