{"id":20087230,"url":"https://github.com/conduitio-labs/conduit-connector-google-cloudstorage","last_synced_at":"2026-03-11T14:07:23.950Z","repository":{"id":58945993,"uuid":"506935603","full_name":"conduitio-labs/conduit-connector-google-cloudstorage","owner":"conduitio-labs","description":"Conduit connector for Google Cloud Storage","archived":false,"fork":false,"pushed_at":"2025-05-12T19:27:19.000Z","size":533,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-07-14T07:52:35.001Z","etag":null,"topics":["conduit","gcp","go","golang","google-cloud-storage"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/conduitio-labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-06-24T08:29:54.000Z","updated_at":"2025-05-12T19:27:22.000Z","dependencies_parsed_at":"2024-06-28T13:56:56.411Z","dependency_job_id":"42ee9e0c-31d6-43f9-a2b6-b1bd3df037a9","html_url":"https://github.com/conduitio-labs/conduit-connector-google-cloudstorage","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/conduitio-labs/conduit-connector-google-cloudstorage","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduitio-labs%2Fconduit-connector-google-cloudstorage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduitio-labs%2Fconduit-connector-google-cloudstorage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduitio-labs%2Fconduit-connector-google-cloudstorage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduitio-labs%2Fconduit-connector-google-cloudstorage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/conduitio-labs","download_url":"https://codeload.github.com/conduitio-labs/conduit-connector-google-cloudstorage/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/conduitio-labs%2Fconduit-connector-google-cloudstorage/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266405405,"owners_count":23923536,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-21T11:47:31.412Z","response_time":64,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conduit","gcp","go","golang","google-cloud-storage"],"created_at":"2024-11-13T16:04:35.089Z","updated_at":"2026-03-11T14:07:18.914Z","avatar_url":"https://github.com/conduitio-labs.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Conduit Connector Google Cloud Storage\n\n### General\n\nThe Google Cloud Storage connector is one of [Conduit](https://github.com/ConduitIO/conduit) plugins. It provides source GCS connectors.\n\n### How to build it\n\nRun `make`.\n\n### Testing\n\nRun `make test` with optional `GOTEST_FLAGS` set to run all the tests. You must set the environment variables `GCP_ServiceAccount_Key`,\n`GCP_ProjectID` and `GCP_Bucket` before you run all the tests. If not set, the tests that use these variables will be ignored.\n\nCases/Scenarios which are dependent on GCS Error response and where these can't be reproducible are ignored in the test cases.\n \n## GCS Source\n\nThe Google Cloud Storage Source Connector connects to a GCS bucket using `serviceAccountKey` and `bucket` details from the configurations. Then will call `Configure` to parse the configurations. \nAfter that, the\n`Open` method is called to make sure the bucket exists and to start the connection from the provided position. If the bucket doesn't exist, or the permissions fail, then an error will occur.\n\n### Change Data Capture (CDC)\n\nThis connector implements CDC features for GCS by scanning the bucket for changes every\n`pollingPeriod` and detecting any change that happened after a certain timestamp. These changes (update, delete, insert)\nare then inserted into a buffer that is checked on each Read request.\n\n* To capture \"delete\" actions, the GCS bucket versioning must be enabled.\n* To capture \"insert\" or \"update\" actions, the bucket versioning doesn't matter.\n\nIf the object has multiple versions in a bucket, then only the live version is considered as a Record.\n\n#### Position Handling\n\nHere the position is constructed using the below custom type which includes the object key which was last read (Used to compare lexicographically when the timestamp is equal),Timestamp of the object concerned event, and type of the reading mode.\n\n```\ntype Position struct {\n\tKey       string    `json:\"key\"`\n\tTimestamp time.Time `json:\"timestamp\"`\n\tType      Type      `json:\"type\"`\n}\n```\n\nThe connector goes through two reading modes.\n\n* Snapshot mode (Value 0): which loops through the GCS bucket and returns the objects that are already in there. The _position type_ during this mode is 0. which makes the connector know at what mode it is and what object it last\n  read. The _position Timestamp_ will be used when changing to CDC mode, the iterator will capture changes that\n  happened after that.\n\n* CDC mode: (Value 1) this mode iterates through the GCS bucket every `pollingPeriod` and captures new actions made on the bucket.\n  the _Position Type_ during this mode is 1. This position is used to return only the\n  actions with a _Position Timestamp_ higher than the last record returned even if the timestamp got matched then the decision would be based on lexicographical comparison, which will ensure that no duplications are in\n  place.\n\n  The CDC mode will start after end of the snapshot item.\n \n  Here the CDC mode works on the default google iterator provided by the GO SDK with a configurable `pollingPeriod` and it doesn't support the notifications through pub/sub because making it seperate service would be more flexible.\n\n### Record Keys\n\nThe GCS object key uniquely identifies the objects in an Google Cloud Storage bucket, which is why a record key is the key read from\nthe GCS bucket.\n\n\n### Configuration\n\nThe config passed to `Configure` can contain the following fields.\n\n| name                  | description                                                                            | required  | example                |\n|-----------------------|----------------------------------------------------------------------------------------|-----------|------------------------|\n| `serviceAccountKey`   | GCP service account key in JSON                                                        | yes       | `{\"key\":\"value\",....}` |\n| `bucket`              | the GCS bucket name                                                                    | yes       | `bucket_name`          |\n| `pollingPeriod`       | polling period for the CDC mode, formatted as a time.Duration string. default is \"1s\"  | no        | `2s`, `500ms`          |\n\nWhen testing using swagger or any rest client, provide the JSON stringified value in `serviceAccountKey`\nExample: `console.log(JSON.stringify(JSON.stringify(key file content)))` run this JavaScript command to get the preferred stringified value. because wrapping JSON with single quotes is not accepted in swagger OR\nRefer to the above example provided in the table.\n\n### Observations\n\nEverything in the bucket is an object, But there is a differentiation like folders (object name ending with a slash)\nor files only for the visual representation in the GCS console. But in the back-end, there is no metadata to identify\nthe differentiation. So we consider all the objects in the bucket irrespective of object name and size.\n\n### Known Limitations\n\nIf a pipeline restarts during the snapshot, then the connector will start scanning the objects from the beginning of\nthe bucket, which could result in duplications.\n\n![scarf pixel](https://static.scarf.sh/a.png?x-pxid=3d30de4f-fb82-4aeb-9f0c-e62e7e39163b)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconduitio-labs%2Fconduit-connector-google-cloudstorage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconduitio-labs%2Fconduit-connector-google-cloudstorage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconduitio-labs%2Fconduit-connector-google-cloudstorage/lists"}