{"id":15936260,"url":"https://github.com/syucream/embulk-input-pubsub","last_synced_at":"2026-06-08T16:05:04.871Z","repository":{"id":56844566,"uuid":"242377825","full_name":"syucream/embulk-input-pubsub","owner":"syucream","description":"Google Cloud Pub/Sub input plugin for Embulk.","archived":false,"fork":false,"pushed_at":"2020-05-05T16:22:23.000Z","size":80,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-04-20T23:09:56.231Z","etag":null,"topics":["embulk-plugin","pubsub"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/syucream.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-22T16:57:14.000Z","updated_at":"2020-08-28T15:34:16.000Z","dependencies_parsed_at":"2022-09-09T04:11:40.780Z","dependency_job_id":null,"html_url":"https://github.com/syucream/embulk-input-pubsub","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/syucream/embulk-input-pubsub","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syucream%2Fembulk-input-pubsub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syucream%2Fembulk-input-pubsub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syucream%2Fembulk-input-pubsub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syucream%2Fembulk-input-pubsub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/syucream","download_url":"https://codeload.github.com/syucream/embulk-input-pubsub/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syucream%2Fembulk-input-pubsub/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34069504,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embulk-plugin","pubsub"],"created_at":"2024-10-07T04:20:48.850Z","updated_at":"2026-06-08T16:05:04.855Z","avatar_url":"https://github.com/syucream.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# embulk-input-pubsub\n\n[![Gem Version](https://badge.fury.io/rb/embulk-input-pubsub.svg)](https://badge.fury.io/rb/embulk-input-pubsub)\n\n[Google Cloud Pub/Sub](https://cloud.google.com/pubsub?hl=en) input plugin for Embulk. \n\n## Overview\n\n* **Plugin type**: input\n* **Guess supported**: no\n\n## Configuration\n\n- **project_id**: GCP project_id (string, required)\n- **subscription_id**: Pub/Sub subscription name (string, required)\n- **json_keyfile**: A path to GCP credential json file (string, required)\n- **max_messages**: A max number of messages on once pubsub call (integer, optional)\n- **checkpoint_basedir**: A path to checkpoint dir (string, optional)\n- **checkpoint**: A path to checkpoint file (string, optional)\n\n### Checkpoint\n\nGoocle Cloud Pub/Sub removes stored messages by ack calls or expiration.\nSo `embulk-input-pubsub` ensures to recovery data-loss with checkpoints which's a fashion used in Apache Flink / Apache Beam.\nIt 1) pulls messages from Pub/Sub, 2) preserves a checkpoint which contains the messages and 3) ack to pubsub.\nIf you got failures on Embulk tasks, you can embulk-resume with the checkpoints. And also you can do simply `embulk-run` with `checkpoint`.\n\nIf you want checkpointing, you need to set `checkpoint_basedir` to preserve checkpoint files on local filesystem. if none, it uses on-memory store.\nIf you want to recover state from checkpoint, you need to set `checkpoint`. It restores transaction states from given checkpoint instead of pulling message from pubsub.\n\nThe checkpoint is implemented as a Protocol Buffers message.\n\n## Example\n\n- pubsub -\u003e stdout config example\n\n```yaml \nin:\n  type: pubsub\n  project_id: \u003cyour-project-id\u003e\n  subscription_id: \u003cyour-subscription-name\u003e\n  json_keyfile: /path/to/credential.json\n  max_messages: 100\n  checkpoint_basedir: /tmp/embulk-input-pubsub/\n\nout:\n  type: stdout\n```\n\nYou execute the example, then you'll get the result:\n\n```\n $ embulk run examples/pubsub2stdout.yaml\n2020-05-06 00:44:05.093 +0900: Embulk v0.9.23\n2020-05-06 00:44:06.540 +0900 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.\n2020-05-06 00:44:10.743 +0900 [INFO] (main): Gem's home and path are set by default: \"/Users/ryo/.embulk/lib/gems\"\n2020-05-06 00:44:12.551 +0900 [INFO] (main): Started Embulk v0.9.23\n2020-05-06 00:44:12.858 +0900 [INFO] (0001:transaction): Loaded plugin embulk-input-pubsub (0.0.1)\n2020-05-06 00:44:18.332 +0900 [INFO] (0001:transaction): Created a new checkpoint! : /tmp/embulk-input-pubsub/checkpoint--1576110815\n2020-05-06 00:44:18.336 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=8 / output tasks 4 = input tasks 1 * 4\n2020-05-06 00:44:18.354 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}\naaa,{}\n2020-05-06 00:44:18.428 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}\n2020-05-06 00:44:18.436 +0900 [INFO] (main): Committed.\n2020-05-06 00:44:18.436 +0900 [INFO] (main): Next config diff: {\"in\":{},\"out\":{}}\n```\n\n## Development\n\n```shell script\n$ ./gradlew gem\n```\n\n## TODO\n\n- Change it to a FileInputPlugin to be applicable for parser plugins\n- Remote filesystem based checkpointing\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyucream%2Fembulk-input-pubsub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsyucream%2Fembulk-input-pubsub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyucream%2Fembulk-input-pubsub/lists"}