{"id":15410993,"url":"https://github.com/disneystreaming/pg2k4j","last_synced_at":"2025-04-05T23:11:40.703Z","repository":{"id":34589401,"uuid":"151743124","full_name":"disneystreaming/pg2k4j","owner":"disneystreaming","description":"Postgresql To Kinesis For Java","archived":false,"fork":false,"pushed_at":"2024-12-14T07:27:44.000Z","size":943,"stargazers_count":79,"open_issues_count":163,"forks_count":25,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-29T22:09:52.152Z","etag":null,"topics":["cdc","java","kinesis","kinesis-stream","logical-replication","postgresql","replication"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/disneystreaming.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-05T15:40:42.000Z","updated_at":"2025-03-14T10:19:57.000Z","dependencies_parsed_at":"2024-02-21T16:35:42.530Z","dependency_job_id":"3cfa7625-43d0-4adb-ae28-38a02de9af86","html_url":"https://github.com/disneystreaming/pg2k4j","commit_stats":null,"previous_names":[],"tags_count":42,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disneystreaming%2Fpg2k4j","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disneystreaming%2Fpg2k4j/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disneystreaming%2Fpg2k4j/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/disneystreaming%2Fpg2k4j/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/disneystreaming","download_url":"https://codeload.github.com/disneystreaming/pg2k4j/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247411239,"owners_count":20934653,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdc","java","kinesis","kinesis-stream","logical-replication","postgresql","replication"],"created_at":"2024-10-01T16:47:12.717Z","updated_at":"2025-04-05T23:11:40.686Z","avatar_url":"https://github.com/disneystreaming.png","language":"Java","funding_links":[],"categories":["WAL"],"sub_categories":["Logical Replication"],"readme":"\n[![Build Status](https://travis-ci.com/disneystreaming/pg2k4j.svg?branch=master)](https://travis-ci.com/disneystreaming/pg2k4j) [![codecov](https://codecov.io/gh/disneystreaming/pg2k4j/branch/master/graph/badge.svg)](https://codecov.io/gh/disneystreaming/pg2k4j)\n\n# This project is no longer actively maintained by Disney Streaming Services\n\nYou are welcome to use and fork this project, and we may occasionally review and merge pull requests submitted from contributors or dependabot/Snyk. However we no longer use this internally and therefore support will be limited.\n\n## pg2k4j\n\n### Postgresql To Kinesis For Java\n\nA tool for publishing inserts, updates, and deletes made on a [Postgresql](https://www.postgresql.org/) database to an [Amazon Kinesis](https://aws.amazon.com/kinesis/) Stream.\npg2k4j may be run as a stand-alone application from the command line, or used as a Java library where its functionality\ncan be extended and customized.\n\n### Getting Started\n\nFirst, setup your Postgres database to support [logical replication](https://www.postgresql.org/docs/10/static/logical-replication.html) and create an AWS Kinesis Stream.\n \n#### Run pg2k4j as a Stand-alone Application\nDownload [Docker](https://www.docker.com/get-started) and login with your docker credentials to gain access to the [pg2k4j docker repository](https://hub.docker.com/r/disneystreaming/pg2k4j/).\nThen run the command below.\n```\ndocker run -v /path/to/.aws/creds/:/aws_creds \ndisneystreaming/pg2k4j \n--awsconfiglocation=/aws_creds --awsprofile=default\n--pgdatabase=\u003cyour_postgres_db\u003e --pghost=\u003cyour_postgres_host\u003e --pguser=\u003cyour_postgres_user\u003e --pgpassword=\u003cyour_postgres_pw\u003e \n--streamname=\u003cyour_kinesis_streamname\u003e\n``` \n\nWhen you observe the below log, pg2k4j is set to publish changes to Kinesis.\n\n```\n[main] INFO com.disneystreaming.pg2k4j.SlotReaderKinesisWriter - Consuming from slot pg2k4j\n ```\n \n#### Use as a Java Library\n\npg2k4j artifacts are published to [maven central](https://mvnrepository.com/artifact/com.disneystreaming.pg2k4j/pg2k4j)\n\n##### Maven\n\n```\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.disneystreaming.pg2k4j\u003c/groupId\u003e\n    \u003cartifactId\u003epg2k4j\u003c/artifactId\u003e\n    \u003cversion\u003eLATEST\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n##### Gradle\n\n```\ncompile group: 'com.disneystreaming.pg2k4j', name: 'pg2k4j', version: 'LATEST'\n```\n\n##### SBT\n\n```\nlibraryDependencies += \"info.pg2k4j\" % \"pg2k4j\" % \"LATEST\"\n```\n\nTo initialize and begin publishing database changes to Kinesis, create a [SlotReaderKinesisWriter](src/main/java/com/disneystreaming/pg2k4j/SlotReaderKinesisWriter.java) \nand call its [runLoop](src/main/java/com/disneystreaming/pg2k4j/SlotReaderKinesisWriter.java#L84) method.\n\n### Why We Wrote pg2k4j\n\npg2k4j is an implementation of a powerful design pattern called [Change Data Capture](https://en.wikipedia.org/wiki/Change_data_capture).\nBy using pg2k4j, anyone can know the state of your database at any point in time by consuming from the Kinesis Stream.\nAt DSS we have rapidly changing datasets that many teams need to access and process in their own way. pg2k4j \nalleviates the need to grant database access to each team or to stand up an API on top of the dataset. This keeps the load down\non your database, making it possible to max out its write throughput. \n\n### Benefits Over Existing Solutions\n\nBefore writing pg2k4j, we explored existing solutions. We used [pg2kinesis](https://github.com/handshake/pg2kinesis) but found\nthat this implementation simply couldn't keep up with the write throughput that we required. As a JVM app, pg2k4j can natively integrate with [Amazon's\nKinesis Producer Library](https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-kpl.html) allowing it to achieve write speeds of over\n1 million records per minute, which is orders of magnitude faster than the performance we observed with its Python\n counterpart.\n\n### How it Works\n\n##### 1. pg2k4j Opens up a [Logical Replication Slot](https://www.postgresql.org/docs/10/static/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS) on the Postgresql database.\n\nA replication slot will stream changes made on the database to the listener of the replication slot in the format specified\nby the plugin used for that replication slot. By default pg2k4j uses the [wal2json](https://github.com/eulerto/wal2json) plugin\nwhich outputs a json representation of a [SlotMessage](src/main/java/com/disneystreaming/pg2k4j/models/SlotMessage.java) to the \nlistening thread. Postgres writes all data changes to the [Write Ahead Log](https://www.postgresql.org/docs/10/static/wal-intro.html), which,\nas well as ensuring data integrity and crash safety, makes it possible to perform logical replication. Each replication slot maintains a pointer to a position in the WAL, indicating the last sequence number this replication\nslot has processed. This pointer allows Postgres to flush all sections of the WAL which occurred before this sequence number. Crucially, if the\napplication maintaining the replication slot does not update this sequence number, the storage space on the database will fill up because\nPostgres won't be able to clear any sections of the WAL. To view this sequence number you can run the below query on your database.\n\n```\nselect * from pg_replication_slots\n```\n\nDetails of how pg2k4j manages this pointer are outlined later in this section.\n\n##### 2. pg2k4j [deserializes](src/main/java/com/disneystreaming/pg2k4j/SlotReaderKinesisWriter.java#L277) the json output sent by the wal2json plugin to a SlotMessage.\n\nThis method should be overridden when using any plugin besides wal2json as the contents from the WAL would not be json\nrepresentations of a SlotMessage.\n\n##### 3. pg2k4j writes this contents to the Kinesis Stream.\n\nFirst the SlotMessage is turned into a Stream of [UserRecord](https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/UserRecord.java), and then\nthese UserRecords are written to the stream with a [callback attached](src/main/java/com/disneystreaming/pg2k4j/SlotReaderKinesisWriter.java#L245) that will be invoked once the records make it to the \nstream.\n\n##### 4. The callback is invoked when the records succeed or fail to make it to the stream.\n\nOn a successful write to the stream pg2k4j will [advance the replication slot's sequence number](src/main/java/com/disneystreaming/pg2k4j/SlotReaderCallback.java#L83), indicating\nthat any data before this point may be flushed by the database. By advancing the sequence number after receiving confirmation\nthat the record arrived on the stream, pg2k4j guarantees that each data change reaches Kinesis. Even on Postgres restart \nor pg2k4j restart this guarantee is preserved.\n\nThere is one other scenario wherein pg2k4j will advance the sequence number. It's important to note that each Postgres instance\nmay have many databases, but a replication slot is configured against a single database. In the scenario where \nthe replication slot database is idle but the other databases are active, it's important that pg2k4j still advances its pointer into\nthe WAL so that Postgres doesn't hang onto these sections of the WAL. That's why pg2k4j [advances the sequence number after\na certain period of inactivity](src/main/java/com/disneystreaming/pg2k4j/SlotReaderKinesisWriter.java#L204-L206)\nwhich [defaults to 5 minutes](src/main/java/com/disneystreaming/pg2k4j/ReplicationConfiguration.java#L38).\n\n### Configuring Infrastructure\n\nThis section is a walk through on how to create your Posgresql instance configured for logical replication as an [RDS](https://aws.amazon.com/rds/) instance.\npg2k4j by no means requires that your Postgesql instance is an RDS instance, but since Kinesis is an AWS product, many users\nwill likely also be running their Postgres instance on AWS. For an example of how to configure a non-RDS instance of Postgres refer\nto the integration tests. This section also walks through how to set up a Kinesis Stream, for which pg2k4j requires no sepcial configuration.\n\n#### Start AWS RDS Postgres\n\n##### Create a Parameter Group\n\nIn AWS console navigate to RDS then parameter groups and create a parameter group for the `postgres10` family.\n\n![Create Parameter Group](exampleImages/parameterGroup.png)\n\nIn this parameter group, set the following values:\n\n```bash\nrds.logical_replication 1\nmax_wal_senders 10\nmax_replication_slots 10\n```\n\n##### Launch Instance\n\nIn AWS console navigate to RDS-\u003eInstances and select `Launch Instance`. Follow the creation wizard,\nselecting `Postgresql` for the DB engine and `Postgresql 10.3-R1` for the DB engine version.\n\nAs shown below, associate the parameter group you created in the previous step with this instance.\n\n![Associate Parameter Group](exampleImages/associateParameterGroup.png)\n\n#### Create Kinesis Stream\n\nIn AWS console navigate to Kinesis, and create a stream.\n\n![Create Kinesis Stream](exampleImages/setupKinesisStream.png)\n\n### Contributing\n\nFork the repo and submit a pr with a description detailing what this code does and what bug or feature it addresses. Any methods\ncontaining substantial logic should include javadocs.\n\nBe sure that both integration tests and unit pass and that any new code introduced has corresponding [tests](src/test/java/com/disney/pg2k4j). Run unit tests with\n\n```bash\n\u003e\u003e mvn clean test\nTests run: 13, Failures: 0, Errors: 0, Skipped: 0\n```\n\nand integration tests with \n\n```bash\nmvn clean verify\nTests run: 2, Failures: 0, Errors: 0, Skipped: 0\n```\n\nContributors are required to fill out a CLA in order for us to be allowed to accept contributions. See [CLA-Individual](CLA-Individual.md) or [CLA-Corporate](CLA-Corporate.md) for details.\n\n### Releasing\nReleasing is automatic and is driven by tags. To release simply tag the master branch with a Semantic version, e.g. `1.0.0`.\n\nThis will update the pom with the version, publish to maven, and build and push the Docker image to Dockerhub.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdisneystreaming%2Fpg2k4j","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdisneystreaming%2Fpg2k4j","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdisneystreaming%2Fpg2k4j/lists"}