{"id":19168821,"url":"https://github.com/nikoshet/rust-dms-cdc-operator","last_synced_at":"2026-01-18T17:01:44.829Z","repository":{"id":232257085,"uuid":"781859383","full_name":"nikoshet/rust-dms-cdc-operator","owner":"nikoshet","description":"The  rust-dms-cdc-operator is a Rust-based utility for comparing the state of a list of tables in an Amazon RDS database with data stored in Parquet files on Amazon S3, particularly useful for change data capture (CDC) scenarios.","archived":false,"fork":false,"pushed_at":"2025-06-05T08:43:01.000Z","size":352,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-01-13T10:04:47.550Z","etag":null,"topics":["aws","cdc","data","dms","parquet","pgdatadiff","polars","postgres","rds","rust","s3","validation"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/dms-cdc-operator","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nikoshet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-04-04T07:14:22.000Z","updated_at":"2025-06-05T08:43:04.000Z","dependencies_parsed_at":"2024-11-09T09:44:03.504Z","dependency_job_id":"0f37d914-6ae9-4da6-a70b-475c79e73385","html_url":"https://github.com/nikoshet/rust-dms-cdc-operator","commit_stats":null,"previous_names":["nikoshet/rust-cdc-validator","nikoshet/rust-dms-cdc-operator"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/nikoshet/rust-dms-cdc-operator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikoshet%2Frust-dms-cdc-operator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikoshet%2Frust-dms-cdc-operator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikoshet%2Frust-dms-cdc-operator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikoshet%2Frust-dms-cdc-operator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nikoshet","download_url":"https://codeload.github.com/nikoshet/rust-dms-cdc-operator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikoshet%2Frust-dms-cdc-operator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28543511,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T14:59:57.589Z","status":"ssl_error","status_checked_at":"2026-01-18T14:59:46.540Z","response_time":98,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","cdc","data","dms","parquet","pgdatadiff","polars","postgres","rds","rust","s3","validation"],"created_at":"2024-11-09T09:43:56.269Z","updated_at":"2026-01-18T17:01:44.807Z","avatar_url":"https://github.com/nikoshet.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eDMS-CDC-Operator\u003c/h1\u003e\n\u003cdiv align=\"center\"\u003e\n \u003cstrong\u003e\n   A Rust 🦀 tool for comparing Amazon RDS tables with Parquet files on Amazon S3, useful for change data capture (CDC)\n \u003c/strong\u003e\n\u003c/div\u003e\n\n\u003cbr /\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003c!-- Github Actions --\u003e\n  \u003ca href=\"https://github.com/nikoshet/rust-dms-cdc-operator/actions/workflows/ci.yaml?query=branch%3Amain\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/actions/workflow/status/nikoshet/rust-dms-cdc-operator/ci.yaml?branch=main\u0026style=flat-square\" alt=\"actions status\" /\u003e\u003c/a\u003e\n  \u003c!-- Version --\u003e\n  \u003ca href=\"https://crates.io/crates/dms-cdc-operator\"\u003e\n    \u003cimg src=\"https://img.shields.io/crates/v/dms-cdc-operator.svg?style=flat-square\"\n    alt=\"Crates.io version\" /\u003e\u003c/a\u003e\n  \u003c!-- Docs --\u003e\n  \u003ca href=\"https://docs.rs/dms-cdc-operator\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/docs-latest-blue.svg?style=flat-square\" alt=\"docs.rs docs\" /\u003e\u003c/a\u003e\n  \u003c!-- Downloads --\u003e\n  \u003ca href=\"https://crates.io/crates/dms-cdc-operator\"\u003e\n    \u003cimg src=\"https://img.shields.io/crates/d/dms-cdc-operator.svg?style=flat-square\" alt=\"Download\" /\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n## Overview\n\nThe `rust-dms-cdc-operator` is a Rust-based tool that compares tables in an Amazon RDS (PostgreSQL) database with data migrated to Amazon S3 using AWS Database Migration Service (DMS). It's particularly useful for ensuring data consistency between the RDS database and Parquet files in S3, especially with change data capture (CDC) updates, since DMS validation with S3 as target isn't supported yet.\n\n## Features\n\n- Import a snapshot of the CDC parquet data stored in AWS S3 with date-based folder partitioning in a locally deployed Postgres\n- Specify a specific time range to replicate the S3 state on a Postgres DB\n- Restore the RDS state from S3 in case of data loss\n- Compare the state of a specific table in an Amazon RDS database with the data stored in Parquet files in the S3 bucket\n- Identify differences at the row level by modifying the validated chunk size\n- Use it as a library so as to integrate it in your projects, or as a client so as to use it as a standalone tool \n\n\n## Prerequisites\n\n- Your source DB is a PostgreSQL\n- You have a running AWS DMS task in FULL LOAD (or FULL LOAD + CDC) Mode\n- The target of the task is AWS S3 with:\n    - Parquet formatted files\n    - date-based folder partitioning\n    - Additional column of `Op` injected by DMS\n\n\n## Installation (Client)\nIn order to use the tool as a client, you can use `cargo`.\n\nThe tool provides two features for running it, which are `Inquire` and `Clap`.\n\n### Using Clap\n```\nUsage: dms-cdc-operator-client validate [OPTIONS] --bucket-name \u003cBUCKET_NAME\u003e --s3-prefix \u003cS3_PREFIX\u003e --source-postgres-url \u003cSOURCE_POSTGRES_URL\u003e --target-postgres-url \u003cTARGET_POSTGRES_URL\u003e\n\nOptions:\n      --bucket-name \u003cBUCKET_NAME\u003e\n          S3 Bucket name where the CDC files are stored\n      --s3-prefix \u003cS3_PREFIX\u003e\n          S3 Prefix where the files are stored Example: data/landing/rds/mydb\n      --source-postgres-url \u003cSOURCE_POSTGRES_URL\u003e\n          Url of the database to validate the CDC files Example: postgres://postgres:postgres@localhost:5432/mydb\n      --target-postgres-url \u003cTARGET_POSTGRES_URL\u003e\n          Url of the target database to import the parquet files Example: postgres://postgres:postgres@localhost:5432/mydb\n      --database-schema \u003cDATABASE_SCHEMA\u003e\n          Schema of database to validate against S3 files [default: public]\n      --included-tables [\u003cINCLUDED_TABLES\u003e...]\n          List of tables to include for validatation against S3 files\n      --excluded-tables [\u003cEXCLUDED_TABLES\u003e...]\n          List of tables to exclude for validatation against S3 files\n      --mode \u003cMODE\u003e\n          Mode to load Parquet files Example: DateAware Example: AbsolutePath Example: FullLoadOnly [default: date-aware] [possible values: date-aware, absolute-path, full-load-only]\n      --start-date \u003cSTART_DATE\u003e\n          Start date to filter the Parquet files Example: 2024-02-14T10:00:00Z\n      --stop-date \u003cSTOP_DATE\u003e\n          Stop date to filter the Parquet files Example: 2024-02-14T10:00:00Z\n      --chunk-size \u003cCHUNK_SIZE\u003e\n          Datadiff chunk size [default: 1000]\n      --max-connections \u003cMAX_CONNECTIONS\u003e\n          Maximum connection pool size [default: 100]\n      --start-position \u003cSTART_POSITION\u003e\n          Datadiff start position [default: 0]\n      --only-datadiff\n          Run only the datadiff\n      --only-snapshot\n          Take only a snapshot from S3 to target DB\n      --accept-invalid-certs-first-db\n          Accept invalid TLS certificates for the first database\n      --accept-invalid-certs-second-db\n          Accept invalid TLS certificates for the second database\n  -h, --help\n          Print help\n  -V, --version\n          Print version\n\n```\n\n### Using Inquire\n```shell\nrust-cdc-validator --features=\"with-inquire\"\n```\n\n## Installation (Library)\nIn order to use the tool as a library, you can run:\n```\ncargo add rust-cdc-validator\n```\n\n\n## Example\n\n- Spin up the local Postgres DB through Docker Compose:\n```shell\ndocker-compose up\n\npsql -h localhost -p 5438 -U postgres -d mydb\n```\n\n- Build and run the Rust tool\n```shell\ncargo fmt --all\ncargo clippy --all\n\ncargo build\n\nRUST_LOG=dms_cdc_operator=info,rust_pgdatadiff=info cargo run --features=\"with-clap\" validate --bucket-name my-bucket --s3-prefix prefix/path --source-postgres-url postgres://postgres:postgres@localhost:5432/mydb1 --target-postgres-url postgres://postgres:postgres@localhost:5438/mydb --database-schema public --included-tables mytable --start-date 2024-02-14T10:00:00Z --chunk-size 100\n```\n\nFor more debugging, you can enable Rust related logs by exporting the following:\n```\nexport RUST_LOG=dms_cdc_operator=debug,rust_pgdatadiff=debug\n```\n\n\n## License\nThis project is licensed under the MIT License \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikoshet%2Frust-dms-cdc-operator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnikoshet%2Frust-dms-cdc-operator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikoshet%2Frust-dms-cdc-operator/lists"}