{"id":24087930,"url":"https://github.com/tinybirdco/mongodb-cdc-workshop","last_synced_at":"2026-06-14T15:32:32.355Z","repository":{"id":257501935,"uuid":"857053970","full_name":"tinybirdco/mongodb-cdc-workshop","owner":"tinybirdco","description":null,"archived":false,"fork":false,"pushed_at":"2024-09-17T14:24:30.000Z","size":438,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-11-21T15:03:51.145Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tinybirdco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-13T17:56:09.000Z","updated_at":"2024-09-17T14:24:35.000Z","dependencies_parsed_at":"2024-09-17T04:29:23.570Z","dependency_job_id":"33f3972d-d00b-497c-8453-e90532c93b54","html_url":"https://github.com/tinybirdco/mongodb-cdc-workshop","commit_stats":null,"previous_names":["tinybirdco/mongodb-cdc-workshop"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tinybirdco/mongodb-cdc-workshop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fmongodb-cdc-workshop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fmongodb-cdc-workshop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fmongodb-cdc-workshop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fmongodb-cdc-workshop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tinybirdco","download_url":"https://codeload.github.com/tinybirdco/mongodb-cdc-workshop/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fmongodb-cdc-workshop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34326233,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-10T03:56:12.954Z","updated_at":"2026-06-14T15:32:32.339Z","avatar_url":"https://github.com/tinybirdco.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tinybird Workshop on ingesting MongoDB CDC events\n\nThis repository is a companion piece to the 'MongoDB CDC' workshop. The intended audience of this workshop are folks who have MongoDB data and are interested in streaming that data to Tinybird.\n\nIn this workshop we start with a MongoDB Atlas instance with a `weather-reports` collection in a `weather-data` database. We then configure and deploy an instance of the [Confluent MongoDB Atlas Source Connector](https://docs.confluent.io/cloud/current/connectors/cc-mongo-db-source.html), and then stream that data into Tinybird using its native Confluent Stream connector. \n\nHere is a look at what we are building:\n\n![Diagram](images/diagram.png)\n\n## Workshop topics\n\n* Publishing MongoDB data to a Kafka stream:\n  * Tour live MongoDB collection on Atlas.\n  * Confluent MongoDB Atlas Source Connector\n* Consume Kafka Topic in Tinybird.\n* Manage duplicate data.\n* Working with nested and varying JSON documents\n* Build API endpoints that publish MongoDB data.\n\nThe `tinybird` folder contains:\n* Data Source definitions.\n* Pipe and Node definitions.\n* Example Tinybird Playgrounds.\n\n## Resources\n* [A practical guide to real-time CDC with MongoDB](https://www.tinybird.co/blog-posts/mongodb-cdc)\n* [Lambda CDC processing with Tinybird](https://www.tinybird.co/docs/guides/querying-data/lambda-example-cdc)\n\n## Session JSON objects\n\nWhen working with nested JSON, there are two JSON documents ingested:\n\n### Weather report objects\n![JSON](images/report-object.png)\n\n### Alert objects\n![JSON](images/alert-object.png)\n\n## Landing Data Sources \n\n### `mongo_cdc_events` Data Source\n\n#### `insert` events\n\nMongoDB CDC events land in a `mondo_cdc_events` Data Source. If the event has `operationType = insert`, there is a `fullDocument` JSON object that contains the weather report. \n\nReport attributes are parsed with the `JSONExtract` functions:\n    * JSONExtractString(payload, 'description') AS description\n    * JSONExtractFloat(payload, 'temp_f') AS temp_f\n    * JSONExtractInt(payload, 'humidity') AS humidity\n\n```bash\nDESCRIPTION \u003e\n    CDC Events coming from a MongoDB instance.\n\nSCHEMA \u003e\n    `_id__data` String `json:$._id._data`,\n    `clusterTime` Int64 `json:$.clusterTime`,\n    `documentKey__id` String `json:$.documentKey._id`,\n    `fullDocument__id` String `json:$.fullDocument._id`,\n    `site_name` String `json:$.fullDocument.site_name`,\n    `timestamp` DateTime `json:$.fullDocument.timestamp`,\n    `payload` String `json:$.fullDocument`,\n    `ns_coll` String `json:$.ns.coll`,\n    `ns_db` String `json:$.ns.db`,\n    `operationType` String `json:$.operationType`,\n    `wallTime` Int64 `json:$.wallTime`\n\nENGINE \"MergeTree\"\nENGINE_PARTITION_KEY \"toYYYYMM(timestamp)\"\nENGINE_SORTING_KEY \"timestamp, site_name\"\n```\n\n#### `delete` events\nIf the event has `operationType = delete`, there is no `fullDocument` JSON object. Since that violates the schema specification, the event is quarantined and written to a `mongo_cdc_events_quarantine` table. That table is used to materizlize `delete` events into a `deletes_mv` Data Source. \n\nThen the `deletes_mv` Data Source is referenced when applying deletes to create the `weather_reports_mv` Data Source:\n\n```sql\nSELECT * \nFROM mongo_cdc_events\nWHERE documentKey__id NOT IN (\n    SELECT documentKey__id \n    FROM deletes_mv\n)\n```\n\n\n\n### `nested_json` Data Source\n\nThe `report` and `alert` objects are ingested with the following Data Source schema. Here the common `message` JSON attribute, that contains the different JSON structures, is assigned to a `payload` string. From there, the `JSONExtract` set of functions are used to parse and access the JSON attributes in the `message` sections. \n\n```bash\nSCHEMA \u003e\n    `event_type` String `json:$.event_type`,\n    `timestamp` DateTime `json:$.timestamp`,\n    `site_name` String `json:$.site_name`,\n    `payload` String `json:$.message`\n\nENGINE \"MergeTree\"\nENGINE_PARTITION_KEY \"toYear(timestamp)\"\nENGINE_SORTING_KEY \"event_type, timestamp, site_name\"\n```\n\n\n\n\n## Demo components\n\nThis workshop starts with a MongoDB Atlas database with a `weather-reports` collection. \n\n![MongoDB Atlas](images/mongodb-atlas.png)\n\nThe Confluent MongoDB Atlas Source Connector is used to publish CDC events onto a Kafka stream.\n\n![Confluent Connector](images/confluent-connector.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Fmongodb-cdc-workshop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftinybirdco%2Fmongodb-cdc-workshop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Fmongodb-cdc-workshop/lists"}