{"id":28455692,"url":"https://github.com/questdb/questdb_change_tracker","last_synced_at":"2025-10-23T19:24:17.353Z","repository":{"id":250696090,"uuid":"835181228","full_name":"questdb/questdb_change_tracker","owner":"questdb","description":"Change Tracker (pseudo CDC) for QuestDB tables","archived":false,"fork":false,"pushed_at":"2024-08-05T11:07:46.000Z","size":23,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-06T22:11:16.825Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/questdb.png","metadata":{"files":{"readme":"README.md","changelog":"change_tracker.py","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-29T10:20:29.000Z","updated_at":"2024-11-13T10:35:28.000Z","dependencies_parsed_at":"2024-08-05T09:18:27.178Z","dependency_job_id":"6bb8ba44-bf8c-4cc1-b0d8-1f7edce39482","html_url":"https://github.com/questdb/questdb_change_tracker","commit_stats":null,"previous_names":["javier/questdb_change_tracker","questdb/questdb_change_tracker"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/questdb/questdb_change_tracker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fquestdb_change_tracker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fquestdb_change_tracker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fquestdb_change_tracker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fquestdb_change_tracker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/questdb","download_url":"https://codeload.github.com/questdb/questdb_change_tracker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fquestdb_change_tracker/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262177856,"owners_count":23270948,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-06T22:10:23.746Z","updated_at":"2025-10-23T19:24:17.306Z","avatar_url":"https://github.com/questdb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# QuestDB Change Tracker\n\nQuestDB Change Tracker is a sample, non-production ready project that demonstrates different strategies for tracking changes in QuestDB. This repository includes three Python scripts, two of which leverage beta features for fine-grained data ingestion tracking, while the third script tracks append-only changes with limited tolerance for out-of-order or late data.\n\n## Introduction\n\nIn today's data-driven world, tracking changes in your database is important for various applications like machine learning updates, real-time data integrations, and continuous table materialization. This project demonstrates three strategies for tracking changes in QuestDB:\n\n1. **Change Tracker Script**: Uses beta features for detailed tracking of data ingestion.\n2. **Materialize View Script**: Uses beta features for creating materialized views based on row thresholds.\n3. **Materialize Append-Only Script**: Does not use beta features and tracks append-only changes with limited tolerance for out-of-order or late data.\n\n\n## Features\n- Monitors specified QuestDB tables for changes.\n- Aggregates data from new transactions based on specified columns.\n- Detects and reports structural changes in the table.\n- Outputs aggregated results to stdout.\n- Inserts data into a materialized view based on a user-defined SQL template.\n- Tracks processing status using a specified QuestDB table to ensure continuity between script runs.\n\n## Installation\n1. Clone the repository:\n    ```sh\n    git clone https://github.com/javier/questdb_change_tracker.git\n    cd questdb_change_tracker\n    ```\n\n2. Create and activate a virtual environment (optional but recommended):\n    ```sh\n    python -m venv venv\n    source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n    ```\n\n3. Install the required dependencies:\n    ```sh\n    pip install psycopg2\n    ```\n\n## Change Tracker Script\n\n### Usage\n```sh\npython change_tracker.py --table_name \u003ctable_name\u003e --columns \u003ccolumns\u003e [--row_threshold \u003crow_threshold\u003e] [--check_interval \u003ccheck_interval\u003e] [--timestamp_column \u003ctimestamp_column\u003e] [--tracking_table \u003ctracking_table\u003e] [--tracking_id \u003ctracking_id\u003e] [--dbname \u003cdbname\u003e] [--user \u003cuser\u003e] [--host \u003chost\u003e] [--port \u003cport\u003e] [--password \u003cpassword\u003e]\n```\n\n### Parameters\n- `--table_name`: The name of the table to monitor (required).\n- `--columns`: Comma-separated list of columns to aggregate (required).\n- `--row_threshold`: The number of rows to trigger aggregation (default: 1000).\n- `--check_interval`: The interval (in seconds) to check for new transactions (default: 30).\n- `--timestamp_column`: The name of the timestamp column (default: 'timestamp').\n- `--tracking_table`: The name of the tracking table (optional).\n- `--tracking_id`: The tracking ID for this run (optional).\n- `--dbname`: The name of the database (default: 'qdb').\n- `--user`: The database user (default: 'admin').\n- `--host`: The database host (default: '127.0.0.1').\n- `--port`: The database port (default: 8812).\n- `--password`: The database password (default: 'quest').\n\n\n### Output\nThe script provides the following output:\n- Initial transaction ID and structure version.\n- Notifications of structure version changes.\n- Aggregated results including transaction IDs, total rows, and specified column statistics (first, last, min, max, avg).\n\n\n### Example Command Line\n```sh\npython change_tracker.py --table_name smart_meters --columns frequency,voltage --row_threshold 100 --check_interval 30 --timestamp_column timestamp --tracking_table materialize_tracker --tracking_id meter_logger\n```\n\n### Example Output\n```\nStarting from transaction ID: 125 with structure version: 1\nStructure version changed from 1 to 2 on transaction 127\nAggregated results from 2024-07-29 11:03:03.102658 to 2024-07-29 11:03:33.002031:\nIncluded Transactions: 126 to 129\nTotal Rows: 300\nfrequency_first, voltage_first, frequency_last, voltage_last, frequency_min, voltage_min, frequency_max, voltage_max, frequency_avg, voltage_avg\n50, 60, 50, 60, 54.6, 216.50205993652344, 132.5439910888672, 110.09369659423828, 239.7596435546875, 176.34308303833006\n```\n\n## Materialize View Script\n\nThe Materialize View script allows monitoring one or more tables, and when the specified row thresholds are met, it executes a SQL template provided by the user. The script replaces the `{timestamp_txn_filter}` placeholder in the SQL template with the appropriate timestamp filters based on the transactions. Note that this script is not production-ready as the state is reset upon restart.\n\n### Usage\n```sh\npython materialize_view.py --table_names \u003ctable_names\u003e --thresholds \u003cthresholds\u003e --sql_template_path \u003csql_template_path\u003e [--check_interval \u003ccheck_interval\u003e] --timestamp_columns \u003ctimestamp_columns\u003e [--tracking_table \u003ctracking_table\u003e] [--tracking_id \u003ctracking_id\u003e] [--dbname \u003cdbname\u003e] [--user \u003cuser\u003e] [--host \u003chost\u003e] [--port \u003cport\u003e] [--password \u003cpassword\u003e]\n```\n\n### Parameters\n- `--table_names`: Comma-separated list of table names to monitor (required).\n- `--thresholds`: Comma-separated list of row thresholds corresponding to each table (required).\n- `--sql_template_path`: Path to the file containing the SQL template (required).\n- `--check_interval`: The interval (in seconds) to check for new transactions (default: 30).\n- `--timestamp_columns`: Comma-separated list of timestamp columns corresponding to each table (format: `table_name.column_name`) (required).\n- `--tracking_table`: The name of the tracking table (optional).\n- `--tracking_id`: The tracking ID for this run (optional).\n- `--dbname`: The name of the database (default: 'qdb').\n- `--user`: The database user (default: 'admin').\n- `--host`: The database host (default: '127.0.0.1').\n- `--port`: The database port (default: 8812).\n- `--password`: The database password (default: 'quest').\n\n### SQL Template Example\n```sql\nINSERT INTO sampled_meters(\n  timestamp, device_id, mark_model, \n  first_status, last_status, frequency, energy_consumption, voltage, current, power_factor,\n  price\n  )\nSELECT smart_meters.timestamp, device_id, mark_model, \n      first(status), last(status), \n      avg(frequency), avg(energy_consumption), avg(voltage), avg(current), \n      avg(power_factor), avg(price)\nFROM smart_meters ASOF JOIN trades \nWHERE {timestamp_txn_filter} \nSAMPLE BY 10m; \n```\n\n### Example Command Line\n```bash\npython materialize_view.py --table_names smart_meters,trades --thresholds 100,50 --sql_template_path materialize.sql --check_interval 5 --timestamp_columns smart_meters.timestamp,trades.timestamp --tracking_table materialize_tracker --tracking_id meters_and_trades\n```\n\n### Example Output\n```\npython materialize_view.py --table_names smart_meters,trades --thresholds 100,50 --sql_template_path materialize.sql --check_interval 5 --timestamp_columns smart_meters.timestamp,trades.timestamp --tracking_table materialize_tracker --tracking_id meters_and_trades\n\nStarting from transaction ID: 308 with structure version: 3 for table smart_meters\nStarting from transaction ID: 3728 with structure version: 0 for table trades\nExecuted query:\nINSERT INTO sampled_meters(\n  timestamp, device_id, mark_model,\n  first_status, last_status, frequency, energy_consumption, voltage, current, power_factor,\n  price\n  )\nSELECT smart_meters.timestamp, device_id, mark_model,\n      first(status), last(status),\n      avg(frequency), avg(energy_consumption), avg(voltage), avg(current),\n      avg(power_factor), avg(price)\nFROM smart_meters ASOF JOIN trades\nWHERE smart_meters.timestamp \u003e= '2024-07-29 14:51:34.144738' AND smart_meters.timestamp \u003c= '2024-07-29 14:52:04.044696' AND trades.timestamp \u003e= '2024-07-29 14:51:37.107452' AND trades.timestamp \u003c= '2024-07-29 14:52:06.804897'\nSAMPLE BY 10m;\n\nExecuted query:\nINSERT INTO sampled_meters(\n  timestamp, device_id, mark_model,\n  first_status, last_status, frequency, energy_consumption, voltage, current, power_factor,\n  price\n  )\nSELECT smart_meters.timestamp, device_id, mark_model,\n      first(status), last(status),\n      avg(frequency), avg(energy_consumption), avg(voltage), avg(current),\n      avg(power_factor), avg(price)\nFROM smart_meters ASOF JOIN trades\nWHERE smart_meters.timestamp \u003e= '2024-07-29 14:52:04.142463' AND smart_meters.timestamp \u003c= '2024-07-29 14:52:23.047161' AND trades.timestamp \u003e= '2024-07-29 14:52:06.905866' AND trades.timestamp \u003c= '2024-07-29 14:52:36.885151'\nSAMPLE BY 10m;\n```\n\n## Materialize Append-Only Script\n\nThe Materialize Append-Only script monitors one or more tables for new transactions and triggers a materialize query after a specified number of transactions. This script is designed for append-only tables with limited tolerance for out-of-order or late data. The query uses a time window based on the earliest transaction timestamp in the batch, minus a lookback period, to capture the relevant data.\n\n### Usage\n```sh\npython materialize_append_only.py --table_names \u003ctable_names\u003e --transaction_threshold \u003ctransaction_threshold\u003e --sql_template_path \u003csql_template_path\u003e [--check_interval \u003ccheck_interval\u003e] --timestamp_columns \u003ctimestamp_columns\u003e [--lookback_seconds \u003clookback_seconds\u003e] [--tracking_table \u003ctracking_table\u003e] [--tracking_id \u003ctracking_id\u003e] [--dbname \u003cdbname\u003e] [--user \u003cuser\u003e] [--host \u003chost\u003e] [--port \u003cport\u003e] [--password \u003cpassword\u003e]\n```\n\n### Parameters\n- `--table_names`: Comma-separated list of table names to monitor (required).\n- `--transaction_threshold`: Number of transactions to trigger the materialize query (required).\n- `--sql_template_path`: Path to the file containing the SQL template (required).\n- `--check_interval`: The interval (in seconds) to check for new transactions (default: 30).\n- `--timestamp_columns`: Comma-separated list of timestamp columns corresponding to each table (format: table_name.column_name) (required).\n- `--lookback_seconds`: Number of seconds to look back from the earliest transaction timestamp in the batch (default: 15).\n- `--tracking_table`: Name of the tracking table to keep track of processed transactions.\n- `--tracking_id`: Tracking ID for this run.\n- `--dbname`: The name of the database (default: 'qdb').\n- `--user`: The database user (default: 'admin').\n- `--host`: The database host (default: '127.0.0.1').\n- `--port`: The database port (default: 8812).\n- `--password`: The database password (default: 'quest').\n\n### SQL Template Example\n```sql\nINSERT INTO sampled_meters(\n  timestamp, device_id, mark_model, \n  first_status, last_status, frequency, energy_consumption, voltage, current, power_factor,\n  price\n  )\nSELECT smart_meters.timestamp, device_id, mark_model, \n      first(status), last(status), \n      avg(frequency), avg(energy_consumption), avg(voltage), avg(current), \n      avg(power_factor), avg(price)\nFROM smart_meters ASOF JOIN trades \nWHERE {timestamp_txn_filter} \nSAMPLE BY 10m; \n```\n\n### Example Command Line\n```sh\npython materialize_append_only.py --table_names smart_meters,trades --transaction_threshold 10 --sql_template_path materialize.sql --check_interval 30 --timestamp_columns smart_meters.timestamp,trades.timestamp --lookback_seconds 15 --tracking_table materialize_tracker --tracking_id meters_and_trades\n```\n\n### Example Output\n```sh\nStarting from transaction ID: None for table smart_meters\nStarting from transaction ID: None for table trades\nInitialized starting transaction ID: 346 for table smart_meters\nInitialized starting transaction ID: 4827 for table trades\nExecuted query:\nINSERT INTO sampled_meters(\n  timestamp, device_id, mark_model,\n  first_status, last_status, frequency, energy_consumption, voltage, current, power_factor,\n  price\n  )\nSELECT smart_meters.timestamp, device_id, mark_model,\n      first(status), last(status),\n      avg(frequency), avg(energy_consumption), avg(voltage), avg(current),\n      avg(power_factor), avg(price)\nFROM smart_meters ASOF JOIN trades\nWHERE smart_meters.timestamp \u003e= dateadd('s', -15, '2024-07-30 08:57:14.338505') AND trades.timestamp \u003e= dateadd('s', -15, '2024-07-30 08:57:14.338505')\nSAMPLE BY 10m;\n\nExecuted query:\nINSERT INTO sampled_meters(\n  timestamp, device_id, mark_model,\n  first_status, last_status, frequency, energy_consumption, voltage, current, power_factor,\n  price\n  )\nSELECT smart_meters.timestamp, device_id, mark_model,\n      first(status), last(status),\n      avg(frequency), avg(energy_consumption), avg(voltage), avg(current),\n      avg(power_factor), avg(price)\nFROM smart_meters ASOF JOIN trades\nWHERE smart_meters.timestamp \u003e= dateadd('s', -15, '2024-07-30 09:04:18.312326') AND trades.timestamp \u003e= dateadd('s', -15, '2024-07-30 09:04:18.312326')\nSAMPLE BY 10m;\n```\n\n## License\nThis project is licensed under the Apache License 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquestdb%2Fquestdb_change_tracker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquestdb%2Fquestdb_change_tracker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquestdb%2Fquestdb_change_tracker/lists"}