{"id":27657115,"url":"https://github.com/sendbird/sb-osc","last_synced_at":"2025-06-27T18:34:47.588Z","repository":{"id":239512229,"uuid":"781761928","full_name":"sendbird/sb-osc","owner":"sendbird","description":"Sendbird's Online Schema Change Tool for Aurora MySQL","archived":false,"fork":false,"pushed_at":"2025-01-21T18:34:44.000Z","size":185,"stargazers_count":43,"open_issues_count":0,"forks_count":3,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-24T06:54:27.500Z","etag":null,"topics":["aurora","aws","mysql","schema-migrations"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sendbird.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-04T01:36:36.000Z","updated_at":"2025-01-21T18:34:33.000Z","dependencies_parsed_at":"2024-05-20T06:23:44.871Z","dependency_job_id":"33830523-0038-4569-b6fa-42042f8d0dd8","html_url":"https://github.com/sendbird/sb-osc","commit_stats":null,"previous_names":["sendbird/sb-osc"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sendbird%2Fsb-osc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sendbird%2Fsb-osc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sendbird%2Fsb-osc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sendbird%2Fsb-osc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sendbird","download_url":"https://codeload.github.com/sendbird/sb-osc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250580704,"owners_count":21453531,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aurora","aws","mysql","schema-migrations"],"created_at":"2025-04-24T06:54:32.588Z","updated_at":"2025-04-24T06:54:33.028Z","avatar_url":"https://github.com/sendbird.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SB-OSC\n\n**Sendbird's online schema migration for Aurora MySQL**\n\nSB-OSC is an online schema change tool for Aurora MySQL databases, designed to dramatically improve performance on large\ntables by leveraging multithreading in all stages of schema migration process.\n\nIt also provides seamless pausing and resuming of tasks to adeptly handle extended operation times of large table schema\nchanges, along with a built-in monitoring system to dynamically control its heavy DML load based on Aurora's performance\nmetrics.\n\nSB-OSC is designed to overcome the limitations that existing migration tools face with large-scale tables,\nsignificantly reducing the operational overhead associated with managing large tables.\n\nPlease visit our [blog](https://sendbird.com/blog/sb-osc-sendbird-online-schema-change) for more information.\n\n## Takeaways\n\nSB-OSC has its own unique features that differentiate it from existing schema migration tools such as `pt-osc` and `gh-ost`.\n\n### Multithreading\n\nSB-OSC is designed to leverage multithreading in all stages of the schema migration process, bulk import (initial table\ncopy), binlog event processing, and DML event application.\n\nFor binlog event processing, SB-OSC processes binlog files in parallel, which enables it to handle large tables with\nheavy write loads.\n\n### Resumable\n\nSB-OSC is resumable at any stage of the schema migration process. It saves the current state of each stage to database\nand Redis, allowing users to pause and resume the process at any time, as log as binlog retention is sufficient.\n\n### Operation Class\n\nSB-OSC supports operation classes that can override main queries used in the schema migration process. This feature\nallows users to customize queries for specific tables such as data retention, table redesign, and more.\n\nAlso, it provides operation class that allows replication cross different Aurora clusters which can be used in various\nscenarios such as cross-region replication, cross-account replication, clone cluster replication, etc.\n\n[Guide for operation class](doc/operation-class.md)\n\n### Data Validation\n\nSB-OSC provides strong data validation features to ensure data consistency between the source and destination tables. It\nvalidates both the bulk import and DML event application stages, and attempts to recover from any inconsistencies.\n\n### Index Creation Strategy\n\nSB-OSC allows users to create indexes after the bulk import stage, which can significantly reduce the time required for\nthe initial table copy. This feature is especially useful for large tables with many indexes.\n\n### Monitoring\n\nSB-OSC has a built-in monitoring system that dynamically controls its heavy DML load based on Aurora's performance\nmetrics. This feature makes SB-OSC more reliable on production environments, since it will automatically adjust its DML\nload when production traffic increases.\n\n## Requirements\n\nSB-OSC is designed to work with Aurora MySQL database. It's a containerized application that can be run on both Kubernetes and Docker environments.\n\nSB-OSC accepts `ROW` for binlog format. It is recommended to set `binlog-ignore-db` to `sbosc` to prevent SB-OSC from\nprocessing its own binlog events.\n\n- `binlog_format` set to `ROW`\n- `binlog-ignore-db` set to `sbosc` (Recommended)\n\nDetailed requirements and setup instructions can be found in the [deployment guide](deploy/README.md).\n\n## Performance\n\nSB-OSC shows high performance on both binlog event processing and bulk import. Following are specs of tables used for\nperformance testing:\n\n| Table Alias | Avg Row Length (Bytes) | Write IOPS (IOPS/m) |\n|:-----------:|-----------------------:|--------------------:|\n|      A      |                     57 |                \t149 |\n|      B      |                    912 |                \t502 |\n|      C      |                     61 |              3.38 K |\n|      D      |                    647 |              17.9 K |\n|      E      |                   1042 |              24.4 K |\n|      F      |                     86 |               151 K |\n|      G      |                   1211 |              60.7 K |\n\n**Avg Row Length**: `avg_row_length` from `information_schema.TABLES`  \n**Write IOPS**: Average increase of `count_write` from `performance_schema.table_io_waits_summary_by_table` per\nminute.\n\nAll tables were in the same Aurora MySQL v3 cluster\n\n### Binlog Event Processing\n\nFollowing are read throughput of binlog event processing in read bytes per minute. By comparing read throughput to total\nbinlog creation rate of the cluster, we can see whether SB-OSC can catch up DML events or not.\n\n**Total Binlog Creation Rate**: 144 (MB/m)\n\n|      Table Alias       |  A  |  B  |  C  |  D  |  E  |  F  |  G  |\n|:----------------------:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n| Read Throughput (MB/m) | 513 | 589 | 591 | 402 | 466 | 361 | 305 |\n\nResult shows that SB-OSC can catch up DML events on tables with very high write load.\n\n### Bulk Import\n\nTo provide general insight on bulk import performance, the test was conducted on table `A` with no secondary indexes,\nand no additional traffic.\n\nActual performance of bulk import can vary depending on the number of secondary indexes, the number of rows, column\ntypes,\nproduction traffic, etc.\n\nFollowing are the results of bulk import performance based on instance sizes:\n\n| Instance Type | Insert Rate (rows/s) | Network Throughput (Bytes/s) | Storage Throughput (Bytes/s) | CPU Utilization (%) |\n|:-------------:|---------------------:|-----------------------------:|-----------------------------:|--------------------:|\n|  r6g.2xlarge  |               42.3 K |                       27.2 K |                        457 M |                55.0 |\n|  r6g.4xlarge  |               94.0 K |                       45.9 K |                        900 M |                51.9 |\n|  r6g.8xlarge  |                158 K |                       72.2 K |                       1.39 G |                44.6 |\n\nInsert rate, network throughput, and storage throughput are the average values calculated from CloudWatch metrics.\n\n### Comparison with gh-ost\n\nWe've compared total migration time of SB-OSC and gh-ost on following conditions:\n\n- Table `C` with ~200M rows\n- Aurora MySQL v3 cluster, r6g.8xlarge instance\n- 2 secondary indexes\n- `batch_size` (`chunk-size` for gh-ost): 50000\n- (gh-ost) `--allow-on-master`\n\n**w/o traffic**\n\n|  Tool  | Total Migration Time | CPU Utilization (%) |\n|:------:|---------------------:|--------------------:|\n| SB-OSC |                  22m |                60.6 |\n| gh-ost |               1h 52m |                19.7 |\n\n**w/ traffic**\n\nTraffic was generated only to table `C` during the migration. (~1.0K inserts/s, ~0.33K updates/s, ~0.33K deletes/s)\n\n|  Tool  | Total Migration Time | CPU Utilization (%) |\n|:------:|---------------------:|--------------------:|\n| SB-OSC |                  27m |                62.7 |\n| gh-ost |                  1d+ |                27.4 |\n\nFor gh-ost, we interrupted the migration at 50% (~12h) since ETA kept increasing.\n\n## Limitations\n\n- **Necessity of Integer Primary Keys**\n  SB-OSC performs multithreading based on integer primary keys (PKs) during the bulk import phase. This approach,\n  designed around batch processing and other operations utilizing integer PKs, means SB-OSC cannot be used with tables\n  that do not have integer PKs.\n\n\n- **Updates on Primary Key**\n  SB-OSC replicates records from the original table based on the PK for applying DML events. Therefore, if updates occur\n  on the table's PK, it can be challenging to guarantee data integrity.\n\n\n- **Binlog Resolution**\n  SB-OSC is limited by the fact that binlog's resolution is in seconds. While this doesn't significantly impact most\n  scenarios due to SB-OSC's design, it can affect the logic based on timestamps when excessive events occur within a\n  second.\n\n\n- **Reduced Efficiency for Small Tables**\n  For small tables, the initial table creation, chunk creation, and the multi-stage process of SB-OSC can act as\n  overhead, potentially slowing down the overall speed. Therefore, applying SB-OSC to small tables may not be as\n  effective.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsendbird%2Fsb-osc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsendbird%2Fsb-osc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsendbird%2Fsb-osc/lists"}