{"id":13647544,"url":"https://github.com/pinterest/mysql_utils","last_synced_at":"2025-04-22T02:32:01.927Z","repository":{"id":66170234,"uuid":"44876935","full_name":"pinterest/mysql_utils","owner":"pinterest","description":"Pinterest MySQL Management Tools","archived":true,"fork":false,"pushed_at":"2019-06-25T19:50:26.000Z","size":319,"stargazers_count":884,"open_issues_count":1,"forks_count":142,"subscribers_count":71,"default_branch":"master","last_synced_at":"2024-11-09T21:36:59.655Z","etag":null,"topics":["mysql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pinterest.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2015-10-24T17:33:19.000Z","updated_at":"2024-11-04T15:35:32.000Z","dependencies_parsed_at":"2023-04-06T08:36:31.820Z","dependency_job_id":null,"html_url":"https://github.com/pinterest/mysql_utils","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pinterest%2Fmysql_utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pinterest%2Fmysql_utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pinterest%2Fmysql_utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pinterest%2Fmysql_utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pinterest","download_url":"https://codeload.github.com/pinterest/mysql_utils/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250163718,"owners_count":21385296,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mysql"],"created_at":"2024-08-02T01:03:38.075Z","updated_at":"2025-04-22T02:32:01.470Z","avatar_url":"https://github.com/pinterest.png","language":"Python","readme":"**Note:** This project is no longer actively maintained by Pinterest.\n\n---\n\n# Pinterest MySQL Management Tools\n\n## NOTE: THESE TOOLS WILL NOT JUST WORK!!!\nYou will need to add a fair amount of glue in order to make these tools work:\n  - **Service discovery**\n  - **A CMDB**\n  - **A bunch of company specific information in the environment specifics lib**\n\nIt is hoped that this code will be useful to you as an example of working\nimplementations of DB tools.\n\n## Basics of MySQL at Pinterest\n\nPinterest has historically used MySQL to store some of our most important data:\n  - **Pins**\n  - **Boards**\n  - **Image metadata**\n  - **Usernames and passwords**\n\nRecently, we have added additional use cases:\n  - **Pinlater**\nThanks in part to kernel optimizations, MySQL is now replacing Redis and\nbecoming the only supported backend for our asynchronous job execution engine.\n  - **Zen**\nMySQL has joined HBase as a supported backend for our Graph Storage Engine.\n\nIn all of the environments that MySQL supports at Pinterest, the environment is\nidentical from an administrative perspective:\n  - **A single master with one or two slaves**\nHistorically MySQL was used with multiple writable instances in a replica set.\nThis topology is error prone and has been simplified to a single master with\none or more slaves.\n  - **Zookeeper provides service discovery**\nThe contract between the administrative tools and the MySQL applications is\nZookeeper.  With few exceptions, Zookeeper provides clients with database\nhostnames, usernames and passwords.\n\n## The Lifecycle of a MySQL Instance in the Cloud\n\nMySQL servers at Pinterest are launched, live, and die with only the rarest of \nconfiguration changes. Upgrading kernels, MySQL versions, and any other changes\nthat would require a restart of the database are never done in-place.  Instead,\nthese actions are always performed through server replacements and \nfailovers/slave promotions as needed. This choice has greatly simplified our\nautomation by removing the need to manage intermediate state. \n\nOne of our most important scripts is launch_replacement_db_host.py. In the\nsimplest case, the only required argument to launch_replacement_db_host.py is \nthe hostname of a failed slave. The existing instance is examined, all required\nparameters for a new server are computed, and then the new server is launched.\nFor other changes, such as MySQL upgrades, hardware upgrades/downgrades, and\ndatacenter migrations, there are optional arguments.\n\nAfter the new server has booted and received its initial base configuration\nfrom our provisioning system, a cron job will notice that the data directory is\nempty and run mysql_restore.py. Based on service discovery, this \nscript will attempt to find a database backup, restore it, setup replication,\nand then add the new MySQL instance to service discovery.  Like \nlaunch_replacement_db_host.py,  mysql_restore.py accepts many \noptional arguments for non-standard uses.\n\nIf a MySQL master server requires replacement, the mysql_failover.py script\nmust be run to promote the primary slaves to master. This script deals with\neither either living or dead initial masters, modifies MySQL replication\ntopology, and then updates service discovery.\n\nAfter a server has been removed from service discovery, it will be subject to a\nretirement queue system. This system has several steps that lead to eventual\ntermination of a server:\n  - Based on service discovery, servers that are not in use will have several\nstatus counters reset.\n  - After a day, the servers will be inspected to see if any activity has\ncaused the status counters to increment. If the counters have incremented, the\nretirement process is aborted. If the counters have not incremented, the MySQL\ninstance is sent a shutdown command.\n  - After another day, the server is subject to termination if its database has\nnot been restarted.\n\n## Utilities\n\nA list of included utilities:\n  - **archive_mysql_binlogs.py**\nThis script backs up MySQL replication logs in order to be able to perform\npoint-in-time recoveries in the case where all servers in a replica set are\nlost. All logs up to the current log being written to are uploaded to S3.\n  - **backup_tester.py**\nReplace some number of replica servers in order to test backups.\n  - **binlog_rotator.py**\nIf the current binlog has been in use longer than a predefined limit, rotate\nit.\n  - **check_mysql_replication.py**\nThis script displays replication status of an instance in terms of sql/io\nthread status and bytes behind, a computed seconds behind master based on\npt-heartbeat. If a watch argument is supplied, normal output is suppess and\ninstead only a computed seconds behind master is displayed along with a\nguestimate for replication catchup.\n  - **find_shard_mismatches.py**\nThis script examines production servers and find any incorrectly located\nshards.\n  - **find_unused_db_servers.py**\nThis script finds unused servers based on service discovery and optionally add\nthe instances to the retirement queue.\n  - **fix_orphaned_shards.py**\nThis script picks up where find_shard_mismatches.py left off and rename and then\neventually drop the unused shards.\n  - **get_recent_checksums.py**\nUse data populated by mysql_checksum.py to display current replication\nconsistency data.\n  - **kill_backups.py**\nKill any running backups. This is run by cron on master servers.\n  - **launch_amazon_mysql_server.py**\nThis script is generally called by launch_replacement_db_host.py and provides\nan easy interface to correctly launch a new server in aws.\n  - **launch_replacement_db_host.py**\nThis script accepts a hostname of an existing replica and then pulls a variety\nof data from our cmdb and attempts to launch a server to replace the supplied\ninstance with a similar configuration. Optional additional arguments can\noverride the configuration in a variety of ways.\n  - **modify_mysql_zk**\nThis script will modify our service discovery system in a variety of ways.\n  - **mysql_backup_status.py**\nThis script checks that backups have run across all replica sets and optionally\ndisplay the created backup files. This script only checks xtrabackup backups\n  - **mysql_backup.py**\nThis script is the entry point for backups for MySQL. It can perform logical\nand xtrabackup backups.\n  - **mysql_backup_csv.py**\nThis script backups up data to S3 in CSV format in a manner that can be\nqueried by Hive like systems. It is **very** much multiprocess and\nmultithreaded and doubles as a stress testing utility.\n  - **mysql_backup_logical.py**\nThis script is basically shorthand for running a logical backup through\nmysql_backup.py.\n  - **mysql_backup_xtrabackup.py**\nThis script is basically shorthand for running a xtrabackups backup through\nmysql_backup.py.\n  - **mysql_checksum.py**\nThis script is run every day, and runs a pt-checksum against a subset of the\nshards in order to verify that master and slave are not out of sync and if so,\nby how much. The mysql_checksum.py script runs the checksums and stores the\nresults.\n  - **mysql_cli.py**\nThis script attempts to remove the need for users to know hostnames,\nusernames and passwords in order to use the mysql cli. The script accepts\na replica set name or in some cases a shard name and will determine the\ncurrent hosts in productions and then launches a mysql cli in the shell\nusing a read only username and password. It can also use a variety of\nprivileges such as a read write connection or a admin connections. It can also\naccept a hostname and just figure out usernames and passwords.\n  - **mysql_cnf_builder.py**\nThis script builds MySQL configuration files based on global defaults, and then\noverrides for workload type, hardware and MySQL version. Several example\nconfiguration files are included.\n  - **mysql_grants.py**\nThis script manages our database users. It is one of our oldest bits of\nautomation and one of our most limited. It fulfills our needs for the time\nbeing but sooner or later will need to be significantly expanded.\n  - **mysql_failover.py**\nThis script attempts to safely run a failover on a MySQL replica set, updating\nreplication topology and service discovery.\n  - **mysql_grants.py**\nThis script provides an interface to check and correct db user configuration.\n  - **mysql_init_server.py**\nThis script takes a server with mysql binaries installed and sets up an empty\nmysql instance and then imports users.\n  - **mysql_record_table_size.py*\nRecord the size of all innodb tables.\n  - **mysql_replica_mappings.py**\nThis script provides administrators a quick view of what is in production in a\nformat that is easy to use for shell scripting. The script can also pull in\nhardware, availability zone, etc...\n  - **restart_daemons.py**\nRestart pt daemons, if needed.\n  - **mysql_restore.py**\nThis script finds a backup, restores it, sets up replication and\nthen adds the new instance to service discovery based on data recorded by\nlaunch_replacement_db_host.py.\n  - **mysql_shard_status.py**\nThis script displays the status in service discovery of an instance. Primarily\nused for gating cron jobs.\n  - **other_slave_running_etl.py**\nChecks is another slave server is running a csv backup. Useful for gating cron.\n  - **retirement_queue.py**\nThis script ensures that a server which is no longer in service discovery\nis no longer in use and then terminates the instance.\n  - **safe_uploader.py**\nThis module provides our canonical way to upload data into S3. Either the \nprocesses feeding in data all succeed or the upload is not finalized.\n  - **schema_verifier.py**\nThis script ensures that schema is in sync across sharded data sets.\n\n## Some examples\n\nFind the pinlater test servers\n```\n$ ./mysql_replica_mappings.py | grep pinlatertest\npinlatertestdb002               master      pinlatertestdb-2-3:3306\npinlatertestdb002               slave       pinlatertestdb-2-4:3306\n```\n\nPromote the slave to master\n```\n$ ./mysql_failover.py pinlatertestdb-2-3\nI18:15:10 [__main__] Master to demote is pinlatertestdb-2-3:3306\nI18:15:10 [__main__] Replica set is detected as pinlatertestdb002\nI18:15:10 [__main__] Taking promotion lock on replica set\nI18:15:10 [__main__] Promotion lock identifier is 48f8ee80-d97e-47b4-bf2a-75fd5d985b20\nI18:15:10 [__main__] Releasing any expired locks\nI18:15:10 [__main__] UPDATE mysqlops.promotion_locks SET lock_active = NULL WHERE expires \u003c now()\nI18:15:10 [__main__] Checking existing locks\nI18:15:10 [__main__] Taking lock against replica set: pinlatertestdb002\nI18:15:10 [__main__] INSERT INTO mysqlops.promotion_locks SET lock_identifier = '48f8ee80-d97e-47b4-bf2a-75fd5d985b20', lock_active = 'active', created_at = NOW(), expires = NOW() + INTERVAL 12 HOUR, released = NULL, replica_set = 'pinlatertestdb002', promoting_host = 'devops001', promoting_user = 'rwultsch'\nI18:15:10 [__main__] Slave/new master is detected as pinlatertestdb-2-4:3306\nI18:15:10 [__main__] DR slave is detected as None\nI18:15:10 [__main__] Replica pinlatertestdb-2-4:3306 is replicating from expected master server pinlatertestdb-2-3:3306\nI18:15:10 [__main__] Testing to see if Slave/new master is setup to write replication logs\nI18:15:10 [__main__] Slave/new master is setup to write replication logs\nI18:15:10 [__main__] Master is considered alive\nI18:15:10 [__main__] Lag on pinlatertestdb-2-4:3306 is 0 is \u003c= limit of 60\nI18:15:10 [__main__] Preliminary sanity checks complete, starting promotion\nI18:15:10 [__main__] Setting read_only on master\nI18:15:10 [lib.mysql_lib] Confirming that long running transactions have gone away\nI18:15:10 [lib.mysql_lib] All long trx are now dead\nI18:15:10 [lib.mysql_lib] SET GLOBAL read_only = 1\nI18:15:10 [__main__] Confirming no writes to old master\nI18:15:10 [lib.mysql_lib] FLUSH TABLE_STATISTICS\nI18:15:10 [lib.mysql_lib] FLUSH USER_STATISTICS\nI18:15:10 [__main__] Waiting 10 seconds to confirm instance is no longer accepting writes\nI18:15:20 [__main__] No writes after sleep, looks like we are good to go\nI18:15:20 [__main__] Waiting for replicas to be caught up\nI18:15:20 [__main__] pinlatertestdb-2-4:3306 is in sync with the master\nI18:15:20 [__main__] Setting up replication from old master (pinlatertestdb-2-3:3306)to new master (pinlatertestdb-2-4:3306)\nI18:15:20 [lib.mysql_lib] Setting pinlatertestdb-2-3:3306 as a replica of new master pinlatertestdb-2-4:3306\nI18:15:20 [lib.mysql_lib] Confirming that long running transactions have gone away\nI18:15:20 [lib.mysql_lib] All long trx are now dead\nI18:15:20 [lib.mysql_lib] SET GLOBAL read_only = 1\nI18:15:20 [lib.mysql_lib] CHANGE MASTER TO MASTER_USER='REDACTED', MASTER_PASSWORD='REDACTED', MASTER_HOST='pinlatertestdb-2-4', MASTER_PORT=3306, MASTER_LOG_FILE='pinlatertestdb-2-4-bin.000665', MASTER_LOG_POS=90715916\nI18:15:20 [lib.mysql_lib] START SLAVE\nI18:15:21 [__main__] Updating zk\nI18:15:21 [modify_mysql_zk] Instance is pinlatertestdb-2-4:3306\nI18:15:21 [modify_mysql_zk] Detected replica_set as pinlatertestdb002\nI18:15:21 [kazoo_utils] Underlying zookeeper connection is healthy.\nI18:15:21 [modify_mysql_zk] Replica set pinlatertestdb002 is held in zk_node /config/services/generaldb/mysql_databases\nI18:15:21 [modify_mysql_zk] Existing config:\nI18:15:21 [modify_mysql_zk] {'master': {'host': 'pinlatertestdb-2-3', 'port': 3306},\n 'passwd': 'REDACTED',\n 'slave': {'host': 'pinlatertestdb-2-4', 'port': 3306},\n 'user': 'REDACTED'}\nI18:15:21 [modify_mysql_zk] New config:\nI18:15:21 [modify_mysql_zk] {'master': {'host': 'pinlatertestdb-2-4', 'port': 3306},\n 'passwd': 'REDACTED',\n 'slave': {'host': 'pinlatertestdb-2-3', 'port': 3306},\n 'user': 'REDACTED'}\nI18:15:21 [modify_mysql_zk] Pushing new configuration for pinlatertestdb002:\nI18:15:21 [__main__] Removing read_only from new master\nI18:15:21 [lib.mysql_lib] SET GLOBAL read_only = 0\nI18:15:21 [__main__] Removing replication configuration from new master\nI18:15:21 [lib.mysql_lib] ('Previous replication settings:', {'Replicate_Wild_Do_Table': '', 'Retrieved_Gtid_Set': '', 'Master_SSL_CA_Path': '', 'Last_Error': '', 'Until_Log_File': '', 'SQL_Delay': 0L, 'Seconds_Behind_Master': 0L, 'Master_User': 'REDACTED', 'Master_Port': 3306L, 'Master_Retry_Count': 86400L, 'Until_Log_Pos': 0L, 'Master_Log_File': 'pinlatertestdb-2-3-bin.000715', 'Read_Master_Log_Pos': 41406489L, 'Replicate_Do_DB': '', 'Master_SSL_Verify_Server_Cert': 'No', 'Exec_Master_Log_Pos': 41406489L, 'Replicate_Ignore_Server_Ids': '', 'Replicate_Ignore_Table': '', 'Master_Server_Id': 167842827L, 'Relay_Log_Space': 41406892L, 'Last_SQL_Error': '', 'SQL_Remaining_Delay': None, 'Relay_Master_Log_File': 'pinlatertestdb-2-3-bin.000715', 'Master_SSL_Allowed': 'No', 'Master_SSL_CA_File': '', 'Slave_IO_State': 'Waiting for master to send event', 'Last_SQL_Error_Timestamp': '', 'Relay_Log_File': 'mysqld_3306-relay-bin.002138', 'Replicate_Ignore_DB': '', 'Last_IO_Error': '', 'Until_Condition': 'None', 'Slave_SQL_Running_State': 'Slave has read all relay log; waiting for the slave I/O thread to update it', 'Replicate_Do_Table': '', 'Last_Errno': 0L, 'Master_Host': 'pinlatertestdb-2-3', 'Master_Info_File': '/raid0/mysql/3306/data/master.info', 'Master_SSL_Key': '', 'Executed_Gtid_Set': '', 'Master_Bind': '', 'Skip_Counter': 0L, 'Slave_SQL_Running': 'Yes', 'Relay_Log_Pos': 41406661L, 'Master_SSL_Cert': '', 'Last_IO_Errno': 0L, 'Slave_IO_Running': 'Yes', 'Connect_Retry': 60L, 'Last_SQL_Errno': 0L, 'Last_IO_Error_Timestamp': '', 'Replicate_Wild_Ignore_Table': '', 'Master_UUID': 'f38ce3e5-1609-11e5-9d3c-0e36038ac59d', 'Auto_Position': 0L, 'Master_SSL_Crl': '', 'Master_SSL_Cipher': '', 'Master_SSL_Crlpath': ''})\nI18:15:21 [lib.mysql_lib] STOP SLAVE\nI18:15:21 [lib.mysql_lib] RESET SLAVE ALL\nI18:15:21 [__main__] Releasing promotion lock\nI18:15:21 [__main__] UPDATE mysqlops.promotion_locks SET lock_active = NULL AND released = NOW() WHERE lock_identifier = '48f8ee80-d97e-47b4-bf2a-75fd5d985b20'\nI18:15:21 [__main__] Failover complete\n```\n\nReplace the old master/new slave\n```\n$ ./launch_replacement_db_host.py pinlatertestdb-2-3 --reason kicks_and_giggles\nI18:19:28 [__main__] Trying to launch a replacement for host pinlatertestdb-2-3 which is part of replica set is pinlatertestdb002\nI18:19:28 [__main__] Data from cmdb: {u'config.instance_type': u'i2.2xlarge', u'region': u'us-east-1', u'cloud.aws.vpc_id': u'REDACTED', u'cloud.aws.subnet_id': u'REDACED', u'location': u'us-east-1a', u'id': u'REDACTED', u'security_group_ids': u'REDACTED', u'config.name': u'pinlatertestdb-2-3', u'security_groups': u'REDACTED'}\nI18:19:29 [__main__] Reason for launch: kicks_and_giggles\nI18:19:29 [launch_amazon_mysql_server] Requested hostname = pinlatertestdb-2-7\nI18:19:29 [launch_amazon_mysql_server] Requested instance_type = i2.2xlarge\nI18:19:29 [launch_amazon_mysql_server] Requested vpc_security_group = REDACTED\nI18:19:29 [launch_amazon_mysql_server] Requested classic_security_group = None\nI18:19:29 [launch_amazon_mysql_server] Requested availability_zone = us-east-1a\nI18:19:29 [launch_amazon_mysql_server] Requested mysql_major_version = 5.6\nI18:19:29 [launch_amazon_mysql_server] Requested mysql_minor_version = stable\nI18:19:29 [launch_amazon_mysql_server] Requested dry_run = False\nI18:19:29 [launch_amazon_mysql_server] Requested skip_name_check = True\nI18:19:29 [launch_amazon_mysql_server] Will use subnet \"REDACTED\" in \"REDACTED\" based upon security group REDACTED and availibility zone us-east-1a\nI18:19:29 [launch_amazon_mysql_server] Config for new server:\n..Tons of Pinterest stuff redacted here..\nI18:19:30 [launch_amazon_mysql_server] Launched instance i-1231234\n```\n\n Check replication on the old master/new slave\n```\n./check_mysql_replication.py pinlatertestdb-2-3\nHeartbeat_seconds_behind: 0\nSlave_IO_Running: Yes\nIO_lag_bytes: 2068\nIO_lag_binlogs: 0\nSlave_SQL_Running: Yes\nSQL_lag_bytes: 2068\nSQL_lag_binlogs: 0\n```\n\nCheck grants on the old master/new slave\n```\n$ ./mysql_grants.py -i pinlatertestdb-2-3 -a check\n$ echo $?\n0\n```\n\n## Not a Panacea\nThese tools are tightly integrated into our service discovery mechanism and\nwould likely require moderate modification of the code that reads and writes\nfrom service discovery. There are also some significant legacy limitations to\nthese utilities, such as the lack of support for more than two slaves. It is\nour hope that these tools are useful to other that wish to create automation\nfor their MySQL infrastructure.\n","funding_links":[],"categories":["Database Tools","Python","Linux生态圈Dev\u0026Ops工具与服务","Databases"],"sub_categories":["Learning Resources"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpinterest%2Fmysql_utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpinterest%2Fmysql_utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpinterest%2Fmysql_utils/lists"}