{"id":30294448,"url":"https://github.com/linkedin/queryanalyzeragent","last_synced_at":"2025-08-17T01:35:02.721Z","repository":{"id":57572046,"uuid":"249773941","full_name":"linkedin/QueryAnalyzerAgent","owner":"linkedin","description":"Analyze MySQL queries with negligible overhead","archived":false,"fork":false,"pushed_at":"2020-09-04T08:09:06.000Z","size":24,"stargazers_count":35,"open_issues_count":1,"forks_count":9,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-06-20T09:49:50.627Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linkedin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-24T17:30:56.000Z","updated_at":"2024-02-09T15:05:41.000Z","dependencies_parsed_at":"2022-08-23T17:40:36.221Z","dependency_job_id":null,"html_url":"https://github.com/linkedin/QueryAnalyzerAgent","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/linkedin/QueryAnalyzerAgent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FQueryAnalyzerAgent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FQueryAnalyzerAgent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FQueryAnalyzerAgent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FQueryAnalyzerAgent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linkedin","download_url":"https://codeload.github.com/linkedin/QueryAnalyzerAgent/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FQueryAnalyzerAgent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270796217,"owners_count":24647319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-17T01:35:01.628Z","updated_at":"2025-08-17T01:35:02.701Z","avatar_url":"https://github.com/linkedin.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Query Analyzer Agent - Capture and analyze the queries without overhead.\n\nQuery Analyzer Agent runs on the database server. It captures all the queries by sniffing the network port, aggregates the queries and sends results to a remote server for further analysis. Refer to [LinkedIn's Engineering Blog](https://engineering.linkedin.com/blog/2017/09/query-analyzer--a-tool-for-analyzing-mysql-queries-without-overh) for more details.\n\n## Getting Started\n### Prerequisites\nQuery Analyzer Agent is written in Go, so before you get started you should [install and setup Go](https://golang.org/doc/install). You can also follow the steps here to install and setup Go.\n```\n$ wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz\n$ sudo tar -C /usr/local -xzf go1.14.linux-amd64.tar.gz\n$ mkdir ~/projects\n$ export PATH=$PATH:/usr/local/go/bin\n$ export GOPATH=~/projects\n$ export GOBIN=~/projects/bin\n```\n\nQuery Analyzer Agent requires the following external libraries\n- pcap.h (provided by libpcap-dev package), gcc or build-essential for building this package\n    - RHEL/CentOs/Fedora:\n      ```\n      $ sudo yum install gcc libpcap libpcap-devel git\n      ```\n    - Debian/Ubuntu:\n      ```\n      $ sudo apt-get install build-essential libpcap-dev git\n      ```\n- [Go-MySQL-Driver](https://github.com/go-sql-driver/mysql)\n  ```\n  $ go get github.com/go-sql-driver/mysql\n  ```\n\n### Third Party Libraries\nGo build system automatically downloads the following third party tools from the respective github repository during the compilation of this project.\n```\nGoPacket\nhttps://github.com/google/gopacket\nCopyright (c) 2012 Google, Inc. All rights reserved.\nCopyright (c) 2009-2011 Andreas Krennmair. All rights reserved.\nLicense: BSD 3-Clause \"New\" or \"Revised\" License\n\nPercona Go packages for MySQL\nhttps://github.com/percona/go-mysql\nCopyright (C) 2007 Free Software Foundation, Inc. \u003chttp://fsf.org/\u003e\nLicense: BSD 3-Clause \"New\" or \"Revised\" License\n\nViper\nhttps://github.com/spf13/viper\nCopyright (c) 2014 Steve Francia\nLicense: MIT\n```\n\n### Setting up remote database\nQuery Analyzer Agent either prints the aggregated queries to a local log file or sends to a remote database which can store queries collected from all the agents. We have chosen MySQL as the remote database.\n\nExecute the following SQL statements on the remote database server.\n```\nCREATE DATABASE IF NOT EXISTS `query_analyzer`;\n\nCREATE TABLE IF NOT EXISTS `query_analyzer`.`query_info` (\n  `hostname` varchar(64) NOT NULL DEFAULT '',\n  `checksum` char(16) NOT NULL DEFAULT '',\n  `fingerprint` longtext NOT NULL,\n  `sample` longtext CHARACTER SET utf8mb4,\n  `firstseen` datetime NOT NULL,\n  `mintime` float NOT NULL DEFAULT '0',\n  `mintimeat` datetime NOT NULL,\n  `maxtime` float NOT NULL DEFAULT '0',\n  `maxtimeat` datetime NOT NULL,\n  `is_reviewed` enum('0','1','2') NOT NULL DEFAULT '0',\n  `reviewed_by` varchar(20) DEFAULT NULL,\n  `reviewed_on` datetime DEFAULT NULL,\n  `comments` mediumtext,\n  PRIMARY KEY (`hostname`,`checksum`),\n  KEY `checksum` (`checksum`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;\n\nCREATE TABLE IF NOT EXISTS `query_analyzer`.`query_history` (\n  `hostname` varchar(64) NOT NULL DEFAULT '',\n  `checksum` char(16) NOT NULL DEFAULT '',\n  `src` varchar(39) NOT NULL DEFAULT '',\n  `user` varchar(16) DEFAULT NULL,\n  `db` varchar(64) NOT NULL DEFAULT '',\n  `ts` datetime NOT NULL,\n  `count` int unsigned NOT NULL DEFAULT '1',\n  `querytime` float NOT NULL DEFAULT '0',\n  `bytes` int unsigned NOT NULL DEFAULT '0',\n  PRIMARY KEY (`hostname`,`checksum`,`ts`),\n  KEY `checksum` (`checksum`),\n  KEY `user` (`user`),\n  KEY `covering` (`hostname`,`ts`,`querytime`,`count`,`bytes`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8\n/*!50100 PARTITION BY RANGE (TO_DAYS(ts))\n(PARTITION p202004 VALUES LESS THAN (TO_DAYS('2020-05-01')) ENGINE = InnoDB,\n PARTITION p202005 VALUES LESS THAN (TO_DAYS('2020-06-01')) ENGINE = InnoDB,\n PARTITION p202006 VALUES LESS THAN (TO_DAYS('2020-07-01')) ENGINE = InnoDB,\n PARTITION p202007 VALUES LESS THAN (TO_DAYS('2020-08-01')) ENGINE = InnoDB,\n PARTITION p202008 VALUES LESS THAN (TO_DAYS('2020-09-01')) ENGINE = InnoDB,\n PARTITION p202009 VALUES LESS THAN (TO_DAYS('2020-10-01')) ENGINE = InnoDB,\n PARTITION p202010 VALUES LESS THAN (TO_DAYS('2020-11-01')) ENGINE = InnoDB,\n PARTITION p202011 VALUES LESS THAN (TO_DAYS('2020-12-01')) ENGINE = InnoDB,\n PARTITION pMAX VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;\n/* You can use different partition scheme based on your retention */\n\nCREATE USER /*!50706 IF NOT EXISTS*/ 'qan_rw'@'qan_agent_ip' IDENTIFIED BY 'Complex_P@ssw0rd';\n\nGRANT SELECT, INSERT, UPDATE ON `query_analyzer`.* TO 'qan_rw'@'qan_agent_ip';\n```\nThe above SQLs can be found in remote_database/remote_schema.sql and remote_database/users.sql files.\n\n## Build and Install\n```\n$ git clone https://github.com/linkedin/QueryAnalyzerAgent\n$ cd QueryAnalyzerAgent\n$ go get\n$ go build -o $GOBIN/QueryAnalyzerAgent\n```\n\n## Configuration\nQueryAnalyzerAgent config is in TOML format which is organized into several subheadings. For the basic use, you need to specify the Ethernet Interface, Port and connection details of remote database endpoint in the config file - qan.toml\n\nOnce the remote database is setup, update qan.toml\n```\n[remoteDB]\nEnabled = 1\n\n# remote database hostname to send results to\nHostname = \"remote_database_hostname\"\n\n# remote database port to send results to\nPort = 3306\n\n# remote database username to send results to\nUsername = \"qan_rw\"\n\n# remote database password to send results to\nPassword = \"Complex_P@ssw0rd\"\n\n# remote database name to send results to\nDBName = \"query_analyzer\"\n```\n\nIf user and db details of connection are needed, create a user to connect to the local database and update the localDB section. Create user SQL can be found at local_database/users.sql\n\n\n## Running Query Analyzer Agent\n```\nSince the agent sniffs the network interface, it should have net_raw capability.\n$ sudo setcap cap_net_raw+ep $GOBIN/QueryAnalyzerAgent\n$ $GOBIN/QueryAnalyzerAgent --config-file qan.toml (or complete path to qan.toml)\n\nIf you do not set the net_raw capability, you can run the agent as a root user.\n$ sudo $GOBIN/QueryAnalyzerAgent --config-file qan.toml (or complete path to qan.toml)\n```\n\n## Query Analytics\nOnce you understand the schema, you can write queries and build fancy UI to extract the information you want.\nExamples:\n\n* Top 5 queries which have the maximum total run time during a specific interval. If a query takes 1 second and executes 1000 times, the total run time is 1000 seconds.\n  ```\n  SELECT \n      SUM(count),\n      SUM(querytime) \n  INTO \n      @count, @qt \n  FROM \n      query_history history \n  WHERE \n      history.hostname='mysql.database-server-001.linkedin.com' \n      AND ts\u003e='2020-03-11 09:00:00' \n      AND ts\u003c='2020-03-11 10:00:00';\n    \n  SELECT \n      info.checksum,\n      info.firstseen AS first_seen,\n      info.fingerprint,\n      info.sample,\n      SUM(count) as count,\n      ROUND(((SUM(count)/@count)*100),2) AS pct_total_query_count,\n      ROUND((SUM(count)/(TIME_TO_SEC(TIMEDIFF(MAX(history.ts),MIN(history.ts))))),2) AS qps,\n      ROUND((SUM(querytime)/SUM(count)),6) AS avg_query_time,\n      ROUND(SUM(querytime),6) AS sum_query_time,\n      ROUND((SUM(querytime)/@qt)*100,2) AS pct_total_query_time,\n      MIN(info.mintime) AS min_query_time,\n      MAX(info.maxtime) AS max_query_time\n  FROM \n      query_history history \n  JOIN     \n      query_info info \n  ON \n      info.checksum=history.checksum \n      AND info.hostname=history.hostname \n  WHERE \n      info.hostname='mysql.database-server-001.linkedin.com' \n      AND ts\u003e='2020-03-11 09:00:00' \n      AND ts\u003c='2020-03-11 10:00:00' \n  GROUP BY \n      info.checksum \n  ORDER BY\n      pct_total_query_time DESC \n  LIMIT 5\\G\n  ```\n\n* Trend for a particular query\n  ```\n  SELECT \n      UNIX_TIMESTAMP(ts),\n      ROUND(querytime/count,6) \n  FROM \n      query_history history \n  WHERE \n      history.checksum='D22AB75FA3CC05DC' \n      AND history.hostname='mysql.database-server-001.linkedin.com' \n      AND ts\u003e='2020-03-11 09:00:00' \n      AND ts\u003c='2020-03-11 10:00:00';\n  ```\n* Queries fired from a particular IP\n  ```\n  SELECT\n      info.checksum,\n      info.fingerprint,\n      info.sample\n  FROM \n      query_history history \n  JOIN     \n      query_info info \n  ON \n      info.checksum=history.checksum \n      AND info.hostname=history.hostname \n  WHERE \n      history.src='10.251.225.27'\n  LIMIT 5;\n  ```\n* New queries on a particular day\n  ```\n  SELECT\n      info.firstseen,\n      info.checksum,\n      info.fingerprint,\n      info.sample\n  FROM   \n      query_info info \n  WHERE \n      info.hostname = 'mysql.database-server-001.linkedin.com' \n      AND info.firstseen \u003e= '2020-03-10 00:00:00'\n      AND info.firstseen \u003c '2020-03-11 00:00:00'\n  LIMIT 5;\n  ```\n\n## Limitations\n* As of now, it works only for MySQL.\n\n* Does not account for \n   * SSL\n   * Compressed packets\n   * Replication traffic\n   * Big queries for performance reasons\n\n* The number of unique query fingerprints should be limited (like \u003c100K). For example if there is some blob in the query and the tool is unable to generate the correct fingerprint, it will lead to a huge number of fingerprints and can increase the memory footprint of QueryAnalyzerAgent.\u003cbr /\u003e\u003cbr /\u003e\n  Another example is if you are using Github's Orchestrator in pseudo GTID mode, it generates queries like \n  ```\n  drop view if exists `_pseudo_gtid_`.`_asc:5d8a58c6:0911a85c:865c051f49639e79`\n  ```\n\n  The fingerprint for those queries will be unique each time and it will lead to more number of distinct queries in QueryAnalyzerAgent. Code to ignore those queries is commented, uncomment if needed.\n\n* Test the performance of QueryAnalyzerAgent in your staging environment before running on production. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Fqueryanalyzeragent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinkedin%2Fqueryanalyzeragent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Fqueryanalyzeragent/lists"}