https://github.com/sjmudd/queryprofiler
Capture query profiles of one or more MySQL servers using performance_schema
https://github.com/sjmudd/queryprofiler
Last synced: about 1 month ago
JSON representation
Capture query profiles of one or more MySQL servers using performance_schema
- Host: GitHub
- URL: https://github.com/sjmudd/queryprofiler
- Owner: sjmudd
- License: bsd-2-clause
- Created: 2015-12-17T07:19:33.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-12-29T00:59:32.000Z (over 9 years ago)
- Last Synced: 2025-03-24T22:51:21.510Z (about 2 months ago)
- Language: Go
- Size: 46.9 KB
- Stars: 20
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# queryprofiler
Compare query profiles of 2 different servers by querying performance_schema.events_statements_summary_by_digest
# overview
The intention here is to connect to two servers and collect information from the digest table
in parallel.
This will do the following:
* generate n collections of query digests for each server.
* From this data we can collect n-1 samples which are based on collection x compared against collection x-1.
* Finally we find the top Z queries for server1
* With each query on server1 attempt to compare each of these queries with server2, comparing metrics using the available samples.Note: this is still work in progress and not completed.
# Usage```
queryprofiler [] [ ...]
``````
DSN1='user:password@tcp(server1.example.com:3306)/performance_schema'
DSN2='user:password@tcp(server2.example.com:3306)/performance_schema'
./queryprofiler "$DSN1" "$DSN2"
```# Concepts
In theory using P_S to profile the queries may seem quite simple,
but I think that to get useful values it requires a little more attention.
The sections below describe how queryprofiler analyses the queries
on the server.## Event
Event is the table represetation of the P_S digest table.
## Collection
Collection is a slice of Events together with a timestamp of when
data was collected. Collections is a slice of Collection.## Sample
Sample is a slice of rows that come from subtracting _matching_
values by Key and recording the start time and duration of the
sample. It contains several rows for different queries. Samples
is a slice of Sample. Sample metrics are normalised to metrics
per second for consistency.## Metric
This is a slice of float64, which is the underlying numbers used
by this program. Thus a sample really contains a named set of
Mmetric.## Key
In theory the QUERY_DIGEST might be used but this digest is not
stable between different MySQL versions so I collect an MD5 digest
of the DIGEST_TEXT. That said the DIGEST is not a unique key,
what's unique is a combination of query (digest) and SCHEMA_NAME,
so the Key considered as the key of queries is based on the MD5_DIGEST
and the SCHEMA_NAME, joined by a ".". if SCHEMA_NAME contains a
value.## Issues
* events_statements_summary_by_digest may have empty DIGEST/DIGEST_TEXT. This represents lost values because the maximum number of digest values has been exceeded. You may see this empty query having quite high values because of this.
* Only completed queries are shown. Any long query that is running while queryprofiler is looking for data won't be shown.
* events_statements_summary_by_digest should have only one row per DIGEST_TEXT / SCHEMA_NAME. Unfortunately I've seen that this is not the case and multiple row may be present. This has been reported. See http://bugs.mysql.com/bug.php?id=79533. In the meantime if multiple rows are found with the same DIGEST_TEXT/SCHEMA_NAME the values are merged together.
* events_statements_summary_by_digest has a DIGEST column which represents a unique key (with the SCHEMA_NAME) to identify queries. However, this digest may not be the same for the same query on 2 different servers due to the way the optimiser works. Consequently queryprofiler takes an MD5 checksum of the QUERY_TEXT and uses that instead. I should really file a feature requesting that the generated query digest is calculated consistently as that would avoid this extra operation.
# Sample Output
For sample output look at the file of the same name: sample-output.txt