{"id":19773947,"url":"https://github.com/badoo/pinba2","last_synced_at":"2025-07-14T05:33:49.667Z","repository":{"id":47337841,"uuid":"80511734","full_name":"badoo/pinba2","owner":"badoo","description":"Pinba2: new implementation of https://github.com/tony2001/pinba_engine","archived":false,"fork":false,"pushed_at":"2022-12-13T10:12:19.000Z","size":1483,"stargazers_count":130,"open_issues_count":8,"forks_count":18,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-04-20T17:45:52.203Z","etag":null,"topics":["mariadb","mysql","pinba","stats","timeseries"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/badoo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-31T10:44:21.000Z","updated_at":"2024-02-25T19:45:53.000Z","dependencies_parsed_at":"2023-01-28T10:45:50.101Z","dependency_job_id":null,"html_url":"https://github.com/badoo/pinba2","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/badoo%2Fpinba2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/badoo%2Fpinba2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/badoo%2Fpinba2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/badoo%2Fpinba2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/badoo","download_url":"https://codeload.github.com/badoo/pinba2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224219555,"owners_count":17275477,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mariadb","mysql","pinba","stats","timeseries"],"created_at":"2024-11-12T05:11:34.714Z","updated_at":"2024-11-12T05:11:35.376Z","avatar_url":"https://github.com/badoo.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"Pinba2\n======\n\nAn attempt to rethink internal implementation and some features of excellent https://github.com/tony2001/pinba_engine by @tony2001.\n\n\nPinba (PHP Is Not A Bottleneck Anymore) is a statistics server using MySQL as an interface.\n\nIt accumulates and processes data sent over UDP and displays statistics in human-readable form of simple \"reports\" (like what are my slowest scripts or sql queries).\nThis is not limited to PHP, there are clients for multiple languages and nginx module.\n\n\nKey differences from original implementation\n--------------------------------------------\n\n- no raw data tables (i.e. requests, timers) support, yet (can be implemented)\n    - raw data tables have VERY high memory usage requirements and uses are limited\n- simpler, more flexible report configuration\n    - all use cases from original pinba are covered by only 3 kinds of reports (of which you mostly need one: timer)\n    - simple aggregation keys specification, can mix different types, i.e. ~script,~server,+request_tag,@timer_tag\n        - supports 15 keys max at the moment (never seen anyone using more than 5 anyway)\n        - performance does not degrade when adding more keys to reports\n    - more options can be configured per report now\n        - stats gathering history: i.e. some reports can aggregate over 60sec, while others - over 300sec, as needed\n        - histograms+percentiles: some reports might need very detailed histograms, while others - coarse\n- simpler to maintain\n    - no 'pools' to configure, aka no re-configuration is required when traffic grows\n    - no limits on tag name/value sizes (but keep it reasonable)\n- aggregation performance improved, reduced cpu/memory usage\n    - currently handles ~72k simple packets/sec (~200mbps) with 5 medium-complexity reports (4 keys aggregation) @ ~40% overall cpu usage\n    - handles up to 1.4 million packets/sec (~3 gbps) on internal setups on commodity hardware from 2015\n    - uses significantly less memory (orders of magnitude) for common cases, since we don't store raw requests by default\n    - current goal is to be able to handle 10gpbs of incoming traffic with hundreds of reports\n- select performance - might be slower\n  - selects from complex reports never slow down new data aggregation\n  - selects in general will be slower for complex reports with thousands of rows and high percision percentiles\n    - select * from 30k rows report without percentiles takes at least ~200 milliseconds or so\n    - with percentiles (say histogram with 10k entries) - will add ~300ms to that\n- misc\n  - traffic and memory_footprint are measured in bytes (original pinba truncates to kilobytes)\n  - raw histogram data is available as an extra field in existing report (not as a separate table)\n\n\nClient libraries\n----------------\n\nSame client libraries can be used with this pinba implementation\n\nlist from http://pinba.org/\n\n- PHP: https://github.com/tony2001/pinba_extension\n- Nginx: https://github.com/tony2001/ngx_http_pinba_module\n- NodeJS: https://github.com/Sannis/node-pinba\n- Ruby: https://github.com/prepor/pinbo\n- Python: https://github.com/IsCoolEntertainment/pynba\n- Java: https://github.com/alex-krash/jpinba\n- Go: https://github.com/mkevac/gopinba\n\n\nMigrating from original Pinba\n-----------------------------\n\nWe've got some scripts to help [in scripts directory](scripts).\nConvert mysqldump of your old tables to new format with [this script](scripts/convert_mysqldump.php).\n\n\n\nMore Info\n--------\n\n- [TODO](TODO.md)\n- [Building](docs/index.md#building) - use docker, really\n- [Installing](docs/index.md#installation) - use docker, really\n- [Configuration](docs/index.md#configuration) - optional, should run fine with default settings\n- [User-defined reports + examples](#user-defined-reports)\n\n\nDocker\n------\n\n### Fedora 25\n\n[`Dockerfile`](Dockerfile)\n\nBasics\n------\n\n**Requests**\n\nWe get these over UDP, each request contains metrics data gathered by your application (like serving pages to users, or performing db queries).\n\n\nData comes in three forms\n\n- **request fields** (these are predefined and hardcoded since the dawn of original pinba)\n    - `host_name`: name of the physical host (like \"subdomain.mycoolserver.localnetwork\")\n    - `script_name`: name of the script\n    - `server_name`: name of the logical host (like \"example.com\")\n    - `schema`: usually \"http\" or \"https\"\n    - `status`: usually http status (this one is 32-bit integer)\n    - `request_time`: wall-clock time it took to execute the whole request\n    - `rusage_user`: rusage in user space for this request\n    - `rusage_system`: rusage in kernel space for this request\n    - `document_size`: size of result doc\n    - `memory_footprint`: amount of memory used\n- **request tags** - this is just a bunch of `key -\u003e value` pairs attached to request as a whole\n    - ex. in pseudocode `[ 'application' -\u003e 'my_cool_app', 'environment' -\u003e 'production' ]`\n- **timers** - a bunch is sub-action measurements, for example: time it took to execute some db query, or process some user input.\n    - number of timers is not limited, track all db/memcached queries\n    - each timer can also have tags!\n        - ex. `[ 'group' -\u003e 'db', 'server' -\u003e 'db1.lan', 'op_type' -\u003e 'update' ]`\n        - ex. `[ 'group' -\u003e 'memcache', 'server' -\u003e 'mmc1.lan', 'op_type' -\u003e 'get' ]`\n\n\n**Reports**\n\nReport is a read-only view of incoming data, aggregated within specified time window.\nOne can think of it as a table of key/value pairs: `Aggregation_key value` -\u003e `Aggregated_data` + `Percentiles`.\n\n- `Aggregation_key` - configured when report is created.\n    - key *names* are set by the user, *values* for those keys are taken from requests for aggregation\n    - key *name* is a combination of\n        - request fields: `~host`, `~script`, `~server`, `~schema`, `~status`\n        - request tags: `+whatever_name_you_want`\n        - timer tags: `@some_timer_tag_name`\n- `Aggregation_key value` - is the set of values, corresponding to key names set in `Aggregation_key`\n    - ex. if `Aggregation_key` is `~host`, there'll be a key/value pair per unique `host` we see in request stream\n    - ex. if `Aggregation_key` is `~host,+req_tag`, there'll be a key/value pair per unique `[host, req_tag_value]` pair\n- `Aggregated_data` is report-specific (i.e. a structure with fields like: req_count, hit_count, total_time, etc.).\n- `Percentiles` is a bunch of fields with specific percentiles, calculated over data from `request_time` or `timer_value`\n- `Histogram` is a field where engine exports raw histogram data (that we calculate percentiles from) in text form\n\nThere are 3 kinds of reports: packet, request, timer. The difference between those boils down to\n\n- How `Aggregation_key values`-s are extracted and matched\n- How `Aggregated_data` is populated (i.e. if you aggregate on request tags, there is no need/way to aggregate timer data)\n- What value we use for `Histogram` and `Percentiles`\n\n\n**SQL tables**\n\nReports are exposed to the user as SQL tables.\n\nAll report tables have same simple structure\n\n- `Aggregation_key`, one table field per key part (i.e. ~script,~host,@timer_tag needs 3 fields with appropriate types)\n- `Aggregated_data`, 3 fields per data field (field_value, field_value_per_sec, field_value_percent) (i.e. request report needs 7*3 fields = 21 data fields)\n- `Percentiles`, one field per configured percentile (optional)\n- `Histogram`, one text field for raw histogram data that percentiles are calculated from (optional)\n\nASCII art!\n\n                              ----------------           -------------------------------------------------------------\n                              | key -\u003e value |           | key_part_1 | ... | data_part_1 | ... | percentile_1 | ... |\n    ------------              ----------------           -------------------------------------------------------------\n    | Requests |  aggregate\u003e  |  .........   |  select\u003e  |    ...................................................    |\n    ------------              ----------------           -------------------------------------------------------------\n                              | key -\u003e value |           | key_part_1 | ... | data_part_1 | ... | percentile_1 | ... |\n                              ----------------           -------------------------------------------------------------\n\n\n**SQL table comments**\n\nAll pinba tables are created with sql comment to tell the engine about table purpose and structure,\ngeneral syntax for comment is as follows (not all reports use all the fields).\n\n    \u003e COMMENT='v2/\u003creport_type\u003e/\u003caggregation_window\u003e/\u003ckeys\u003e/\u003chistogram+percentiles\u003e/\u003cfilters\u003e';\n\n[Take a look at examples first](#user-defined-reports)\n\n- \u0026lt;aggregation_window\u0026gt;: time window we aggregate data in. values are\n    - 'default_history_time' to use global setting (= 60 seconds)\n    - (number of seconds) - whatever you want \u003e0\n- \u0026lt;keys\u0026gt;: keys we aggregate incoming data on\n    - 'no_keys': key based aggregation not needed / not supported (packet report only)\n    - \u0026lt;key_spec\u0026gt;[,\u0026lt;key_spec\u0026gt;[,...]]\n        - ~field_name: any of 'host', 'script', 'server', 'schema'\n        - +request_tag_name: use this request tag's value as key\n        - @timer_tag_name: use this timer tag's value as key (timer reports only)\n    - example: '~host,~script,+application,@group,@server'\n        - will aggregate on 5 keys\n        - 'host_name', 'script_name' global fields, 'application' request tag, plus 'group' and 'server' timer tag values\n- \u0026lt;histogram+percentiles\u0026gt;: histogram time and percentiles definition\n    - 'no_percentiles': disable\n    - syntax: 'hv=\u0026lt;min_time_ms\u0026gt;:\u0026lt;max_time_ms\u0026gt;:\u0026lt;bucket_count\u0026gt;,\u0026lt;percentiles\u0026gt;'\n        - \u0026lt;percentiles\u0026gt;=p\u0026lt;double\u0026gt;[,p\u0026lt;double\u0026gt;[...]]\n        - (alt syntax) \u0026lt;percentiles\u0026gt;='percentiles='\u0026lt;double\u0026gt;[:\u0026lt;double\u0026gt;[...]]\n    - example: 'hv=0:2000:20000,p99,p99.9,p100'\n        - this uses histogram for time range [0,2000) millseconds, with 20000 buckets, so each bucket is 0.1 ms 'wide'\n        - also adds 3 percentiles to report 99th, 99.9th and 100th, percentile calculation precision is 0.1ms given above\n        - uses 'request_time' (for packet/request reports) or 'timer_value' (for timer reports) from incoming packets for percentiles calculation\n    - example (alt syntax): 'hv=0:2000:20000,percentiles=99:99.9:100'\n        - same effect as above\n- \u0026lt;filters\u0026gt;: accept only packets maching these filters into this report\n    - to disable: put 'no_filters' here, report will accept all packets\n    - any of (separate with commas):\n        - 'min_time=\u0026lt;milliseconds\u0026gt;'\n        - 'max_time=\u0026lt;milliseconds\u0026gt;'\n        - '\u0026lt;tag_spec\u0026gt;=\u0026lt;value\u0026gt;' - check that packet has fields, request or timer tags with given values and accept only those\n    - \u0026lt;tag_spec\u0026gt; is the same as \u0026lt;key_spec\u0026gt; above, i.e. ~request_field,+request_tag,@timer_tag\n    - example: min_time=0,max_time=1000,+browser=chrome\n        - will accept only requests with request_time in range [0, 1000)ms with request tag 'browser' present and value 'chrome'\n        - there is currently no way to filter timers by their timer_value, can't think of a use case really\n\n\nUser-defined reports\n--------------------\n\n**Packet report (like info in tony2001/pinba_engine)**\n\nGeneral information about incoming packets\n\n- just aggregates everything into single item (mostly used to gauge general traffic)\n- Aggregation_key is always empty\n- Aggregated_data is global packet totals: { req_count, timer_count, hit_count, total_time, ru_utime, ru_stime, traffic, memory_footprint }\n- Histogram and Percentiles are calculated from data in request_time field\n\nTable comment syntax\n\n    \u003e 'v2/packet/\u003caggregation_window\u003e/no_keys/\u003chistogram+percentiles\u003e/\u003cfilters\u003e';\n\nExample\n\n```sql\nmysql\u003e CREATE TABLE `info` (\n      `req_count` bigint(20) unsigned NOT NULL,\n      `timer_count` bigint(20) unsigned NOT NULL,\n      `time_total` double NOT NULL,\n      `ru_utime_total` double NOT NULL,\n      `ru_stime_total` double NOT NULL,\n      `traffic` bigint(20) unsigned NOT NULL,\n      `memory_footprint` bigint(20) unsigned NOT NULL\n    ) ENGINE=PINBA DEFAULT CHARSET=latin1 COMMENT='v2/packet/default_history_time/no_keys/no_percentiles/no_filters'\n\nmysql\u003e select * from info;\n+-----------+-------------+-------------------+------------------+-----------------+-----------+------------------+\n| req_count | timer_count | time_total        | ru_utime_total   | ru_stime_total  | traffic   | memory_footprint |\n+-----------+-------------+-------------------+------------------+-----------------+-----------+------------------+\n|   3940547 |    59017168 | 6982620.849607239 | 128279.101920963 | 18963.268457099 | 141734072 |  317514981871616 |\n+-----------+-------------+-------------------+------------------+-----------------+-----------+------------------+\n1 row in set (0.00 sec)\n```\n\n\n**Request data report**\n\n- aggregates at request level, never touching timers at all\n- Aggregation_key is a combination of request_field (host, script, etc.) and request_tags (must NOT have timer_tag keys)\n- Aggregated_data is request-based\n    - req_count, req_time_total, req_ru_utime, req_ru_stime, traffic_kb, mem_usage\n- Histogram and Percentiles are calculated from data in request_time field\n\nTable comment syntax\n\n    \u003e 'v2/packet/\u003caggregation_window\u003e/\u003ckey_spec\u003e/\u003chistogram+percentiles\u003e/\u003cfilters\u003e';\n\nexample (report by script name only here)\n\n```sql\nmysql\u003e CREATE TABLE `report_by_script_name` (\n        `script` varchar(64) NOT NULL,\n        `req_count` int(10) unsigned NOT NULL,\n        `req_per_sec` float NOT NULL,\n        `req_percent` float,\n        `req_time_total` float NOT NULL,\n        `req_time_per_sec` float NOT NULL,\n        `req_time_percent` float,\n        `ru_utime_total` float NOT NULL,\n        `ru_utime_per_sec` float NOT NULL,\n        `ru_utime_percent` float,\n        `ru_stime_total` float NOT NULL,\n        `ru_stime_per_sec` float NOT NULL,\n        `ru_stime_percent` float,\n        `traffic_total` bigint(20) unsigned NOT NULL,\n        `traffic_per_sec` float NOT NULL,\n        `traffic_percent` float,\n        `memory_footprint` bigint(20) NOT NULL,\n        `memory_per_sec` float NOT NULL,\n        `memory_percent` float\n        ) ENGINE=PINBA DEFAULT CHARSET=latin1 COMMENT='v2/request/60/~script/no_percentiles/no_filters';\n\nmysql\u003e select * from report_by_script_name; -- skipped some fields for brevity\n+----------------+-----------+-------------+----------------+------------------+----------------+------------------+-----------------+------------------+\n| script         | req_count | req_per_sec | req_time_total | req_time_per_sec | ru_utime_total | ru_stime_per_sec | traffic_per_sec | memory_footprint |\n+----------------+-----------+-------------+----------------+------------------+----------------+------------------+-----------------+------------------+\n| script-0.phtml |    200001 |     3333.35 |        200.001 |          3.33335 |              0 |                0 |               0 |                0 |\n| script-6.phtml |    200000 |     3333.33 |            200 |          3.33333 |              0 |                0 |               0 |                0 |\n| script-3.phtml |    200000 |     3333.33 |            200 |          3.33333 |              0 |                0 |               0 |                0 |\n| script-5.phtml |    200000 |     3333.33 |            200 |          3.33333 |              0 |                0 |               0 |                0 |\n| script-4.phtml |    200000 |     3333.33 |            200 |          3.33333 |              0 |                0 |               0 |                0 |\n| script-8.phtml |    200000 |     3333.33 |            200 |          3.33333 |              0 |                0 |               0 |                0 |\n| script-9.phtml |    200000 |     3333.33 |            200 |          3.33333 |              0 |                0 |               0 |                0 |\n| script-1.phtml |    200001 |     3333.35 |        200.001 |          3.33335 |              0 |                0 |               0 |                0 |\n| script-2.phtml |    200000 |     3333.33 |            200 |          3.33333 |              0 |                0 |               0 |                0 |\n| script-7.phtml |    200000 |     3333.33 |            200 |          3.33333 |              0 |                0 |               0 |                0 |\n+----------------+-----------+-------------+----------------+------------------+----------------+------------------+-----------------+------------------+\n10 rows in set (0.00 sec)\n```\n\n\n**Timer data report**\n\nThis is the one you need for 95% uses\n\n- aggregates at request + timer levels\n- Aggregation_key is a combination of request_field (host, script, etc.), request_tags and timer_tags (must have at least one timer_tag key)\n- Aggregated_data is timer-based (aka taken from timer data)\n    - req_count, timer_hit_count, timer_time_total, timer_ru_utime, timer_ru_stime\n- Histogram and Percentiles are calculated from data in timer_value\n\nTable comment syntax\n\n    \u003e 'v2/packet/\u003caggregation_window\u003e/\u003ckey_spec\u003e/\u003chistogram+percentiles\u003e/\u003cfilters\u003e';\n\nexample (some complex report)\n\n```sql\nmysql\u003e CREATE TABLE `tag_info_pinger_call_from_wwwbmamlan` (\n      `pinger_dst_cluster` varchar(64) NOT NULL,\n      `pinger_src_host` varchar(64) NOT NULL,\n      `pinger_dst_host` varchar(64) NOT NULL,\n      `req_count` int(11) NOT NULL,\n      `req_per_sec` float NOT NULL,\n      `req_percent` float,\n      `hit_count` int(11) NOT NULL,\n      `hit_per_sec` float NOT NULL,\n      `hit_percent` float,\n      `time_total` float NOT NULL,\n      `time_per_sec` float NOT NULL,\n      `time_percent` float,\n      `ru_utime_total` float NOT NULL,\n      `ru_utime_per_sec` float NOT NULL,\n      `ru_utime_percent` float,\n      `ru_stime_total` float NOT NULL,\n      `ru_stime_per_sec` float NOT NULL,\n      `ru_stime_percent` float,\n      `p50` float NOT NULL,\n      `p75` float NOT NULL,\n      `p95` float NOT NULL,\n      `p99` float NOT NULL,\n      `p100` float NOT NULL,\n      `histogram_data` text NOT NULL\n    ) ENGINE=PINBA DEFAULT CHARSET=latin1\n      COMMENT='v2/timer/60/@pinger_dst_cluster,@pinger_src_host,@pinger_dst_host/hv=0:1000:100000,p50,p75,p95,p99,p100/+pinger_phase=call,+pinger_src_cluster=wwwbma.mlan';\n```\n\nexample (grouped by host_name, script_name, server_name and value timer tag \"tag10\")\n\n```sql\nmysql\u003e CREATE TABLE `report_host_script_server_tag10` (\n      `host` varchar(64) NOT NULL,\n      `script` varchar(64) NOT NULL,\n      `server` varchar(64) NOT NULL,\n      `tag10` varchar(64) NOT NULL,\n      `req_count` int(10) unsigned NOT NULL,\n      `req_per_sec` float NOT NULL,\n      `hit_count` int(10) unsigned NOT NULL,\n      `hit_per_sec` float NOT NULL,\n      `time_total` float NOT NULL,\n      `time_per_sec` float NOT NULL,\n      `ru_utime_total` float NOT NULL,\n      `ru_utime_per_sec` float NOT NULL,\n      `ru_stime_total` float NOT NULL,\n      `ru_stime_per_sec` float NOT NULL\n    ) ENGINE=PINBA DEFAULT CHARSET=latin1\n      COMMENT='v2/timer/60/~host,~script,~server,@tag10/no_percentiles/no_filters';\n\nmysql\u003e select * from report_host_script_server_tag10; -- skipped some fields for brevity\n+-----------+----------------+-------------+-----------+-----------+-----------+------------+----------------+----------------+\n| host      | script         | server      | tag10     | req_count | hit_count | time_total | ru_utime_total | ru_stime_total |\n+-----------+----------------+-------------+-----------+-----------+-----------+------------+----------------+----------------+\n| localhost | script-3.phtml | antoxa-test | select    |       806 |       806 |      5.642 |              0 |              0 |\n| localhost | script-6.phtml | antoxa-test | select    |       805 |       805 |      5.635 |              0 |              0 |\n| localhost | script-0.phtml | antoxa-test | something |       800 |       800 |         12 |              0 |              0 |\n| localhost | script-1.phtml | antoxa-test | select    |       804 |       804 |      5.628 |              0 |              0 |\n| localhost | script-2.phtml | antoxa-test | something |       797 |       797 |     11.955 |              0 |              0 |\n| localhost | script-8.phtml | antoxa-test | select    |       803 |       803 |      5.621 |              0 |              0 |\n| localhost | script-6.phtml | antoxa-test | something |       805 |       805 |     12.075 |              0 |              0 |\n| localhost | script-4.phtml | antoxa-test | select    |       798 |       798 |      5.586 |              0 |              0 |\n| localhost | script-4.phtml | antoxa-test | something |       798 |       798 |      11.97 |              0 |              0 |\n| localhost | script-3.phtml | antoxa-test | something |       806 |       806 |      12.09 |              0 |              0 |\n| localhost | script-1.phtml | antoxa-test | something |       804 |       804 |      12.06 |              0 |              0 |\n| localhost | script-2.phtml | antoxa-test | select    |       797 |       797 |      5.579 |              0 |              0 |\n| localhost | script-9.phtml | antoxa-test | something |       806 |       806 |      12.09 |              0 |              0 |\n| localhost | script-7.phtml | antoxa-test | select    |       801 |       801 |      5.607 |              0 |              0 |\n| localhost | script-5.phtml | antoxa-test | select    |       802 |       802 |      5.614 |              0 |              0 |\n| localhost | script-5.phtml | antoxa-test | something |       802 |       802 |      12.03 |              0 |              0 |\n| localhost | script-9.phtml | antoxa-test | select    |       806 |       806 |      5.642 |              0 |              0 |\n| localhost | script-0.phtml | antoxa-test | select    |       800 |       800 |        5.6 |              0 |              0 |\n| localhost | script-8.phtml | antoxa-test | something |       803 |       803 |     12.045 |              0 |              0 |\n| localhost | script-7.phtml | antoxa-test | something |       801 |       801 |     12.015 |              0 |              0 |\n+-----------+----------------+-------------+-----------+-----------+-----------+------------+----------------+----------------+\n```\n\n\nSystem Reports\n--------------\n\n**Active reports information table**\n\nThis table lists all reports known to the engine with additional information about them.\n\n| Field  | Description |\n|:------ |:----------- |\n| id | internal id, useful for matching reports with system threads. report calls pthread_setname_np(\"rh/[id]\") |\n| table_name | mysql fully qualified table name (including database) |\n| internal_name | the name known to the engine (it never changes with table renames, but you shouldn't really care about that). |\n| kind | internal report kind (one of the kinds described in this doc, like stats, active, etc.) |\n| uptime | time since report creation (seconds) |\n| time_window | time window this reports aggregates data for (that you specify when creating a table) |\n| tick_count | number of ticks, time_window is split into |\n| approx_row_count | approximate row count |\n| approx_mem_used | approximate memory usage |\n| batches_sent | number of packet batches sent from coordinator to report thread |\n| batches_received | number of packet batches received by report thread (if you have != 0 here, you're losing batches and packets) |\n| packets_received | packets received and processed |\n| packets_lost | packets that could not be processed and had to be dropped (aka, report couldn't cope with such packet rate) |\n| packets_aggregated | number of packets that we took useful information from |\n| packets_dropped_by_bloom | number of packets dropped by packet-level bloom filter |\n| packets_dropped_by_filters | number of packets dropped by packet-level filters |\n| packets_dropped_by_rfield | number of packets dropped by request_field aggregation |\n| packets_dropped_by_rtag | number of packets dropped by request_tag aggregation |\n| packets_dropped_by_timertag | number of packets dropped by timer_tag aggregation (i.e. no useful timers) |\n| timers_scanned | number of timers scanned |\n| timers_aggregated | number of timers that we took useful information from |\n| timers_skipped_by_bloom | number of timers skipped by timer-level bloom filter |\n| timers_skipped_by_filters | number of timers skipped by timertag filters |\n| timers_skipped_by_tags | number of timers skipped by not having required tags present |\n| ru_utime | rusage: user time |\n| ru_stime | rusage: system time |\n| last_tick_time | time we last merged temporary data to selectable data |\n| last_tick_prepare_duration | time it took to prepare to merge temp data to selectable data |\n| last_snapshot_merge_duration | time it took to prepare last select (not implemented yet) |\n\nTable comment syntax\n\n    \u003e 'v2/active'\n\nexample\n\n```sql\nmysql\u003e CREATE TABLE IF NOT EXISTS `pinba`.`active` (\n      `id` int(10) unsigned NOT NULL,\n      `table_name` varchar(128) NOT NULL,\n      `internal_name` varchar(128) NOT NULL,\n      `kind` varchar(64) NOT NULL,\n      `uptime` double unsigned NOT NULL,\n      `time_window_sec` int(10) unsigned NOT NULL,\n      `tick_count` int(10) NOT NULL,\n      `approx_row_count` int(10) unsigned NOT NULL,\n      `approx_mem_used` bigint(20) unsigned NOT NULL,\n      `batches_sent` bigint(20) unsigned NOT NULL,\n      `batches_received` bigint(20) unsigned NOT NULL,\n      `packets_received` bigint(20) unsigned NOT NULL,\n      `packets_lost` bigint(20) unsigned NOT NULL,\n      `packets_aggregated` bigint(20) unsigned NOT NULL,\n      `packets_dropped_by_bloom` bigint(20) unsigned NOT NULL,\n      `packets_dropped_by_filters` bigint(20) unsigned NOT NULL,\n      `packets_dropped_by_rfield` bigint(20) unsigned NOT NULL,\n      `packets_dropped_by_rtag` bigint(20) unsigned NOT NULL,\n      `packets_dropped_by_timertag` bigint(20) unsigned NOT NULL,\n      `timers_scanned` bigint(20) unsigned NOT NULL,\n      `timers_aggregated` bigint(20) unsigned NOT NULL,\n      `timers_skipped_by_bloom` bigint(20) unsigned NOT NULL,\n      `timers_skipped_by_filters` bigint(20) unsigned NOT NULL,\n      `timers_skipped_by_tags` bigint(20) unsigned NOT NULL,\n      `ru_utime` double NOT NULL,\n      `ru_stime` double NOT NULL,\n      `last_tick_time` double NOT NULL,\n      `last_tick_prepare_duration` double NOT NULL,\n      `last_snapshot_merge_duration` double NOT NULL\n    ) ENGINE=PINBA DEFAULT CHARSET=latin1 COMMENT='v2/active';\n\n\nmysql\u003e select *, packets_received/uptime as packets_per_sec, timers_scanned/uptime as timers_per_sec, ru_utime/uptime utime_per_sec from active\\G\n*************************** 1. row ***************************\n                          id: 1\n                  table_name: ./pinba/tag_report_perf___10us\n               internal_name: ./pinba/tag_report_perf___10us\n                        kind: report_by_timer_data\n                      uptime: 2316.135996475\n             time_window_sec: 60\n                  tick_count: 60\n            approx_row_count: 10830\n             approx_mem_used: 117561688\n                batches_sent: 185186\n            batches_received: 185186\n            packets_received: 38144533\n                packets_lost: 0\n          packets_aggregated: 6634543\n    packets_dropped_by_bloom: 31509990\n  packets_dropped_by_filters: 0\n   packets_dropped_by_rfield: 0\n     packets_dropped_by_rtag: 0\n packets_dropped_by_timertag: 0\n              timers_scanned: 1455859105\n           timers_aggregated: 1097993658\n     timers_skipped_by_bloom: 357865447\n   timers_skipped_by_filters: 0\n      timers_skipped_by_tags: 0\n                    ru_utime: 182.947086\n                    ru_stime: 4.105066\n              last_tick_time: 1525363484.9716723\n  last_tick_prepare_duration: 0.006995009000000001\nlast_snapshot_merge_duration: 0.000000266\n             packets_per_sec: 16469.038544391762      // 16.5k packets/sec\n              timers_per_sec: 628896.7355846566       // 628k timers/sec, ~38 timers/packet\n               utime_per_sec: 0.0789880586798154      // at ~8% cpu!\n1 row in set (0.01 sec)\n```\n\n\n**Stats (see also: status variables)**\n\nThis table contains internal stats, useful for monitoring/debugging/performance tuning.\n\nTable comment syntax\n\n    \u003e 'v2/stats'\n\nexample\n\n```sql\nmysql\u003e CREATE TABLE IF NOT EXISTS `stats` (\n      `uptime` DOUBLE NOT NULL,\n      `ru_utime` DOUBLE NOT NULL,\n      `ru_stime` DOUBLE NOT NULL,\n      `udp_poll_total` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_recv_total` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_recv_eagain` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_recv_bytes` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_recv_packets` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_packet_decode_err` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_batch_send_total` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_batch_send_err` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_packet_send_total` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_packet_send_err` BIGINT(20) UNSIGNED NOT NULL,\n      `udp_ru_utime` DOUBLE NOT NULL,\n      `udp_ru_stime` DOUBLE NOT NULL,\n      `repacker_poll_total` BIGINT(20) UNSIGNED NOT NULL,\n      `repacker_recv_total` BIGINT(20) UNSIGNED NOT NULL,\n      `repacker_recv_eagain` BIGINT(20) UNSIGNED NOT NULL,\n      `repacker_recv_packets` BIGINT(20) UNSIGNED NOT NULL,\n      `repacker_packet_validate_err` BIGINT(20) UNSIGNED NOT NULL,\n      `repacker_batch_send_total` BIGINT(20) UNSIGNED NOT NULL,\n      `repacker_batch_send_by_timer` BIGINT(20) UNSIGNED NOT NULL,\n      `repacker_batch_send_by_size` BIGINT(20) UNSIGNED NOT NULL,\n      `repacker_ru_utime` DOUBLE NOT NULL,\n      `repacker_ru_stime` DOUBLE NOT NULL,\n      `coordinator_batches_received` BIGINT(20) UNSIGNED NOT NULL,\n      `coordinator_batch_send_total` BIGINT(20) UNSIGNED NOT NULL,\n      `coordinator_batch_send_err` BIGINT(20) UNSIGNED NOT NULL,\n      `coordinator_control_requests` BIGINT(20) UNSIGNED NOT NULL,\n      `coordinator_ru_utime` DOUBLE NOT NULL,\n      `coordinator_ru_stime` DOUBLE NOT NULL,\n      `dictionary_size` BIGINT(20) UNSIGNED NOT NULL,\n      `dictionary_mem_hash` BIGINT(20) UNSIGNED NOT NULL,\n      `dictionary_mem_list` BIGINT(20) UNSIGNED NOT NULL,\n      `dictionary_mem_strings` BIGINT(20) UNSIGNED NOT NULL,\n      `version_info` text(1024) NOT NULL,\n      `build_string` text(1024) NOT NULL\n    ) ENGINE=PINBA DEFAULT CHARSET=latin1 COMMENT='v2/stats';\n```\n\n```sql\nmysql\u003e select *, (repacker_ru_utime/uptime) as repacker_ru_utime_per_sec from stats\\G\n*************************** 1. row ***************************\n                      uptime: 12.482723834\n                    ru_utime: 2.248\n                    ru_stime: 1.12\n              udp_poll_total: 20924\n              udp_recv_total: 49753\n             udp_recv_eagain: 20904\n              udp_recv_bytes: 192375675\n            udp_recv_packets: 870451\n       udp_packet_decode_err: 0\n        udp_batch_send_total: 20915\n          udp_batch_send_err: 0\n       udp_packet_send_total: 870451\n         udp_packet_send_err: 0\n                udp_ru_utime: 0.8680000000000001\n                udp_ru_stime: 0.8240000000000001\n         repacker_poll_total: 20948\n         repacker_recv_total: 41827\n        repacker_recv_eagain: 20912\n       repacker_recv_packets: 870451\nrepacker_packet_validate_err: 0\n   repacker_batch_send_total: 849\nrepacker_batch_send_by_timer: 0\n repacker_batch_send_by_size: 849\n           repacker_ru_utime: 1.1720000000000002\n           repacker_ru_stime: 0.07200000000000001\ncoordinator_batches_received: 849\ncoordinator_batch_send_total: 0\n  coordinator_batch_send_err: 0\ncoordinator_control_requests: 0\n        coordinator_ru_utime: 0.032\n        coordinator_ru_stime: 0\n             dictionary_size: 444\n         dictionary_mem_hash: 6311251\n         dictionary_mem_list: 14208\n      dictionary_mem_strings: 5587\n                version_info: pinba 2.0.8, git: 1afd7eb872a6ef95e34efbbe730aea3926489798, modified: 1\n                build_string: whatever-string-from-configure\n```\n\n\n**Status Variables**\n\nSame values as in stats table, but 'built-in' (no need to create the table), but uglier to use in selects.\n\nExample (all vars)\n\n```sql\nmysql\u003e show status where Variable_name like 'Pinba%';\n+------------------------------------+-----------+\n| Variable_name                      | Value     |\n+------------------------------------+-----------+\n| Pinba_uptime                       | 30.312758 |\n| Pinba_udp_poll_total               | 99344     |\n| Pinba_udp_recv_total               | 227735    |\n| Pinba_udp_recv_eagain              | 99299     |\n| Pinba_udp_recv_bytes               | 367344280 |\n| Pinba_udp_recv_packets             | 1642299   |\n| Pinba_udp_packet_decode_err        | 0         |\n| Pinba_udp_batch_send_total         | 94382     |\n| Pinba_udp_batch_send_err           | 0         |\n| Pinba_udp_ru_utime                 | 24.052000 |\n| Pinba_udp_ru_stime                 | 32.820000 |\n| Pinba_repacker_poll_total          | 94711     |\n| Pinba_repacker_recv_total          | 188709    |\n| Pinba_repacker_recv_eagain         | 94327     |\n| Pinba_repacker_recv_packets        | 1642299   |\n| Pinba_repacker_packet_validate_err | 0         |\n| Pinba_repacker_batch_send_total    | 1622      |\n| Pinba_repacker_batch_send_by_timer | 189       |\n| Pinba_repacker_batch_send_by_size  | 1433      |\n| Pinba_repacker_ru_utime            | 59.148000 |\n| Pinba_repacker_ru_stime            | 23.564000 |\n| Pinba_coordinator_batches_received | 1622      |\n| Pinba_coordinator_batch_send_total | 1104      |\n| Pinba_coordinator_batch_send_err   | 0         |\n| Pinba_coordinator_control_requests | 9         |\n| Pinba_coordinator_ru_utime         | 0.040000  |\n| Pinba_coordinator_ru_stime         | 0.032000  |\n| Pinba_dictionary_size              | 364       |\n| Pinba_dictionary_mem_used          | 6303104   |\n+------------------------------------+-----------+\n29 rows in set (0.00 sec)\n```\n\n\nExample (var combo)\n\n```sql\nmysql\u003e select\n    (select VARIABLE_VALUE from information_schema.global_status where VARIABLE_NAME='PINBA_UDP_RECV_PACKETS')\n    / (select VARIABLE_VALUE from information_schema.global_status where VARIABLE_NAME='PINBA_UPTIME')\n    as packets_per_sec;\n+-------------------+\n| packets_per_sec   |\n+-------------------+\n| 54239.48988125529 |\n+-------------------+\n1 row in set (0.00 sec)\n```\n\nHistograms and Percentiles\n--------------------------\n\nTODO (need help describing details here).\n\nYou don't need to understand this to use the engine.\n\nFor all incoming time data (request_time or timer_value) - we build a histogram representing time values distribution for each 'row' in the report.\nThis allows us to calculate percentiles (with some accuracy, that is given by histogram range and bucket count).\n\nSo each row in every report that has percentiles configured will have a histogram associated with it.\nWhen selecting data from that report, the engine processes the histogram to get percentile values.\n\n**Histogram**\n\nconfig defines the range and bucket count: `hv=\u003cmin_value_ms\u003e:\u003cmax_value_ms\u003e:\u003cbucket_count\u003e`.\nThis defines a histogram with the following structure\n\n```\ngiven\n  \u003chv_range\u003e     = \u003cmax_value_ms\u003e - \u003cmin_value_ms\u003e\n  \u003cbucket_width\u003e = \u003chv_range\u003e / \u003cbucket_count\u003e\n\nhistogram looks like this\n  [negative_infinity bucket] -\u003e number of time values in range (-inf, \u003cmin_value_ms\u003e]\n  [0 bucket]                 -\u003e number of time values in range (\u003cmin_value_ms\u003e, \u003cmin_value_ms\u003e + \u003cbucket_width\u003e]\n  [1 bucket]                 -\u003e number of time values in range (\u003cmin_value_ms\u003e + \u003cbucket_width\u003e, \u003cmin_value_ms\u003e + \u003cbucket_width\u003e * 2]\n....\n  [last bucket]              -\u003e number of time values in range (\u003cmin_value_ms\u003e + \u003cbucket_width\u003e * (\u003cbucket_count\u003e - 1), \u003cmin_value_ms\u003e + \u003cbucket_width\u003e * \u003cbucket_count\u003e]\n  [positive_infinity bucket] -\u003e number of time values in range (\u003cmax_value_ms\u003e, +inf)\n```\n\n**Things to know about percentile caculation**\n\n- when percentile calculation needs to take 'partial bucket' (i.e. not all values from the bucket) - it interpolates percentile value, assuming uniform distribution within the bucket\n- percentile 0   - is always equal to min_value_ms\n- percentile 100 - is always equal to max_value_ms\n\n**Raw Histogram output format**\n\nGeneric format description\n\n```\nhv=\u003cmin_value_ms\u003e:\u003cmin_value_ms\u003e:\u003cbucket_count\u003e;values=[min:\u003cnegative_inf_value\u003e,max:\u003cpositive_inf_value\u003e,\u003cbucket_id\u003e:\u003cvalue\u003e, ...]\n```\n\nExample\n\n```\n// histogram configured with\n//  min_value_ms = 0\n//  min_value_ms = 2000  (aka 2 seconds)\n//  bucket_count = 20000 (so histogram resolution is 2000ms/20000 = 100 microseconds)\n//\n// negative_inf bucket contains 3 values\n// positive_inf bucket contains 3 values\n// and bucket with id = 69, this bucket correspods to bucket (6ms, 7ms]\n//    as buckets are numbered from 0, and (69 + 1)*100microseconds = 7000microseconds = 7milliseconds\nhv=0:2000:20000;values=[min:3,max:3,69:3]\n```\n\n**Percentile caculation example**\n\nGiven the histogram above, say we need to calculate percentile 50 (aka median). Aka, the value that is larger than 50% of the values in the 'value set'.\nOur 'value set' is as follows\n\n```\n[ -inf, -inf, -inf, 7ms, 7ms, 7ms, +inf, +inf, +inf ]\n\nor, transforming 'infinities' into min_value_ms and max_value_ms\n\n[ 0ms, 0ms, 0ms, 7ms, 7ms, 7ms, 2000ms, 2000ms, 2000ms ]\n```\n\n- calculate what '50% of all values' means, got 9 values, 50% is 4.5\n- round 4.5 up, take the value of 5th elt -\u003e 7ms is the answer\n- but, taking into account a point from above (we interpolate within bucket, assuming uniform distribution)\n  - actually the transformed value set will look like this\n  ```\n  [ 0ms, 0ms, 0ms, 6.33(3)ms, 6.66(6)ms, 7ms, 2000ms, 2000ms, 2000ms ]\n  ```\n  since we assume uniform distribution, virtually splitting the bucket into N=3 (the number of values in a bucket) sub-buckets\n- so our answer will be 6.66(6) millseconds\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbadoo%2Fpinba2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbadoo%2Fpinba2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbadoo%2Fpinba2/lists"}