{"id":13587061,"url":"https://github.com/iximiuz/pq","last_synced_at":"2025-04-06T12:11:41.087Z","repository":{"id":46676445,"uuid":"361842359","full_name":"iximiuz/pq","owner":"iximiuz","description":"Parse and Query log files as time series","archived":false,"fork":false,"pushed_at":"2022-08-31T07:48:46.000Z","size":655,"stargazers_count":394,"open_issues_count":8,"forks_count":10,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-30T11:08:33.110Z","etag":null,"topics":["aggregation","log","time-series"],"latest_commit_sha":null,"homepage":"https://iximiuz.com/en/posts/pq/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iximiuz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-26T17:48:42.000Z","updated_at":"2025-03-27T21:54:09.000Z","dependencies_parsed_at":"2023-01-16T16:45:46.958Z","dependency_job_id":null,"html_url":"https://github.com/iximiuz/pq","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iximiuz%2Fpq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iximiuz%2Fpq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iximiuz%2Fpq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iximiuz%2Fpq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iximiuz","download_url":"https://codeload.github.com/iximiuz/pq/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247478324,"owners_count":20945266,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aggregation","log","time-series"],"created_at":"2024-08-01T15:05:59.798Z","updated_at":"2025-04-06T12:11:41.059Z","avatar_url":"https://github.com/iximiuz.png","language":"Rust","readme":"- [pq - Parse and Query log files as time series](#pq---parse-and-query-log-files-as-time-series)\n  * [Why](#why)\n  * [How](#how)\n  * [Usage](#usage)\n  * [Installation](#installation)\n  * [Documentation](#documentation)\n    + [Decoders](#decoders)\n    + [Mappers](#mappers)\n    + [Query language](#query-language)\n    + [Formatters](#formatters)\n    + [Command-line flags and options](#command-line-flags-and-options)\n  * [Interactive Mode Demo](#interactive-mode-demo)\n    + [Secondly HTTP request rate with by (method, status_code) breakdowns](#secondly-http-request-rate-with-by--method--status-code--breakdowns)\n  * [Secondly traffic (in KB/s) aggregated by method](#secondly-traffic--in-kb-s--aggregated-by-method)\n  * [Development](#development)\n  * [Glossary](#glossary)\n\n# pq - Parse and Query log files as time series\n\n[![.github/workflows/rust.yml](https://github.com/iximiuz/pq/actions/workflows/rust.yml/badge.svg?branch=main)](https://github.com/iximiuz/pq/actions/workflows/rust.yml)\n\n**Project is actively being developed!**\n\n\n## Why\n\nI often find myself staring at Nginx or Envoy access logs flooding my screens with real-time data. My only wish at that moment is to be able to aggregate these lines somehow and analyze the output at a slower pace. Ideally, with some familiar and concise query language. Something like that would do:\n\n```bash\ntail -f /var/log/nginx/access.log | \\\n  pq 'nginx:combined | select sum(sum_over_time(content_len{status_code=\"2..\"}[1s])) by (method) / 1024'\n```\n\n\n##  How\n\nThe idea is pretty straightforward - most of the log files around are essentially time series. If we could **parse** an input stream into a series of structured records, we\nwould be able to **query** the derived stream with PromQL-like expressions.\n\n**pq** reads the input stream line by line, applies some decoding and mapping, and produces such a stream of structured _records_.\n\nSimply put, **pq** turns lines into key-value objects (dictionaries). While keys are always strings, values can be of the following types:\n\n- _metric_ (or _tag_) - entries with lower cardinality\n- _value_ (or _field_) - entries with higher cardinality\n- _timestamp_ - the one that makes the input stream a time series.\n\nHaving a stream of timestamped records, **pq** can query it with its own query language. The query language and the query execution model are highly influenced by Prometheus. The query results can be printed with one of the supported formatters (human-readable, JSON, Prometheus API) or displayed on the screen in an interactive way.\n\n\n## Usage\n\nInteractive:\n\n```bash\ndocker logs -f nginx | pq -i '\n/[^\\[]+\\[([^]]+)].+?\\s+\"([^\\s]+)[^\"]*?\"\\s+(\\d+)\\s+(\\d+).*/\n| map { .0:ts, .1 as method, .2:str as status_code, .3 as content_len }\n| select sum(count_over_time(__line__[1s])) by (method)'\n```\n\nFor further analysis (JSON):\n\n```bash\ndocker logs nginx | pq '\n/[^\\[]+\\[([^]]+)].+?\\s+\"([^\\s]+)[^\"]*?\"\\s+(\\d+)\\s+(\\d+).*/\n| map { .0:ts, .1 as method, .2:str as status_code, .3 as content_len }\n| select count_over_time(__line__[1s])\n| to_json' \u003e result.jsonl\n```\n\nYou can also visualize JSON results using the [simplistic plotting utility](graph.html):\n\n![RPS](images/monthly_pageviews_by_lang-result-2000-opt.png)\n\nA better usage example is under construction... See \u003ca href=\"https://iximiuz.com/en/posts/pq/\"\u003ethis article for some screencasts\u003c/a\u003e.\n\n\n## Installation\n\nFor now only the following method is supported:\n\n```bash\ncargo install --git https://github.com/iximiuz/pq\n```\n\nIt requires Cargo and Rust and should probably work on all platforms supported by Rust ecosystem.\n\nEventually, more installation methods will be added (brew, apt, dnf, etc).\n\n\n## Documentation\n\n`pq` accepts _a program_ as its only required argument. A program must\nstart from a _decoder_ clause that can be followed by a _mapper_ clause, and then by a _query_ clause. Also, an optional _formatter_ can be applied at the end:\n\n```bash\npq '\u003cdecoder\u003e'\npq '\u003cdecoder\u003e | \u003cformatter\u003e'\npq '\u003cdecoder\u003e | map \u003cmapper\u003e'\npq '\u003cdecoder\u003e | select \u003cquery\u003e'\npq '\u003cdecoder\u003e | map \u003cmapper\u003e | select \u003cquery\u003e'\npq '\u003cdecoder\u003e | map \u003cmapper\u003e | select \u003cquery\u003e | \u003cformatter\u003e'\n```\n\n### Decoders\n\nCurrently supported input decoders:\n\n- regex `/.../` - uses a regex with match groups to split lines on fields\n- JSON `json` - expects a JSONL input stream\n\nComing soon decoders:\n\n- CSV\n- logfmt (_aka_ scanf)\n- Prometheus\n- InfluxDB\n- Nginx\n- Apache\n- Envoy\n- etc...\n\n\n### Mappers\n\nThe result of decoding is a stream of _raw entries_. Depending on the decoder and the input\nstream, an entry can be a _tuple_ or a _dictionary_. The following syntax is used to map\nan entry to a full-fledged record that can be then used at the query stage.\n\n...for a tuple entry:\n\n```bash\n\u003cdecoder\u003e | map { .0, .1, .3  }          // pick up first, second, and forth elements of a tuple\n                                         // produces the following object: { f0: \u003cval\u003e, f1: \u003cval\u003e, f3: \u003cval\u003e }\n\n\u003cdecoder\u003e | map { .0 as foo, .1 as bar } // produces object { foo: \u003cval\u003e, bar: \u003cval\u003e }\n```\n\n...for a dictionary entry:\n\n```bash\n\u003cdecoder\u003e | map { .foo, .bar  }  // filters out all other fields\n                                 // produces the following object: { foo: \u003cval\u003e, bar: \u003cval\u003e }\n\n\u003cdecoder\u003e | map { .foo as qux }  // produces object { qux: \u003cfoo's val\u003e }\n```\n\nRecord fields are strongly typed. Entry fields may or may not be typed. Appending a field name with `:str`, `:num`, or `:ts` applies dynamic type casting:\n\n```bash\n\u003cdecoder\u003e | map { .foo:str as qux, .bar:num as abc, .baz:ts }\n```\n\nThe timestamp type also supports an optional _format specifier_: `:ts [optional format like \"%Y-%m-%d\"]`. If the format of a timestamp field is not provided, `pq` will try its best to guess the format based on the input samples.\n\n\n### Query language\n\nThe query language is heavily influenced by PromQL. Hopefully existing PromQL\nskills should be totally transferable.\n\nNormally, a query starts from a metric selector:\n\n- `body_bytes` - matches all records with the `body_bytes` metric.\n- `body_bytes{method=\"GET\"}` - takes only GET requests.\n- `body_bytes{method!=\"GET\", status_code~=\"5..\"}` - takes failed non-GET requests.\n\nA query is executed with a given frequency (by default _1 sec_) and a selector \nreturns the latest closest sample from the stream. To get multiple samples, a time \nduration can be added:\n\n- `body_bytes[1s]` - returns secondly buckets of samples\n- `body_bytes{status_code!=\"200\"}[1h30m15s5ms]` - returns all non-200 records for the past `~1h30m`.\n\nAn operator or a function can be applied to a selector.\n\nSupported operators:\n\n- arithmetic `+ - / * ^ %`: `body_bytes{method=\"GET\"} + body_bytes{method=\"POST\"}` or `body_bytes{} / 1024`\n- comparison: `== != \u003c= \u003c \u003e= \u003e`: `body_bytes{} \u003e 1000`\n- aggregation `avg() bottomk() count() group() man() min() sum() topk()`: `min(body_bytes)`\n- coming soon - more aggregations `quantile() stderr() stdvar()`\n- coming soon - logical `and unless or`\n\nSupported functions:\n\n- `avg_over_time(selector[duration])`\n- `count_over_time(selector[duration])`\n- `last_over_time(selector[duration])`\n- `min_over_time(selector[duration])`\n- `max_over_time(selector[duration])`\n- `sum_over_time(selector[duration])`\n- coming soon - other well-known functions...\n\nAnd most of the expressions can be combined. Ex:\n\n```SQL\nsum(sum_over_time(content_len[1s])) by (method) / 1024\n```\n\n### Formatters\n\nCurrently supported output formatters:\n\n- human-readable (implicit, used by default)\n- JSON `to_json`\n- Prometheus API-like `to_promapi`\n- interactive via `-i` flag.\n\nComing soon formatters:\n\n- PromQL\n\n### Command-line flags and options\n\n**pq** also accepts some optional command-line flags and named arguments:\n\n```bash\nFLAGS:\n    -i, --interactive\n    -v, --verbose\n\nOPTIONS:\n    -I, --interval \u003cinterval\u003e  # same meaning as in Prometheus\n    -b, --lookback \u003clookback\u003e  # same meaning as in Prometheus\n    -s, --since \u003csince\u003e\n    -u, --until \u003cuntil\u003e\n```\n\n\n## Interactive Mode Demo\n\nThe stage consists of a web server and some number of concurrent clients generating the traffic.\n\n```bash\n# Launch a test web server.\ndocker run -p 55055:80 --rm --name test_server nginx\n\n# In another terminal, start pouring some well-known but diverse traffic.\n# Notice, `-q` means Query Rate and `-c` means multiplier.\nhey -n 1000000 -q 80 -c 2 -m GET http://localhost:55055/ \u0026\nhey -n 1000000 -q 60 -c 2 -m GET http://localhost:55055/qux \u0026\nhey -n 1000000 -q 40 -c 2 -m POST http://localhost:55055/ \u0026\nhey -n 1000000 -q 20 -c 2 -m PUT http://localhost:55055/foob \u0026\nhey -n 1000000 -q 10 -c 2 -m PATCH http://localhost:55055/ \u0026\n```\n\nAccess log in the first terminal looks impossible to analyze in real-time, right? Interactive `pq` mode to the rescue!\n\n### Secondly HTTP request rate with by (method, status_code) breakdowns\n\n```bash\ndocker logs -n 1000 -f test_server | \\\n    pq '/[^\\[]+\\[([^]]+)]\\s+\"([^\\s]+)[^\"]*?\"\\s+(\\d+)\\s+(\\d+).*/\n        | map { .0:ts, .1 as method, .2:str as status_code, .3 as content_len } \n        | select count_over_time(__line__[1s])' \\\n    -i\n```\n\n![RPS](images/rps-2000-opt.png)\n\n\n## Secondly traffic (in KB/s) aggregated by method\n\nSlightly more advanced query - use aggregation by HTTP method only:\n\n```bash\ndocker logs -n 1000 -f test_server | \\\n    pq '/[^\\[]+\\[([^]]+)]\\s+\"([^\\s]+)[^\"]*?\"\\s+(\\d+)\\s+(\\d+).*/\n        | map { .0:ts, .1 as method, .2:str as status_code, .3 as content_len } \n        | sum(sum_over_time(content_len[1s])) by (method) / 1024' \\\n    -i\n```\n\n![BPS](images/bps-2000-opt.png)\n\nFor more use cases, see [tests/scenarios folder](tests/scenarios).\n\n## Development\n\nContribution is always welcome!\n\n```bash\n# Build it with\nmake\n\n# Test it with\nmake test-all\nmake test-e2e\n\n# Run a certain e2e test\nE2E_CASE=vector_matching_one_to_one_010 make test-e2e\n```\n\n## Glossary\n\n- Time Series - a stream of timestamped values, _aka_ samples sharing the same metric name and, optionally, the same set of labels (i.e. a unique combination of key-value pairs).\n- Metric name - a human-readable name of a measurement. E.g. `http_requests_total`, `content_length`, etc).\n- Metric type - counter, gauge, histogram, and summary.\n- Label - a dimension of the measurement. E.g. `method`, `url`, etc.\n- Sample - _aka_ data point - a (value, timestamp) tuple. Value is always float64 and timestamp is always with millisecond precision.\n- Instant vector - a type of expression evaluation - a set of time series (vector) containing a single sample for each time series, all sharing the same timestamp.\n- Range vector - a type of expression evaluation - a set of time series containing a range of data points over time for each time series.\n- Scalar and string - two other expression evaluation results.\n- Vector selector - expression of a form `\u003cmetric_name\u003e[{label1=value1[, label2=value2, ...]}][[time_duration]]`.\n","funding_links":[],"categories":["Rust"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiximiuz%2Fpq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiximiuz%2Fpq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiximiuz%2Fpq/lists"}