An open API service indexing awesome lists of open source software.

https://github.com/gfleury/gstreamtop

Real time analytical SQL Query on text streams
https://github.com/gfleury/gstreamtop

aggregation analytics interactive real-time regex sql stream-processing text

Last synced: 5 months ago
JSON representation

Real time analytical SQL Query on text streams

Awesome Lists containing this project

README

        

[![Build Status](https://travis-ci.org/gfleury/gstreamtop.svg?branch=master)](https://travis-ci.org/gfleury/gstreamtop) [![codecov](https://codecov.io/gh/gfleury/gstreamtop/branch/master/graph/badge.svg)](https://codecov.io/gh/gfleury/gstreamtop)

# gstreamtop

Gstreamtop is a text stream SQL query tool (fancy wording). It basically maps plain text lines/json/csv entries as SQL table fields and allowing to run SQL queries. The main purpose is aggregation (queries must have GROUP BY).
Simplifying you can tail a log file and run a SQL query aggregating for something. Differently from ELK stack or Kafka + KSQL (of course there is no comparison between the tools), the idea is to have something locally that you can run quickly (as a test or probe) and without external dependencies.

## Example

[![asciicast](https://asciinema.org/a/Y8qSzmLxPFXFETAMCbCtcYWdB.png?autoplay=1)](https://asciinema.org/a/Y8qSzmLxPFXFETAMCbCtcYWdB?autoplay=1)

## Testing

```bash
gstreamtop$ make
gstreamtop$ tail -f /var/log/nginx/access_log | ./gstreamtop runQuery combinedlog "SELECT URLIFY(url) as url, COUN(*) as count, SUM(size) as sum, size, MAX(response) FROM log GROUP BY url, size ORDER BY count ASC, size DESC LIMIT 20;"
```

## Mappings

The text to field maps is done on the mappings.yaml file. The regex in 'FIELDS IDENTIFIED BY' is the one that creates the mapping.

```yaml
- name: combinedlog
tables:
- CREATE TABLE log(ip VARCHAR, col2 VARCHAR, col3 VARCHAR, dt VARCHAR, method VARCHAR,
url VARCHAR, version VARCHAR, response INTEGER, size INTEGER, referer VARCHAR, useragent
VARCHAR) WITH FIELDS IDENTIFIED BY '^(?P\\S+)\\s(?P\\S+)\\s(?P\\S+)\\s\\[(?P

[\\w:\\/]+\\s[+\\-]\\d{4})\\]\\s"(?P\\S+)\\s?(?P\\S+)?\\s?(?P\\S+)?"\\s(?P\\d{3}|-)\\s(?P\\d+|-)\\s?"?(?P[^"]*)"?\\s?"?(?P[^"]*)?"?$'
LINES TERMINATED BY '\n';
```

## Prometheus Exporter

You can export queries as Prometheus metrics and visualize them on a grafana. (This was not the original idea of the tool but it happened to be good to have at the end).