https://github.com/gfleury/gstreamtop
Real time analytical SQL Query on text streams
https://github.com/gfleury/gstreamtop
aggregation analytics interactive real-time regex sql stream-processing text
Last synced: 5 months ago
JSON representation
Real time analytical SQL Query on text streams
- Host: GitHub
- URL: https://github.com/gfleury/gstreamtop
- Owner: gfleury
- License: mit
- Created: 2018-10-28T09:52:03.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-26T12:58:04.000Z (over 1 year ago)
- Last Synced: 2024-06-21T17:50:48.326Z (10 months ago)
- Topics: aggregation, analytics, interactive, real-time, regex, sql, stream-processing, text
- Language: Go
- Size: 896 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://travis-ci.org/gfleury/gstreamtop) [](https://codecov.io/gh/gfleury/gstreamtop)
# gstreamtop
Gstreamtop is a text stream SQL query tool (fancy wording). It basically maps plain text lines/json/csv entries as SQL table fields and allowing to run SQL queries. The main purpose is aggregation (queries must have GROUP BY).
Simplifying you can tail a log file and run a SQL query aggregating for something. Differently from ELK stack or Kafka + KSQL (of course there is no comparison between the tools), the idea is to have something locally that you can run quickly (as a test or probe) and without external dependencies.## Example
[](https://asciinema.org/a/Y8qSzmLxPFXFETAMCbCtcYWdB?autoplay=1)
## Testing
```bash
gstreamtop$ make
gstreamtop$ tail -f /var/log/nginx/access_log | ./gstreamtop runQuery combinedlog "SELECT URLIFY(url) as url, COUN(*) as count, SUM(size) as sum, size, MAX(response) FROM log GROUP BY url, size ORDER BY count ASC, size DESC LIMIT 20;"
```## Mappings
The text to field maps is done on the mappings.yaml file. The regex in 'FIELDS IDENTIFIED BY' is the one that creates the mapping.
```yaml
- name: combinedlog
tables:
- CREATE TABLE log(ip VARCHAR, col2 VARCHAR, col3 VARCHAR, dt VARCHAR, method VARCHAR,
url VARCHAR, version VARCHAR, response INTEGER, size INTEGER, referer VARCHAR, useragent
VARCHAR) WITH FIELDS IDENTIFIED BY '^(?P\\S+)\\s(?P\\S+)\\s(?P\\S+)\\s\\[(?P
LINES TERMINATED BY '\n';
```
## Prometheus Exporter
You can export queries as Prometheus metrics and visualize them on a grafana. (This was not the original idea of the tool but it happened to be good to have at the end).