https://github.com/dispensable/tailsql
Continually run sql query on tumbling/sliding window of log files records
https://github.com/dispensable/tailsql
duckdb log sql sqlite stream sysadmin tail tools
Last synced: about 2 months ago
JSON representation
Continually run sql query on tumbling/sliding window of log files records
- Host: GitHub
- URL: https://github.com/dispensable/tailsql
- Owner: dispensable
- License: other
- Created: 2024-04-05T09:01:37.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-12T15:22:30.000Z (about 2 years ago)
- Last Synced: 2025-06-17T17:50:45.913Z (about 1 year ago)
- Topics: duckdb, log, sql, sqlite, stream, sysadmin, tail, tools
- Language: Go
- Homepage:
- Size: 58.6 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tailsql
tailsql can continually run sql query on tumbling/sliding window of log files records.
# why
- when reading logs using `tail -f`, sometimes it flushs too fast to read
- sometime you just wanna do some simple aggregation/join analytics, and awk can't express this with one line
- you wanna aggregate data like stream job, but traditional unix tools works like batch job
- you wanna compare two or more log files on sepecific fields but other records rushed into the small screen
# usage
It's a go-stream pipeline job like:
```
log -> parse -> filter -> throttler -> sql row \
log -> parse -> filter -> throttler -> sql row -> window -> insert as table -> run sql(duckdb/sqlite) -> sink
log -> parse -> filter -> throttler -> sql row /
```
for example log format like this: `2024/04/12 22:47:42.506277 GETM SUCC localhost:7710 605` (ts method result server time_used)
you can run query to analytic your log
```bash
tailsql query \
-f my.log \
# use capture group to parse log to table row
# __ is seperator before is field name after is data type
# now support int/float/date/str
-r '.+ (?PGETM) (?PSUCC) .+7710 (?P[0-9]+) .+' \
# only contain time > 10000 records
# filter syntax is like SQL where
-F 'time > 10000' \
-s stdout \
# sliding windows, size 10s sliding interval 5s, use input time not log time
# you can use your logs time, just change -1 to the index of your ts capture group
-w '10:5:-1' \
# parsed rows filtered then insert to db engine for anylytic
# support duckdb/sqlite/qlbridge membtree
-d duckdb \
# format as table
-o table \
# query
'select count(1) from t0 where time > 12275'
```
will get result like:
```
>>> Press CTRL + C to quit ...
INFO[2024-04-12T22:41:56+08:00] Query stream started, please wait 10s...
INFO[2024-04-12T22:42:06+08:00] >> query result <<
Run sql `select count(1) from t0 where time > 12275`:
+----------+
| COUNT(1) |
+----------+
| 71 |
+----------+
INFO[2024-04-12T22:42:11+08:00] >> query result <<
Run sql `select count(1) from t0 where time > 12275`:
+----------+
| COUNT(1) |
+----------+
| 86 |
+----------+
INFO[2024-04-12T22:42:16+08:00] >> query result <<
Run sql `select count(1) from t0 where time > 12275`:
+----------+
| COUNT(1) |
+----------+
| 81 |
+----------+
INFO[2024-04-12T22:42:21+08:00] >> query result <<
Run sql `select count(1) from t0 where time > 12275`:
+----------+
| COUNT(1) |
+----------+
| 63 |
+----------+
^C>>> User ask to quit ...
```