Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/svenslaggare/sqlgrep
sqlgrep = SQL + grep + tail -f
https://github.com/svenslaggare/sqlgrep
grep log-analysis log-parser logging rust sql
Last synced: 19 days ago
JSON representation
sqlgrep = SQL + grep + tail -f
- Host: GitHub
- URL: https://github.com/svenslaggare/sqlgrep
- Owner: svenslaggare
- License: mit
- Created: 2020-10-31T21:07:01.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2023-12-05T18:46:38.000Z (about 1 year ago)
- Last Synced: 2023-12-06T19:20:27.893Z (about 1 year ago)
- Topics: grep, log-analysis, log-parser, logging, rust, sql
- Language: Rust
- Homepage:
- Size: 665 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![sqlgrep](assets/logo_small.png)
# sqlgrep
Combines SQL with regular expressions to provide a new way to filter and process text files.## Build
* Requires cargo (https://rustup.rs/).
* Build with: `cargo build --release`
* Build output in `target/release/sqlgrep`## Example
First, a schema needs to be defined that will transform text lines into structured data:
```
CREATE TABLE connections(
line = 'connection from ([0-9.]+) \\((.+)?\\) at ([a-zA-Z]+) ([a-zA-Z]+) ([0-9]+) ([0-9]+):([0-9]+):([0-9]+) ([0-9]+)',line[1] => ip TEXT,
line[2] => hostname TEXT,
line[9] => year INT,
line[4] => month TEXT,
line[5] => day INT,
line[6] => hour INT,
line[7] => minute INT,
line[8] => second INT
);
```If we want to know the IP and hostname for all connections which have a hostname in the file `testdata/ftpd_data.txt` with the table definition above in `testdata/ftpd.txt` we can do:
```
sqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -c "SELECT ip, hostname FROM connections WHERE hostname IS NOT NULL"
```We can also do it "live" by tailing following the file (note the `-f` argument):
```
sqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -f -c "SELECT ip, hostname FROM connections WHERE hostname IS NOT NULL"
```If we want to know how many connection attempts we get per hostname (i.e. a group by query):
```
sqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -c "SELECT hostname, COUNT() AS count FROM connections GROUP BY hostname"
```See `testdata` folder and `src/integration_tests.rs` for more examples.
# Documentation
Tries to follow the SQL standard, so you should expect that normal SQL queries work. However, not every feature is supported yet.## Queries
Supported features:
* Where.
* Group by.
* Aggregates.
* Having.
* Inner & outer joins. The joined table is loaded completely in memory.
* Limits.
* Extract(x FROM y) for timestamps.
* Case expressions.Supported aggregates:
* `count(x)`
* `min(x)`
* `max(x)`
* `sum(x)`
* `avg(x)`
* `stddev(x)`
* `variance(x)`
* `percentile(x, p)`: calculates the `p` percentile of x where `p` in interval `[0.0, 1.0]`
* `bool_and(x)`
* `bool_or(x)`
* `array_agg(x)`
* `string_agg(x, delimiter)`Supported functions:
* `least(INT|REAL|INTERVAL, INT|REAL|INTERVAL) => INT|REAL|INTERVAL`
* `greatest(INT|REAL|INTERVAL, INT|REAL|INTERVAL) => INT|REAL|INTERVAL`
* `abs(INT|REAL|INTERVAL) => INT|REAL|INTERVAL`
* `sqrt(REAL) => REAL`
* `pow(REAL, REAL) => REAL`
* `regex_matches(TEXT, TEXT) => BOOLEAN`
* `length(TEXT) => INT`
* `upper(TEXT) => TEXT`
* `lower(TEXT) => TEXT`
* `array_unique(ARRAY) => ARRAY`
* `array_length(ARRAY) => INT`
* `array_cat(ARRAY, ARRAY) => ARRAY`
* `array_append(ARRAY, ANY) => ARRAY`
* `array_prepend(ANY, ARRAY) => ARRAY`
* `now() => TIMESTAMP`
* `make_timestamp(INT, INT, INT, INT, INT, INT, INT) => TIMESTAMP`
* `date_trunc(TEXT, TIMESTAMP) => TIMESTAMP`## Special features
The input file can either be specified using the CLI or as an additional argument to the `FROM` statement as following:
```
SELECT * FROM connections::'file.log';
```## Tables
### Syntax
```
CREATE TABLE (
Separate pattern and column definition. Pattern can be used in multiple column definitions.
= '',
[] => ,
Use regex splits instead of matches.
= split '',Inline regex. Will be bound to the first group
'' =>
Array pattern. Will create array of fixed sized based on the given patterns.
[], [], ... => [],
Timestamp pattern. Will create a timestamp. Year, month, day, hour, minute, second. Each part is optional.
[], [], ... => TIMESTAMP,
Json pattern. Will access the given attribute.
{ .field1.field2 } => ,
{ .field1[] } => ,
);
```
Multiple tables can be defined in the same file.### Supported types
* `TEXT`: String type.
* `INT`: 64-bits integer type.
* `REAL`: 64-bits float type.
* `BOOLEAN`: Boolean type. When extracting data, it means the _existence_ of a group.
* `[]`: Array types such as `real[]`.
* `TIMESTAMP`: Timestamp type.
* `INTERVAL`: Interval type.### Modifiers
Placed after the column type and adds additional constraints/transforms when extracting vale for a column.
* `NOT NULL`: The column cannot be `NULL`. If a not null column gets a null value, the row is skipped.
* `TRIM`: Trim string types for whitespaces.
* `CONVERT`: Tries to convert a string value into the value type.
* `DEFAULT `: Use this as default value instead of NULL.
* `MICROSECONDS`: The decimal second part is in microseconds, not milliseconds.