https://github.com/svenslaggare/sqlgrep

sqlgrep = SQL + grep + tail -f
https://github.com/svenslaggare/sqlgrep

grep log-analysis log-parser logging rust sql

Last synced: 5 months ago
JSON representation

sqlgrep = SQL + grep + tail -f

Host: GitHub
URL: https://github.com/svenslaggare/sqlgrep
Owner: svenslaggare
License: mit
Created: 2020-10-31T21:07:01.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2023-12-05T18:46:38.000Z (about 2 years ago)
Last Synced: 2023-12-06T19:20:27.893Z (about 2 years ago)
Topics: grep, log-analysis, log-parser, logging, rust, sql
Language: Rust
Homepage:
Size: 665 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          ![sqlgrep](assets/logo_small.png)

# sqlgrep

Combines SQL with regular expressions to provide a new way to filter and process text files.

## Build

* Requires cargo (https://rustup.rs/).

* Build with: `cargo build --release`

* Build output in `target/release/sqlgrep`

## Example

First, a schema needs to be defined that will transform text lines into structured data:

```

CREATE TABLE connections(

    line = 'connection from ([0-9.]+) \\((.+)?\\) at ([a-zA-Z]+) ([a-zA-Z]+) ([0-9]+) ([0-9]+):([0-9]+):([0-9]+) ([0-9]+)',

    line[1] => ip TEXT,

    line[2] => hostname TEXT,

    line[9] => year INT,

    line[4] => month TEXT,

    line[5] => day INT,

    line[6] => hour INT,

    line[7] => minute INT,

    line[8] => second INT

);

```

If we want to know the IP and hostname for all connections which have a hostname in the file `testdata/ftpd_data.txt` with the table definition above in `testdata/ftpd.txt`  we can do:

```

sqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -c "SELECT ip, hostname FROM connections WHERE hostname IS NOT NULL"

```

We can also do it "live" by tailing following the file (note the `-f` argument):

```

sqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -f -c "SELECT ip, hostname FROM connections WHERE hostname IS NOT NULL"

```

If we want to know how many connection attempts we get per hostname (i.e. a group by query):

```

sqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -c "SELECT hostname, COUNT() AS count FROM connections GROUP BY hostname"

```

See `testdata` folder and `src/integration_tests.rs` for more examples.

# Documentation

Tries to follow the SQL standard, so you should expect that normal SQL queries work. However, not every feature is supported yet.

## Queries

Supported features:

* Where.

* Group by.

* Aggregates.

* Having.

* Inner & outer joins. The joined table is loaded completely in memory.

* Limits.

* Extract(x FROM y) for timestamps.

* Case expressions.

Supported aggregates:

* `count(x)`

* `min(x)`

* `max(x)`

* `sum(x)`

* `avg(x)`

* `stddev(x)`

* `variance(x)`

* `percentile(x, p)`: calculates the `p` percentile of x where `p` in interval `[0.0, 1.0]`

* `bool_and(x)`

* `bool_or(x)`

* `array_agg(x)`

* `string_agg(x, delimiter)`

Supported functions:

* `least(INT|REAL|INTERVAL, INT|REAL|INTERVAL) => INT|REAL|INTERVAL`

* `greatest(INT|REAL|INTERVAL, INT|REAL|INTERVAL) => INT|REAL|INTERVAL`

* `abs(INT|REAL|INTERVAL) => INT|REAL|INTERVAL`

* `sqrt(REAL) => REAL`

* `pow(REAL, REAL) => REAL`

* `regex_matches(TEXT, TEXT) => BOOLEAN`

* `length(TEXT) => INT`

* `upper(TEXT) => TEXT`

* `lower(TEXT) => TEXT`

* `array_unique(ARRAY) => ARRAY`

* `array_length(ARRAY) => INT`

* `array_cat(ARRAY, ARRAY) => ARRAY`

* `array_append(ARRAY, ANY) => ARRAY`

* `array_prepend(ANY, ARRAY) => ARRAY`

* `now() => TIMESTAMP`

* `make_timestamp(INT, INT, INT, INT, INT, INT, INT) => TIMESTAMP`

* `date_trunc(TEXT, TIMESTAMP) => TIMESTAMP`

## Special features

The input file can either be specified using the CLI or as an additional argument to the `FROM` statement as following:

```

SELECT * FROM connections::'file.log';

```

## Tables

### Syntax

```

CREATE TABLE (

    Separate pattern and column definition. Pattern can be used in multiple column definitions.

     = '',

    [] =>  ,

    

    Use regex splits instead of matches.

     = split '',

    Inline regex. Will be bound to the first group

    '' =>  

    

    Array pattern. Will create array of fixed sized based on the given patterns.

    [], [], ... =>  [],

    

    Timestamp pattern. Will create a timestamp. Year, month, day, hour, minute, second. Each part is optional.

    [], [], ... =>  TIMESTAMP,

    

    Json pattern. Will access the given attribute.

    { .field1.field2 } =>  ,

    { .field1[] } =>  ,

);

```

Multiple tables can be defined in the same file.

### Supported types

* `TEXT`: String type.

* `INT`: 64-bits integer type.

* `REAL`: 64-bits float type.

* `BOOLEAN`: Boolean type. When extracting data, it means the _existence_ of a group.

* `[]`: Array types such as `real[]`.

* `TIMESTAMP`: Timestamp type.

* `INTERVAL`: Interval type.

### Modifiers

Placed after the column type and adds additional constraints/transforms when extracting vale for a column.

* `NOT NULL`: The column cannot be `NULL`. If a not null column gets a null value, the row is skipped.

* `TRIM`: Trim string types for whitespaces.

* `CONVERT`: Tries to convert a string value into the value type.

* `DEFAULT `: Use this as default value instead of NULL.

* `MICROSECONDS`: The decimal second part is in microseconds, not milliseconds.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/svenslaggare/sqlgrep

Awesome Lists containing this project

README