{"id":22302338,"url":"https://github.com/svenslaggare/sqlgrep","last_synced_at":"2025-07-29T03:33:11.003Z","repository":{"id":55328768,"uuid":"308972024","full_name":"svenslaggare/sqlgrep","owner":"svenslaggare","description":"sqlgrep = SQL + grep + tail -f","archived":false,"fork":false,"pushed_at":"2023-12-05T18:46:38.000Z","size":681,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-12-06T19:20:27.893Z","etag":null,"topics":["grep","log-analysis","log-parser","logging","rust","sql"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/svenslaggare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-10-31T21:07:01.000Z","updated_at":"2023-12-06T19:20:27.894Z","dependencies_parsed_at":"2023-11-30T19:43:44.199Z","dependency_job_id":null,"html_url":"https://github.com/svenslaggare/sqlgrep","commit_stats":null,"previous_names":[],"tags_count":16,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/svenslaggare%2Fsqlgrep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/svenslaggare%2Fsqlgrep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/svenslaggare%2Fsqlgrep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/svenslaggare%2Fsqlgrep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/svenslaggare","download_url":"https://codeload.github.com/svenslaggare/sqlgrep/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227977351,"owners_count":17850379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["grep","log-analysis","log-parser","logging","rust","sql"],"created_at":"2024-12-03T18:36:31.196Z","updated_at":"2024-12-03T18:36:31.877Z","avatar_url":"https://github.com/svenslaggare.png","language":"Rust","readme":"![sqlgrep](assets/logo_small.png)\n# sqlgrep\nCombines SQL with regular expressions to provide a new way to filter and process text files.\n\n## Build\n* Requires cargo (https://rustup.rs/).\n* Build with: `cargo build --release`\n* Build output in `target/release/sqlgrep`\n\n## Example\nFirst, a schema needs to be defined that will transform text lines into structured data:\n```\nCREATE TABLE connections(\n    line = 'connection from ([0-9.]+) \\\\((.+)?\\\\) at ([a-zA-Z]+) ([a-zA-Z]+) ([0-9]+) ([0-9]+):([0-9]+):([0-9]+) ([0-9]+)',\n\n    line[1] =\u003e ip TEXT,\n    line[2] =\u003e hostname TEXT,\n    line[9] =\u003e year INT,\n    line[4] =\u003e month TEXT,\n    line[5] =\u003e day INT,\n    line[6] =\u003e hour INT,\n    line[7] =\u003e minute INT,\n    line[8] =\u003e second INT\n);\n```\n\nIf we want to know the IP and hostname for all connections which have a hostname in the file `testdata/ftpd_data.txt` with the table definition above in `testdata/ftpd.txt`  we can do:\n\n```\nsqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -c \"SELECT ip, hostname FROM connections WHERE hostname IS NOT NULL\"\n```\n\nWe can also do it \"live\" by tailing following the file (note the `-f` argument):\n\n```\nsqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -f -c \"SELECT ip, hostname FROM connections WHERE hostname IS NOT NULL\"\n```\n\nIf we want to know how many connection attempts we get per hostname (i.e. a group by query):\n\n```\nsqlgrep -d testdata/ftpd.txt testdata/ftpd_data.txt -c \"SELECT hostname, COUNT() AS count FROM connections GROUP BY hostname\"\n```\n\nSee `testdata` folder and `src/integration_tests.rs` for more examples.\n\n# Documentation\nTries to follow the SQL standard, so you should expect that normal SQL queries work. However, not every feature is supported yet.\n\n## Queries\nSupported features:\n* Where.\n* Group by.\n* Aggregates.\n* Having.\n* Inner \u0026 outer joins. The joined table is loaded completely in memory.\n* Limits.\n* Extract(x FROM y) for timestamps.\n* Case expressions.\n\nSupported aggregates:\n* `count(x)`\n* `min(x)`\n* `max(x)`\n* `sum(x)`\n* `avg(x)`\n* `stddev(x)`\n* `variance(x)`\n* `percentile(x, p)`: calculates the `p` percentile of x where `p` in interval `[0.0, 1.0]`\n* `bool_and(x)`\n* `bool_or(x)`\n* `array_agg(x)`\n* `string_agg(x, delimiter)`\n\nSupported functions:\n* `least(INT|REAL|INTERVAL, INT|REAL|INTERVAL) =\u003e INT|REAL|INTERVAL`\n* `greatest(INT|REAL|INTERVAL, INT|REAL|INTERVAL) =\u003e INT|REAL|INTERVAL`\n* `abs(INT|REAL|INTERVAL) =\u003e INT|REAL|INTERVAL`\n* `sqrt(REAL) =\u003e REAL`\n* `pow(REAL, REAL) =\u003e REAL`\n* `regex_matches(TEXT, TEXT) =\u003e BOOLEAN`\n* `length(TEXT) =\u003e INT`\n* `upper(TEXT) =\u003e TEXT`\n* `lower(TEXT) =\u003e TEXT`\n* `array_unique(ARRAY) =\u003e ARRAY`\n* `array_length(ARRAY) =\u003e INT`\n* `array_cat(ARRAY, ARRAY) =\u003e ARRAY`\n* `array_append(ARRAY, ANY) =\u003e ARRAY`\n* `array_prepend(ANY, ARRAY) =\u003e ARRAY`\n* `now() =\u003e TIMESTAMP`\n* `make_timestamp(INT, INT, INT, INT, INT, INT, INT) =\u003e TIMESTAMP`\n* `date_trunc(TEXT, TIMESTAMP) =\u003e TIMESTAMP`\n\n## Special features\nThe input file can either be specified using the CLI or as an additional argument to the `FROM` statement as following:\n```\nSELECT * FROM connections::'file.log';\n```\n\n## Tables\n### Syntax\n```\nCREATE TABLE \u003cname\u003e(\n    Separate pattern and column definition. Pattern can be used in multiple column definitions.\n    \u003cpattern name\u003e = '\u003cregex patern\u003e',\n    \u003cpattern name\u003e[\u003cgroup index\u003e] =\u003e \u003ccolumn name\u003e \u003ccolumn type\u003e,\n    \n    Use regex splits instead of matches.\n    \u003cpattern name\u003e = split '\u003cregex patern\u003e',\n\n    Inline regex. Will be bound to the first group\n    '\u003cregex patern\u003e' =\u003e \u003ccolumn name\u003e \u003ccolumn type\u003e\n    \n    Array pattern. Will create array of fixed sized based on the given patterns.\n    \u003cpattern name\u003e[\u003cgroup index\u003e], \u003cpattern name\u003e[\u003cgroup index\u003e], ... =\u003e \u003ccolumn name\u003e \u003celement type\u003e[],\n    \n    Timestamp pattern. Will create a timestamp. Year, month, day, hour, minute, second. Each part is optional.\n    \u003cpattern name\u003e[\u003cgroup index\u003e], \u003cpattern name\u003e[\u003cgroup index\u003e], ... =\u003e \u003ccolumn name\u003e TIMESTAMP,\n    \n    Json pattern. Will access the given attribute.\n    { .field1.field2 } =\u003e \u003ccolumn name\u003e \u003ccolumn type\u003e,\n    { .field1[\u003carray index\u003e] } =\u003e \u003ccolumn name\u003e \u003ccolumn type\u003e,\n);\n```\nMultiple tables can be defined in the same file.\n\n### Supported types\n* `TEXT`: String type.\n* `INT`: 64-bits integer type.\n* `REAL`: 64-bits float type.\n* `BOOLEAN`: Boolean type. When extracting data, it means the _existence_ of a group.\n* `\u003celement type\u003e[]`: Array types such as `real[]`.\n* `TIMESTAMP`: Timestamp type.\n* `INTERVAL`: Interval type.\n\n### Modifiers\nPlaced after the column type and adds additional constraints/transforms when extracting vale for a column.\n* `NOT NULL`: The column cannot be `NULL`. If a not null column gets a null value, the row is skipped.\n* `TRIM`: Trim string types for whitespaces.\n* `CONVERT`: Tries to convert a string value into the value type.\n* `DEFAULT \u003cvalue\u003e`: Use this as default value instead of NULL.\n* `MICROSECONDS`: The decimal second part is in microseconds, not milliseconds.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsvenslaggare%2Fsqlgrep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsvenslaggare%2Fsqlgrep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsvenslaggare%2Fsqlgrep/lists"}