Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stastnypremysl/lsql-csv
lsql-csv is a tool for small CSV file data querying from a shell with short queries. It makes it possible to work with small CSV files like with a read-only relational databases. The tool implements a new language LSQL similar to SQL, specifically designed for working with CSV files in shell.
https://github.com/stastnypremysl/lsql-csv
csv data-analysis data-processing haskell language linux-shell lsql lsql-csv new-language query-language relational-database sql unix-command unix-philosophy unix-shell
Last synced: 27 days ago
JSON representation
lsql-csv is a tool for small CSV file data querying from a shell with short queries. It makes it possible to work with small CSV files like with a read-only relational databases. The tool implements a new language LSQL similar to SQL, specifically designed for working with CSV files in shell.
- Host: GitHub
- URL: https://github.com/stastnypremysl/lsql-csv
- Owner: stastnypremysl
- License: gpl-3.0
- Created: 2020-05-25T17:36:14.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-04-17T07:50:06.000Z (7 months ago)
- Last Synced: 2024-04-18T02:02:31.332Z (7 months ago)
- Topics: csv, data-analysis, data-processing, haskell, language, linux-shell, lsql, lsql-csv, new-language, query-language, relational-database, sql, unix-command, unix-philosophy, unix-shell
- Language: Haskell
- Homepage:
- Size: 2.71 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# `lsql-csv`
`lsql-csv` is a tool for CSV file data querying from a shell with short queries. It makes it possible to work with small CSV files like with a read-only relational database.The tool implements a new language LSQL similar to SQL, specifically designed for working with CSV files in a shell. LSQL aims to be a more lapidary language than SQL. Its design purpose is to enable its user to quickly write simple queries directly
to the terminal - its design purpose is therefore different from SQL, where the readability of queries is more taken into account than in LSQL.## Installation
It is necessary, you had GHC (`>=8 <9.29`) and Haskell packages Parsec (`>=3.1 <3.2`), Glob (`>=0.10 <0.11`), base (`>=4.9 <4.20`), text (`>=1.2 <2.2`) and containers (`>=0.5 <0.8`)
installed. (The package boundaries given are identical to the boundaries in Cabal package file.) For a build and an installation run:make
sudo make install
Now the `lsql-csv` is installed in `/usr/local/bin`. If you want, you can specify `INSTALL_DIR` like:sudo make INSTALL_DIR=/custom/install-folder install
This will install the package into `INSTALL_DIR`.
If you have installed `cabal`, you can alternatively run:
cabal install
It will also install the Haskell package dependencies for you.The package is also published at https://hackage.haskell.org/package/lsql-csv in the Hackage public repository. You can therefore also install it directly without the repository cloned with:
cabal install lsql-csv
### Running the unit tests
If you want to verify, that the package has been compiled correctly, it is possible to test it by running:make test
This will run all tests for you.
## `lsql-csv` - quick introduction
### Examples
One way to learn a new programming language is by understanding concrete examples of its usage. The following examples are written explicitly for the purpose of teaching a reader, how to use the tool `lsql-csv` by showing him many examples of its usage.The following examples might not be enough for readers, who don't know enough Unix/Linux scripting. If this is the case, please consider learning Unix/Linux scripting first before LSQL.
It is also advantageous to know SQL.
The following examples will be mainly about parsing of `/etc/passwd` and parsing of `/etc/group`. To make example reading more comfortable, we have added `/etc/passwd` and `/etc/group` column descriptions from man pages to the text.
File `/etc/passwd` has the following columns:
* login name;
* optional encrypted password;
* numerical user ID;
* numerical group ID;
* user name or comment field;
* user home directory;
* optional user command interpreter.File `/etc/group` has the following columns:
* group name;
* password;
* numerical group ID;
* user list separated by commas.#### Hello World
lsql-csv '-, &1.2 &1.1'
This will print the second (`&1.2`) and the first column (`&1.1`) of a CSV file on the standard input. If you know SQL, you can read it like `SELECT S.second, S.first FROM stdio S;`.
Commands are split by commas into blocks. The first block is (*and always is*) the from block. There are file names or `-` (the standard input) separated by whitespaces.
The second block in the example is the select block, also separated by whitespaces.For example:
lsql-csv '-, &1.2 &1.1' <<- EOF
World,Hello
EOF
It returns:
Hello,World#### Simple filtering
lsql-csv -d: '-, &1.*, if &1.3 >= 1000' = 1000`. It can also be written as:
lsql-csv -d: 'p=/etc/passwd, p.*, if p.3 >= 1000'
lsql-csv -d: 'p=/etc/passwd, &1.*, if &1.3 >= 1000'lsql-csv -d: '/etc/passwd, &1.*, if &1.3 >= 1000'
The `-d:` optional argument means the primary delimiter is `:`. In the few examples we used overnaming, which allows us to give a data source file `/etc/passwd` a name `p`.If you know SQL, you can read it as `SELECT * FROM /etc/passwd P WHERE P.UID >= 1000;`. As you can see, the LSQL style is more compressed than standard SQL.
The output might be:nobody:x:65534:65534:nobody:/var/empty:/bin/false
me:x:1000:1000::/home/me:/bin/bashIf you specify delimiter specifically for `/etc/passwd`, the output will be a comma delimited.
lsql-csv '/etc/passwd -d:, &1.*, if &1.3 >= 1000'It might return:
nobody,x,65534,65534,nobody,/var/empty,/bin/false
me,x,1000,1000,,/home/me,/bin/bashThis happens because the default global delimiter, which is used for the output generation, is a comma.
The global delimiter changes by usage of the command-line optional argument, but remains unchanged by the usage of the attribute inside the from block.#### Named columns
Let's suppose we have a file `people.csv`:
name,age
Adam,21
Petra,23
Karel,25Now, let's get all the names of people in `people.csv` using the `-n` named optional argument:
lsql-csv -n 'people.csv, &1.name'
The output will be:
Adam
Petra
KarelAs you can see, we can reference named columns by the name. The named optional argument `-n` enables first-line headers.
If first-line headers are enabled by the argument, each column has two names under `&X` - the number name `&X.Y` and the actual name `&X.NAME`.Now, we can select all columns with a wildcard `&1.*`:
lsql-csv -n 'people.csv, &1.*'
As the output, we get:
Adam,21,21,Adam
Petra,23,23,Petra
Karel,25,25,KarelThe output contains each column twice because the wildcard `&1.*` was evaluated to `&1.1, &1.2, &1.age, &1.name`.
How to fix it?lsql-csv -n 'people.csv, &1.[1-9]*'
The output is now:
Adam,21
Petra,23
Karel,25The command can also be written as
lsql-csv -n 'people.csv, &1.{1,2}'
lsql-csv -n 'people.csv, &1.{1..2}'
lsql-csv 'people.csv -n, &1.{1..2}'The output will be in all cases still the same.
#### Simple join
Let's say, I am interested in the default group names of users. We need to join tables `/etc/passwd` and `/etc/group`. Let's do it.
lsql-csv -d: '/etc/{passwd,group}, &1.1 &2.1, if &1.4 == &2.3'
What does `/etc/{passwd,group}` mean? Basically, there are three types of expressions. The select, the from, and the arithmetic expression.
In all select and from expressions, you can use the curly expansion and wildcards just like in `bash`.
Finally, the output can be something like this:root:root
bin:bin
daemon:daemon
me:meThe first column is the name of a user and the second column is the name of its default group.
#### Basic grouping
Let's say, I want to count users using the same shell.lsql-csv -d: 'p=/etc/passwd, p.7 count(p.3), by p.7'
And the output?/bin/bash:7
/bin/false:7
/bin/sh:1
/bin/sync:1
/sbin/halt:1
/sbin/nologin:46
/sbin/shutdown:1
You can see here the first usage of the by block, which is equivalent to `GROUP BY` in SQL.#### Basic sorting
Let's say, you want to sort your users by `UID` with `UID` greater than or equal to 1000 ascendingly.lsql-csv -d: '/etc/passwd, &1.*, if &1.3 >= 1000, sort &1.3'
The output might look like:
me1:x:1000:1000::/home/me1:/bin/bash
me2:x:1001:1001::/home/me2:/bin/bash
me3:x:1002:1002::/home/me3:/bin/bash
nobody:x:65534:65534:nobody:/var/empty:/bin/falseThe sort block is the equivalent of `ORDER BY` in SQL.
If we wanted descendingly sorted output, we might create a pipe to the `tac` command - the `tac` command prints the lines in reverse order:
lsql-csv -d: '/etc/passwd, &1.*, if &1.3 >= 1000, sort &1.3' | tac
#### About nice outputs
There is a trick, how to concatenate two values in the select expression: Write them without space.But how will the interpreter know the ends of the values in a command? If the interpreter sees a char, that can't be part of the currently parsed value, it tries to parse it as a new value concatenated to the current one.
You can use quotes for it - quotes themselves can't be part of most value types like the column name or numerical constant.As an example, let's try to format our basic grouping example.
lsql-csv -d: 'p=/etc/passwd, "The number of users of "p.7" is "count(p.3)".", by p.7
The output might be:
The number of users of /bin/bash is 7.
The number of users of /bin/false is 7.
The number of users of /bin/sh is 1.
The number of users of /bin/sync is 1.
The number of users of /sbin/halt is 1.
The number of users of /sbin/nologin is 46.
The number of users of /sbin/shutdown is 1.
As you can see, string formatting is sometimes very simple with LSQL.#### Arithmetic expression
So far, we just met all kinds of blocks, and only the if block accepts the arithmetic expression, and the other accepts the select expression.
What if we needed to run the arithmetic expression inside the select expression? There is a special syntax `$(...)` for it.For example:
lsql-csv -d: '/etc/passwd, $(sin(&1.3)^2 + cos(&1.3)^2)'
It returns something like:
1.0
1.0
1.0
0.9999999999999999
...
1.0If we run:
lsql-csv -d: '/etc/passwd, $(&1.3 >= 1000), sort $(&1.3 >= 1000)'We get something like:
false
false
...
false
true
true
...
true#### More complicated join
Let's see more complicated examples.
lsql-csv -d: 'p=/etc/passwd g=/etc/group, p.1 g.1, if p.1 in g.4'
This will print all pairs of users and its group excluding the default group.
If you know SQL, you can read it as `SELECT P.1, G.1 FROM /etc/passwd P, /etc/group G WHERE G.4 LIKE '%' + P.1 + '%';` with operator `LIKE` case-sensitive and columns named by their column number.How does `in` work? It's one of the basic string level "consist". If some string `A` is a substring of `B`, then `A in B` is `true`. Otherwise, it is `false`.
And the output?
root:root
root:wheel
root:floppy
root:tape
lp:lp
halt:root
halt:wheelThe example will work under the condition, that there isn’t any username, which is an infix of any other username.
#### More complicated...
The previous example doesn't give a very readable output. We can use `group by` to improve it (shortened as `by`).
lsql-csv -d: 'p=/etc/passwd g=/etc/group, p.1 cat(g.1","), if p.1 in g.4, by p.1'
The output will be something like:
adm:adm,disk,sys,
bin:bin,daemon,sys,
daemon:adm,bin,daemon,
lp:lp,
mythtv:audio,cdrom,tty,video,
news:news,
It groups all non-default groups of a user to a one line and concatenates it delimited by `,`.How can we add default groups too?
lsql-csv -d: 'p=/etc/passwd g=/etc/group, p.1 cat(g.1","), if p.1 in g.4, by p.1' |
lsql-csv -d: '- /etc/passwd /etc/group, &1.1 &1.2""&3.1, if &1.1 == &2.1 && &2.4 == &3.3'
This will output something like:adm:adm,disk,sys,adm
bin:bin,daemon,sys,bin
daemon:adm,bin,daemon,daemon
lp:lp,lp
mythtv:audio,cdrom,tty,video,mythtv
news:news,newsThe first part of the command is the same as in the previous example. The second part inner joins the output
of the first part with `/etc/passwd` on the username and `/etc/group` on the default GID number and prints
the output of the first part with an added default group name.The examples will also work under the condition, that there isn’t any username,
which is an infix of any other username.## Usage
Now, if you understand the examples, it is time to move forward to a more abstract description of the language and tool usage.### Options
-h
--helpShows a short command line help and exits before doing anything else.
-n
--namedEnables the first-line naming convention in CSV files.
With this option, the first lines of CSV files will be interpreted as a list of column names.This works only on input files. Output is always without first-line column
names.
-dCHAR
--delimiter=CHARChanges the default primary delimiter. The default value is `,`.
-sCHAR
--secondary-delimiter=CHAR
Changes the default quote char (secondary delimiter). The default value is `"`.### Datatypes
There are 4 datatypes considered: `Bool`, `Int`, `Double`, and `String`.
`Bool` is either `true`/`false`, `Int` is at least a 30-bit integer, `Double` is a double-precision floating point number, and `String` is an ordinary char string.During CSV data parsing, the following logic of datatype selection is used:
* `Bool`, if `true` or `false`;
* `Int`, if the POSIX ERE `[0-9]+` fully matches;
* `Double`, if the POSIX ERE `[0-9]+\.[0-9]+(e[0-9]+)?` fully matches;
* `String`, if none of the above matches.### Joins
Join means, that you put multiple input files into the from block.Joins always have the time complexity O(nm).
There is no optimization made based on if conditions when you put multiple files into the from block.### Documentation of language
lsql-csv [OPTIONS] COMMAND
Description of the grammar:
COMMAND -> FROM_BLOCK, REST
REST -> SELECT_BLOCK, REST
REST -> BY_BLOCK, REST
REST -> SORT_BLOCK, REST
REST -> IF_BLOCK, REST
REST -> LAST_BLOCK
LAST_BLOCK -> SELECT_BLOCK
LAST_BLOCK -> BY_BLOCK
LAST_BLOCK -> SORT_BLOCK
LAST_BLOCK -> IF_BLOCK
FROM_BLOCK -> FROM_EXPRFROM_EXPR -> FROM_SELECTOR FROM_EXPR
FROM_EXPR -> FROM_SELECTOR// Wildcard and brace expansion
FROM_SELECTOR ~~> FROM ... FROM// Standard input
FROM -> ASSIGN_NAME=- OPTIONS
FROM -> - OPTIONS
FROM -> ASSIGN_NAME=FILE_PATH OPTIONS
FROM -> FILE_PATH OPTIONSOPTIONS -> -dCHAR OPTIONS
OPTIONS -> --delimiter=CHAR OPTIONSOPTIONS -> -sCHAR OPTIONS
OPTIONS -> --secondary-delimiter=CHAR OPTIONSOPTIONS -> -n OPTIONS
OPTIONS -> --named OPTIONSOPTIONS -> -N OPTIONS
OPTIONS -> --not-named OPTIONSOPTIONS ->
SELECT_BLOCK -> SELECT_EXPR
BY_BLOCK -> by SELECT_EXPR
SORT_BLOCK -> sort SELECT_EXPR
IF_BLOCK -> if ARITHMETIC_EXPR
ARITHMETIC_EXPR -> ATOMARITHMETIC_EXPR -> ARITHMETIC_EXPR OPERATOR ARITHMETIC_EXPR
ARITHMETIC_EXPR -> (ARITHMETIC_EXPR)// Logical negation
ARITHMETIC_EXPR -> ! ARITHMETIC_EXPR
// Number negation
ARITHMETIC_EXPR -> - ARITHMETIC_EXPR
SELECT_EXPR -> ATOM_SELECTOR SELECT_EXPR
SELECT_EXPR -> ATOM_SELECTOR
// Wildcard and brace expansion
ATOM_SELECTOR ~~> ATOM ... ATOM
ATOM -> pi
ATOM -> e
ATOM -> true
ATOM -> false// e.g. 1.0, "text", 'text', 1
ATOM -> CONSTANT
// e.g. &1.1
ATOM -> SYMBOL_NAMEATOM -> $(ARITHMETIC_EXPR)
ATOM -> AGGREGATE_FUNCTION(SELECT_EXPR)
ATOM -> ONEARG_FUNCTION(ARITHMETIC_EXPR)// # is not a char:
// Two atoms can be written without whitespace
// and their values will be String appended
// if the right atom begins with a char,
// which can't be a part of the left atom.
//
// E.g. if the left atom is a number constant,
// and the right atom is a String constant
// beginning with a quote char,
// the left atom value will be converted to the String
// and prepended to the right atom value.
//
// This rule doesn't apply inside ARITHMETIC_EXPR
ATOM ~~> ATOM#ATOM
// Converts all values to the String type and appends them.
AGGREGATE_FUNCTION -> cat// Returns the number of values.
AGGREGATE_FUNCTION -> countAGGREGATE_FUNCTION -> min
AGGREGATE_FUNCTION -> max
AGGREGATE_FUNCTION -> sum
AGGREGATE_FUNCTION -> avg
// All trigonometric functions are in radians.
ONEARG_FUNCTION -> sin
ONEARG_FUNCTION -> cos
ONEARG_FUNCTION -> tan
ONEARG_FUNCTION -> asin
ONEARG_FUNCTION -> acos
ONEARG_FUNCTION -> atan
ONEARG_FUNCTION -> sinh
ONEARG_FUNCTION -> cosh
ONEARG_FUNCTION -> tanh
ONEARG_FUNCTION -> asinh
ONEARG_FUNCTION -> acosh
ONEARG_FUNCTION -> atanh
ONEARG_FUNCTION -> exp
ONEARG_FUNCTION -> sqrt
// Converts a value to the String type and returns its length.
ONEARG_FUNCTION -> sizeONEARG_FUNCTION -> to_string
ONEARG_FUNCTION -> negate
ONEARG_FUNCTION -> abs
ONEARG_FUNCTION -> signum
ONEARG_FUNCTION -> truncate
ONEARG_FUNCTION -> ceiling
ONEARG_FUNCTION -> floor
ONEARG_FUNCTION -> even
ONEARG_FUNCTION -> odd
// A in B means A is a substring of B.
OPERATOR -> in
OPERATOR -> *
OPERATOR -> /// General power
OPERATOR -> **
// Natural power
OPERATOR -> ^
// Integer division truncated towards minus infinity
// (x div y)*y + (x mod y) == x
OPERATOR -> div
OPERATOR -> mod
// Integer division truncated towards 0
// (x quot y)*y + (x rem y) == x
OPERATOR -> quot
OPERATOR -> rem// Greatest common divisor
OPERATOR -> gcd
// Least common multiple
OPERATOR -> lcm
// String append
OPERATOR -> ++
OPERATOR -> +
OPERATOR -> -
OPERATOR -> <=
OPERATOR -> >=
OPERATOR -> <
OPERATOR -> >
OPERATOR -> !=
OPERATOR -> ==
OPERATOR -> ||
OPERATOR -> &&Each command is made from blocks separated by a comma. There are these types of blocks.
* From block
* Select block
* If block
* By block
* Sort blockThe first block is always the from block. If the block after the first block is without a specifier (`if`, `by`, or `sort`), then it is the select block. Otherwise, it is a block specified by the specifier.
The from block accepts a specific grammar (as specified in the grammar description), the select, the by, and the sort block accept the select expression (`SELECT_EXPR` in the grammar),
and the if block accepts the arithmetic expression (`ARITHMETIC_EXPR` in the grammar).Every source data file has a reference number based on its position in the from block and may have multiple names - the assign name, the name given to the source data file by `ASSIGN_NAME=FILE_PATH` syntax in the from block, and
the default name, which is given by the path to the file or `-` in the case of the standard input in the from block.Each column of a source data file has a reference number based on its position in it and may have a name (if the named option is enabled for the given source file).
If a source data file with the reference number `M` (numbering input files from 1) has a name `XXX`, its columns can be addressed by `&M.N` or `XXX.N`, where `N` is the reference number of a column (numbering columns from 1).
If the named option is enabled for the input file and a column has the name `NAME`, it can also be addressed by `&M.NAME` or `XXX.NAME`.We call the address a symbol name - `SYMBOL_NAME` in the grammar description.
If there is a collision in naming (some symbol name addresses more than one column), then the behavior is undefined.
#### Exotic chars
Some chars cannot be in unquoted symbol names - exotic chars. For simplicity, we can suppose, they are all non-alphanumerical chars excluding `-`, `.`, `&`, and `_`.
Also the first char of a symbol name must be non-numerical and must not be `-` or `.` to not be considered as an exotic char.It is possible to use a symbol name with exotic chars using \` quote - like \`EXOTIC SYMBOL NAME\`.
#### Quote chars
There are 3 quote chars (\`, " and ') used in LSQL. " and ' are always quoting a `String`. The \` quote char is used for quoting symbol names.These chars can be used for `String` appending. If two atoms inside SELECT_EXPR are written consecutively without whitespace and the left atom ends by a quote char or the right begins by a quote char,
they will be converted to the `String` and will be `String` appended.
For example, `&1.1"abc"` means: convert the value of `&1.1` to the `String` and append it to the `String` constant `abc`.#### Constants
Constants are in the grammar description as `CONSTANT`. In the following section, we speak only about these constants and not about built-in constant values like `pi` or `true`.There are 3 datatypes of constants. `String`, `Double`, and `Int`.
Every string quoted in " chars or ' chars in an LSQL command is always tokenized as a `String` constant.
Numbers fully matching the POSIX ERE `[0-9]+` are considered `Int` constants and numbers fully matching the POSIX ERE `[0-9]+\.[0-9]+` `Double` constants.#### Operator associativity and precedence
All operators are right-to-left associative.The following list outlines the precedence of the `lsql-csv` infix operators. The lower the precedence number, the higher the priority.
* 1: `in`, `**`, `^`
* 2: `*`, `/`, `div`, `quot`, `rem`, `mod`, `gcd`, `lcm`
* 3: `++`, `+`, `-`
* 4: `<=`, `>=`, `<`, `>`, `!=`, `==`
* 5: `||`, `&&`#### Select expression
Select expressions are in the grammar description as `SELECT_EXPR`.
They are similar to the `bash` expressions. They are made by atom selector expressions (`ATOM_SELECTOR`) separated by whitespaces.
These expressions are wildcard and brace expanded to atoms (`ATOM`) and are further processed as they were separated by whitespace.
(In `bash` brace expansion is a mechanism by which arbitrary strings are generated. For example, `a{b,c,d}e` is expanded to `abe ace ade`, see [the `bash` reference manual](https://www.gnu.org/software/bash/manual/bash.html) for details.)Wildcards and brace expansion expressions are only evaluated and expanded in unquoted parts of the atom selector expression,
which aren’t part of an inner arithmetic expression.For example, if we have an LSQL command with symbol names `&1.1` and `&1.2`, then
* the atom selector expression `&1.{1,2}` will be expanded to `&1.1` and `&1.2`;
* the atom selector expression `&1.*` will be expanded to `&1.1` and `&1.2`;
* the atom selector expression `` `&1.*` `` will be expanded to `` `&1.*` ``;
* the atom selector expression `"&1.*"` will be expanded to `"&1.*"`;
* the atom selector expression `$(&?.1)` will be expanded to `$(&?.1)`;
* the atom selector expression `&*$(&1.1)` will be expanded to `&1.1$(&1.1)` and `&1.2$(&1.1)`.Every atom selector expression can consist:
* A wildcard (Each wildcard is expanded against the symbol name list. If no symbol name matching the wildcard is found, the wildcard is expanded to itself.);
* A `bash` brace expansion expression (e.g. `{22..25}` -> `22 23 24 25`);
* An arithmetic expression in `$(expr)` format;
* A call of an aggregate function `AGGREGATE_FUNCTION(SELECT_EXPR)` - there cannot be any space after `FUNCTION`;
* A call of a one-argument function `ONEARG_FUNCTION(ARITHMETIC_EXPR)` — there cannot be any space after `FUNCTION`;
* A constant;
* A symbol name;
* A built-in constant value;
* A reference to a column name.If you want to concatenate strings without `++` operator, you can write: `a.1","a.2`.
Please, keep in mind, that operators must be put inside arithmetic expressions, or they will be matched to a column name or aggregate function.
#### Arithmetic expression
Arithmetic expressions are in the grammar description as `ARITHMETIC_EXPR`.The expressions use mainly the classical `awk` style of expressions.
You can use here operators `OPERATOR` keywords `>`, `<`, `<=`, `>=`, `==`, `||`, `&&`, `+`, `-`, `*`, `/`...Wildcards and brace expansion expressions are not evaluated inside the arithmetic expression.
#### Select blocks
Select blocks are referred in the grammar description as `SELECT_BLOCK`.
These blocks determine the output. They accept the select expression.There must be at least one select block in an LSQL command, which refers to at least one symbol name, or the behavior is undefined.
Examples of select blocks:
&1.[3-6]
This will print the 3rd, 4th, 5th, and 6th columns from the first file if the first file has at least 6 columns.
ax*.{6..4}
This will print the 6th, the 5th, and the 4th columns from all files whose name begins with ax if the files have at least 6 columns.
#### From blocks
These blocks are in the grammar description as `FROM_BLOCK`.
There must be exactly one from block at the beginning of an LSQL command.The from block contains input file paths (or `-` in the case of the standard input), and optionally their assign name `ASSIGN_NAME`.
You can use the wildcards and the curly bracket expansion as you were in the `bash` to refer input files.
If there is a wildcard with an assign name `NAME` matching more than one input file, the input files will be given assign names `NAME`, `NAME1`, `NAME2`...
If there is a wildcard, that matches to no file, it is expanded to itself.If `FILE_PATH` is put inside \` quotes, no wildcard or expansion logic applies to it.
You can also add custom attributes to input files in the format `FILE_PATH -aX --attribute=X -b`. The attributes will be applied to all files which will be matched against `FILE_PATH`.
The custom attributes are referred to as `OPTIONS` in the grammar description.Examples:
/etc/{passwd,group}
This will select `/etc/passwd` and `/etc/group` files. They can be addressed either as `&1` or `/etc/passwd`, and `&2` or `/etc/group`.
passwd=/etc/passwd
This will select `/etc/passwd` and set its assign name to `passwd`. It can be addressed as `&1`, `passwd`, or `/etc/passwd`.
##### Possible attributes
-n
--namedEnables the first-line naming convention for an input CSV file.
With this option, the first line of a CSV file will be interpreted as a list of column names.-N
--not-namedYou can also set the exact opposite to an input file. This can be useful if you change the default behavior.
-dCHAR
--delimiter=CHAR
This changes the primary delimiter of an input file.-sCHAR
--secondary-delimiter=CHAR
This changes the secondary delimiter of an input file.Example:
/etc/passwd -d:
This will select `/etc/passwd` and set its delimiter to `:`.
Currently, commas and `CHAR`s, which are also quotes in LSQL, are not supported as a delimiter or a secondary delimiter in `FILE_PATH` custom attributes.
#### If blocks
These blocks are in the grammar description as `IF_BLOCK`.
They always begin with the `if` keyword.
They accept arithmetic expressions, which should be convertible to `Bool`:
either `String` `false`/`true`, `Int` `0` `false`, anything else `true`), or `Bool`.Rows with the arithmetic expression converted to `Bool` `true` are printed or aggregated, and
rows with the arithmetic expression converted to `Bool` `false` are skipped.Filtering is done before the aggregation.
You can imagine the if block as the `WHERE` clause in SQL.
#### By blocks
By blocks are referred in the grammar description as `BY_BLOCK`.
These blocks always begin with the `by` keyword. They accept the select expression.There can be only one by block in a whole LSQL command.
The by block is used to group the resulting set by the given atoms for the evaluation by an aggregate function.
The by block is similar to the `GROUP BY` clause in SQL.There must be at least one aggregate function in the select block if the by block is present. Otherwise, the behavior is undefined.
If there is an aggregate function present without the by block present in an LSQL command, the aggregate function runs over all rows at once.
#### Sort blocks
These blocks are in the grammar description as `SORT_BLOCK`.
It begins with the `sort` keyword. They accept the select expression.The sort block determines the order of the final output - given atoms are sorted in ascending order.
If there is more than one atom in the sort block (`A`, `B`, `C`...), the data is first sorted by `A`
and in the case of ties, the atoms (`B`, `C`...) are used to further refine the order of the final output.You can imagine the sort block as the `ORDER BY` clause in SQL.
There can be only one sort block in the whole command.