Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/banyc/dfsql
SQL REPL/lib for Data Frames
https://github.com/banyc/dfsql
cli csv data-analysis jsonl ndjson repl sql
Last synced: about 5 hours ago
JSON representation
SQL REPL/lib for Data Frames
- Host: GitHub
- URL: https://github.com/banyc/dfsql
- Owner: Banyc
- License: mit
- Created: 2023-12-02T11:36:52.000Z (12 months ago)
- Default Branch: master
- Last Pushed: 2024-05-22T14:59:51.000Z (6 months ago)
- Last Synced: 2024-05-22T16:13:09.547Z (6 months ago)
- Topics: cli, csv, data-analysis, jsonl, ndjson, repl, sql
- Language: Rust
- Homepage: https://crates.io/crates/dfsql
- Size: 533 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `dfsql`
![](img/terminal.png)
- Revision: the standalone `count` command is replaced with `len`, so make sure to replace `(count)` and `col "count"` with `len` and `col "len"` respectively.
- the unary `count ` command is unaffected.## Install
```bash
cargo install dfsql
```## How to run
```bash
dfsql --input your.csv --output a-new.csv
# ...or
dfsql -i your.csv -o a-new.csv
```## REPL
- `exit`/`quit`: exit the REPL loop.
```bash
exit
```
- `undo`: undo the previous successful operation.
```bash
undo
```
- `reset`: reset all the changes and go back to the original data frame.
```bash
reset
```
- `schema`: show column names and types of the data frame.
```bash
schema
```
- `save`: save the current data frame to a file.
```bash
save a-new.csv
```## Statements
- `select`
```py
select *
```
```sql
select last_name first_name
```
- Select columns "last_name" and "first_name" and collect them into a data frame.
- Group by
```py
group ( | )* agg *
```
```sql
group first_name agg (count)
```
- Group the data frame by column "first_name" and then aggregate each group with the count of the members.
- `filter`
```py
filter
```
```sql
filter first_name = "John"
```
- `limit`
```py
limit
```
```sql
limit 5
```
- `reverse`
```sql
reverse
```
- `sort`
```py
sort ((asc | desc | ()) )*
```
```sql
sort icpsr_id
```
- `use`
```py
use
```
```py
use other
```
- Switch to the data frame called `other`.
- join
```py
(left | right | inner | full) join on ?
```
```py
left join other on id ID
```
- left join the data frame called `other` on my column `id` and its column `ID`## Expressions
- `col`: reference to a column.
```py
col : ( | ) ->
```
```sql
select col first_name
```
- `exclude`: remove columns from the data frame.
```py
exclude : * ->
```
```sql
select exclude last_name first_name
```
- literal: literal values like `42`, `"John"`, `1.0`, and `null`.
- binary operations
```sql
select a * b
```
- Calculate the product of columns "a" and "b" and collect the result.
- unary operations
```sql
select -a
```
```sql
select sum a
```
- Sum all values in column "a" and collect the scalar result.
- `alias`: assign a name to a column.
```py
alias : ( | ) ->
```
```sql
select alias product a * b
```
- Assign the name "product" to the product and collect the new column.
- conditional
```py
: if then (if then )* otherwise ->
```
```sql
select if class = 0 then "A" if class = 1 then "B" else null
```
- `cast`: cast a column to either type `str`, `int`, or `float`.
```py
cast : ->
```
```sql
select cast str id
```
- Cast the column "id" to type `str` and collect the result.