Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/banyc/dfsql

SQL REPL/lib for Data Frames
https://github.com/banyc/dfsql

cli csv data-analysis jsonl ndjson repl sql

Last synced: 3 months ago
JSON representation

SQL REPL/lib for Data Frames

Host: GitHub
URL: https://github.com/banyc/dfsql
Owner: Banyc
License: mit
Created: 2023-12-02T11:36:52.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2024-05-22T14:59:51.000Z (9 months ago)
Last Synced: 2024-05-22T16:13:09.547Z (9 months ago)
Topics: cli, csv, data-analysis, jsonl, ndjson, repl, sql
Language: Rust
Homepage: https://crates.io/crates/dfsql
Size: 533 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# `dfsql`

![](img/terminal.png)

- Revision: the standalone `count` command is replaced with `len`, so make sure to replace `(count)` and `col "count"` with `len` and `col "len"` respectively.
- the unary `count ` command is unaffected.

## Install

```bash
cargo install dfsql
```

## How to run

```bash
dfsql --input your.csv --output a-new.csv
# ...or
dfsql -i your.csv -o a-new.csv
```

## REPL

- `exit`/`quit`: exit the REPL loop.
```bash
exit
```
- `undo`: undo the previous successful operation.
```bash
undo
```
- `reset`: reset all the changes and go back to the original data frame.
```bash
reset
```
- `schema`: show column names and types of the data frame.
```bash
schema
```
- `save`: save the current data frame to a file.
```bash
save a-new.csv
```

## Statements

- `select`
```py
select *
```
```sql
select last_name first_name
```
- Select columns "last_name" and "first_name" and collect them into a data frame.
- Group by
```py
group ( | )* agg * ``` ```sql group first_name agg (count) ``` - Group the data frame by column "first_name" and then aggregate each group with the count of the members. - `filter` ```py filter ``` ```sql filter first_name = "John" ``` - `limit` ```py limit ``` ```sql limit 5 ``` - `reverse` ```sql reverse ``` - `sort` ```py sort ((asc | desc | ()) )* ``` ```sql sort icpsr_id ``` - `use` ```py use ``` ```py use other ``` - Switch to the data frame called `other`. - join ```py (left | right | inner | full) join on ? ``` ```py left join other on id ID ``` - left join the data frame called `other` on my column `id` and its column `ID`

## Expressions

- `col`: reference to a column.
```py
col : ( | ) -> ``` ```sql select col first_name ``` - `exclude`: remove columns from the data frame. ```py exclude : * -> ``` ```sql select exclude last_name first_name ``` - literal: literal values like `42`, `"John"`, `1.0`, and `null`. - binary operations ```sql select a * b ``` - Calculate the product of columns "a" and "b" and collect the result. - unary operations ```sql select -a ``` ```sql select sum a ``` - Sum all values in column "a" and collect the scalar result. - `alias`: assign a name to a column. ```py alias : ( | ) -> ``` ```sql select alias product a * b ``` - Assign the name "product" to the product and collect the new column. - conditional ```py : if then (if then )* otherwise -> ``` ```sql select if class = 0 then "A" if class = 1 then "B" else null ``` - `cast`: cast a column to either type `str`, `int`, or `float`. ```py cast : -> ``` ```sql select cast str id ``` - Cast the column "id" to type `str` and collect the result.