Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ywelsch/duckdb-psql
A piped SQL for DuckDB
https://github.com/ywelsch/duckdb-psql
Last synced: 2 months ago
JSON representation
A piped SQL for DuckDB
- Host: GitHub
- URL: https://github.com/ywelsch/duckdb-psql
- Owner: ywelsch
- License: mit
- Created: 2023-03-31T20:01:01.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-08T18:58:58.000Z (9 months ago)
- Last Synced: 2024-02-09T17:38:54.803Z (9 months ago)
- Language: C++
- Homepage:
- Size: 54.7 KB
- Stars: 33
- Watchers: 4
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Linux](https://github.com/ywelsch/duckdb-psql/actions/workflows/Linux.yml/badge.svg)](https://github.com/ywelsch/duckdb-psql/actions/workflows/Linux.yml) [![MacOS](https://github.com/ywelsch/duckdb-psql/actions/workflows/MacOS.yml/badge.svg)](https://github.com/ywelsch/duckdb-psql/actions/workflows/MacOS.yml) [![Windows](https://github.com/ywelsch/duckdb-psql/actions/workflows/Windows.yml/badge.svg)](https://github.com/ywelsch/duckdb-psql/actions/workflows/Windows.yml)
# PSQL: a piped SQL for DuckDB
PSQL extends [DuckDB](https://duckdb.org)'s SQL with a pipe syntax to provide simple composable queries. It's a lightweight variant of piped languages such as [PRQL](https://prql-lang.org) and [Kusto](https://docs.microsoft.com/azure/data-explorer/kusto/query/samples?pivots=azuredataexplorer), yet leveraging the full power of DuckDB's SQL.
Pipes allow you to compose your SQL queries in a very natural way (example inspired by PRQL):
```sql
from 'https://raw.githubusercontent.com/ywelsch/duckdb-psql/main/example/invoices.csv' |
where invoice_date >= date '1970-01-16' |
select
*,
0.8 as transaction_fees,
total - transaction_fees as income |
where income > 1 |
select
customer_id,
avg(total),
sum(income) as sum_income,
count() as ct
group by customer_id |
order by sum_income desc |
limit 10 |
as invoices
join 'https://raw.githubusercontent.com/ywelsch/duckdb-psql/main/example/customers.csv'
as customers
on invoices.customer_id = customers.customer_id |
select
customer_id,
last_name || ', ' || first_name as name,
sum_income,
version() as db_version;
```which returns:
```
┌─────────────┬─────────────────────┬────────────┬────────────┐
│ customer_id │ name │ sum_income │ db_version │
│ int64 │ varchar │ double │ varchar │
├─────────────┼─────────────────────┼────────────┼────────────┤
│ 6 │ Holý, Helena │ 43.83 │ v0.7.1 │
│ 7 │ Gruber, Astrid │ 36.83 │ v0.7.1 │
│ 24 │ Ralston, Frank │ 37.83 │ v0.7.1 │
│ 25 │ Stevens, Victor │ 36.83 │ v0.7.1 │
│ 26 │ Cunningham, Richard │ 41.83 │ v0.7.1 │
│ 28 │ Barnett, Julia │ 37.83 │ v0.7.1 │
│ 37 │ Zimmermann, Fynn │ 37.83 │ v0.7.1 │
│ 45 │ Kovács, Ladislav │ 39.83 │ v0.7.1 │
│ 46 │ O'Reilly, Hugh │ 39.83 │ v0.7.1 │
│ 57 │ Rojas, Luis │ 40.83 │ v0.7.1 │
├─────────────┴─────────────────────┴────────────┴────────────┤
│ 10 rows 4 columns │
└─────────────────────────────────────────────────────────────┘
```Pipes can also be used in sub-expression, by using a special syntax to delimit start and end of pipelines:
```sql
create view invoices as (|
from 'https://raw.githubusercontent.com/ywelsch/duckdb-psql/main/example/invoices.csv' |
where invoice_date >= date '1970-01-16' |
select
0.8 as transaction_fees,
total - transaction_fees as income
|);
```## How does it work?
The underlying engine just does a simple syntactic transformation of the query, rewriting pipes
```sql
A | B | C | D
```
to
```sql
WITH _tmp1 AS (A),
_tmp2 AS (FROM _tmp1 B)
_tmp3 AS (FROM _tmp2 C)
FROM _tmp3 D
```## Limitations
This is mainly an experiment at simplifying SQL and nowhere as feature-complete as some of the piped language alternatives. Its main advantage is that is has all the power and expressivity of DuckDB's SQL, while gaining some of the benefits of piped languages. As it is just implemented as a simple pre-processing step using quick and dirty regex subsitutions, it is unaware of the scoping rules of SQL. It has a special syntax for piped sub-expressions (surrounded by `(|` and `|)`) and does not allow arbitrary nesting of piped sub-expressions.
## Installing the extension
To install the PSQL extension, DuckDB needs to be launched with the `allow_unsigned_extensions` option set to true.
Depending on the DuckDB usage, this can be configured as follows:CLI:
```shell
duckdb -unsigned
```Python:
```python
con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'})
```A custom extension repository then needs to be defined as follows:
```sql
SET custom_extension_repository='http://welsch.lu/duckdb/psql/latest';
```
Note that the `/latest` path will provide the latest extension version available for the current version of DuckDB.
A given extension version can be selected by using that version as last path element instead.After running these steps, the extension can then be installed and loaded using the regular INSTALL/LOAD commands in DuckDB:
```sql
FORCE INSTALL psql; # To override current installation with latest
LOAD psql;
```## Build from source
To build the extension:
```sh
make
```
The main binaries that will be built are:
```sh
./build/release/duckdb
./build/release/test/unittest
./build/release/extension/psql/psql.duckdb_extension
```
- `duckdb` is the binary for the duckdb shell with the extension code automatically loaded.
- `unittest` is the test runner of duckdb. Again, the extension is already linked into the binary.
- `psql.duckdb_extension` is the loadable binary as it would be distributed.To run the extension code, simply start the shell with `./build/release/duckdb`.