Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/manzt/quak

a scalable data profiler
https://github.com/manzt/quak

database dataframe jupyter python visualization

Last synced: about 1 month ago
JSON representation

a scalable data profiler

Awesome Lists containing this project

README

        



quak logo

quak /kwĂŚk/


an anywidget for data that talks like a duck

**quak** is a scalable data profiler for quickly scanning large tables,
capturing interactions as executable SQL queries.

- **interactive** 🖱️ mouse over column summaries, cross-filter, sort, and slice rows.
- **fast** ⚡ built with [Mosaic](https://github.com/uwdata/mosaic); views are expressed as SQL queries lazily executed by [DuckDB](https://duckdb.org/).
- **flexible** 🔄 supports many data types and formats via [Apache Arrow](https://arrow.apache.org/docs/index.html), the [dataframe interchange protocol](https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html), and the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
- **reproducible** 📓 a UI for building complex SQL queries; materialize views in the kernel for further analysis.

## install

```sh
pip install quak
```

## usage

The easiest way to get started with **quak** is using the IPython
[cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html).

```python
%load_ext quak
```

```python
import polars as pl

df = pl.read_parquet("https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet")
df
```

olympic athletes table

**quak** hooks into Jupyter's display mechanism to automatically render any
dataframe-like object (implementing the [Python dataframe interchange
protocol](https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html) or [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html))
using `quak.Widget` instead of the default display.

Alternatively, you can use `quak.Widget` directly:

```python
import polars as pl
import quak

df = pl.read_parquet("https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet")
widget = quak.Widget(df)
widget
```

### interacting with the data

**quak** captures all user interactions as _queries_.

At any point, table state can be accessed as SQL,

```python
widget.sql # SELECT * FROM df WHERE ...
```

which for convenience can be executed in the kernel to materialize the view for further analysis:

```python
widget.data() # returns duckdb.DuckDBPyRelation object
```

By representing UI state as SQL, **quak** makes it easy to generate complex
queries via interactions that would be challenging to write manually, while
keeping them reproducible.

### using quak in marimo

**quak** can also be used in [**marimo** notebooks](https://github.com/marimo-team/marimo),
which provide out-of-the-box support for anywidget:

```python
import marimo as mo
import polars as pl
import quak

df = pl.read_parquet("https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet")
widget = mo.ui.anywidget(quak.Widget(df))
widget
```

## contributing

Contributors welcome! Check the [Contributors Guide](./CONTRIBUTING.md) to get
started. Note: I'm wrapping up my PhD, so I might be slow to respond. Please
open an issue before contributing a new feature.

## references

**quak** pieces together many important ideas from the web and Python data science ecosystems.
It serves as an example of what you can achieve by embracing these platforms for their strengths.

- [Observable's data table](https://observablehq.com/documentation/cells/data-table): Inspiration for the UI design and user interactions.
- [Mosaic](https://github.com/uwdata/mosaic): The foundation for linking databases and interactive table views.
- [Apache Arrow](https://arrow.apache.org/): Support for various data types and efficient data interchange between JS/Python.
- [DuckDB](https://duckdb.org/): An amazingly engineered piece of software that makes SQL go vroom.