Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/manzt/quak
a scalable data profiler
https://github.com/manzt/quak
database dataframe jupyter python visualization
Last synced: about 1 month ago
JSON representation
a scalable data profiler
- Host: GitHub
- URL: https://github.com/manzt/quak
- Owner: manzt
- License: mit
- Created: 2024-06-19T02:33:05.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-13T17:55:40.000Z (2 months ago)
- Last Synced: 2024-09-28T23:21:53.555Z (about 2 months ago)
- Topics: database, dataframe, jupyter, python, visualization
- Language: TypeScript
- Homepage: https://manzt.github.io/quak/
- Size: 2.43 MB
- Stars: 209
- Watchers: 9
- Forks: 9
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
quak /kwĂŚk/
an anywidget for data that talks like a duck
**quak** is a scalable data profiler for quickly scanning large tables,
capturing interactions as executable SQL queries.- **interactive** đąď¸ mouse over column summaries, cross-filter, sort, and slice rows.
- **fast** ⥠built with [Mosaic](https://github.com/uwdata/mosaic); views are expressed as SQL queries lazily executed by [DuckDB](https://duckdb.org/).
- **flexible** đ supports many data types and formats via [Apache Arrow](https://arrow.apache.org/docs/index.html), the [dataframe interchange protocol](https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html), and the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).
- **reproducible** đ a UI for building complex SQL queries; materialize views in the kernel for further analysis.## install
```sh
pip install quak
```## usage
The easiest way to get started with **quak** is using the IPython
[cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html).```python
%load_ext quak
``````python
import polars as pldf = pl.read_parquet("https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet")
df
```**quak** hooks into Jupyter's display mechanism to automatically render any
dataframe-like object (implementing the [Python dataframe interchange
protocol](https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html) or [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html))
using `quak.Widget` instead of the default display.Alternatively, you can use `quak.Widget` directly:
```python
import polars as pl
import quakdf = pl.read_parquet("https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet")
widget = quak.Widget(df)
widget
```### interacting with the data
**quak** captures all user interactions as _queries_.
At any point, table state can be accessed as SQL,
```python
widget.sql # SELECT * FROM df WHERE ...
```which for convenience can be executed in the kernel to materialize the view for further analysis:
```python
widget.data() # returns duckdb.DuckDBPyRelation object
```By representing UI state as SQL, **quak** makes it easy to generate complex
queries via interactions that would be challenging to write manually, while
keeping them reproducible.### using quak in marimo
**quak** can also be used in [**marimo** notebooks](https://github.com/marimo-team/marimo),
which provide out-of-the-box support for anywidget:```python
import marimo as mo
import polars as pl
import quakdf = pl.read_parquet("https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet")
widget = mo.ui.anywidget(quak.Widget(df))
widget
```## contributing
Contributors welcome! Check the [Contributors Guide](./CONTRIBUTING.md) to get
started. Note: I'm wrapping up my PhD, so I might be slow to respond. Please
open an issue before contributing a new feature.## references
**quak** pieces together many important ideas from the web and Python data science ecosystems.
It serves as an example of what you can achieve by embracing these platforms for their strengths.- [Observable's data table](https://observablehq.com/documentation/cells/data-table): Inspiration for the UI design and user interactions.
- [Mosaic](https://github.com/uwdata/mosaic): The foundation for linking databases and interactive table views.
- [Apache Arrow](https://arrow.apache.org/): Support for various data types and efficient data interchange between JS/Python.
- [DuckDB](https://duckdb.org/): An amazingly engineered piece of software that makes SQL go vroom.