{"id":14977104,"url":"https://github.com/manzt/quak","last_synced_at":"2025-05-16T09:06:39.500Z","repository":{"id":250249996,"uuid":"817094783","full_name":"manzt/quak","owner":"manzt","description":"a scalable data profiler","archived":false,"fork":false,"pushed_at":"2025-02-05T14:07:04.000Z","size":2577,"stargazers_count":353,"open_issues_count":13,"forks_count":13,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-05-12T05:09:00.007Z","etag":null,"topics":["database","dataframe","jupyter","python","visualization"],"latest_commit_sha":null,"homepage":"https://manzt.github.io/quak/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/manzt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-19T02:33:05.000Z","updated_at":"2025-05-06T05:08:04.000Z","dependencies_parsed_at":"2024-08-26T22:34:37.939Z","dependency_job_id":"0f0ec896-9541-4a79-96e0-18c07b5f95e5","html_url":"https://github.com/manzt/quak","commit_stats":{"total_commits":113,"total_committers":7,"mean_commits":"16.142857142857142","dds":0.06194690265486724,"last_synced_commit":"d85b8fc674ba24902ec46ed003b46d19d2864ccc"},"previous_names":["manzt/quak"],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manzt%2Fquak","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manzt%2Fquak/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manzt%2Fquak/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manzt%2Fquak/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/manzt","download_url":"https://codeload.github.com/manzt/quak/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254501558,"owners_count":22081528,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","dataframe","jupyter","python","visualization"],"created_at":"2024-09-24T13:55:03.587Z","updated_at":"2025-05-16T09:06:34.483Z","avatar_url":"https://github.com/manzt.png","language":"TypeScript","funding_links":[],"categories":["TypeScript"],"sub_categories":[],"readme":"\u003ch1\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/logo-color.svg\" alt=\"quak logo\" width=\"90\"\u003e\n  \u003cbr\u003equak /kwæk/\n\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003cspan\u003ean \u003ca href=\"https://github.com/manzt/anywidget\"\u003eanywidget\u003c/a\u003e for data that talks like a duck\u003c/span\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\n**quak** is a scalable data profiler for quickly scanning large tables,\ncapturing interactions as executable SQL queries.\n\n- **interactive** 🖱️ mouse over column summaries, cross-filter, sort, and slice rows.\n- **fast** ⚡ built with [Mosaic](https://github.com/uwdata/mosaic); views are expressed as SQL queries lazily executed by [DuckDB](https://duckdb.org/).\n- **flexible** 🔄 supports many data types and formats via [Apache Arrow](https://arrow.apache.org/docs/index.html), the [dataframe interchange protocol](https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html), and the [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html).\n- **reproducible** 📓 a UI for building complex SQL queries; materialize views in the kernel for further analysis.\n\n## install\n\n```sh\npip install quak\n```\n\n## usage\n\nThe easiest way to get started with **quak** is using the IPython\n[cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html).\n\n```python\n%load_ext quak\n```\n\n```python\nimport polars as pl\n\ndf = pl.read_parquet(\"https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet\")\ndf\n```\n\n\u003cimg alt=\"olympic athletes table\" src=\"https://github.com/user-attachments/assets/83858282-8876-4b12-aeea-44eb82d3bed3\"\u003e\n\n**quak** hooks into Jupyter's display mechanism to automatically render any\ndataframe-like object (implementing the [Python dataframe interchange\nprotocol](https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html) or [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html))\nusing `quak.Widget` instead of the default display.\n\nAlternatively, you can use `quak.Widget` directly:\n\n```python\nimport polars as pl\nimport quak\n\ndf = pl.read_parquet(\"https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet\")\nwidget = quak.Widget(df)\nwidget\n```\n\n### interacting with the data\n\n**quak** captures all user interactions as _queries_.\n\nAt any point, table state can be accessed as SQL,\n\n```python\nwidget.sql # SELECT * FROM df WHERE ...\n```\n\nwhich for convenience can be executed in the kernel to materialize the view for further analysis:\n\n```python\nwidget.data() # returns duckdb.DuckDBPyRelation object\n```\n\nBy representing UI state as SQL, **quak** makes it easy to generate complex\nqueries via interactions that would be challenging to write manually, while\nkeeping them reproducible.\n\n### using quak in marimo\n\n**quak** can also be used in [**marimo** notebooks](https://github.com/marimo-team/marimo),\nwhich provide out-of-the-box support for anywidget:\n\n```python\nimport marimo as mo\nimport polars as pl\nimport quak\n\ndf = pl.read_parquet(\"https://github.com/uwdata/mosaic/raw/main/data/athletes.parquet\")\nwidget = mo.ui.anywidget(quak.Widget(df))\nwidget\n```\n\n## contributing\n\nContributors welcome! Check the [Contributors Guide](./CONTRIBUTING.md) to get\nstarted. Note: I'm wrapping up my PhD, so I might be slow to respond. Please\nopen an issue before contributing a new feature.\n\n## references\n\n**quak** pieces together many important ideas from the web and Python data science ecosystems. \nIt serves as an example of what you can achieve by embracing these platforms for their strengths.\n\n- [Observable's data table](https://observablehq.com/documentation/cells/data-table): Inspiration for the UI design and user interactions.\n- [Mosaic](https://github.com/uwdata/mosaic): The foundation for linking databases and interactive table views. \n- [Apache Arrow](https://arrow.apache.org/): Support for various data types and efficient data interchange between JS/Python.\n- [DuckDB](https://duckdb.org/): An amazingly engineered piece of software that makes SQL go vroom.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanzt%2Fquak","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanzt%2Fquak","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanzt%2Fquak/lists"}