{"id":16802976,"url":"https://github.com/coady/graphique","last_synced_at":"2025-05-09T01:45:15.152Z","repository":{"id":37960534,"uuid":"243892677","full_name":"coady/graphique","owner":"coady","description":"GraphQL service for arrow tables and parquet data sets.","archived":false,"fork":false,"pushed_at":"2025-01-17T18:44:13.000Z","size":4929,"stargazers_count":88,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-09T01:45:07.208Z","etag":null,"topics":["arrow","graphql","parquet"],"latest_commit_sha":null,"homepage":"https://coady.github.io/graphique/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coady.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-29T02:42:32.000Z","updated_at":"2025-02-10T15:20:07.000Z","dependencies_parsed_at":"2023-02-16T01:01:22.285Z","dependency_job_id":"7ee4c380-621f-4adc-9ed3-3d8ca783a581","html_url":"https://github.com/coady/graphique","commit_stats":{"total_commits":460,"total_committers":3,"mean_commits":"153.33333333333334","dds":"0.017391304347826098","last_synced_commit":"94a438e403d38381756d1df5777fb85a19844bce"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coady%2Fgraphique","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coady%2Fgraphique/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coady%2Fgraphique/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coady%2Fgraphique/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coady","download_url":"https://codeload.github.com/coady/graphique/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253176443,"owners_count":21866142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","graphql","parquet"],"created_at":"2024-10-13T09:41:12.928Z","updated_at":"2025-05-09T01:45:15.126Z","avatar_url":"https://github.com/coady.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![image](https://img.shields.io/pypi/v/graphique.svg)](https://pypi.org/project/graphique/)\n![image](https://img.shields.io/pypi/pyversions/graphique.svg)\n[![image](https://pepy.tech/badge/graphique)](https://pepy.tech/project/graphique)\n![image](https://img.shields.io/pypi/status/graphique.svg)\n[![build](https://github.com/coady/graphique/actions/workflows/build.yml/badge.svg)](https://github.com/coady/graphique/actions/workflows/build.yml)\n[![image](https://codecov.io/gh/coady/graphique/branch/main/graph/badge.svg)](https://codecov.io/gh/coady/graphique/)\n[![CodeQL](https://github.com/coady/graphique/actions/workflows/github-code-scanning/codeql/badge.svg)](https://github.com/coady/graphique/actions/workflows/github-code-scanning/codeql)\n[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/coady/graphique)\n[![image](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![image](https://mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)\n\n[GraphQL](https://graphql.org) service for [arrow](https://arrow.apache.org) tables and [parquet](https://parquet.apache.org) data sets. The schema for a query API is derived automatically.\n\n## Usage\n```console\n% env PARQUET_PATH=... uvicorn graphique.service:app\n```\n\nOpen http://localhost:8000/ to try out the API in [GraphiQL](https://github.com/graphql/graphiql/tree/main/packages/graphiql#readme). There is a test fixture at `./tests/fixtures/zipcodes.parquet`.\n\n```console\n% env PARQUET_PATH=... strawberry export-schema graphique.service:app.schema\n```\noutputs the graphql schema for a parquet data set.\n\n### Configuration\nGraphique uses [Starlette's config](https://www.starlette.io/config/): in environment variables or a `.env` file. Config variables are used as input to a [parquet dataset](https://arrow.apache.org/docs/python/dataset.html).\n\n* PARQUET_PATH: path to the parquet directory or file\n* FEDERATED = '': field name to extend type `Query` with a federated `Table` \n* DEBUG = False: run service in debug mode, which includes metrics\n* COLUMNS = None: list of names, or mapping of aliases, of columns to select\n* FILTERS = None: json `filter` query for which rows to read at startup\n\nFor more options create a custom [ASGI](https://asgi.readthedocs.io/en/latest/index.html) app. Call graphique's `GraphQL` on an arrow [Dataset](https://arrow.apache.org/docs/python/api/dataset.html), [Scanner](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html), or [Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html). The GraphQL `Table` type will be the root Query type.\n\nSupply a mapping of names to datasets for multiple roots, and to enable federation.\n\n```python\nimport pyarrow.dataset as ds\nfrom graphique import GraphQL\n\nsource = ds.dataset(...)\napp = GraphQL(source)  # Table is root query type\napp = GraphQL.federated({\u003cname\u003e: source, ...}, keys={\u003cname\u003e: [], ...})  # Tables on federated fields\n```\n\nStart like any ASGI app.\n\n```console\nuvicorn \u003cmodule\u003e:app\n```\n\nConfiguration options exist to provide a convenient no-code solution, but are subject to change in the future. Using a custom app is recommended for production usage.\n\n### API\n#### types\n* `Dataset`: interface for an arrow dataset, scanner, or table.\n* `Table`: implements the `Dataset` interface. Adds typed `row`, `columns`, and `filter` fields from introspecting the schema.\n* `Column`: interface for an arrow column (a.k.a. ChunkedArray). Each arrow data type has a corresponding column implementation: Boolean, Int, Long, Float, Decimal, Date, Datetime, Time, Duration, Base64, String, List, Struct. All columns have a `values` field for their list of scalars. Additional fields vary by type.\n* `Row`: scalar fields. Arrow tables are column-oriented, and graphique encourages that usage for performance. A single `row` field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.\n\n#### selection\n* `slice`: contiguous selection of rows\n* `filter`: select rows with simple predicates\n* `scan`: select rows and project columns with expressions\n\n#### projection\n* `columns`: provides a field for every `Column` in the schema\n* `column`: access a column of any type by name\n* `row`: provides a field for each scalar of a single row\n* `apply`: transform columns by applying a function\n* `join`: join tables by key columns\n\n#### aggregation\n* `group`: group by given columns, and aggregate the others\n* `runs`: partition on adjacent values in given columns, transforming the others into list columns\n* `tables`: return a list of tables by splitting on the scalars in list columns\n* `flatten`: flatten list columns with repeated scalars\n\n#### ordering\n* `sort`: sort table by given columns\n* `rank`: select rows with smallest or largest values\n\n### Performance\nGraphique relies on native [PyArrow](https://arrow.apache.org/docs/python/index.html) routines wherever possible. Otherwise it falls back to using [NumPy](https://numpy.org/doc/stable/) or custom optimizations.\n\nBy default, datasets are read on-demand, with only the necessary rows and columns scanned. Although graphique is a running service, [parquet is performant](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html) at reading a subset of data. Optionally specify `FILTERS` in the json `filter` format to read a subset of rows at startup, trading-off memory for latency. An empty filter (`{}`) will read the whole table.\n\nSpecifying `COLUMNS` will limit memory usage when reading at startup (`FILTERS`). There is little speed difference as unused columns are inherently ignored. Optional aliasing can also be used for camel casing.\n\nIf index columns are detected in the schema metadata, then an initial `filter` will also attempt a binary search on tables.\n\n## Installation\n```console\n% pip install graphique[server]\n```\n\n## Dependencies\n* pyarrow\n* strawberry-graphql[asgi,cli]\n* numpy\n* isodate\n* uvicorn (or other [ASGI server](https://asgi.readthedocs.io/en/latest/implementations.html))\n\n## Tests\n100% branch coverage.\n\n```console\n% pytest [--cov]\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoady%2Fgraphique","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoady%2Fgraphique","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoady%2Fgraphique/lists"}