{"id":49799842,"url":"https://github.com/datafusion-contrib/datafusion-materialized-views","last_synced_at":"2026-05-12T13:06:57.175Z","repository":{"id":269474493,"uuid":"907458984","full_name":"datafusion-contrib/datafusion-materialized-views","owner":"datafusion-contrib","description":"Incremental view maintenance \u0026 query rewriting for materialized views in DataFusion","archived":false,"fork":false,"pushed_at":"2026-03-09T07:25:05.000Z","size":175,"stargazers_count":70,"open_issues_count":9,"forks_count":17,"subscribers_count":10,"default_branch":"main","last_synced_at":"2026-04-23T05:06:52.544Z","etag":null,"topics":["arrow","big-data","datafusion","materialized-views","rust","sql"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datafusion-contrib.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-23T16:19:38.000Z","updated_at":"2026-04-15T05:43:48.000Z","dependencies_parsed_at":"2025-03-20T22:24:25.487Z","dependency_job_id":"ecea76ff-9b48-4a41-959a-2be995da621d","html_url":"https://github.com/datafusion-contrib/datafusion-materialized-views","commit_stats":null,"previous_names":["datafusion-contrib/datafusion-materialized-views"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/datafusion-contrib/datafusion-materialized-views","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-materialized-views","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-materialized-views/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-materialized-views/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-materialized-views/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datafusion-contrib","download_url":"https://codeload.github.com/datafusion-contrib/datafusion-materialized-views/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-materialized-views/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32940183,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-12T09:19:52.626Z","status":"ssl_error","status_checked_at":"2026-05-12T09:17:33.438Z","response_time":102,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","big-data","datafusion","materialized-views","rust","sql"],"created_at":"2026-05-12T13:06:55.620Z","updated_at":"2026-05-12T13:06:57.168Z","avatar_url":"https://github.com/datafusion-contrib.png","language":"Rust","funding_links":[],"categories":["Projects"],"sub_categories":["Production-Ready Systems"],"readme":"# datafusion-materialized-views\n\nAn implementation of incremental view maintenance \u0026 query rewriting for materialized views in DataFusion.\n\nA **materialized view** is a view whose query has been pre-computed and saved for later use. This can drastically speed up workloads by pre-computing at least a large fragment of a user-provided query. Furthermore, by implementing a _view matching_ algorithm, we can implement an optimizer that rewrites queries to automatically make use of materialized views where possible and beneficial, a concept known as _query rewriting_.\n\nEfficiently maintaining the up-to-dateness of a materialized view is a problem known as _incremental view maintenance_. It is a hard problem in general, but we make some simplifying assumptions:\n\n* Data is stored as Hive-partitioned files in object storage.\n* The smallest unit of data that can be updated is a single file.\n\nThis is a typical pattern with DataFusion, as files in object storage usually are immutable (especially if they are Parquet) and can only be replaced, not appended to or modified. However, it does mean that our implementation of incremental view maintenance only works for Hive-partitioned materialized views in object storage. (Future work may generalize this to alternate storage sources, but the requirement of logically partitioned tables remains.) In contrast, the view matching problem does not depend on the underlying physical representation of the tables.\n\n## Example\n\nHere we walk through a hypothetical example of setting up a materialized view, to illustrate\nwhat this library offers. The core of the incremental view maintenance implementation is a UDTF (User-Defined Table Function),\ncalled `mv_dependencies`, that outputs a build graph for a materialized view. This gives users the information they need to determine\nwhen partitions of the materialized view need to be recomputed.\n\n```sql\n-- Create a base table\nCREATE EXTERNAL TABLE t1 (column0 TEXT, date DATE)\nSTORED AS PARQUET\nPARTITIONED BY (date)\nLOCATION 's3://t1/';\n\nINSERT INTO t1 VALUES \n('a', '2021-01-01'), \n('b', '2022-02-02'), \n('c', '2022-02-03'), -- Two values in the year 2022\n('d', '2023-03-03');\n\n-- Pretend we can create materialized views in SQL\n-- The TableProvider implementation will need to implement the Materialized trait.\nCREATE MATERIALIZED VIEW m1 AS SELECT\n    COUNT(*) AS count,\n   date_part('YEAR', date) AS year\nPARTITIONED BY (year)\nLOCATION 's3://m1/';\n\n-- Show the dependency graph for m1 using the mv_dependencies UDTF\nSELECT * FROM mv_dependencies('m1');\n\n+--------------------+----------------------+---------------------+-------------------+--------------------------------------+----------------------+\n| target             | source_table_catalog | source_table_schema | source_table_name | source_uri                           | source_last_modified |\n+--------------------+----------------------+---------------------+-------------------+--------------------------------------+----------------------+\n| s3://m1/year=2021/ | datafusion           | public              | t1                | s3://t1/date=2021-01-01/data.parquet | 2023-07-11T16:29:26  |\n| s3://m1/year=2022/ | datafusion           | public              | t1                | s3://t1/date=2022-02-02/data.parquet | 2023-07-11T16:45:22  |\n| s3://m1/year=2022/ | datafusion           | public              | t1                | s3://t1/date=2022-02-03/data.parquet | 2023-07-11T16:45:44  |\n| s3://m1/year=2023/ | datafusion           | public              | t1                | s3://t1/date=2023-03-03/data.parquet | 2023-07-11T16:45:44  |\n+--------------------+----------------------+---------------------+-------------------+--------------------------------------+----------------------+\n```\n\n## More detailed example (with code)\n\nAs of now, actually implementing materialized views is somewhat complicated, as the library is initially focused on providing a minimal kernel of functionality that can be shared across multiple implementations of materialized views. Broadly, the process includes these steps:\n\n* Define a custom `MaterializedListingTable` type that implements `Materialized`\n* Register the type globally using the `register_materialized` global function\n* Initialize the `FileMetadata` component\n* Initialize the `RowMetadataRegistry`\n* Register the `mv_dependencies` and `stale_files` UDTFs (User Defined Table Functions) in your DataFusion `SessionContext`\n* Periodically regenerate directories marked as stale by `stale_files`\n\nA full walkthrough of this process including implementation can be seen in an integration test, under [`tests/materialized_listing_table.rs`](tests/materialized_listing_table.rs).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-materialized-views","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-materialized-views","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-materialized-views/lists"}