{"id":24647273,"url":"https://github.com/jankaul/iceberg-rust","last_synced_at":"2025-05-15T17:09:20.204Z","repository":{"id":65290744,"uuid":"580539035","full_name":"JanKaul/iceberg-rust","owner":"JanKaul","description":"Rust implementation of Apache Iceberg with integration for Datafusion","archived":false,"fork":false,"pushed_at":"2025-05-13T14:10:15.000Z","size":4928,"stargazers_count":177,"open_issues_count":18,"forks_count":26,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-05-13T15:33:32.538Z","etag":null,"topics":["arrow","datafusion","iceberg","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JanKaul.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-12-20T20:15:41.000Z","updated_at":"2025-05-13T14:10:19.000Z","dependencies_parsed_at":"2024-11-30T01:41:12.412Z","dependency_job_id":"18fdf603-63d5-4131-811a-c0db89db8afb","html_url":"https://github.com/JanKaul/iceberg-rust","commit_stats":{"total_commits":54,"total_committers":1,"mean_commits":54.0,"dds":0.0,"last_synced_commit":"e459c75c437352109b511dde991ec2ca6e5c1d08"},"previous_names":["jankaul/iceberg-rust","jankaul/iceberg-rust_archive"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JanKaul%2Ficeberg-rust","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JanKaul%2Ficeberg-rust/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JanKaul%2Ficeberg-rust/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JanKaul%2Ficeberg-rust/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JanKaul","download_url":"https://codeload.github.com/JanKaul/iceberg-rust/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253989162,"owners_count":21995763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","datafusion","iceberg","rust"],"created_at":"2025-01-25T15:17:16.793Z","updated_at":"2025-05-15T17:09:15.196Z","avatar_url":"https://github.com/JanKaul.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rust implementation of [Apache Iceberg](https://iceberg.apache.org)\n\nApache Iceberg is Open Table Format that brings ACID quarantees to large analytic datasets. \nThis repository contains a Rust implementation of Apache Iceberg that focuses on the interoperability with the Arrow ecosystem.\nIt provides an Iceberg integration for the [Datafusion](https://arrow.apache.org/datafusion/) query engine.\n\n[![Crates.io][crates-badge]][crates-url]\n[![Apache V2.0 licensed][apache-badge]][apache-url]\n[![Build Status][actions-badge]][actions-url]\n\n[crates-badge]: https://img.shields.io/crates/v/iceberg-rust\n[crates-url]: https://crates.io/crates/iceberg-rust\n[apache-badge]: https://img.shields.io/badge/License-Apache_2.0-blue.svg\n[apache-url]: https://github.com/JanKaul/iceberg-rust/blob/main/LICENSE\n[actions-badge]: https://github.com/JanKaul/iceberg-rust/actions/workflows/rust.yml/badge.svg?branch=main\n[actions-url]: https://github.com/JanKaul/iceberg-rust/actions/workflows/rust.yml\n\n## Features\n\n### Iceberg tables\n\n| Feature | Status |\n| --- | --- |\n| Read | :white_check_mark: |\n| Read partitioned | :white_check_mark: |\n| Insert | :white_check_mark: |\n| Insert partitioned | :white_check_mark: |\n| Equality deletes | :white_check_mark: |\n| Positional deletes | |\n\n### Iceberg Views\n\n| Feature | Status |\n| --- | --- |\n| Read | :white_check_mark: |\n\n### Iceberg Materialized Views\n\n| Feature | Status |\n| --- | --- |\n| Read | :white_check_mark: |\n| Full refresh | :white_check_mark: |\n| Incremental refresh | :white_check_mark: |\n\n### Catalogs\n\n- REST\n- S3Tables\n- Filesystem\n- Glue\n- RDBMS (Postgres, MySQL)\n\n### File formats\n\n- parquet\n\n### Integrations\n\n- [Datafusion](https://arrow.apache.org/datafusion/)\n\n## Example\n\nCheck out the [datafusion examples](datafusion_iceberg/examples).\n\n```rust\nuse datafusion::{arrow::array::Int64Array, prelude::SessionContext};\nuse datafusion_iceberg::DataFusionTable;\nuse iceberg_rust::{\n    catalog::Catalog,\n    spec::{\n        partition::{PartitionField, PartitionSpec, Transform},\n        schema::Schema,\n        types::{PrimitiveType, StructField, StructType, Type},\n    },\n    table::Table,\n};\nuse iceberg_sql_catalog::SqlCatalog;\nuse object_store::memory::InMemory;\nuse object_store::ObjectStore;\n\nuse std::sync::Arc;\n\n#[tokio::main]\npub(crate) async fn main() {\n    let object_store: Arc\u003cdyn ObjectStore\u003e = Arc::new(InMemory::new());\n\n    let catalog: Arc\u003cdyn Catalog\u003e = Arc::new(\n        SqlCatalog::new(\"sqlite://\", \"test\", object_store.clone())\n            .await\n            .unwrap(),\n    );\n\n    let schema = Schema::builder()\n        .with_fields(\n            StructType::builder()\n                .with_struct_field(StructField {\n                    id: 1,\n                    name: \"id\".to_string(),\n                    required: true,\n                    field_type: Type::Primitive(PrimitiveType::Long),\n                    doc: None,\n                })\n                .with_struct_field(StructField {\n                    id: 2,\n                    name: \"customer_id\".to_string(),\n                    required: true,\n                    field_type: Type::Primitive(PrimitiveType::Long),\n                    doc: None,\n                })\n                .with_struct_field(StructField {\n                    id: 3,\n                    name: \"product_id\".to_string(),\n                    required: true,\n                    field_type: Type::Primitive(PrimitiveType::Long),\n                    doc: None,\n                })\n                .with_struct_field(StructField {\n                    id: 4,\n                    name: \"date\".to_string(),\n                    required: true,\n                    field_type: Type::Primitive(PrimitiveType::Date),\n                    doc: None,\n                })\n                .with_struct_field(StructField {\n                    id: 5,\n                    name: \"amount\".to_string(),\n                    required: true,\n                    field_type: Type::Primitive(PrimitiveType::Int),\n                    doc: None,\n                })\n                .build()\n                .unwrap(),\n        )\n        .build()\n        .unwrap();\n\n    let partition_spec = PartitionSpec::builder()\n        .with_partition_field(PartitionField::new(4, 1000, \"day\", Transform::Day))\n        .build()\n        .expect(\"Failed to create partition spec\");\n\n    let table = Table::builder()\n        .with_name(\"orders\")\n        .with_location(\"/test/orders\")\n        .with_schema(schema)\n        .with_partition_spec(partition_spec)\n        .build(\u0026[\"test\".to_owned()], catalog)\n        .await\n        .expect(\"Failed to create table\");\n\n    let table = Arc::new(DataFusionTable::from(table));\n\n    let ctx = SessionContext::new();\n\n    ctx.register_table(\"orders\", table).unwrap();\n\n    ctx.sql(\n        \"INSERT INTO orders (id, customer_id, product_id, date, amount) VALUES \n        (1, 1, 1, '2020-01-01', 1),\n        (2, 2, 1, '2020-01-01', 1),\n        (3, 3, 1, '2020-01-01', 3),\n        (4, 1, 2, '2020-02-02', 1),\n        (5, 1, 1, '2020-02-02', 2),\n        (6, 3, 3, '2020-02-02', 3);\",\n    )\n    .await\n    .expect(\"Failed to create query plan for insert\")\n    .collect()\n    .await\n    .expect(\"Failed to insert values into table\");\n\n    let batches = ctx\n        .sql(\"select product_id, sum(amount) from orders group by product_id;\")\n        .await\n        .expect(\"Failed to create plan for select\")\n        .collect()\n        .await\n        .expect(\"Failed to execute select query\");\n\n    for batch in batches {\n        if batch.num_rows() != 0 {\n            let (product_ids, amounts) = (\n                batch\n                    .column(0)\n                    .as_any()\n                    .downcast_ref::\u003cInt64Array\u003e()\n                    .unwrap(),\n                batch\n                    .column(1)\n                    .as_any()\n                    .downcast_ref::\u003cInt64Array\u003e()\n                    .unwrap(),\n            );\n            for (product_id, amount) in product_ids.iter().zip(amounts) {\n                if product_id.unwrap() == 1 {\n                    assert_eq!(amount.unwrap(), 7)\n                } else if product_id.unwrap() == 2 {\n                    assert_eq!(amount.unwrap(), 1)\n                } else if product_id.unwrap() == 3 {\n                    assert_eq!(amount.unwrap(), 3)\n                } else {\n                    panic!(\"Unexpected product id\")\n                }\n            }\n        }\n    }\n\n    ctx.sql(\n        \"INSERT INTO orders (id, customer_id, product_id, date, amount) VALUES \n        (7, 1, 3, '2020-01-03', 1),\n        (8, 2, 1, '2020-01-03', 2),\n        (9, 2, 2, '2020-01-03', 1);\",\n    )\n    .await\n    .expect(\"Failed to create query plan for insert\")\n    .collect()\n    .await\n    .expect(\"Failed to insert values into table\");\n\n    let batches = ctx\n        .sql(\"select product_id, sum(amount) from orders group by product_id;\")\n        .await\n        .expect(\"Failed to create plan for select\")\n        .collect()\n        .await\n        .expect(\"Failed to execute select query\");\n\n    for batch in batches {\n        if batch.num_rows() != 0 {\n            let (product_ids, amounts) = (\n                batch\n                    .column(0)\n                    .as_any()\n                    .downcast_ref::\u003cInt64Array\u003e()\n                    .unwrap(),\n                batch\n                    .column(1)\n                    .as_any()\n                    .downcast_ref::\u003cInt64Array\u003e()\n                    .unwrap(),\n            );\n            for (product_id, amount) in product_ids.iter().zip(amounts) {\n                if product_id.unwrap() == 1 {\n                    assert_eq!(amount.unwrap(), 9)\n                } else if product_id.unwrap() == 2 {\n                    assert_eq!(amount.unwrap(), 2)\n                } else if product_id.unwrap() == 3 {\n                    assert_eq!(amount.unwrap(), 4)\n                } else {\n                    panic!(\"Unexpected product id\")\n                }\n            }\n        }\n    }\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjankaul%2Ficeberg-rust","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjankaul%2Ficeberg-rust","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjankaul%2Ficeberg-rust/lists"}