{"id":14987741,"url":"https://github.com/apache/hudi-rs","last_synced_at":"2025-05-15T08:09:15.197Z","repository":{"id":238020493,"uuid":"795242557","full_name":"apache/hudi-rs","owner":"apache","description":"The native Rust implementation for Apache Hudi, with Python API bindings.","archived":false,"fork":false,"pushed_at":"2025-05-05T19:33:21.000Z","size":853,"stargazers_count":217,"open_issues_count":41,"forks_count":42,"subscribers_count":17,"default_branch":"main","last_synced_at":"2025-05-10T17:16:31.124Z","etag":null,"topics":["apache","hudi","python","rust"],"latest_commit_sha":null,"homepage":"https://hudi.apache.org/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-02T21:40:33.000Z","updated_at":"2025-05-09T09:21:38.000Z","dependencies_parsed_at":"2024-06-13T02:24:19.389Z","dependency_job_id":"079ff952-4466-4743-adcf-04a2c7fdeef2","html_url":"https://github.com/apache/hudi-rs","commit_stats":{"total_commits":62,"total_committers":12,"mean_commits":5.166666666666667,"dds":"0.19354838709677424","last_synced_commit":"e23e6ed3190e412a9d196fd3959b54c27ca75e2b"},"previous_names":["apache/hudi-rs"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fhudi-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fhudi-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fhudi-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fhudi-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/hudi-rs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253850240,"owners_count":21973661,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","hudi","python","rust"],"created_at":"2024-09-24T14:15:18.869Z","updated_at":"2025-05-15T08:09:15.169Z","avatar_url":"https://github.com/apache.png","language":"Rust","funding_links":[],"categories":["Data Processing \u0026 DataFrames"],"sub_categories":[],"readme":"\u003c!--\n  ~ Licensed to the Apache Software Foundation (ASF) under one\n  ~ or more contributor license agreements.  See the NOTICE file\n  ~ distributed with this work for additional information\n  ~ regarding copyright ownership.  The ASF licenses this file\n  ~ to you under the Apache License, Version 2.0 (the\n  ~ \"License\"); you may not use this file except in compliance\n  ~ with the License.  You may obtain a copy of the License at\n  ~\n  ~   http://www.apache.org/licenses/LICENSE-2.0\n  ~\n  ~ Unless required by applicable law or agreed to in writing,\n  ~ software distributed under the License is distributed on an\n  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n  ~ KIND, either express or implied.  See the License for the\n  ~ specific language governing permissions and limitations\n  ~ under the License.\n--\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://hudi.apache.org/\"\u003e\n    \u003cimg src=\"https://hudi.apache.org/assets/images/hudi_logo_transparent_1400x600.png\" alt=\"Hudi logo\" height=\"120px\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  The native Rust implementation for Apache Hudi, with Python API bindings.\n  \u003cbr\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://github.com/apache/hudi-rs/actions/workflows/ci.yml\"\u003e\n    \u003cimg alt=\"hudi-rs ci\" src=\"https://github.com/apache/hudi-rs/actions/workflows/ci.yml/badge.svg\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://codecov.io/github/apache/hudi-rs\"\u003e\n    \u003cimg alt=\"hudi-rs codecov\" src=\"https://codecov.io/github/apache/hudi-rs/graph/badge.svg\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://join.slack.com/t/apache-hudi/shared_invite/zt-2ggm1fub8-_yt4Reu9djwqqVRFC7X49g\"\u003e\n    \u003cimg alt=\"join hudi slack\" src=\"https://img.shields.io/badge/slack-%23hudi-72eff8?logo=slack\u0026color=48c628\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://x.com/apachehudi\"\u003e\n    \u003cimg alt=\"follow hudi x/twitter\" src=\"https://img.shields.io/twitter/follow/apachehudi?label=apachehudi\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.linkedin.com/company/apache-hudi\"\u003e\n    \u003cimg alt=\"follow hudi linkedin\" src=\"https://img.shields.io/badge/apache%E2%80%93hudi-0077B5?logo=linkedin\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nThe `hudi-rs` project aims to broaden the use of [Apache Hudi](https://github.com/apache/hudi) for a diverse range of\nusers and projects.\n\n| Source                  | Downloads                   | Installation Command |\n|-------------------------|-----------------------------|----------------------|\n| [**PyPi.org**][pypi]    | [![][pypi-badge]][pypi]     | `pip install hudi`   |\n| [**Crates.io**][crates] | [![][crates-badge]][crates] | `cargo add hudi`     |\n\n[pypi]: https://pypi.org/project/hudi/\n[pypi-badge]: https://img.shields.io/pypi/dm/hudi?style=flat-square\u0026color=51AEF3\n[crates]: https://crates.io/crates/hudi\n[crates-badge]: https://img.shields.io/crates/d/hudi?style=flat-square\u0026color=163669\n\n## Usage Examples\n\n\u003e [!NOTE]\n\u003e These examples expect a Hudi table exists at `/tmp/trips_table`, created using\n\u003e the [quick start guide](https://hudi.apache.org/docs/quick-start-guide).\n\n### Snapshot Query\n\nSnapshot query reads the latest version of the data from the table. The table API also accepts partition filters.\n\n#### Python\n\n```python\nfrom hudi import HudiTableBuilder\nimport pyarrow as pa\n\nhudi_table = HudiTableBuilder.from_base_uri(\"/tmp/trips_table\").build()\nbatches = hudi_table.read_snapshot(filters=[(\"city\", \"=\", \"san_francisco\")])\n\n# convert to PyArrow table\narrow_table = pa.Table.from_batches(batches)\nresult = arrow_table.select([\"rider\", \"city\", \"ts\", \"fare\"])\nprint(result)\n```\n\n#### Rust\n\n```rust\nuse hudi::error::Result;\nuse hudi::table::builder::TableBuilder as HudiTableBuilder;\nuse arrow::compute::concat_batches;\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c()\u003e {\n    let hudi_table = HudiTableBuilder::from_base_uri(\"/tmp/trips_table\").build().await?;\n    let batches = hudi_table.read_snapshot(\u0026[(\"city\", \"=\", \"san_francisco\")]).await?;\n    let batch = concat_batches(\u0026batches[0].schema(), \u0026batches)?;\n    let columns = vec![\"rider\", \"city\", \"ts\", \"fare\"];\n    for col_name in columns {\n        let idx = batch.schema().index_of(col_name).unwrap();\n        println!(\"{}: {}\", col_name, batch.column(idx));\n    }\n    Ok(())\n}\n```\n\nTo run read-optimized (RO) query on Merge-on-Read (MOR) tables, set `hoodie.read.use.read_optimized.mode` when creating the table.\n\n#### Python\n\n```python\nhudi_table = (\n    HudiTableBuilder\n    .from_base_uri(\"/tmp/trips_table\")\n    .with_option(\"hoodie.read.use.read_optimized.mode\", \"true\")\n    .build()\n)\n```\n\n#### Rust\n\n```rust\nlet hudi_table = \n    HudiTableBuilder::from_base_uri(\"/tmp/trips_table\")\n    .with_option(\"hoodie.read.use.read_optimized.mode\", \"true\")\n    .build().await?;\n```\n\n\u003e [!NOTE]\n\u003e Currently reading MOR tables is limited to tables with Parquet data blocks.\n\n### Time-Travel Query\n\nTime-travel query reads the data at a specific timestamp from the table. The table API also accepts partition filters.\n\n#### Python\n\n```python\nbatches = (\n    hudi_table\n    .read_snapshot_as_of(\"20241231123456789\", filters=[(\"city\", \"=\", \"san_francisco\")])\n)\n```\n\n#### Rust\n\n```rust\nlet batches = \n    hudi_table\n    .read_snapshot_as_of(\"20241231123456789\", \u0026[(\"city\", \"=\", \"san_francisco\")]).await?;\n```\n\n### Incremental Query\n\nIncremental query reads the changed data from the table for a given time range.\n\n#### Python\n\n```python\n# read the records between t1 (exclusive) and t2 (inclusive)\nbatches = hudi_table.read_incremental_records(t1, t2)\n\n# read the records after t1\nbatches = hudi_table.read_incremental_records(t1)\n```\n\n#### Rust\n\n```rust\n// read the records between t1 (exclusive) and t2 (inclusive)\nlet batches = hudi_table.read_incremental_records(t1, Some(t2)).await?;\n\n// read the records after t1\nlet batches = hudi_table.read_incremental_records(t1, None).await?;\n```\n\n\u003e [!NOTE]\n\u003e Currently the only supported format for the timestamp arguments is Hudi Timeline format: `yyyyMMddHHmmssSSS` or `yyyyMMddHHmmss`.\n\n## Query Engine Integration\n\nHudi-rs provides APIs to support integration with query engines. The sections below highlight some commonly used APIs.\n\n### Table API\n\nCreate a Hudi table instance using its constructor or the `TableBuilder` API.\n\n| Stage           | API                                       | Description                                                                    |\n|-----------------|-------------------------------------------|--------------------------------------------------------------------------------|\n| Query planning  | `get_file_slices()`                       | For snapshot query, get a list of file slices.                                 |\n|                 | `get_file_slices_splits()`                | For snapshot query, get a list of file slices in splits.                       |\n|                 | `get_file_slices_as_of()`                 | For time-travel query, get a list of file slices at a given time.              |\n|                 | `get_file_slices_splits_as_of()`          | For time-travel query, get a list of file slices in splits at a given time.    |\n|                 | `get_file_slices_between()`               | For incremental query, get a list of changed file slices between a time range. |\n| Query execution | `create_file_group_reader_with_options()` | Create a file group reader instance with the table instance's configs.         |\n\n### File Group API\n\nCreate a Hudi file group reader instance using its constructor or the Hudi table API `create_file_group_reader_with_options()`.\n\n| Stage           | API                                   | Description                                                                                                                                                                        |\n|-----------------|---------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Query execution | `read_file_slice()`                   | Read records from a given file slice; based on the configs, read records from only base file, or from base file and log files, and merge records based on the configured strategy. |\n\n\n### Apache DataFusion\n\nEnabling the `hudi` crate with `datafusion` feature will provide a [DataFusion](https://datafusion.apache.org/) \nextension to query Hudi tables.\n\n\u003cdetails\u003e\n\u003csummary\u003eAdd crate hudi with datafusion feature to your application to query a Hudi table.\u003c/summary\u003e\n\n```shell\ncargo new my_project --bin \u0026\u0026 cd my_project\ncargo add tokio@1 datafusion@43\ncargo add hudi --features datafusion\n```\n\nUpdate `src/main.rs` with the code snippet below then `cargo run`.\n\n\u003c/details\u003e\n\n```rust\nuse std::sync::Arc;\n\nuse datafusion::error::Result;\nuse datafusion::prelude::{DataFrame, SessionContext};\nuse hudi::HudiDataSource;\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c()\u003e {\n    let ctx = SessionContext::new();\n    let hudi = HudiDataSource::new_with_options(\n        \"/tmp/trips_table\",\n        [(\"hoodie.read.input.partitions\", \"5\")]).await?;\n    ctx.register_table(\"trips_table\", Arc::new(hudi))?;\n    let df: DataFrame = ctx.sql(\"SELECT * from trips_table where city = 'san_francisco'\").await?;\n    df.show().await?;\n    Ok(())\n}\n```\n\n### Other Integrations\n\nHudi is also integrated with\n\n- [Daft](https://www.getdaft.io/projects/docs/en/stable/integrations/hudi/)\n- [Ray](https://docs.ray.io/en/latest/data/api/doc/ray.data.read_hudi.html#ray.data.read_hudi)\n\n### Work with cloud storage\n\nEnsure cloud storage credentials are set properly as environment variables, e.g., `AWS_*`, `AZURE_*`, or `GOOGLE_*`.\nRelevant storage environment variables will then be picked up. The target table's base uri with schemes such\nas `s3://`, `az://`, or `gs://` will be processed accordingly.\n\nAlternatively, you can pass the storage configuration as options via Table APIs.\n\n#### Python\n\n```python\nfrom hudi import HudiTableBuilder\n\nhudi_table = (\n    HudiTableBuilder\n    .from_base_uri(\"s3://bucket/trips_table\")\n    .with_option(\"aws_region\", \"us-west-2\")\n    .build()\n)\n```\n\n#### Rust\n\n```rust\nuse hudi::table::builder::TableBuilder as HudiTableBuilder;\n\nasync fn main() -\u003e Result\u003c()\u003e {\n    let hudi_table = \n        HudiTableBuilder::from_base_uri(\"s3://bucket/trips_table\")\n        .with_option(\"aws_region\", \"us-west-2\")\n        .build().await?;\n}\n```\n\n## Contributing\n\nCheck out the [contributing guide](./CONTRIBUTING.md) for all the details about making contributions to the project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fhudi-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Fhudi-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fhudi-rs/lists"}