{"id":16556668,"url":"https://github.com/arrow-udf/arrow-udf","last_synced_at":"2025-04-06T00:07:17.937Z","repository":{"id":212281419,"uuid":"731115289","full_name":"arrow-udf/arrow-udf","owner":"arrow-udf","description":"A User-Defined Function Framework for Apache Arrow.","archived":false,"fork":false,"pushed_at":"2025-03-22T16:35:09.000Z","size":930,"stargazers_count":88,"open_issues_count":11,"forks_count":17,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-03T04:59:57.712Z","etag":null,"topics":["arrow","python","rust","udf","wasm"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arrow-udf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-13T11:45:30.000Z","updated_at":"2025-03-22T16:25:06.000Z","dependencies_parsed_at":"2024-01-24T05:24:17.536Z","dependency_job_id":"e1e9159e-7f2e-44e4-8ce1-e234b1ab142e","html_url":"https://github.com/arrow-udf/arrow-udf","commit_stats":{"total_commits":184,"total_committers":11,"mean_commits":"16.727272727272727","dds":0.1684782608695652,"last_synced_commit":"80b09d67ee0c7b796bf7a492a71842ac64622406"},"previous_names":["risingwavelabs/arrow-udf-wasm","arrow-udf/arrow-udf","risingwavelabs/arrow-udf"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arrow-udf%2Farrow-udf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arrow-udf%2Farrow-udf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arrow-udf%2Farrow-udf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arrow-udf%2Farrow-udf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arrow-udf","download_url":"https://codeload.github.com/arrow-udf/arrow-udf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247415967,"owners_count":20935388,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","python","rust","udf","wasm"],"created_at":"2024-10-11T20:05:22.955Z","updated_at":"2025-04-06T00:07:17.907Z","avatar_url":"https://github.com/arrow-udf.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# Arrow User-Defined Functions Framework\n\nEasily create and run user-defined functions (UDF) on Apache Arrow.\nYou can define functions in Rust, Python, Java or JavaScript.\nThe functions can be executed natively, or in WebAssembly, or in a [remote server].\n\n| Language   | Native                             | WebAssembly              | Remote                    |\n| ---------- |------------------------------------|--------------------------|---------------------------|\n| Rust       | [arrow-udf]                        | [arrow-udf-runtime/wasm] |                           |\n| Python     | [arrow-udf-runtime/python]         |                          | [arrow-udf-remote/python] |\n| JavaScript | [arrow-udf-runtime/javascript]     |                          |                           |\n| Java       |                                    |                          | [arrow-udf-remote/java]   |\n\n[remote server]: ./arrow-udf-runtime/src/remote\n[arrow-udf]: ./arrow-udf\n[arrow-udf-runtime/python]: ./arrow-udf-runtime/src/python\n[arrow-udf-runtime/javascript]: ./arrow-udf-runtime/src/javascript\n[arrow-udf-runtime/wasm]: ./arrow-udf-runtime/src/wasm\n[arrow-udf-remote/python]: ./arrow-udf-remote/python\n[arrow-udf-remote/java]: ./arrow-udf-remote/java\n\n\u003e [!NOTE]\n\u003e [arrow-udf] generates `RecordBatch` Rust functions from scalar functions, and can be used in more general contexts\n\u003e whenever you need to work with Arrow Data in Rust, not specifically user-provided code.\n\u003e\n\u003e Other crates are more focused on providing runtimes or protocols for running user-provided code.\n\n- `arrow-udf`: You call `fn(\u0026RecordBatch)-\u003eRecordBatch` directly, as if you wrote it by hand.\n- `arrow-udf-runtime/python`/`arrow-udf-runtime/javascript`: You first `add_function` to a `Runtime`, and then call it with the `Runtime`.\n- `arrow-udf-runtime/wasm`: You first create a `Runtime` with compiled WASM binary, and then `find_function` and call it.\n- `arrow-udf-runtime/remote`: You start a `Client` to call the function running in a remote `Server` process.\n\nYou can also use this library to add custom functions to DuckDB, see [arrow-udf-duckdb-example].\n\n[arrow-udf-duckdb-example]: ./arrow-udf-duckdb-example\n\n## Extension Types\n\nIn addition to the standard types defined by Arrow, these crates also support the following data types through Arrow's [extension type](https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types). When using extension types, you need to add the `ARROW:extension:name` key to the field's metadata.\n\n| Extension Type | Physical Type             | `ARROW:extension:name`   |\n| -------------- | ------------------------- | ------------------------ |\n| JSON           | Utf8, Binary, LargeBinary | `arrowudf.json`          |\n| Decimal        | Utf8                      | `arrowudf.decimal`       |\n\nAlternatively, you can configure the extension metadata key and values to look for when converting between Arrow and extension types:\n\n```rust\nlet mut js_runtime = arrow_udf_runtime::javascript::Runtime::new().unwrap();\nlet converter = js_runtime.converter_mut();\nconverter.set_arrow_extension_key(\"Extension\");\nconverter.set_json_extension_name(\"Variant\");\nconverter.set_decimal_extension_name(\"Decimal\");\n```\n\n### JSON Type\n\nJSON type is stored in string array in text form.\n\n```rust\nlet json_field = Field::new(name, DataType::Utf8, true)\n    .with_metadata([(\"ARROW:extension:name\".into(), \"arrowudf.json\".into())].into());\nlet json_array = StringArray::from(vec![r#\"{\"key\": \"value\"}\"#]);\n```\n\n### Decimal Type\n\nDifferent from the fixed-point decimal type built into Arrow, this decimal type represents floating-point numbers with arbitrary precision or scale, that is, the [unconstrained numeric](https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL) in Postgres. The decimal type is stored in a string array in text form.\n\n```rust\nlet decimal_field = Field::new(name, DataType::Utf8, true)\n    .with_metadata([(\"ARROW:extension:name\".into(), \"arrowudf.decimal\".into())].into());\nlet decimal_array = StringArray::from(vec![\"0.0001\", \"-1.23\", \"0\"]);\n```\n\n## Benchmarks\n\nWe have benchmarked the performance of function calls in different environments.\nYou can run the benchmarks with the following command:\n\n```sh\ncargo bench --bench bench\n```\n\nPerformance comparison of calling `gcd` on a chunk of 1024 rows:\n\n```\ngcd/native          1.5237 µs   x1\ngcd/wasm            15.547 µs   x10\ngcd/js(quickjs)     85.007 µs   x55\ngcd/python          175.29 µs   x115\n```\n\n## Who is using this library?\n\n- [RisingWave]: A Distributed SQL Database for Stream Processing.\n- [Databend]: An open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake.\n\n[RisingWave]: https://github.com/risingwavelabs/risingwave\n[Databend]: https://github.com/datafuselabs/databend\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farrow-udf%2Farrow-udf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farrow-udf%2Farrow-udf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farrow-udf%2Farrow-udf/lists"}