{"id":46211081,"url":"https://github.com/buoyant-data/oxbow","last_synced_at":"2026-04-07T21:01:59.881Z","repository":{"id":162637879,"uuid":"637127098","full_name":"buoyant-data/oxbow","owner":"buoyant-data","description":"Collection of AWS Lambdas for creating and managing Delta tables","archived":false,"fork":false,"pushed_at":"2026-04-01T13:16:01.000Z","size":591,"stargazers_count":57,"open_issues_count":5,"forks_count":11,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-01T15:23:38.193Z","etag":null,"topics":["datalake","deltalake","lambda","parquet","rust"],"latest_commit_sha":null,"homepage":"https://www.buoyantdata.com","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/buoyant-data.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-05-06T15:35:50.000Z","updated_at":"2026-04-01T13:13:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"cb2e90df-cae3-431d-830c-52171eb3c387","html_url":"https://github.com/buoyant-data/oxbow","commit_stats":null,"previous_names":[],"tags_count":90,"template":false,"template_full_name":null,"purl":"pkg:github/buoyant-data/oxbow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buoyant-data%2Foxbow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buoyant-data%2Foxbow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buoyant-data%2Foxbow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buoyant-data%2Foxbow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/buoyant-data","download_url":"https://codeload.github.com/buoyant-data/oxbow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buoyant-data%2Foxbow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31528751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"ssl_error","status_checked_at":"2026-04-07T16:28:06.951Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datalake","deltalake","lambda","parquet","rust"],"created_at":"2026-03-03T09:09:02.922Z","updated_at":"2026-04-07T21:01:59.856Z","avatar_url":"https://github.com/buoyant-data.png","language":"Rust","readme":"ifdef::env-github[]\n:tip-caption: :bulb:\n:note-caption: :information_source:\n:important-caption: :heavy_exclamation_mark:\n:caution-caption: :fire:\n:warning-caption: :warning:\nendif::[]\n:toc: macro\n\n= Oxbow\n\nOxbow is a project to take an existing storage location which contains\nlink:https://parquet.apache.org[Apache Parquet] files into a\nlink:https://delta.io[Delta Lake table]. It is intended to run both as an AWS\nLambda or as a command line application.\n\nThe project is named after link:https://en.wikipedia.org/wiki/Oxbow_lake[Oxbow\nlakes] to keep with the lake theme.\n\ntoc::[]\n\n== Using\n\n=== Command Line\n\nExecuting `cargo build --release` from a clone of this repository will build\nthe command line binary `oxbow` which can be used directly to convert a\ndirectory full of `.parquet` files into a Delta table.\n\nThis is an _in place_ operation and will convert the specified table location\ninto a Delta table!\n\n.Simple local files\n[source,bash]\n----\n% oxbow --table ./path/to/my/parquet-files\n----\n\n.Files on AWS\n[source,bash]\n----\n% export AWS_REGION=us-west-2\n% export AWS_SECRET_ACCESS_KEY=xxxx\n# Set other AWS environment variables\n% oxbow --table s3://my-bucket/prefix/to/parquet\n----\n\n=== Lambda\n\nThe `deployment/` directory contains the necessary Terraform to provision the\nfunction, a DynamoDB table for locking, S3 bucket, and IAM permissions.\n\nAfter configuring the necessary authentication for Terraform, the following\nsteps can be used to provision:\n\n[source,bash]\n----\ncargo lambda build --release --output-format zip --bin oxbow-lambda\nterraform init\nterraform plan\nterraform apply\n----\n\n[NOTE]\n====\nTerraform configures the Lambda to run with the smallest amount of memory\nallowed. For bucket locations with massive `.parquet` files, this may need to\nbe tuned.\n====\n\n==== Advanced\n\nTo help ameliorate\nlink:https://www.buoyantdata.com/blog/2023-11-27-concurrency-limitations-with-deltalake-on-aws.html[concurrency\nchallenges for Delta Lake on AWS] with the DynamoDb lock, the `deployment/`\ndirectory also contains an \"advanced\" pattern which uses the `group-events`\nLambda to help serialize S3 Bucket Notifications into an AWS SQS FIFO with\nMessage Group IDs.\n\nTo build all the necessary code locally for the Advanced pattern, please run\n`make build-release`\n\n\n== Development\n\nBuilding and testing can be done with cargo: `cargo test`.\n\nIn order to deploy this in AWS Lambda, it must first be built with the `cargo\nlambda` command line tool, e.g.:\n\n[source,bash]\n----\ncargo lambda build --release --output-format zip\n----\n\nThis will produce the file: `target/lambda/oxbow-lambda/bootstrap.zip` which can be\nuploaded direectly in the web console, or referenced in the Terraform (see\n`deployment.tf`).\n\n=== Design\n\n==== Command Line\n\nWhen running `oxbow` via command line it is a _one time operation_. It will\ntake an existing directory or location full of `.parquet` files and create a\nDelta table out of it.\n\n\n==== Lambda\n\nWhen running `oxbow` inside of a AWS Lambda function it should be configured\nwith an S3 Event Trigger and create new commits to a Delta Lake table any time\na `.parquet` file is added to the bucket/prefix.\n\n== Licensing\n\nThis repository is licensed under the link:https://www.gnu.org/licenses/agpl-3.0.en.html[AGPL 3.0]. If your organization is interested in re-licensing this function for re-use, contact me via email for commercial licensing terms: `rtyler@buoyantdata.com`\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbuoyant-data%2Foxbow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbuoyant-data%2Foxbow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbuoyant-data%2Foxbow/lists"}