{"id":28532134,"url":"https://github.com/databrickslabs/arcuate","last_synced_at":"2025-06-28T00:04:20.286Z","repository":{"id":37954776,"uuid":"498700977","full_name":"databrickslabs/arcuate","owner":"databrickslabs","description":"Delta Sharing + MLflow for ML model \u0026 experiment exchange (arcuate delta - a fan shaped river delta)","archived":false,"fork":false,"pushed_at":"2023-12-27T15:16:04.000Z","size":12458,"stargazers_count":22,"open_issues_count":9,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-09T15:43:40.237Z","etag":null,"topics":["big-data","data-sharing","delta-sharing","mlflow","spark"],"latest_commit_sha":null,"homepage":"https://databrickslabs.github.io/arcuate/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databrickslabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-01T11:04:12.000Z","updated_at":"2024-08-21T18:36:40.000Z","dependencies_parsed_at":"2023-02-13T01:45:50.720Z","dependency_job_id":null,"html_url":"https://github.com/databrickslabs/arcuate","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/databrickslabs/arcuate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Farcuate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Farcuate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Farcuate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Farcuate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databrickslabs","download_url":"https://codeload.github.com/databrickslabs/arcuate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databrickslabs%2Farcuate/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262352616,"owners_count":23297688,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-sharing","delta-sharing","mlflow","spark"],"created_at":"2025-06-09T15:31:05.006Z","updated_at":"2025-06-28T00:04:20.264Z","avatar_url":"https://github.com/databrickslabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Arcuate\n\n*Deltas with a triangular or fan shape are called* **arcuate** *(arc-like) deltas. The Nile River forms an arcuate delta as it empties into the Mediterranean Sea.*\n___\n\n[![DBR](https://img.shields.io/badge/DBR-10.4_ML-green)]()\n[![PyTest](https://github.com/databrickslabs/arcuate/actions/workflows/pytest.yml/badge.svg?branch=main)](https://github.com/databrickslabs/arcuate/actions/workflows/pytest.yml)\n[![Build arcuate project](https://github.com/databrickslabs/arcuate/actions/workflows/build.yml/badge.svg?branch=main)](https://github.com/databrickslabs/arcuate/actions/workflows/build.yml)\n\n## Model exchange via Delta Sharing\n\nOne of the main drivers for data sharing is the knowledge contained within the data. An alternative for sharing data in highly regulated environments can be sharing of models trained on the sensitive data.\n\nCurrent options are not fit for purpose\n\nLeveraging experiments \u0026 model in MLflow, combining it with Delta to leverage Delta Sharing capabilities to enable models exchange\n\nUsing Delta Sharing also allow sharing of relevant metadata such as training parameters, model accuracy, artifacts, etc.\n\nThe project name takes inspiration from arcuate delta - the wide fan-shaped river delta. We believe that enabling model exchange will have a wide impact on many digitally connected industries.\n\n![How it works](images/model_exchange.png)\n\n## How to use:\n\n- Install the library\n  ```python\n  pip install arcuate\n  ```\n\n- Train model in Databricks (or elsewhere), store it in MLflow\n- Export MLflow experiments \u0026 models to a Delta table and add it to a share, using Python APIs\n  ```python\n  from arcuate import *\n  client = MlflowClient()\n  spark = SparkSession.builder.getOrCreate()\n\n  # export the experiment experiment_name to table_name, and add it to share_name\n  provider.export_experiments(client, spark, experiment_name, table_name, share_name)\n\n  # export the model model_name to table_name, and add it to share_name\n  provider.export_models(client, spark, model_name, table_name, share_name)    \n  ```\n\n- Recipient of this shared table can load it into MLflow seamlessly:\n  ```python\n  from arcuate import *\n  import delta_sharing\n\n  client = MlflowClient()\n  spark = SparkSession.builder.getOrCreate()\n  df = delta_sharing.load_as_pandas(delta_sharing_coordinate)\n\n  # import the shared table as experiment_name\n  recipient.import_experiments(client, df, experiment_name)\n  # or import the model_name\n  recipient.import_models(client, df, model_name)\n  ```\n\n## Project support\nPlease note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.\n\nAny issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.\n\n## Authors:\n- Vuong Nguyen, Solutions Architect, \u003cvuong.nguyen@databricks.com\u003e\n- Milos Colic, Sr. Solutions Architect, \u003cmilos.colic@databricks.com\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabrickslabs%2Farcuate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabrickslabs%2Farcuate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabrickslabs%2Farcuate/lists"}