{"id":34114943,"url":"https://github.com/datarobot/airflow-provider-datarobot","last_synced_at":"2026-04-05T01:31:27.335Z","repository":{"id":39876977,"uuid":"465637205","full_name":"datarobot/airflow-provider-datarobot","owner":"datarobot","description":"DataRobot provider for Apache Airflow","archived":false,"fork":false,"pushed_at":"2025-08-22T20:26:35.000Z","size":722,"stargazers_count":31,"open_issues_count":0,"forks_count":5,"subscribers_count":75,"default_branch":"main","last_synced_at":"2025-12-17T04:23:18.894Z","etag":null,"topics":["airflow","apache-airflow","datarobot","dr-engineering","integration","mlops","python"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/airflow-provider-datarobot/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datarobot.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-03T08:45:43.000Z","updated_at":"2025-08-22T20:26:37.000Z","dependencies_parsed_at":"2024-04-08T18:58:07.544Z","dependency_job_id":"42be8050-6796-4387-81dd-c8978eac698d","html_url":"https://github.com/datarobot/airflow-provider-datarobot","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/datarobot/airflow-provider-datarobot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fairflow-provider-datarobot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fairflow-provider-datarobot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fairflow-provider-datarobot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fairflow-provider-datarobot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datarobot","download_url":"https://codeload.github.com/datarobot/airflow-provider-datarobot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fairflow-provider-datarobot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31421869,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T00:25:07.052Z","status":"ssl_error","status_checked_at":"2026-04-05T00:25:05.923Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","apache-airflow","datarobot","dr-engineering","integration","mlops","python"],"created_at":"2025-12-14T19:48:05.995Z","updated_at":"2026-04-05T01:31:27.324Z","avatar_url":"https://github.com/datarobot.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataRobot Provider for Apache Airflow\n[![Documentation](https://img.shields.io/badge/docs-readthedocs-forestgreen)](https://datarobot-datarobot-airflow-provider.readthedocs-hosted.com/en/latest/)\n[![PyPI version](https://img.shields.io/pypi/v/airflow-provider-datarobot)](https://pypi.org/project/airflow-provider-datarobot/)\n![Python versions](https://img.shields.io/pypi/pyversions/airflow-provider-datarobot)\n![License](https://img.shields.io/pypi/l/airflow-provider-datarobot)\n\nThis package provides operators, sensors, and a hook to integrate [DataRobot](https://www.datarobot.com) into Apache Airflow.\nUsing these components, you should be able to build the essential DataRobot pipeline - create a project, train models, deploy a model,\nand score predictions against the model deployment.\n\n## Install the Airflow provider\n\n**To run Airflow within DataRobot SaaS environment, please reach out to [DataRobot Support](https://www.datarobot.com/contact-us/).**\n\nFor a local installation, the DataRobot provider for Apache Airflow requires an environment with the following dependencies installed:\n\n* [Apache Airflow](https://pypi.org/project/apache-airflow/) \u003e= 2.3\n\n* [DataRobot Python API Client](https://pypi.org/project/datarobot/) \u003e= 3.2.0\n\nTo install the DataRobot provider, run the following command:\n\n``` sh\npip install airflow-provider-datarobot\n```\n\n## Create a connection from Airflow to DataRobot\n\nThe next step is to create a connection from Airflow to DataRobot:\n\n1. In the Airflow user interface, click **Admin \u003e Connections** to\n   [add an Airflow connection](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#creating-a-connection-with-the-ui).\n\n2. On the **List Connection** page, click **+ Add a new record**.\n\n3. In the **Add Connection** dialog box, configure the following fields:\n\n    | Field          | Description |\n    |----------------|-------------|\n    |Connection Id   | `datarobot_default` (this name is used by default in all operators) |\n    |Connection Type | DataRobot |\n    |API Key         | A DataRobot API key, created in the [DataRobot Developer Tools](https://app.datarobot.com/account/developer-tools), from the [*API Keys* section](https://app.datarobot.com/docs/api/api-quickstart/api-qs.html#create-a-datarobot-api-token). |\n    |DataRobot endpoint URL | `https://app.datarobot.com/api/v2` by default |\n\n4. Click **Test** to establish a test connection between Airflow and DataRobot.\n\n5. When the connection test is successful, click **Save**.\n\n## JSON configuration for the DAG run\n\nOperators and sensors use parameters from the [config](https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html?highlight=config#Named%20Arguments_repeat21) JSON submitted when triggering the DAG; for example:\n\n\n``` yaml\n{\n    \"training_data\": \"s3-presigned-url-or-local-path-to-training-data\",\n    \"project_name\": \"Project created from Airflow\",\n    \"autopilot_settings\": {\n        \"target\": \"readmitted\"\n    },\n    \"deployment_label\": \"Deployment created from Airflow\",\n    \"score_settings\": {\n        \"intake_settings\": {\n            \"type\": \"s3\",\n            \"url\": \"s3://path/to/scoring-data/Diabetes10k.csv\",\n            \"credential_id\": \"\u003ccredential_id\u003e\"\n        },\n        \"output_settings\": {\n            \"type\": \"s3\",\n            \"url\": \"s3://path/to/results-dir/Diabetes10k_predictions.csv\",\n            \"credential_id\": \"\u003ccredential_id\u003e\"\n        }\n    }\n}\n```\n\n\nThese config values are accessible in the `execute()` method of any operator in the DAG\nthrough the `context[\"params\"]` variable; for example, to get training data, you could use the following:\n\n``` py\ndef execute(self, context: Context) -\u003e str:\n    ...\n    training_data = context[\"params\"][\"training_data\"]\n    ...\n```\n\n## Development\n### Pre-requisites\n- [Docker Desktop 1.13.1 or later](https://docs.docker.com/desktop/)\n- [Astronomer CLI 1.30.0 or later](https://github.com/astronomer/astro-cli?tab=readme-ov-file#install-the-astro-cli)\n- [Pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#installation) or [virtualenv](https://virtualenv.pypa.io/en/latest/)\n\n### Environment Setup\nIt is useful to have a simple airflow testing environment and a local development environment for the\noperators and DAGs. The following steps will construct the two environments needed for development.\n1. Clone the `airflow-provider-datarobot` repository\n    ```bash\n        cd ~/workspace\n        git clone git@github.com:datarobot/airflow-provider-datarobot.git\n        cd airflow-provider-datarobot\n    ```\n2. Create a virtual environment and install the dependencies\n    ```bash\n        pyenv virtualenv 3.12 airflow-provider-datarobot\n        pyenv local airflow-provider-datarobot\n        make req-dev\n        pre-commit install\n    ```\n\n### Astro Setup\n1. (OPTIONAL) Install astro with the following command or manually from the links above:\n    ```bash\n        make install-astro\n    ```\n2. Build an astro development environment with the following command:\n    ```bash\n        make create-astro-dev\n    ```\n3. A new `./astro-dev` folder will be constructed for you to use as a development and test environment.\n4. Compile and run airflow on the development package with:\n    ```bash\n        make build-astro-dev\n    ```\n\n_Note: All credentials and logins will be printed in the terminal after running\nthe `build-astro-dev` command._\n\n### Updating Operators in the Dev Environment\n- Test, compile, and run new or updated operators on the development package with:\n    ```bash\n        make build-astro-dev\n    ```\n- Manually start the airflow dev environment without rebuilding the package with:\n    ```bash\n        make start-astro-dev\n    ```\n- Manually stop the airflow dev environment without rebuilding the package with:\n    ```bash\n        make stop-astro-dev\n    ```\n- If there are problems with the airflow environment you can reset it to a clean state with:\n    ```bash\n        make clean-astro-dev\n    ```\n\n\n### Release Process\nFor `mainline` releases, the following steps should be followed:\n- Determine the next version of the package (example: 1.0.2). Version should not include a `v` prefix.\n- Determine the SHA hash of the commit that will be the release.\n  - See: https://github.com/datarobot/airflow-provider-datarobot/commits/main/\n- Connect to `harness`.\n- Run the `create-release-pr` pipeline with the SHA hash and version as parameters.\n- Review and approve the release PR on GitHub.\n  - Changes or comments can be added to the PR.\n  - The PR will automatically request review once checks pass.\n- Merge the PR and use the resulting SHA hash from merge to main in the next step (different SHA from previous step)\n- Run the `create-release-tag` pipeline with the SHA hash and version as parameters.\n- Run the `release-pypi` pipeline with the input set as `Git Tag` and the `Tag Name` as the version (tags are generated with a `v` prefix, example v1.0.2).\n\nFor `early-access` releases, run the `release-early-access-pypi` pipeline. There are no PRs or tags for early-access releases. The early access version is also automatically released each Tuesday.\n\n\n## Issues\n\nPlease submit [issues](https://github.com/datarobot/airflow-provider-datarobot/issues) and [pull requests](https://github.com/datarobot/airflow-provider-datarobot/pulls) in our official repo:\n[https://github.com/datarobot/airflow-provider-datarobot](https://github.com/datarobot/airflow-provider-datarobot)\n\nWe are happy to hear from you. Please email any feedback to the authors at [support@datarobot.com](mailto:support@datarobot.com).\n\n\n# Copyright Notice\n\nCopyright 2023 DataRobot, Inc. and its affiliates.\n\nAll rights reserved.\n\nThis is proprietary source code of DataRobot, Inc. and its affiliates.\n\nReleased under the terms of DataRobot Tool and Utility Agreement.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatarobot%2Fairflow-provider-datarobot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatarobot%2Fairflow-provider-datarobot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatarobot%2Fairflow-provider-datarobot/lists"}