{"id":19491726,"url":"https://github.com/rudderlabs/rudder-airflow-provider","last_synced_at":"2026-02-17T17:31:04.288Z","repository":{"id":39583617,"uuid":"430668842","full_name":"rudderlabs/rudder-airflow-provider","owner":"rudderlabs","description":"Rudderstack provider for Apache Airflow","archived":false,"fork":false,"pushed_at":"2025-12-08T12:59:10.000Z","size":137,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-01-29T21:35:25.163Z","etag":null,"topics":["airflow","dag","rudderstack","scheduler"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rudderlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-11-22T10:53:48.000Z","updated_at":"2025-12-08T08:44:20.000Z","dependencies_parsed_at":"2025-10-07T02:55:22.991Z","dependency_job_id":"cc99f935-c0ca-477a-b895-b846f1cf17df","html_url":"https://github.com/rudderlabs/rudder-airflow-provider","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":"rudderlabs/rudder-repo-template","purl":"pkg:github/rudderlabs/rudder-airflow-provider","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rudderlabs%2Frudder-airflow-provider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rudderlabs%2Frudder-airflow-provider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rudderlabs%2Frudder-airflow-provider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rudderlabs%2Frudder-airflow-provider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rudderlabs","download_url":"https://codeload.github.com/rudderlabs/rudder-airflow-provider/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rudderlabs%2Frudder-airflow-provider/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29551257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T14:33:00.708Z","status":"ssl_error","status_checked_at":"2026-02-17T14:32:58.657Z","response_time":100,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","dag","rudderstack","scheduler"],"created_at":"2024-11-10T21:17:58.211Z","updated_at":"2026-02-17T17:31:04.243Z","avatar_url":"https://github.com/rudderlabs.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://rudderstack.com/\"\u003e\n    \u003cimg src=\"https://user-images.githubusercontent.com/59817155/121357083-1c571300-c94f-11eb-8cc7-ce6df13855c9.png\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cb\u003eThe Customer Data Platform for Developers\u003c/b\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cb\u003e\n    \u003ca href=\"https://rudderstack.com\"\u003eWebsite\u003c/a\u003e\n    ·\n    \u003ca href=\"https://rudderstack.com/join-rudderstack-slack-community\"\u003eSlack Community\u003c/a\u003e\n  \u003c/b\u003e\n\u003c/p\u003e\n\n---\n\n# RudderStack Airflow Provider\n\nThe [RudderStack](https://rudderstack.com) Airflow Provider lets you programmatically schedule and trigger your [Reverse ETL](https://www.rudderstack.com/docs/reverse-etl) syncs and [Profiles](https://www.rudderstack.com/docs/profiles/overview/) runs outside RudderStack and integrate them with your existing Airflow workflows.\nRefer to [orchestration docs](https://www.rudderstack.com/docs/data-pipelines/orchestration/airflow/).\n\n\n## Installation\n\n```bash\npip install rudderstack-airflow-provider\n```\n\n## Usage\n\n### RudderstackRETLOperator\n\n\u003e [!NOTE]  \n\u003e Use [RudderstackRETLOperator](#rudderstackretloperator) for reverse ETL connections\n\nA simple DAG for triggering syncs for a RudderStack Reverse ETL source:\n\n```python\nwith DAG(\n    \"rudderstack-retl-sample\",\n    default_args=default_args,\n    description=\"A simple tutorial DAG for reverse etl\",\n    schedule_interval=timedelta(days=1),\n    start_date=datetime(2021, 1, 1),\n    catchup=False,\n    tags=[\"rs-retl\"],\n) as dag:\n    # retl_connection_id, sync_type are template fields\n    rs_operator = RudderstackRETLOperator(\n        retl_connection_id=\"connection_id\",\n        task_id=\"\u003ca unique, meaningful id for the airflow task\u003e\",\n        connection_id=\"\u003crudderstack api airflow connection id\u003e\"\n    )\n```\n\nFor the complete code, refer to this [example](https://github.com/rudderlabs/rudder-airflow-provider/tree/main/examples).\n\nMandatatory parameters for RudderstackRETLOperator:\n* retl_connection_id: This is the [connection id](https://www.rudderstack.com/docs/data-pipelines/orchestration/airflow/#where-can-i-find-the-connection-id-for-my-reverse-etl-connection) for the sync job.\n* connection_id: The Airflow connection to use for connecting to the Rudderstack API.\tDefault value is `rudderstack_default`.\n\n\nRudderstackRETLOperator exposes other configurable parameters as well. Mostly default values for them would be recommended.\n\n* request_max_retries: The maximum number of times requests to the RudderStack API should be retried before failng.\n* request_retry_delay: Time (in seconds) to wait between each request retry.\n* request_timeout: Time (in seconds) after which the requests to RudderStack are declared timed out.\n* poll_interval: Time (in seconds) for polling status of triggered job.\n* poll_timeout: Time (in seconds) after which the polling for a triggered job is declared timed out.\n* wait_for_completion: Boolean if execution run should poll and wait till completion of sync. Default value is True.\n* sync_type: Type of sync to trigger `incremental` or `full`. Default is None as RudderStack will be deteriming sync type.\n\n\n### RudderstackProfilesOperator\n\nRudderstackProfilesOperator can be used to trigger profiles run. A simple DAG for triggering profile runs for a profiles project.\n\n```python\nwith DAG(\n    \"rudderstack-profiles-sample\",\n    default_args=default_args,\n    description=\"A simple tutorial DAG for profiles run.\",\n    schedule_interval=timedelta(days=1),\n    start_date=datetime(2021, 1, 1),\n    catchup=False,\n    tags=[\"rs-profiles\"],\n) as dag:\n    # profile_id is template field\n    rs_operator = RudderstackProfilesOperator(\n        profile_id=\"\u003cprofile_id\u003e\",\n        task_id=\"\u003ca unique, meaningful id for the airflow task\",\n        connection_id=\"\u003crudderstack api connection id\u003e\",\n    )\n```\n\nMandatatory parameters for RudderstackProfilesOperator:\n* profile_id: This is the [profiles id](https://www.rudderstack.com/docs/data-pipelines/orchestration/airflow/#where-can-i-find-my-profiles-project-id) for the profiles project to run.\n* connection_id: The Airflow connection to use for connecting to the Rudderstack API.\tDefault value is `rudderstack_default`.\n\nRudderstackProfilesOperator exposes other configurable parameters as well. Mostly default values for them would be recommended.\n\n* request_max_retries: The maximum number of times requests to the RudderStack API should be retried before failng.\n* request_retry_delay: Time (in seconds) to wait between each request retry.\n* request_timeout: Time (in seconds) after which the requests to RudderStack are declared timed out.\n* poll_interval: Time (in seconds) for polling status of triggered job.\n* poll_timeout: Time (in seconds) after which the polling for a triggered job is declared timed out.\n* wait_for_completion: Boolean if execution run should poll and wait till completion of sync. Default value is True.\n* parameters: Additional parameters to pass to the profiles run command, as supported by the API endpoint. Default value is `None`.\n\n\n### RudderstackETLOperator\n\nRudderstackETLOperator can be used to trigger ETL sync runs. A simple DAG for triggering ETL sync.\n\n```python\nwith DAG(\n    \"rudderstack-etl-sample\",\n    default_args=default_args,\n    description=\"A simple tutorial DAG for etl sync.\",\n    schedule_interval=timedelta(days=1),\n    start_date=datetime(2021, 1, 1),\n    catchup=False,\n    tags=[\"rs-etl\"],\n) as dag:\n    # etl_source_id is template field\n    rs_operator = RudderstackProfilesOperator(\n        etl_source_id=\"\u003cetl_source_id\u003e\",\n        task_id=\"\u003ca unique, meaningful id for the airflow task\",\n        connection_id=\"\u003crudderstack api connection id\u003e\",\n    )\n```\n\nMandatatory parameters for RudderstackETLOperator:\n* etl_source_id: This is the [source id](TBD) for the ETL source.\n* connection_id: The Airflow connection to use for connecting to the Rudderstack API.\tDefault value is `rudderstack_default`.\n\nRudderstackETLOperator exposes other configurable parameters as well. Mostly default values for them would be recommended.\n\n* request_max_retries: The maximum number of times requests to the RudderStack API should be retried before failng.\n* request_retry_delay: Time (in seconds) to wait between each request retry.\n* request_timeout: Time (in seconds) after which the requests to RudderStack are declared timed out.\n* poll_interval: Time (in seconds) for polling status of triggered job.\n* poll_timeout: Time (in seconds) after which the polling for a triggered job is declared timed out.\n* wait_for_completion: Boolean if execution run should poll and wait till completion of sync. Default value is True.\n\n\n## Contribute\n\nWe would love to see you contribute to this project. Get more information on how to contribute [here](CONTRIBUTING.md).\n\n## License\n\nThe RudderStack Airflow Provider is released under the [MIT License](LICENSE).\n\n## Contact Us\n\nFor more information or queries on this feature, you can [contact us](mailto:%20docs@rudderstack.com) or start a conversation in our [Slack](https://rudderstack.com/join-rudderstack-slack-community) community.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frudderlabs%2Frudder-airflow-provider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frudderlabs%2Frudder-airflow-provider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frudderlabs%2Frudder-airflow-provider/lists"}