https://github.com/rudderlabs/rudder-airflow-provider
Rudderstack provider for Apache Airflow
https://github.com/rudderlabs/rudder-airflow-provider
airflow dag rudderstack scheduler
Last synced: 2 months ago
JSON representation
Rudderstack provider for Apache Airflow
- Host: GitHub
- URL: https://github.com/rudderlabs/rudder-airflow-provider
- Owner: rudderlabs
- License: mit
- Created: 2021-11-22T10:53:48.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-12-08T12:59:10.000Z (5 months ago)
- Last Synced: 2026-01-29T21:35:25.163Z (3 months ago)
- Topics: airflow, dag, rudderstack, scheduler
- Language: Python
- Homepage:
- Size: 134 KB
- Stars: 1
- Watchers: 8
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: CODEOWNERS
Awesome Lists containing this project
README
The Customer Data Platform for Developers
---
# RudderStack Airflow Provider
The [RudderStack](https://rudderstack.com) Airflow Provider lets you programmatically schedule and trigger your [Reverse ETL](https://www.rudderstack.com/docs/reverse-etl) syncs and [Profiles](https://www.rudderstack.com/docs/profiles/overview/) runs outside RudderStack and integrate them with your existing Airflow workflows.
Refer to [orchestration docs](https://www.rudderstack.com/docs/data-pipelines/orchestration/airflow/).
## Installation
```bash
pip install rudderstack-airflow-provider
```
## Usage
### RudderstackRETLOperator
> [!NOTE]
> Use [RudderstackRETLOperator](#rudderstackretloperator) for reverse ETL connections
A simple DAG for triggering syncs for a RudderStack Reverse ETL source:
```python
with DAG(
"rudderstack-retl-sample",
default_args=default_args,
description="A simple tutorial DAG for reverse etl",
schedule_interval=timedelta(days=1),
start_date=datetime(2021, 1, 1),
catchup=False,
tags=["rs-retl"],
) as dag:
# retl_connection_id, sync_type are template fields
rs_operator = RudderstackRETLOperator(
retl_connection_id="connection_id",
task_id="",
connection_id=""
)
```
For the complete code, refer to this [example](https://github.com/rudderlabs/rudder-airflow-provider/tree/main/examples).
Mandatatory parameters for RudderstackRETLOperator:
* retl_connection_id: This is the [connection id](https://www.rudderstack.com/docs/data-pipelines/orchestration/airflow/#where-can-i-find-the-connection-id-for-my-reverse-etl-connection) for the sync job.
* connection_id: The Airflow connection to use for connecting to the Rudderstack API. Default value is `rudderstack_default`.
RudderstackRETLOperator exposes other configurable parameters as well. Mostly default values for them would be recommended.
* request_max_retries: The maximum number of times requests to the RudderStack API should be retried before failng.
* request_retry_delay: Time (in seconds) to wait between each request retry.
* request_timeout: Time (in seconds) after which the requests to RudderStack are declared timed out.
* poll_interval: Time (in seconds) for polling status of triggered job.
* poll_timeout: Time (in seconds) after which the polling for a triggered job is declared timed out.
* wait_for_completion: Boolean if execution run should poll and wait till completion of sync. Default value is True.
* sync_type: Type of sync to trigger `incremental` or `full`. Default is None as RudderStack will be deteriming sync type.
### RudderstackProfilesOperator
RudderstackProfilesOperator can be used to trigger profiles run. A simple DAG for triggering profile runs for a profiles project.
```python
with DAG(
"rudderstack-profiles-sample",
default_args=default_args,
description="A simple tutorial DAG for profiles run.",
schedule_interval=timedelta(days=1),
start_date=datetime(2021, 1, 1),
catchup=False,
tags=["rs-profiles"],
) as dag:
# profile_id is template field
rs_operator = RudderstackProfilesOperator(
profile_id="",
task_id="