An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with apache-airflow

A curated list of projects in awesome lists tagged with apache-airflow .

https://github.com/teamclairvoyant/airflow-maintenance-dags

A series of DAGs/Workflows to help maintain the operation of Airflow

airflow airflow-maintenance-dags apache-airflow cleanup dag maintenance workflow

Last synced: 15 May 2025

https://github.com/astronomer/dag-factory

Construct Apache Airflow DAGs Declaratively via YAML configuration files

airflow apache-airflow dags python

Last synced: 13 May 2025

https://github.com/astronomer/astronomer-cosmos

Run your dbt Core or dbt Fusion projects as Apache Airflow DAGs and Task Groups with a few lines of code

airflow airflow-operators apache-airflow dbt python workflow

Last synced: 29 Jan 2026

https://github.com/couler-proj/couler

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

apache-airflow argo-workflows cloud-native distributed-computing kubeflow kubernetes machine-learning python scheduler tekton-pipelines unified-api unified-interface workflow-automation workflow-engine workflow-management

Last synced: 14 Jan 2026

https://github.com/tuanavu/airflow-tutorial

Apache Airflow tutorial

apache-airflow python

Last synced: 14 Apr 2025

https://github.com/astronomer/astronomer

Helm Charts for the Astronomer Platform, Apache Airflow as a Service on Kubernetes

apache-airflow astronomer-platform astronomer-software docker kubernetes

Last synced: 02 Feb 2026

https://github.com/andreax79/airflow-code-editor

A plugin for Apache Airflow that allows you to edit DAGs in browser

airflow airflow-plugin apache-airflow python

Last synced: 03 Apr 2026

https://github.com/astronomer/astro-cli

CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer

apache-airflow astro-private-cloud astronomer-platform astronomer-software kubernetes

Last synced: 08 May 2026

https://github.com/blockchain-etl/ethereum-etl-airflow

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee

apache-airflow blockchain-analytics crypto cryptocurrency data-analytics data-engineering ethereum etl gcp google-cloud google-cloud-platform on-chain-analysis web3

Last synced: 24 Jun 2025

https://github.com/apache/airflow-client-python

Apache Airflow - OpenApi Client for Python

airflow apache apache-airflow apache-airflow-client python

Last synced: 14 May 2025

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 13 Apr 2025

https://github.com/teamclairvoyant/airflow-rest-api-plugin

A plugin for Apache Airflow that exposes rest end points for the Command Line Interfaces

airflow airflow-plugin airflow-webserver apache-airflow plugin rest-api

Last synced: 13 Jul 2025

https://github.com/astronomer/airflow-chart

A Helm chart to install Apache Airflow on Kubernetes

airflow apache-airflow astro-private-cloud astronomer-software helm-chart kubernetes

Last synced: 21 Jan 2026

https://github.com/airscholar/e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

apache-airflow apache-kafka apache-spark apache-zookeeper big-data cassandra containerization data-engineering data-pipeline data-processing data-storage docker etl-pipeline postgresql real-time-analytics

Last synced: 16 May 2025

https://github.com/astronomer/agents

AI agent tooling for data engineering workflows.

agents ai airflow apache-airflow claude cursor data-engineering mcp skills

Last synced: 27 Feb 2026

https://github.com/kaxil/airflowctl

A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects

airflow apache-airflow cli dags python

Last synced: 08 Apr 2025

https://github.com/apache/airflow-client-go

Apache Airflow - OpenApi Client for Go

airflow apache apache-airflow apache-airflow-client go

Last synced: 06 Apr 2025

https://github.com/rolanddb/airflow-on-kubernetes

A guide to running Airflow on Kubernetes

apache-airflow kubernetes

Last synced: 27 Mar 2025

https://github.com/astronomer/astronomer-providers

Airflow Providers containing Deferrable Operators & Sensors from Astronomer

airflow airflow-operators airflow-providers apache-airflow python workflow

Last synced: 04 Apr 2025

https://github.com/airscholar/redditdataengineering

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

apache-airflow aws celery data-pipeline end-to-end-data-engineering reddit

Last synced: 03 Sep 2025

https://github.com/ninja-van/airflow-boilerplate

A complete development environment setup for working with Airflow

airflow apache-airflow boilerplate pycharm python

Last synced: 12 Jul 2025

https://github.com/datacamp/viewflow

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

airflow apache-airflow data-engineering data-science packages python workflow

Last synced: 20 Aug 2025

https://github.com/wittline/uber-expenses-tracking

The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.

airflow-docker apache-airflow aws aws-redshift data-engineering data-modeling etl-pipeline expenses-dashboard expenses-tracker power-bi python uber uber-data uber-eats

Last synced: 13 Apr 2025

https://github.com/powerdatahub/terraform-aws-airflow

Terraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker with CeleryExecutor

airflow apache-airflow aws celery hacktoberfest terraform terraform-module terraform-modules

Last synced: 23 Jul 2025

https://github.com/idealista/airflow-role

Ansible role to install Apache Airflow

airflow airflow-role ansible ansible-role apache-airflow debian

Last synced: 25 Oct 2025

https://github.com/wordpress/openverse-catalog

Identifies and collects data on cc-licensed content across web crawl data and public apis.

airflow apache-airflow creative-commons hacktoberfest openverse pytest python search-engine spark

Last synced: 29 Sep 2025

https://github.com/aws-ia/terraform-aws-mwaa

Terraform module for Amazon MWAA(Apache Airflow)

airflow apache-airflow aws aws-mwaa

Last synced: 14 Apr 2025

https://github.com/kadnan/airflow-scraping

Using Apache Airflow to schedule web scrapers

airflow apache-airflow python scheduled-tasks scrapers

Last synced: 15 May 2025

https://github.com/airscholar/sparkingflow

This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.

apache-airflow dataengineering docker java pyspark scala spark

Last synced: 10 Apr 2025

https://github.com/mr-xn/cve-2022-40127

Apache Airflow < 2.4.0 DAG example_bash_operator RCE POC

apache-airflow cve poc rce

Last synced: 22 Mar 2025

https://github.com/garystafford/aws-airflow-demo

Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.

airflow amazon-emr amazon-mwaa apache-airflow aws pyspark-applications

Last synced: 27 Oct 2025

https://github.com/michaelosthege/fairflow

Functional Airflow DAG definitions.

airflow apache-airflow

Last synced: 11 Apr 2025

https://github.com/kadnan/airflow-tutorial

Basic tutorial of using Apache Airflow

airflow apache-airflow python python-3

Last synced: 15 May 2025

https://github.com/sergio11/lyric_wave_architecture

🎵 LyricWave – AI Music Composer (Proof of Concept) 🎶 A personal project exploring automatic generation of unique MP4 songs. LyricWave blends lyrics with AI-generated melodies and synthetic vocals to experiment with new forms of musical expression. A creative testbed to push your ideas into sound. 🚀🎧

airflow airflow-dags airflow-docker airflow-operators apache-airflow audiocraft celery docker docker-compose flask flask-api huggingface huggingface-transformers mongodb music-generation music-processing musicgen pytorch stable-diffusion suno-ai

Last synced: 13 Aug 2025

https://github.com/elyra-ai/pipeline-editor

Common pipeline-editor components used in different clients (e.g. Elyra application, Web browser extensions, etc)

ai airflow apache-airflow kubeflow-pipelines machine-learning pipeline pipeline-editor

Last synced: 10 Oct 2025

https://github.com/doitintl/doit-composer-airflow-training

Getting started with Apache Airflow on Cloud Composer

apache-airflow book cloud-composer google-cloud-platform mini-book

Last synced: 30 Apr 2025

https://github.com/abhishekbhakat/airflow-mcp-server

MCP Server for Apache Airflow

airflow apache-airflow api llm mcp-server

Last synced: 05 Mar 2026

https://github.com/pbwebmedia/airflow-prometheus-exporter

Export Airflow metrics (from mysql) in prometheus format

airflow apache apache-airflow exporter metrics mysql prometheus

Last synced: 21 Aug 2025

https://github.com/unruly/terraform-aws-airflow

Terraform module for a PostgreSQL-backed Apache Airflow instance

airflow apache-airflow terraform terraform-modules

Last synced: 03 Sep 2025

https://github.com/moritzkoerber/covid-19-data-engineering-pipeline

A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

apache-airflow apache-spark api aws aws-cdk aws-cloudformation aws-ecr aws-glue aws-lambda aws-redshift aws-s3 docker great-expectations pyspark spark

Last synced: 28 Apr 2025

https://github.com/astronomer/airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

airflow airflow-operator airflow-provider apache-airflow dag data-orchestration etl python workflow

Last synced: 06 Apr 2025

https://github.com/airscholar/footballdataengineering

An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.

apache-airflow azure-data-factory azure-data-lake-gen2 azure-databricks azure-synapse-analytics data-engineering dataengineering

Last synced: 10 Apr 2025

https://github.com/dain55788/elt-data-pipeline

ELT Data Pipeline implementation in Data Warehousing environment

apache-airflow data-engineering dbt great-expectations postgresql powerbi

Last synced: 17 Jan 2026

https://github.com/alvarocavalcante/airflow-parse-bench

Stop creating bad DAGs! Use this tool to measure and compare the parse time of your DAGs, identify bottlenecks, and optimize your Airflow environment for better performance.

airflow apache-airflow dags data-engineering python python3

Last synced: 11 Sep 2025

https://github.com/behnamyazdan/ecommerce_realtime_data_pipeline

Ecommerce Realtime Data Pipeline (Data Modeling, Workflow Orchestration, Change Data Capture, Analytical Database and Dashboarding)

apache-airflow apache-kafka change-data-capture clickhouse data-pipeline database-modeling debezium docker-compose ecommerce grafana postgresql python realtime-dashboard realtime-streaming

Last synced: 16 Jan 2026

https://github.com/zkan/data-pipelines-with-airflow

Skooldio: Data Pipelines with Airflow

apache-airflow data-engineering data-pipeline

Last synced: 19 Aug 2025

https://github.com/airscholar/kubernetes-for-dataengineering

This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering environment using Kubernetes and Apache Airflow

apache-airflow data-engineering kubernetes

Last synced: 17 Jul 2025

https://github.com/astronomer/astro-provider-ray

This provider contains operators, decorators and triggers to send a ray job from an airflow task

apache-airflow dags mlops-workflow mlpipelines workflow

Last synced: 06 Jul 2025

https://github.com/andreax79/airflow-gitlab-webhook

Commit on Gitlab and run an Apache Airflow DAG

airflow apache-airflow gitlab

Last synced: 13 Apr 2025

https://github.com/archie-cm/ibm-data-engineering-capstone-project

Business challenge that requires building a data platform for retailer data analytics.

apache-airflow apache-spark cognos-analytics db2-warehouse etl-pipeline mongodb mysql postgresql

Last synced: 23 Apr 2025

https://github.com/akarce/e2e-structured-streaming

End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.

airflow apache-airflow apache-kafka apache-spark big-data cassandra docker docker-compose kafka postgresql python spark zookeeper

Last synced: 29 Oct 2025

https://github.com/badal-io/gcp-airflow-foundations

Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse

airflow apache-airflow bigquery dags data-engineering data-pipeline etl-pipeline

Last synced: 24 Mar 2025

https://github.com/airflow-laminar/airflow-supervisor

Airflow utilities for running long-running or always-on jobs with supervisord

airflow apache-airflow process-manager process-monitor python scheduler supervisor supervisord

Last synced: 17 Mar 2026

https://github.com/aymane-maghouti/big-data-project

This project aims to predict smartphone prices using a combination of batch and stream processing techniques in a Big Data environment. The architecture follows the Lambda Architecture pattern, providing both real-time and batch processing capabilities to users.

apache-airflow apache-kafka apache-spark batch-processing big-data-projects hbase hdfs ingestion java lambda-architecture machine-learning postgresql-database powerbi pyspark python spring-boot streaming

Last synced: 29 Oct 2025

https://github.com/pacuna/airflow-docker

Run Apache Airflow using Docker containers

airflow airflow-docker apache-airflow containers docker docker-compose

Last synced: 16 Apr 2025

https://github.com/nathadriele/redshift-to-s3-unload-dag

This Airflow DAG automates the process of extracting data from an Amazon Redshift database and unloading it to Amazon S3 in Parquet format. It runs daily, exporting data from the previous day based on a specified query.

amazon-redshift amazon-s3 apache-airflow dag-scheduling data-export data-migration data-pipeline parquet-format unload-queey unload-query workflow-automation

Last synced: 28 Oct 2025

https://github.com/duyet/airflow-docker-compose

Example how to run Airflow in Docker Compose

airflow apache-airflow docker docker-compose duyetdev

Last synced: 14 Apr 2025

https://github.com/zbrookle/avionix_airflow

Apache Airflow Hosted on a Kubernetes cluster, ready out of the box, with monitoring stack included (Grafana, ElasticSearch, Filebeat)

airflow apache-airflow avionix aws aws-eks chart-builder cluster elasticsearch filebeat grafana helm helm-chart kubernetes kubernetes-executor

Last synced: 31 Oct 2025

https://github.com/nathadriele/dock-financial-data-pipelines

Automated pipeline for generating and processing Dock balance reports using Apache Airflow, SFTP, AWS S3, and Lambda.

airflow-dags apache-airflow automation data-engineering etl-pipeline lambda reports s3-aws sftp

Last synced: 10 Apr 2025

https://github.com/mdzaheerjk/advanced_mlops_project6_australia_weather_rain_predection

🌦️ Australia Weather Rain Prediction with Advanced MLOps 🤖 End-to-end ML pipeline for rain forecasting using Australian weather data 🐳 Dockerized and Kubernetes-ready for scalable deployment 🌐 Flask web app for real-time weather prediction with modular, reproducible code

apache-airflow circleci docker jenkins kubernetes

Last synced: 09 Oct 2025

https://github.com/zbrookle/goflow

A Kubernetes native task manager that functions similarly to Airflow

apache-airflow go golang kubernetes scheduler

Last synced: 09 Sep 2025

https://github.com/narius2030/find-similar-vietnamese-texts

This project build a classification model for topics of news. With the target is automatically recognize suitable topic (class) to a random article. There are two architectures implemented which are LSTM and Hybrid models

apache-airflow data-pipeline nlp-deep-learning tensorflow text-classification text-clustering word-embedding

Last synced: 12 Mar 2026

https://github.com/sergio11/talk_tracer_ai_architecture

TalkTracerAI is an NLP-based meeting analysis tool that transcribes, analyzes, and summarizes conversations, delivering valuable insights and enhancing productivity. 🗣️📊✨

apache-airflow flask googletrans minio nlp python3 transcription

Last synced: 17 Apr 2025

https://github.com/Narius2030/Find-Similar-Vietnamese-Texts

This project build a classification model for topics of news. With the target is automatically recognize suitable topic (class) to a random article. There are two architectures implemented which are LSTM and Hybrid models

apache-airflow data-pipeline nlp-deep-learning tensorflow text-classification text-clustering word-embedding

Last synced: 22 Oct 2025

https://github.com/andreax79/airflow-provider-xlsx

Airflow operators for converting XLSX files from/to Parquet/CSV/JSON

airflow apache-airflow excel parquet

Last synced: 22 Aug 2025

https://github.com/viktorsvertoka/goit-de-hw-07

Home task for Data Engineering course💻

apache-airflow goit goit-de-hw-07 python

Last synced: 09 Apr 2025

https://github.com/korniichuk/workflow

Workflow management platforms comparison

airflow apache-airflow aws aws-step-functions dataops luigi step-functions

Last synced: 15 Apr 2025

https://github.com/michaelosthege/apache-airflow-flowitems

This package helps to reduce the amount of boilerplate code when creating Airflow DAGs from Python callables.

airflow-dags apache-airflow

Last synced: 11 Apr 2025

https://github.com/ren294/smarttraffic_lakehouse_for_hcmc

A Smart Traffic Management System for Ho Chi Minh City, Vietnam leveraging batch and real-time data processing, intuitive dashboards, and monitoring tools to optimize traffic flow, enhance safety, and support sustainable urban mobility through advanced analytics and user-friendly applications.

apache-airflow apache-flink apache-hive apache-hudi apache-kafka apache-nifi apache-spark apache-superset apache-zookeeper big-data debezium grafana lakefs metabase minio promotheus redis seatunnel streamlit trino

Last synced: 11 Apr 2025

https://github.com/sergio11/voice_passport_architecture

VoicePassport 🎤is an innovative authentication system leveraging voice recognition technology, blockchain ⛓️ security, and vector databases 📊 for robust and seamless user verification.

apache-airflow apache-airflow-etl-pipeline blockchain blockchain-technology docker-compose hproxy mongodb qdrant qdrant-client qdrant-vector-database solidity solidity-contracts web3 web3py

Last synced: 13 Aug 2025

https://github.com/casassg/corrent

Corrent: Experimental Airflow functional DAG API

airflow airflow-dags apache-airflow

Last synced: 04 Apr 2025

https://github.com/marcusrehm/airflow-dev-env

A simple container to run Apache Airflow on Windows machines.

airflow apache-airflow docker windows

Last synced: 12 Apr 2025

https://github.com/f-kuzey-edes-huyal/steam-sale-optimizer

An MLOps pipeline for optimizing game discount strategies using Steam reviews, tags, and competitor pricing. Designed for data-driven revenue maximization in the gaming industry.

apache-airflow azure ci-cd evidently game-pricing grafana-dashboard mlflow mlops postgresql steam terraform web-scraping

Last synced: 24 Apr 2026

https://github.com/xnuinside/airflow-helper

Airflow Helper is a tool that currently allows setting up Airflow Variables, Connections, and Pools from a YAML configuration file. Support yaml inheritance & can obtain all settings from existed Airflow Server!

airflow airflow-toolkit airflow-tools apache-airflow cli command-line command-line-tool python

Last synced: 06 Oct 2025

https://github.com/sidiahmedhabib/e2e-data-engineering

This project is an end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using a variety of powerful tools including Apache Airflow, Apache Kafka, Apache Spark and Cassandra. All components are containerized with Docker for easy deployment and scalability.

apache-airflow apache-kafka apache-spark big-data cassandra data-engineering data-streaming

Last synced: 20 Jul 2025

https://github.com/zkan/introduction-to-data-pipelines-and-apache-airflow

Introduction to Data Pipelines and Apache Airflow

apache-airflow data-pipelines

Last synced: 21 Sep 2025