Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/jgoerner/beyond-jupyter

๐Ÿ๐Ÿ’ป๐Ÿ“Š All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

airflow apache apistar data-science docker docker-compose jupyter jupyter-notebook minio postgres superset

Last synced: 01 Jul 2024

https://github.com/ploomber/soopervisor

โ˜๏ธ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.

airflow argo argo-workflows aws data-science kubeflow kubeflow-pipelines kubernetes machine-learning slurm workflow

Last synced: 29 Jun 2024

https://github.com/Redactics/http-nas

File streaming service designed for Kubernetes to provide ReadWriteMany storage support

airflow aws azure google-cloud kubernetes

Last synced: 21 Jun 2024

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 17 Jun 2024

https://github.com/camposvinicius/aws-etl

This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations.

airflow argocd athena aws catalog data data-engineer database emr emr-cluster etl glue kubernetes pipeline postgres pyspark rds spark

Last synced: 16 Jun 2024

https://github.com/hankehly/deploy-airflow-on-ecs-fargate

An example of how to deploy Apache Airflow on Amazon ECS Fargate

airflow aws deploy docker ecs fargate python terraform

Last synced: 15 Jun 2024

https://github.com/pipeline-tools/gusty

Making DAG construction easier

airflow data-etl data-pipeline

Last synced: 15 Jun 2024

https://github.com/qubole/afctl

afctl helps to manage and deploy Apache Airflow projects faster and smoother.

airflow cli deployment docker management

Last synced: 15 Jun 2024

https://github.com/godatadriven/whirl

Fast iterative local development and testing of Apache Airflow workflows

airflow docker local-development testing

Last synced: 15 Jun 2024

https://github.com/andreax79/airflow-code-editor

A plugin for Apache Airflow that allows you to edit DAGs in browser

airflow airflow-plugin apache-airflow python

Last synced: 15 Jun 2024

https://github.com/ms32035/airflow-dag-dependencies

Visualize dependencies between Airflow DAGs

airflow

Last synced: 15 Jun 2024

https://github.com/ajbosco/dag-factory

Dynamically generate Apache Airflow DAGs from YAML configuration files

airflow apache-airflow dags python

Last synced: 15 Jun 2024

https://github.com/ryanchao2012/airfly

Auto Generate Airflow's dag.py On The Fly

airflow airfly ast automation codegen dag-automation gutt python

Last synced: 15 Jun 2024

https://github.com/teamclairvoyant/airflow-maintenance-dags

A series of DAGs/Workflows to help maintain the operation of Airflow

airflow airflow-maintenance-dags apache-airflow cleanup dag maintenance workflow

Last synced: 15 Jun 2024

https://github.com/michaelosthege/fairflow

Functional Airflow DAG definitions.

airflow apache-airflow

Last synced: 15 Jun 2024

https://github.com/Tauffer-Consulting/domino

User friendly and open source platform for workflow creation and monitoring

ai airflow containers data gui kubernetes open-source python workflows

Last synced: 15 Jun 2024

https://github.com/xnuinside/airflow-helper

Airflow Helper is a tool that currently allows setting up Airflow Variables, Connections, and Pools from a YAML configuration file. Support yaml inheritance & can obtain all settings from existed Airflow Server!

airflow airflow-toolkit airflow-tools apache-airflow cli command-line command-line-tool python

Last synced: 15 Jun 2024

https://github.com/stwind/airflow-on-kubernetes

Bare minimal Airflow on Kubernetes (Local, EKS, AKS)

airflow aks aws azure eks kubernetes

Last synced: 15 Jun 2024

https://github.com/noelmcloughlin/airflow-component

Lightweight IaC Installer of Federated Apache-Airflow

airflow celery-workers federated rabbitmq-cluster salt

Last synced: 15 Jun 2024

https://github.com/dsaidgovsg/airflow-pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

airflow docker hadoop spark

Last synced: 15 Jun 2024

https://github.com/villasv/aws-airflow-stack

Turbine: the bare metals that gets you Airflow

airflow airflow-cluster airflow-cookbook aws aws-cloudformation

Last synced: 15 Jun 2024

https://github.com/bahchis/airflow-cookbook

Airflow workflow management platform chef cookbook.

airflow airflow-cookbook chef-cookbook

Last synced: 15 Jun 2024

https://github.com/garystafford/tickit-data-lake-demo

Resources for video demonstrations and blog posts related to DataOps on AWS

airflow aws data-lake dataops devops redshift

Last synced: 13 Jun 2024

https://github.com/Speccy-Rom/SpeccyTV

โ€‹๐Ÿ’ปโ€‹๐Ÿ‘จโ€๐Ÿ’ปโ€‹โŒจ๏ธ๏ธโ€‹ Streaming service with ETL on steroids

airflow clickhouse django docker docker-compose elasticsearch fastapi ffmpeg flask kafka microservice mongodb nginx postgresql pytest python uvicorn

Last synced: 13 Jun 2024

https://github.com/andresionek91/airflow-autoscaling-ecs

Airflow Deployment on AWS ECS Fargate Using Cloudformation

airflow airflow-autoscaling-ecs airflow-deployment airflow-ecs data-engineering

Last synced: 12 Jun 2024

https://github.com/jghoman/awesome-apache-airflow

Curated list of resources about Apache Airflow

airflow apache-airflow awesome awesome-list workflow-management

Last synced: 12 Jun 2024

https://github.com/blockchain-etl/polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon

Last synced: 11 Jun 2024

https://github.com/cubefs/compass

Compass is a task diagnosis platform for bigdata

airflow bigdata diagnose dolphinscheduler flink hadoop mapreduce scheduler spark sql

Last synced: 07 Jun 2024

https://github.com/airflow-helm/charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.

airflow chart charts helm helm-chart helm-charts k8s kubernetes

Last synced: 06 Jun 2024

https://github.com/raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows

Last synced: 01 Jun 2024

https://github.com/pytorch/torchx

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.

airflow aws-batch components deep-learning distributed-training kubernetes machine-learning pipelines python pytorch ray slurm

Last synced: 01 Jun 2024

https://github.com/gocardless/airflow-dbt

Apache Airflow integration for dbt

airflow airflow-dbt dbt

Last synced: 27 May 2024

https://github.com/AuFeld/Data_Engineering_Projects

A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs

airflow aws cassandra data-engineering data-lake data-warehouse docker emr etl-pipeline infrastructure-as-code infrastructure-setup postgresql python redshift s3 spark

Last synced: 27 May 2024

https://github.com/angelotc/MacroDAG

A Dockerized Airflow ETL pipeline that processes macroeconomic indicators from the Federal Reserve.

airflow docker spark

Last synced: 26 May 2024

https://github.com/WeBankFinTech/DataSphereStudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

airflow atlas azkaban dataworks davinci dolphinscheduler flink governance griffin hadoop hive hue kettle linkis spark supperset tableau visualis workflow zeppelin

Last synced: 16 May 2024

https://github.com/apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

airflow azkaban cloud-native data-pipelines job-scheduler orchestration powerful-data-pipelines task-scheduler workflow workflow-orchestration workflow-schedule

Last synced: 14 May 2024

https://github.com/teamclairvoyant/airflow-rest-api-plugin

A plugin for Apache Airflow that exposes rest end points for the Command Line Interfaces

airflow airflow-plugin airflow-webserver apache-airflow plugin rest-api

Last synced: 13 May 2024

https://github.com/domenp/aircal

Visualize Airflow's schedule by exporting future DAG runs as events to Google Calendar.

airflow calendar dag google-calendar schedule

Last synced: 13 May 2024

https://github.com/Anant/Cassandra.Lunch

Resources from weekly Zoom lunches revolving around Apache Cassandra and Apache Cassandra-related topics. Hosted by Anant Corporation.

airflow akka astra cassandra datastax elk kafka nosql scylladb spark

Last synced: 30 Apr 2024

https://github.com/fieldryand/goflow

Simple but powerful DAG scheduler and dashboard

airflow dashboard directed-acyclic-graph go job-scheduler schedule workflow-engine

Last synced: 29 Apr 2024

https://github.com/slve/dbt-github-workflow

dbt-github-workflow is a boilerplate that contains all the necessary configurations to set up a simple CI/CD pipeline for your data modelling stack, making your life simpler by adding back a few extra working hours / out of hours

airflow composer dbt github-actions

Last synced: 23 Apr 2024

https://github.com/ankurchavda/streamify

A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

airflow data-engineering dbt gcp kafka python spark

Last synced: 22 Apr 2024

https://github.com/huseinzol05/Gather-Deployment

Gathers Python deployment, infrastructure and practices.

airflow docker docker-compose kafka pyflink pyspark python tensorflow

Last synced: 15 Apr 2024

https://github.com/iusztinpaul/energy-forecasting

๐ŸŒ€ ๐—ง๐—ต๐—ฒ ๐—™๐˜‚๐—น๐—น ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ ๐Ÿณ-๐—ฆ๐˜๐—ฒ๐—ฝ๐˜€ ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ ๐—™๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ | ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป ๐— ๐—Ÿ๐—˜ & ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ for free by designing, building and deploying an end-to-end ML batch system ~ ๐˜ด๐˜ฐ๐˜ถ๐˜ณ๐˜ค๐˜ฆ ๐˜ค๐˜ฐ๐˜ฅ๐˜ฆ + 2.5 ๐˜ฉ๐˜ฐ๐˜ถ๐˜ณ๐˜ด ๐˜ฐ๐˜ง ๐˜ณ๐˜ฆ๐˜ข๐˜ฅ๐˜ช๐˜ฏ๐˜จ & ๐˜ท๐˜ช๐˜ฅ๐˜ฆ๐˜ฐ ๐˜ฎ๐˜ข๐˜ต๐˜ฆ๐˜ณ๐˜ช๐˜ข๐˜ญ๐˜ด

3-pipeline-design airflow batch-processing cicd data-versioning docker fastapi feature-store gcp github-actions great-expectations hopsworks ml-monitoring mlops model-registry poetry python sktime streamlit weights-and-biases

Last synced: 14 Apr 2024

https://github.com/bryzgaloff/airflow-clickhouse-plugin

The most popular ClickHouse plugin for Airflow. ๐Ÿ” Top-1% downloads on PyPI: https://pypi.org/project/airflow-clickhouse-plugin! Based on mymarilyn/clickhouse-driver.

airflow clickhouse python python3

Last synced: 13 Apr 2024

https://github.com/google/starthinker

Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."

airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows

Last synced: 03 Apr 2024

https://github.com/scribd/objinsync

Continuously synchronize directories from remote object store to local filesystem

airflow cplat s3

Last synced: 02 Apr 2024

https://github.com/chandulal/airflow-testing

Airflow Unit Tests and Integration Tests

airflow airflow-dags airflow-testing testing

Last synced: 31 Mar 2024

https://github.com/difference-engine/docker-airflow-conda-ml

A docker setup for ML pipelines

airflow conda docker ml

Last synced: 30 Mar 2024

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 24 Mar 2024

https://github.com/ucbrise/flor

๐ŸŒป FlorFlow: Flor, now with Dataflow

airflow build dag deep-learning flor hindsight logger logging machine-learning ml pytorch tensorboard vldb

Last synced: 23 Mar 2024

https://github.com/gdoumenc/coworks

CoWorks is a unified compositional serverless microservices framework over AWS, Flask and Airflow technologies.

airflow aws-lambda flask microservice python3 serverless serverless-framework

Last synced: 19 Mar 2024

https://github.com/mpolatcan/airflow-docker

Scalable Airflow Docker image that works Docker and Kubernetes

airflow apache celery containers cron docker docker-image kubernetes scalable scheduler workflow

Last synced: 18 Mar 2024

https://github.com/bhavaniravi/airflow-kube-setup

How to deploy airflow on Kubernetes

airflow docker kubernetes

Last synced: 18 Mar 2024

https://github.com/ris-tlp/audiophile-e2e-pipeline

Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.

airflow aws data-engineering metabase python terraform

Last synced: 18 Mar 2024

https://github.com/jgoerner/data-science-stack-cookiecutter

๐Ÿณ๐Ÿ“Š๐Ÿค“Cookiecutter template to launch an awesome dockerized Data Science toolstack (incl. Jupyster, Superset, Postgres, Minio, AirFlow & API Star)

airflow apistar cookiecutter data-science docker docker-image jupyter minio postgres python superset

Last synced: 14 Mar 2024