Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jgoerner/beyond-jupyter
๐๐ป๐ All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
airflow apache apistar data-science docker docker-compose jupyter jupyter-notebook minio postgres superset
Last synced: 01 Jul 2024
https://github.com/ploomber/soopervisor
โ๏ธ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.
airflow argo argo-workflows aws data-science kubeflow kubeflow-pipelines kubernetes machine-learning slurm workflow
Last synced: 29 Jun 2024
https://github.com/BasicAirData/AirDataComputer
Air Data Computer
aircraft airflow airspeed-velocity datalogger temperature-sensor
Last synced: 27 Jun 2024
https://argoproj.github.io/argo-workflows/
Workflow Engine for Kubernetes
airflow argo argo-workflows batch-processing cloud-native cncf dag data-engineering gitops hacktoberfest k8s knative kubernetes machine-learning mlops pipelines workflow workflow-engine
Last synced: 22 Jun 2024
https://github.com/Redactics/http-nas
File streaming service designed for Kubernetes to provide ReadWriteMany storage support
airflow aws azure google-cloud kubernetes
Last synced: 21 Jun 2024
https://github.com/dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows
Last synced: 17 Jun 2024
https://github.com/camposvinicius/aws-etl
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations.
airflow argocd athena aws catalog data data-engineer database emr emr-cluster etl glue kubernetes pipeline postgres pyspark rds spark
Last synced: 16 Jun 2024
https://github.com/GoogleCloudPlatform/public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
airflow bigquery cloud-composer cloud-native cloud-storage data-architecture data-engineering data-pipelines datasets google-cloud open-data
Last synced: 15 Jun 2024
https://github.com/pipeline-tools/gusty
Making DAG construction easier
airflow data-etl data-pipeline
Last synced: 15 Jun 2024
https://github.com/qubole/afctl
afctl helps to manage and deploy Apache Airflow projects faster and smoother.
airflow cli deployment docker management
Last synced: 15 Jun 2024
https://github.com/godatadriven/whirl
Fast iterative local development and testing of Apache Airflow workflows
airflow docker local-development testing
Last synced: 15 Jun 2024
https://github.com/andreax79/airflow-code-editor
A plugin for Apache Airflow that allows you to edit DAGs in browser
airflow airflow-plugin apache-airflow python
Last synced: 15 Jun 2024
https://github.com/ms32035/airflow-dag-dependencies
Visualize dependencies between Airflow DAGs
Last synced: 15 Jun 2024
https://github.com/ajbosco/dag-factory
Dynamically generate Apache Airflow DAGs from YAML configuration files
airflow apache-airflow dags python
Last synced: 15 Jun 2024
https://github.com/ryanchao2012/airfly
Auto Generate Airflow's dag.py On The Fly
airflow airfly ast automation codegen dag-automation gutt python
Last synced: 15 Jun 2024
https://github.com/teamclairvoyant/airflow-maintenance-dags
A series of DAGs/Workflows to help maintain the operation of Airflow
airflow airflow-maintenance-dags apache-airflow cleanup dag maintenance workflow
Last synced: 15 Jun 2024
https://github.com/michaelosthege/fairflow
Functional Airflow DAG definitions.
Last synced: 15 Jun 2024
https://github.com/Tauffer-Consulting/domino
User friendly and open source platform for workflow creation and monitoring
ai airflow containers data gui kubernetes open-source python workflows
Last synced: 15 Jun 2024
https://github.com/xnuinside/airflow-helper
Airflow Helper is a tool that currently allows setting up Airflow Variables, Connections, and Pools from a YAML configuration file. Support yaml inheritance & can obtain all settings from existed Airflow Server!
airflow airflow-toolkit airflow-tools apache-airflow cli command-line command-line-tool python
Last synced: 15 Jun 2024
https://github.com/angadsingh/airflow-ditto
An airflow DAG transformation framework
airflow airflow-dag aws azure dataflow emr extensible framework graph-algorithms graph-manipulation hdinsight isomorphism livy networkx spark yarn
Last synced: 15 Jun 2024
https://github.com/stwind/airflow-on-kubernetes
Bare minimal Airflow on Kubernetes (Local, EKS, AKS)
airflow aks aws azure eks kubernetes
Last synced: 15 Jun 2024
https://github.com/noelmcloughlin/airflow-component
Lightweight IaC Installer of Federated Apache-Airflow
airflow celery-workers federated rabbitmq-cluster salt
Last synced: 15 Jun 2024
https://github.com/dsaidgovsg/airflow-pipeline
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Last synced: 15 Jun 2024
https://github.com/villasv/aws-airflow-stack
Turbine: the bare metals that gets you Airflow
airflow airflow-cluster airflow-cookbook aws aws-cloudformation
Last synced: 15 Jun 2024
https://github.com/bahchis/airflow-cookbook
Airflow workflow management platform chef cookbook.
airflow airflow-cookbook chef-cookbook
Last synced: 15 Jun 2024
https://github.com/Speccy-Rom/SpeccyTV
โ๐ปโ๐จโ๐ปโโจ๏ธ๏ธโ Streaming service with ETL on steroids
airflow clickhouse django docker docker-compose elasticsearch fastapi ffmpeg flask kafka microservice mongodb nginx postgresql pytest python uvicorn
Last synced: 13 Jun 2024
https://github.com/san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
airflow airflow-dag apache-airflow apache-spark data-engineering data-engineering-pipeline data-lake data-migration emr-cluster etl-framework etl-job etl-pipeline goodreads-data-pipeline livy python redshift s3 scheduler spark warehouse
Last synced: 13 Jun 2024
https://github.com/alanchn31/Data-Engineering-Projects
Personal Data Engineering Projects
airflow aws-redshift cassandra data-engineering data-engineering-nanodegree data-lake data-modeling data-warehouse ingest-data mongodb postgres scrapy spark star-schema
Last synced: 13 Jun 2024
https://github.com/andresionek91/airflow-autoscaling-ecs
Airflow Deployment on AWS ECS Fargate Using Cloudformation
airflow airflow-autoscaling-ecs airflow-deployment airflow-ecs data-engineering
Last synced: 12 Jun 2024
https://github.com/puckel/docker-airflow
Docker Apache Airflow
airflow docker docker-airflow management scheduler task workflow
Last synced: 12 Jun 2024
https://github.com/jghoman/awesome-apache-airflow
Curated list of resources about Apache Airflow
airflow apache-airflow awesome awesome-list workflow-management
Last synced: 12 Jun 2024
https://github.com/san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
airflow airflow-operators aws aws-ec2 aws-s3 aws-sdk cassandra cassandra-database cloudformation cluster data data-engineering data-engineering-pipeline data-lake data-modeling data-warehouse etl-pipeline infrastructure postgres postgresql-database
Last synced: 12 Jun 2024
https://github.com/abhishek-ch/around-dataengineering
A Data Engineering & Machine Learning Knowledge Hub
airflow data-engineering datascience devops infrastructure machine-learning mlops spark
Last synced: 11 Jun 2024
https://github.com/blockchain-etl/polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon
Last synced: 11 Jun 2024
https://github.com/airflow-helm/charts
The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
airflow chart charts helm helm-chart helm-charts k8s kubernetes
Last synced: 06 Jun 2024
https://github.com/raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows
Last synced: 01 Jun 2024
https://github.com/pytorch/torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
airflow aws-batch components deep-learning distributed-training kubernetes machine-learning pipelines python pytorch ray slurm
Last synced: 01 Jun 2024
https://github.com/alanchn31/Movalytics-Data-Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
airflow analytics aws-redshift aws-s3 data-engineer-nanodegree data-engineering data-engineering-pipeline data-modelling data-warehouse-cloud docker movie-database movie-recommendation movie-reviews pyspark python3 redshift spark sql udacity
Last synced: 27 May 2024
https://github.com/gocardless/airflow-dbt
Apache Airflow integration for dbt
Last synced: 27 May 2024
https://github.com/AuFeld/Data_Engineering_Projects
A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs
airflow aws cassandra data-engineering data-lake data-warehouse docker emr etl-pipeline infrastructure-as-code infrastructure-setup postgresql python redshift s3 spark
Last synced: 27 May 2024
https://github.com/angelotc/MacroDAG
A Dockerized Airflow ETL pipeline that processes macroeconomic indicators from the Federal Reserve.
Last synced: 26 May 2024
https://github.com/timkpaine/paperboy
A web frontend for scheduling Jupyter notebook reports
airflow apache-airflow celery dask docker jupyter jupyter-notebook jupyter-notebooks jupyterlab kubernetes luigi notebook nteract papermill phosphorjs scheduling-notebooks
Last synced: 26 May 2024
https://github.com/GoogleCloudPlatform/airflow-operator
Kubernetes custom controller and CRDs to managing Airflow
airflow airflow-operator apache-airflow crd kubernetes kubernetes-controller kubernetes-operator workflow-engine
Last synced: 22 May 2024
https://github.com/orchest/orchest
Build data pipelines, the easy way ๐ ๏ธ
airflow cloud dag data-pipelines data-science deployment docker etl etl-pipeline ide jupyter jupyterlab kubernetes machine-learning notebooks orchest pipelines python self-hosted
Last synced: 16 May 2024
https://github.com/WeBankFinTech/DataSphereStudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
airflow atlas azkaban dataworks davinci dolphinscheduler flink governance griffin hadoop hive hue kettle linkis spark supperset tableau visualis workflow zeppelin
Last synced: 16 May 2024
https://github.com/PipelineAI/pipeline
PipelineAI
airflow artificial-intelligence cassandra docker gpu kafka keras kubeflow kubernetes machine-learning neural-network pipelineai pytorch redis scikit-learn spark tensorflow tfx
Last synced: 14 May 2024
https://github.com/mikeroyal/Apache-Airflow-Guide
Apache Airflow Guide
airflow airflow-dags airflow-docker airflow-operators airflow-plugin awesome awesome-list awesome-resources big-data business-analytics business-intelligence data-engineering distributed python
Last synced: 14 May 2024
https://github.com/apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
airflow azkaban cloud-native data-pipelines job-scheduler orchestration powerful-data-pipelines task-scheduler workflow workflow-orchestration workflow-schedule
Last synced: 14 May 2024
https://github.com/teamclairvoyant/airflow-rest-api-plugin
A plugin for Apache Airflow that exposes rest end points for the Command Line Interfaces
airflow airflow-plugin airflow-webserver apache-airflow plugin rest-api
Last synced: 13 May 2024
https://github.com/domenp/aircal
Visualize Airflow's schedule by exporting future DAG runs as events to Google Calendar.
airflow calendar dag google-calendar schedule
Last synced: 13 May 2024
https://github.com/Sardhendu/PropertyClassification
Classifying the type of property given Real Estate, satellite and Street view Images
aerial-imagery airflow autoencoder computer-vision deep-learning deep-neural-networks deeplearning georeferencing python-3 resnet-18 resnet-50 satellite-imagery tensorflow
Last synced: 13 May 2024
https://github.com/elyra-ai/elyra
Elyra extends JupyterLab with an AI centric approach.
ai airflow anaconda apache-airflow binder docker elyra hacktoberfest jupyterlab jupyterlab-extension jupyterlab-extensions jupyterlab-notebooks kubeflow kubeflow-pipelines machine-learning notebook-jupyter notebooks pipelines pypi python
Last synced: 06 May 2024
https://github.com/fieldryand/goflow
Simple but powerful DAG scheduler and dashboard
airflow dashboard directed-acyclic-graph go job-scheduler schedule workflow-engine
Last synced: 29 Apr 2024
https://github.com/apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration
Last synced: 26 Apr 2024
https://github.com/argoproj/argo-workflows
Workflow Engine for Kubernetes
airflow argo argo-workflows batch-processing cloud-native cncf dag data-engineering gitops hacktoberfest k8s knative kubernetes machine-learning mlops pipelines workflow workflow-engine
Last synced: 26 Apr 2024
https://github.com/slve/dbt-github-workflow
dbt-github-workflow is a boilerplate that contains all the necessary configurations to set up a simple CI/CD pipeline for your data modelling stack, making your life simpler by adding back a few extra working hours / out of hours
airflow composer dbt github-actions
Last synced: 23 Apr 2024
https://github.com/ankurchavda/streamify
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
airflow data-engineering dbt gcp kafka python spark
Last synced: 22 Apr 2024
https://github.com/rjurney/Agile_Data_Code_2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
agile-data agile-data-science airflow amazon-ec2 amazon-web-services analytics apache-kafka apache-spark data data-science data-syndrome kafka machine-learning machine-learning-algorithms predictive-analytics python python-3 python3 spark vagrant
Last synced: 17 Apr 2024
https://github.com/huseinzol05/Gather-Deployment
Gathers Python deployment, infrastructure and practices.
airflow docker docker-compose kafka pyflink pyspark python tensorflow
Last synced: 15 Apr 2024
https://github.com/iusztinpaul/energy-forecasting
๐ ๐ง๐ต๐ฒ ๐๐๐น๐น ๐ฆ๐๐ฎ๐ฐ๐ธ ๐ณ-๐ฆ๐๐ฒ๐ฝ๐ ๐ ๐๐ข๐ฝ๐ ๐๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ | ๐๐ฒ๐ฎ๐ฟ๐ป ๐ ๐๐ & ๐ ๐๐ข๐ฝ๐ for free by designing, building and deploying an end-to-end ML batch system ~ ๐ด๐ฐ๐ถ๐ณ๐ค๐ฆ ๐ค๐ฐ๐ฅ๐ฆ + 2.5 ๐ฉ๐ฐ๐ถ๐ณ๐ด ๐ฐ๐ง ๐ณ๐ฆ๐ข๐ฅ๐ช๐ฏ๐จ & ๐ท๐ช๐ฅ๐ฆ๐ฐ ๐ฎ๐ข๐ต๐ฆ๐ณ๐ช๐ข๐ญ๐ด
3-pipeline-design airflow batch-processing cicd data-versioning docker fastapi feature-store gcp github-actions great-expectations hopsworks ml-monitoring mlops model-registry poetry python sktime streamlit weights-and-biases
Last synced: 14 Apr 2024
https://github.com/bryzgaloff/airflow-clickhouse-plugin
The most popular ClickHouse plugin for Airflow. ๐ Top-1% downloads on PyPI: https://pypi.org/project/airflow-clickhouse-plugin! Based on mymarilyn/clickhouse-driver.
airflow clickhouse python python3
Last synced: 13 Apr 2024
https://github.com/saucam/airflow-runner
airflow automation automation-testing data-engineering workflow
Last synced: 05 Apr 2024
https://github.com/google/starthinker
Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."
airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows
Last synced: 03 Apr 2024
https://github.com/scribd/objinsync
Continuously synchronize directories from remote object store to local filesystem
Last synced: 02 Apr 2024
https://github.com/google/grizzly
End-to-end DataOps platform deployed by Terraform.
airflow bigquery cloud-sql cloud-storage composer data-catalog data-lineage data-loss-prevention dataflow dataops dataops-platform gcp git google-cloud google-cloud-platform pubsub spanner terraform
Last synced: 01 Apr 2024
https://github.com/chandulal/airflow-testing
Airflow Unit Tests and Integration Tests
airflow airflow-dags airflow-testing testing
Last synced: 31 Mar 2024
https://github.com/difference-engine/docker-airflow-conda-ml
A docker setup for ML pipelines
Last synced: 30 Mar 2024
https://github.com/astronomer/astro-sdk
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows
Last synced: 24 Mar 2024
https://github.com/ucbrise/flor
๐ป FlorFlow: Flor, now with Dataflow
airflow build dag deep-learning flor hindsight logger logging machine-learning ml pytorch tensorboard vldb
Last synced: 23 Mar 2024
https://github.com/gdoumenc/coworks
CoWorks is a unified compositional serverless microservices framework over AWS, Flask and Airflow technologies.
airflow aws-lambda flask microservice python3 serverless serverless-framework
Last synced: 19 Mar 2024
https://github.com/mpolatcan/airflow-docker
Scalable Airflow Docker image that works Docker and Kubernetes
airflow apache celery containers cron docker docker-image kubernetes scalable scheduler workflow
Last synced: 18 Mar 2024
https://github.com/bhavaniravi/airflow-kube-setup
How to deploy airflow on Kubernetes
Last synced: 18 Mar 2024
https://github.com/ris-tlp/audiophile-e2e-pipeline
Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
airflow aws data-engineering metabase python terraform
Last synced: 18 Mar 2024
https://github.com/jgoerner/data-science-stack-cookiecutter
๐ณ๐๐คCookiecutter template to launch an awesome dockerized Data Science toolstack (incl. Jupyster, Superset, Postgres, Minio, AirFlow & API Star)
airflow apistar cookiecutter data-science docker docker-image jupyter minio postgres python superset
Last synced: 14 Mar 2024