Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/datacoon/awesome-dataops

Awesome list of dataops products, open source and resources
https://github.com/datacoon/awesome-dataops

List: awesome-dataops

cloud data data-engineering dataops etl workflow-engine

Last synced: 2 months ago
JSON representation

Awesome list of dataops products, open source and resources

Awesome Lists containing this project

README

        

# Awesome DataOps [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

Awesome list of DataOps open source software, online services, courses and use cases
### Table of contents

* [Opensource](#opensource)
* [Commercial products and services](#commercial-products-and-services)

## Opensource
### Data Pipeline Orchestration
* [Apache Airlow](https://airflow.apache.org/) - Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
* [Apache Oozie](http://oozie.apache.org/) - Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
* [Dagster](https://github.com/dagster-io/dagster) - A Python library for building data applications: ETL, ML, Data Pipelines, and more.
* [DBT Cmd tool](https://github.com/fishtown-analytics/dbt) - the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.
* [Reflow](https://github.com/grailbio/reflow) - A language and runtime for distributed, incremental data processing in the cloud

### ETL tools
* [Apache Kafka](https://kafka.apache.org/) - a distributed streaming platform.
* [Apache Nifi](https://nifi.apache.org/) - an easy to use, powerful, and reliable system to process and distribute data.
* [Squirrel](https://github.com/merantix-momentum/squirrel-core) - a Python library for large-scale data loading, transforming and sharing.

## Commercial products and services
### Platforms
* [Astronomer](https://www.astronomer.io/) - spin up and scale Apache Airflow clusters
* [Databand](https://databand.ai/) - Databand tracks your pipeline execution metadata, so you can evaluate changes in runtimes, code, data, and critical business KPIs.
* [DataKitchen](https://www.datakitchen.io/) - end-to-end DataOps platform automates and coordinates all the people, tools, and environments in your entire data analytics organization – everything from orchestration, testing, and monitoring to development and deployment.
* [Prefect](https://www.prefect.io/) - is a new workflow management system, designed for modern infrastructure and powered by open-source software.
* [Saagie](https://www.saagie.com) - Saagie DataOps Orchestrator integrates the commercial and open source data technologies to accelerate project delivery
* [Unravel](https://unraveldata.com/platform/) - helps ops engineers, app developers, and enterprise architects reduce the complexity of delivering reliable application performance – providing unified visibility and operational intelligence to optimize your entire ecosystem

## Cloud ETL
* [AWS Glue](https://docs.aws.amazon.com/glue/index.html) - is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.
* [Azure Data Factory](https://azure.microsoft.com/ru-ru/services/data-factory/) - a hybrid data integration service, simplified ETL operations
* [Google Cloud Dataflow](https://cloud.google.com/dataflow/) - unified stream and batch data processing that's serverless, fast, and cost-effective.
* [ETLWorks](https://etlworks.com/) - a cloud-first, any-to-any data integration platform

## Data catalogs
* [Alation Data Catalog](https://www.alation.com) - a data catalog designed for human collaboration
* [Colibra Data Catalog](https://www.collibra.com/data-catalog) - empowers business users to quickly discover and understand data that matters
* [SQL Data catalog](https://www.red-gate.com/products/dba/sql-data-catalog/) - tool to discover and classify sensitive data for MS SQL Server

### Testing and monitoring
* [RightData](https://www.getrightdata.com/) - is a data testing, reconciliation, validation suite that allows stakeholders in identifying issues related to data consistency, quality, completeness, and gaps.