Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/datacoon/awesome-dataops
Awesome list of dataops products, open source and resources
https://github.com/datacoon/awesome-dataops
List: awesome-dataops
cloud data data-engineering dataops etl workflow-engine
Last synced: 2 months ago
JSON representation
Awesome list of dataops products, open source and resources
- Host: GitHub
- URL: https://github.com/datacoon/awesome-dataops
- Owner: datacoon
- License: cc0-1.0
- Created: 2020-06-12T07:00:58.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-05-02T16:59:32.000Z (over 2 years ago)
- Last Synced: 2024-04-10T13:54:45.944Z (9 months ago)
- Topics: cloud, data, data-engineering, dataops, etl, workflow-engine
- Size: 7.81 KB
- Stars: 22
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-starred - datacoon/awesome-dataops - Awesome list of dataops products, open source and resources (data)
README
# Awesome DataOps [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
Awesome list of DataOps open source software, online services, courses and use cases
### Table of contents* [Opensource](#opensource)
* [Commercial products and services](#commercial-products-and-services)## Opensource
### Data Pipeline Orchestration
* [Apache Airlow](https://airflow.apache.org/) - Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
* [Apache Oozie](http://oozie.apache.org/) - Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
* [Dagster](https://github.com/dagster-io/dagster) - A Python library for building data applications: ETL, ML, Data Pipelines, and more.
* [DBT Cmd tool](https://github.com/fishtown-analytics/dbt) - the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.
* [Reflow](https://github.com/grailbio/reflow) - A language and runtime for distributed, incremental data processing in the cloud### ETL tools
* [Apache Kafka](https://kafka.apache.org/) - a distributed streaming platform.
* [Apache Nifi](https://nifi.apache.org/) - an easy to use, powerful, and reliable system to process and distribute data.
* [Squirrel](https://github.com/merantix-momentum/squirrel-core) - a Python library for large-scale data loading, transforming and sharing.## Commercial products and services
### Platforms
* [Astronomer](https://www.astronomer.io/) - spin up and scale Apache Airflow clusters
* [Databand](https://databand.ai/) - Databand tracks your pipeline execution metadata, so you can evaluate changes in runtimes, code, data, and critical business KPIs.
* [DataKitchen](https://www.datakitchen.io/) - end-to-end DataOps platform automates and coordinates all the people, tools, and environments in your entire data analytics organization – everything from orchestration, testing, and monitoring to development and deployment.
* [Prefect](https://www.prefect.io/) - is a new workflow management system, designed for modern infrastructure and powered by open-source software.
* [Saagie](https://www.saagie.com) - Saagie DataOps Orchestrator integrates the commercial and open source data technologies to accelerate project delivery
* [Unravel](https://unraveldata.com/platform/) - helps ops engineers, app developers, and enterprise architects reduce the complexity of delivering reliable application performance – providing unified visibility and operational intelligence to optimize your entire ecosystem## Cloud ETL
* [AWS Glue](https://docs.aws.amazon.com/glue/index.html) - is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.
* [Azure Data Factory](https://azure.microsoft.com/ru-ru/services/data-factory/) - a hybrid data integration service, simplified ETL operations
* [Google Cloud Dataflow](https://cloud.google.com/dataflow/) - unified stream and batch data processing that's serverless, fast, and cost-effective.
* [ETLWorks](https://etlworks.com/) - a cloud-first, any-to-any data integration platform## Data catalogs
* [Alation Data Catalog](https://www.alation.com) - a data catalog designed for human collaboration
* [Colibra Data Catalog](https://www.collibra.com/data-catalog) - empowers business users to quickly discover and understand data that matters
* [SQL Data catalog](https://www.red-gate.com/products/dba/sql-data-catalog/) - tool to discover and classify sensitive data for MS SQL Server### Testing and monitoring
* [RightData](https://www.getrightdata.com/) - is a data testing, reconciliation, validation suite that allows stakeholders in identifying issues related to data consistency, quality, completeness, and gaps.