Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/opendatadiscovery/awesome-data-catalogs

πŸ“™ Awesome Data Catalogs and Observability Platforms.
https://github.com/opendatadiscovery/awesome-data-catalogs

List: awesome-data-catalogs

awesome awesome-list big-data data-catalog data-discovery data-engineering data-quality datacatalog datadiscovery dataops metadata metadata-management ml observability open-source opendata opensource oss

Last synced: 3 months ago
JSON representation

πŸ“™ Awesome Data Catalogs and Observability Platforms.

Awesome Lists containing this project

README

        

# Awesome Data Discovery and Observability [![Awesome](https://awesome.re/badge-flat.svg)](https://awesome.re)

This repository contains a curated list of awesome data catalogs and observability platforms that help you discover, manage, and observe data in your organization.


## Contents: Existing Data Discovery and Observability Solutions

| [OSS Data Catalogs](#opensource) | [ Proprietary Monocloud DCs](#monocloud)| [ Proprietary Observability Tools](#observability) | [Other Proprietary DCs](#proprietary) |
|--------------------------|--------------------------------|---------------------------------|--------------------------------|
| [πŸ“™ Amundsen](#amundsen) | [πŸ“’ Google DC](#google) | [πŸ” Monte Carlo](#montecarlo) | [πŸ“• Alation](#alation) |
| [πŸ“™ DataHub](#datahub) | [πŸ“’ Azure DC](#azure) | [πŸ” Databand](#databand) | [πŸ“• Atlan](#atlan) |
| [πŸ“™ Marquez](#marquez) | | [πŸ” Datafold](#datafold) | [πŸ“• Collibra](#collibra) |
| [πŸ“™ Atlas](#atlas) | | [πŸ” Ataccama](#ataccama) | [πŸ“• DataGalaxy](#datagalaxy) |
| [πŸ“™ CKAN](#ckan) | | [πŸ” DataKitchen Open Source Data Observability](#datakitchen)| [πŸ“• Informatica](#informatica) |
| [πŸ“™ Magda](#magda) | | | [πŸ“• Stemma](#stemma) |
| [πŸ“™ OpenDataDiscovery](#opendatadiscovery)| | | [πŸ“• Talend](#talend) |
| [πŸ“™ OpenMetadata](#openmetadata)| | | [πŸ“• Select Star](#selectstar) |
| [πŸ“™ Meta\#Grid](#metagrid)| | | |
| [πŸ“™ Grai](#grai) | | | |


## High-Level Feature Comparison

| Tool | Specification -Based | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observ- ability | Column-level lineage | Data collaboration |
|:-------------:|:--:|:----:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:--:|:--:|
| [Alation](#alation) | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ |
| [Amundsen](#amundsen) | ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| [Ataccama](#ataccama) | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ |
| [Atlan](#atlan) | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | βœ”οΈ |
| [Atlas](#atlas) | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| [Azure DC](#azure) | ❌ | βœ”οΈ | ? | βœ”οΈ | ❌ | ❌ | ? | ❌ | ❌ | ❌ | ❌ |
| [CKAN](#ckan) | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| [Collibra](#collibra) | ❌ | βœ”οΈ | ? | βœ”οΈ | ❌ | ❌ | ? | ❌ | ❌ | ❌ | ❌ |
| [DataGalaxy](#datagalaxy) | ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | ❌ | βœ”οΈ | βœ”οΈ | ? | ? |
| [Databand](#databand) | ❌ | ? | ? | ? | ❌ | ? | ? | ? | βœ”οΈ | ❌ | ❌ |
| [Datafold](#datafold) | ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ |
| [DataHub](#datahub) | βœ”οΈ [details](https://datahubproject.io/docs/metadata-modeling/metadata-model/) | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ” | βœ” | βœ” | βœ” | ❌ | βœ” | ❌ |
| [Google DC](#google) | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | ? | ❌ | ❌ | ❌ | ❌ |
| [Informatica](#informatica) | ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ? | ❌ |
| [Magda](#magda) | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| [Marquez](#marquez) | [OpenLineage](https://github.com/OpenLineage/OpenLineage) | βœ”οΈ | ❌| βœ”οΈ | ? | ❌ | ❌ | ❌ | ❌ | βœ”οΈ | ❌ |
| [Monte Carlo](#montecarlo) | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ |
| [Select Star](#selectstar) | ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | βœ”οΈ | βœ”οΈ |
| [OpenDataDiscovery](#opendatadiscovery) | [ODD Specification](https://github.com/opendatadiscovery/opendatadiscovery-specification) | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | βœ”οΈ |
| [OpenMetadata](#openmetadata) | [JSON Schema](https://github.com/json-schema-org/json-schema-spec) | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | βœ”οΈ | βœ”οΈ |
| [Stemma](#stemma) | ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | ? | βœ”οΈ | ❌ | ❌ | ❌ |
| [Talend](#talend) | ❌ | βœ”οΈ | ? | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ |
| [Meta\#Grid](#metagrid) | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | not yet | ❌ | ❌ | ❌ | βœ”οΈ |
| [Grai](#grai) | [Grai Schemas](https://github.com/grai-io/grai-core/tree/master/grai-schemas) | βœ”οΈ | ❌ | βœ”οΈ | ❌ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | βœ”οΈ |

Definitions:



  • Specification-based - uses an open standard for collecting metadata to allow efficient time-to-discovery and federating data catalogs


  • Search-based - allows to search for data assets


  • Network-based - provides rich context about data asset ownership


  • Lineage-based - provides lineage for all entities the solution operates


  • Federation - the ability to map multiple data catalogs into a single UI to avoid repeated data collection.


  • ML 1st citizen - operates ML entities on a high level - you can use them as any other data assets.


  • Data Quality - includes mature data quality assurance tools.


  • End-to-end lineage - data lineage that includes all data assets used in the organization across all its data catalogs and ML tools.


  • Column-level lineage - data lineage with column level granularity


  • Data collaboration - provides possibility to bring together data from various internal and external sources to unlock combined data insights




## πŸ“™ Open-Source Data Catalogs


### Amundsen
[Website](https://www.amundsen.io/) | [GitHub](https://github.com/amundsen-io/amundsen)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/amundsen-io/amundsen/graphs/commit-activity)
![](https://img.shields.io/github/stars/amundsen-io/amundsen.svg?style=social)

A popular open-source data catalog for metadata management and data discovery originated from Lyft. Created by Amundsen maintainers, [Stemma](stemma.ai) provides a managed version of an enterprise data catalog, inspired by Amundsen.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:----:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|
| ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Push


  • UX personalization: No


  • AI autowiring: No


  • Rich data profiling: No


  • Recommendations: Yes


  • Schemas, Description: Yes


  • Complex schemas: No


  • Data preview: Yes


  • Column statistics: Yes


  • Data owner: Yes


  • Top data users: Yes


  • Change notifications:No


  • Change feed: No


  • Deployment:


  • Supported data sources: Hive, Redshift, Druid, RDBMS, Presto, Snowflake



### DataHub

[Website](https://datahubproject.io/) | [GitHub](https://github.com/datahub-project/datahub)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/linkedin/datahub/graphs/commit-activity)
![](https://img.shields.io/github/stars/linkedin/datahub.svg?style=social)

DataHub is an open-source data catalog enabling data discovery, data observability and federated governance that originated from LinkedIn and is commercially offered by Acryl Data as a cloud-hosted SaaS offering.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability |
|:--:|:----:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|
| βœ”οΈ [details](https://datahubproject.io/docs/metadata-modeling/metadata-model/) | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ” | βœ” | βœ” | βœ” | ❌ | βœ” | ❌ |

More features




  • Strategy: Push, Pull


  • Customizable metadata model: Yes. The metadata model can be declared using the open-source Pegasus language, and is interoperable with JSONSchema and Avro


  • Rich data profiling: Yes


  • Recommendations: Yes


  • Schemas, Description: Yes


  • Complex schemas: Yes


  • Data preview: Yes


  • Column statistics: Yes


  • Data owner: Yes


  • Top data users: Yes


  • Lineage impact analysis: Yes


  • Change notifications: Yes


  • Change feed: No


  • Automation: Yes


  • UX personalization: No


  • Deployment: docker-compose / Kubernetes with Helm, or using Acryl Data's SaaS offering


  • Supported data sources:

    • Snowflake

    • BigQuery

    • Redshift

    • Hive

    • Athena

    • Postgres

    • MySQL

    • SQL server

    • Trino

    • Delta Lake

    • S3

    • Looker

    • PowerBI

    • Tableau

    • Mode

    • Metabase

    • Redash

    • Superset

    • Airflow

    • Great Expectation

    • dbt

    • Feast

    • SageMaker

    • Glue

    • Kafka

    • Nifi

    • Okta

    • LDAP

    • Slack

    • There's 50+ integrations - see the docs for the latest.





### Marquez

[Website](https://marquezproject.github.io/marquez/) | [GitHub](https://github.com/MarquezProject/marquez)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/MarquezProject/marquez/graphs/commit-activity)
![](https://img.shields.io/github/stars/MarquezProject/marquez.svg?style=social)

Marquez is an open-source data catalog for collection, aggregation, and visualization of a data ecosystem’s metadata originated from WeWork.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:-----------:|:--:|:--:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|
| [OpenLineage](https://github.com/OpenLineage/OpenLineage) | βœ”οΈ | ❌| βœ”οΈ | ? | ❌ | ❌ | ❌ | ❌ | βœ”οΈ | ❌ |

More features




  • Strategy: Push


  • UX personalization: No


  • AI autowiring: No


  • Rich data profiling: No


  • Recommendations: No


  • Schemas, Description: Yes


  • Complex schemas: No


  • Data preview: Yes


  • Column statistics: No


  • Data owner: Yes


  • Top data users: ?


  • Change notifications: No


  • Change feed: No


  • Deployment:


  • Supported data sources: S3, Kafka



### Atlas

[Website](https://atlas.apache.org/#/) | [GitHub](https://github.com/apache/atlas)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/apache/atlas/graphs/commit-activity)
![](https://img.shields.io/github/stars/apache/atlas.svg?style=social)

Apache Atlas is an open-source data catalog for metadata collection, governance, and data democratization.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:----:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|
| ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Push


  • UX personalization: No


  • AI autowiring: No


  • Rich data profiling: No


  • Recommendations: No


  • Schemas, Description: Yes


  • Complex schemas: No


  • Data preview: No


  • Column statistics: No


  • Data owner: No


  • Top data users: ?


  • Change notifications: Yes


  • Change feed: No


  • Deployment:


  • Supported data sources:HBase, Hive, Sqoop, Kafka, Storm



### CKAN

[Website](https://ckan.org/) | [GitHub](https://github.com/ckan/ckan)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/ckan/ckan/graphs/commit-activity)
![](https://img.shields.io/github/stars/ckan/ckan.svg?style=social)

CKAN is an open-source data catalog for data management, powering data portals for govenments and enterprises.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|
| ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Push


  • UX personalization: No


  • AI autowiring: No


  • Rich data profiling: No


  • Recommendations: ?


  • Schemas, Description: ?


  • Complex schemas: ?


  • Data preview: ?


  • Column statistics: ?


  • Data owner: ?


  • Top data users: ?


  • Change notifications: ?


  • Change feed: ?


  • Deployment:


  • Supported data sources:



### Magda

[Website](https://magda.io/) | [GitHub](https://github.com/magda-io/magda)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/magda-io/magda/graphs/commit-activity)
![](https://img.shields.io/github/stars/magda-io/magda.svg?style=social)

Magda is an open-source data catalog that features data discovery, metadata enrichment, and federation, focused on geodata.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|
| ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Push via UI


  • UX personalization: No


  • AI autowiring: No


  • Rich data profiling: No


  • Recommendations: No


  • Schemas, Description: Yes


  • Complex schemas: No


  • Data preview: Yes


  • Column statistics: No


  • Data owner: Yes


  • Top data users: ?


  • Change notifications: No


  • Change feed: No


  • Deployment:


  • Supported data sources: Mostly geodata



### OpenDataDiscovery

[Website](https://opendatadiscovery.org/) | [GitHub](https://github.com/opendatadiscovery/odd-platform)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/opendatadiscovery/odd-platform/commits/main)
![](https://img.shields.io/github/stars/opendatadiscovery/odd-platform.svg?style=social)

First open-source data discovery and observability platform. ODD Platform is based on ODD Specification.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:----:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|
| βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | βœ”οΈ |

More features




  • Strategy: Push/Pull


  • UX personalization: No


  • Rich data profiling: Yes


  • Data collaboration: Yes


  • Schemas, Description: Yes


  • Complex schemas: Yes


  • Data preview: Yes


  • Column statistics: Yes


  • Data owner: Yes


  • Change notifications: Yes


  • Change feed: Yes


  • Metadata versioning: Yes


  • SaaS: Yes


  • Third-party integrations: Dbt, Great Expectations, and Prefect


  • Supported data sources: Airflow, Athena, AzureSQL, BigQuery, Clickhouse, Databricks, DeltaLake, Druid, DynamoDB, Fivetran, Glue, Hive, Kafka, Looker, MariaDB, MlFlow, MSSQL, MySQL, Oracle, Postgres, Presto, Redash, Redpanda, Redshift, Snowflake, Tableau, and Vertica



### OpenMetadata

[Website](https://open-metadata.org/) | [GitHub](https://github.com/open-metadata/OpenMetadata)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/open-metadata/OpenMetadata/commits/main)
![](https://img.shields.io/github/stars/open-metadata/OpenMetadata.svg?style=social)

OpenMetadata is the all-in-one platform for data collaboration, discovery, governance, lineage, and quality that lets you focus on building and analyzing.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:----:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|
| βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | βœ”οΈ | βœ”οΈ |

More features




  • Strategy: Push/Pull


  • UX personalization: No


  • Rich data profiling: Yes


  • Data collaboration: Yes


  • Schemas, Description: Yes


  • Complex schemas: Yes


  • Data preview: Yes


  • Column statistics: Yes


  • Data owner: Yes


  • Change notifications: Yes


  • Change feed: Yes


  • Metadata versioning: Yes


  • SaaS: Yes


  • Third-party integrations: Dbt, Great Expectations, and Prefect


  • Supported data sources: Airbyte, Airflow, Athena, AzureSQL, BigQuery, Clickhouse, Dagster, Databricks, DB2, DeltaLake, Druid, DynamoDB, Fivetran, Glue, Glue, Hive, Kafka, Looker, MariaDB, Metabase, MlFlow, Mode, MSSQL, MySQL, NiFi, Oracle, Postgres, PowerBI, Presto, Redash, Redpanda, Redshift, Salesforce, SingleStore, Snowflake, Superset, Tableau, Trino, and Vertica



### Meta\#Grid
[Website](https://meta-grid.com/) | [GitHub](https://github.com/patschwork/meta_grid) | [Docs](https://docs.meta-grid.com)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/patschwork/meta_grid/graphs/commit-activity)
![](https://img.shields.io/github/stars/patschwork/meta_grid.svg?style=social)

Meta\#Grid is an open source data catalog for metadata management. It is designed to help small and large organizations create an inventory of their data silos and connect between different technologies. Through a multi-client system, with granular permissions system, Meta\#Grid can be used in consulting companies (with diverse clients and projects) as well as in data mesh organizations. It grows with the requirements of the demand.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:----:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|
| ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | not yet | ❌ | ❌ | ❌ | βœ”οΈ |

More features




  • Strategy: Push, Pull


  • UX personalization: No


  • AI autowiring: No


  • Rich data profiling: No


  • Recommendations: Yes


  • Schemas, Description: Yes


  • Complex schemas: Yes


  • Data preview: No


  • Column statistics: No


  • Data owner: Yes


  • Top data users: No


  • Change notifications: Yes


  • Change feed: Yes


  • Deployment:


  • Supported data sources: Hive, Redshift, Druid, RDBMS, Presto, Snowflake



### Grai
[Website](https://grai.io/) | [GitHub](https://github.com/grai-io/grai-core) | [Docs](https://docs.grai.io)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/grai-io/grai-core/graphs/commit-activity)
![](https://img.shields.io/github/stars/grai-io/grai-core.svg?style=social)

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:----:|:---:|:---:|:--:|:---:|:--:|:---:|:--:|:---:|:--:|
| [Grai Schemas](https://github.com/grai-io/grai-core/tree/master/grai-schemas) | βœ”οΈ | ❌ | βœ”οΈ | ❌ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | βœ”οΈ |

More features




  • Strategy: Push, Pull


  • Customizable metadata model: Yes. The metadata model can be flexibly extended or modified as needed.


  • Rich data profiling: No


  • Recommendations: No


  • Schemas, Description: Yes


  • Complex schemas: Yes


  • Data preview: No


  • Column statistics: No


  • Data owner: Yes


  • Top data users: No


  • CI Integration: Yes


  • Lineage impact analysis: Yes


  • Change notifications: Yes


  • Change feed: Yes


  • Automation: Yes


  • UX personalization: Yes


  • Deployment: docker-compose / Kubernetes with Helm, or using Grai SaaS offering


  • Supported data sources:

    • Snowflake

    • BigQuery

    • Redshift

    • Postgres

    • MySQL

    • dbt

    • Slack

    • ... many others see the docs for a full list.





## πŸ“• Proprietary Data Catalogs


### Collibra

[Website](https://www.collibra.com) | [GitHub](https://github.com/collibra)

Collibra is an enterprise data catalog that helps to discover and understand data that matters and drive impactful insights from it.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:--:|:---:|:--:|:---:|:-:|:--:|:--:|:--:|:--:|
| ❌ | βœ”οΈ | ? | βœ”οΈ | ❌ | ❌ | ? | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Push


  • UX personalization: Yes


  • AI autowiring: ?


  • Network-based: No


  • Rich data profiling: ?


  • Supported data sources:



### Informatica

[Website](https://www.informatica.com/) | [GitHub](https://github.com/InformaticaCloudApplicationIntegration)

Informatica is an enterprise data catalog that provides AI-powered data discovery engine to scan and catalog data assets.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:--:|:--:|:---:|:--:|:---:|:--:|
| ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ? | ❌ |

More features




  • Strategy: Push


  • UX personalization: ?


  • AI autowiring: ?


  • Network-based: Yes


  • Rich data profiling: Yes


  • Supported data sources:



### Alation

[Website](https://www.alation.com/) | [GitHub](https://github.com/Alation)

Alation is a collaborative data catalog that helps companies to drive value and business impact from their data.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:--:|:--:|:---:|:---:|:---:|:---:|
| ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Push


  • UX personalization: Yes


  • AI autowiring: No


  • Network-based: No


  • Rich data profiling: No


  • Supported data sources:



### Atlan

[Website](https://atlan.com/) | [GitHub](https://github.com/atlanhq)

Atlan is a modern data catalog offering data discovery, data profiling, data quality, data lineage and data governance.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:---:|:---:|:---:|:---:|
| ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | βœ”οΈ |

More features




  • Strategy: Pull


  • UX personalization: ?


  • AI autowiring: ?


  • Network-based: No


  • Rich data profiling: ?


  • Supported data sources: Presto, Deequ, Atlas, Airflow, Hudi



### DataGalaxy

[Website](https://www.datagalaxy.com/en-gb/home/) | [GitHub](https://github.com/datagalaxy-lab)

DataGalaxy is a modern data catalog offering data discovery, data profiling, data quality, data lineage and data governance.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:---:|:---:|:---:|:---:|
| ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | ❌ | βœ”οΈ | βœ”οΈ | ? | ? |

More features




  • Strategy: Pull & Push


  • UX personalization: Yes


  • AI autowiring: Yes


  • Network-based: Yes


  • Rich data profiling: Yes


  • Supported data sources: [Available connectors](https://www.datagalaxy.com/fr/integrations-connecteurs/)



### Stemma

[Website](https://www.stemma.ai/)

Stemma is a fully managed data catalog powered by the open-source data catalog Amundsen that helps data teams have total trust in their data.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | ? | βœ”οΈ | ❌ | ❌ | ❌ |

More features




  • Strategy: Push


  • UX personalization: No


  • AI autowiring: No


  • Network-based: No


  • Rich data profiling: No


  • Supported data sources:



### Talend

[Website](https://www.talend.com/) | [GitHub](https://github.com/Talend)

Talend is a data catalog that helps enterprises power critical business descisions with trusted data.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| ❌ | βœ”οΈ | ? | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Push


  • UX personalization: Yes


  • AI autowiring: ?


  • Network-based: ?


  • Rich data profiling: Yes


  • Supported data sources:



### Select Star

[Website](https://www.selectstar.com/)

Select Star is an intelligent data discovery platform that automatically analyzes and documents your data. Select Star provides an easy to use data portal that everyone can use to find and understand data.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | βœ”οΈ | βœ”οΈ |

More features




  • Strategy: Pull


  • AI autowiring: Yes


  • Network-based: Yes


  • Rich data profiling: No


  • ER Diagram generation: Yes


  • Role & Policy based access control: Yes


  • Popularity & usage: Yes


  • Description & Tag propagation: Yes


  • Data preview: Yes


  • Data owners: Yes


  • Top data users: Yes


  • UX personalization: No


  • Supported data sources:

    • Snowflake

    • BigQuery

    • Redshift

    • Postgres

    • Looker

    • PowerBI

    • Tableau

    • Mode

    • Sigma

    • Sisense

    • Metabase

    • Looker Studio

    • DBT Cloud & Core

    • Slack





## πŸ“’ Monocloud Data Catalogs


### Google Cloud Data Catalog

[Website](https://cloud.google.com/data-catalog) | [GitHub](https://github.com/GoogleCloudPlatform)

Google Cloud Data Catalog is a fully managed, scalable metadata management service in Google Cloud's Data Analytics family of products.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | ? | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Pull


  • UX personalization: ?


  • AI autowiring: ?


  • Network-based: No


  • Rich data profiling: No


  • Supported data sources:



### Azure Data Catalog

[Website](https://azure.microsoft.com/en-us/services/data-catalog/)

Azure Data Catalog is a fully managed, enterprise-wide metadata catalog that makes data asset discovery straightforward.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| ❌ | βœ”οΈ | ? | βœ”οΈ | ❌ | ❌ | ? | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Pull


  • UX personalization: ?


  • AI autowiring: ?


  • Network-based: ?


  • Rich data profiling: ?


  • Supported data sources:



## πŸ” Data Observability Platforms


### DataKitchen Open Source Data Observability

[Website](https://docs.datakitchen.io/articles/#!open-source-data-observability/data-observability-overview)

DataKitchen's Open Source Data Observability Products are full featured with Apache 2.0 license. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | βœ”οΈ |

More features




  • Strategy: Full featured, with UI for singlue user. Enterprise version for teams


  • UX personalization: No


  • AI autowiring: DataOps TestGen data quality verification tool that does five main tasks: (1) data profiling, (2) new dataset screening and hygiene review, (3) AI/algorithmic generation of data quality validation tests, (4) ongoing production testing of new data refreshes and (5) continuous periodic monitoring of datasets for anomalies


  • Network-based: Data Journey based


  • Rich data profiling: 51 characteristics, with UI


  • Supported data sources: Snowflake, Redshift, Tableau, Synapse, Postgres, Tableau, PowerBI, Airflow, Fivetran, Databricks, dbt, Databricks
    Azure Data Factory, SSIS, Synapse Pipelines, ADF-managed Airflow, Google Composer, AWS S3, Qlik Sense, Amazon Managed Workflows for Apache Airflow, Talend Cloud, Azure Functions (via Event Hub), Azure ADLS/Blob Storage (via Event Hub)


### Monte Carlo

[Website](https://www.montecarlodata.com/)

Monte Carlo is a data observability tool that helps to increase trust in data by eliminating or preventing data downtime.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ |

More features




  • Strategy: Pull


  • UX personalization: ?


  • AI autowiring: ?


  • Network-based: ?


  • Rich data profiling: ?


  • Supported data sources: Snowflake, Hive, Kafka, Looker, Redshift, Tableau, Big Query, Airflow, Fivetran, Presto, Mode, Periscope, Databricks, Glue, dbt, Chartio, Spark, AWS, S3, data.world, Google Cloud Platform



### Databand

[Website](https://databand.ai/) | [GitHub](https://github.com/databand-ai/)

Databand is an observability platform that helps data engineers identify and troubleshoot pipeline issues and data quality problems.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| ❌ | ? | ? | ? | ❌ | ? | ? | ? | βœ”οΈ | ? | ? |

More features




  • Strategy: Push


  • UX personalization: ?


  • AI autowiring: ?


  • Network-based: ?


  • Rich data profiling: ?


  • Supported data sources:



### Datafold

[Website](https://www.datafold.com/) | [GitHub](https://github.com/datafold)

Datafold is a data monitoring and observability platform that gives you confidence in your data quality through diffs, profiling, and anomaly detection.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:--:|:--:|:--:|:--:|
| ❌ | βœ”οΈ | βœ”οΈ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | βœ”οΈ | ? | ? |

More features




  • Strategy: Push


  • UX personalization: ?


  • AI autowiring: ?


  • Network-based: ?


  • Rich data profiling: ?


  • Supported data sources:



### Ataccama

[Website](https://www.ataccama.com/) | [GitHub](https://github.com/ataccama)

Ataccama is an enterprise data catalog and observability tool featuring data profiling and data quality management, designed for data professionals.

|Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
|:--:|:---:|:---:|:---:|:--:|:---:|:--:|:---:|:---:|:---:|:---:|
| ❌ | βœ”οΈ | ❌ | βœ”οΈ | ❌ | ❌ | βœ”οΈ | ❌ | ❌ | ❌ | ❌ |

More features




  • Strategy: Pull


  • UX personalization: Yes


  • AI autowiring: No


  • Network-based: No


  • Rich data profiling: Yes


  • Supported data sources:


Back to top