An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-catalog

A curated list of projects in awesome lists tagged with data-catalog .

https://github.com/datahub-project/datahub

The Metadata Platform for your Data and AI Stack

data-catalog data-discovery data-governance datahub metadata

Last synced: 30 Apr 2025

https://github.com/linkedin/WhereHows

The Metadata Platform for your Data Stack

data-catalog data-discovery datahub hacktoberfest linkedin metadata

Last synced: 16 Feb 2025

https://github.com/linkedin/datahub

The Metadata Platform for your Data Stack

data-catalog data-discovery datahub hacktoberfest linkedin metadata

Last synced: 21 Jan 2025

https://github.com/open-metadata/openmetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake

Last synced: 30 Apr 2025

https://github.com/open-metadata/OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake

Last synced: 15 Mar 2025

https://github.com/amundsen-io/amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

amundsen data-catalog data-discovery linuxfoundation metadata

Last synced: 22 Apr 2025

https://github.com/apache/gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog opendatacatalog skycomputing stratosphere

Last synced: 28 Apr 2025

https://github.com/apache/Gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog opendatacatalog skycomputing stratosphere

Last synced: 22 Jan 2025

https://github.com/intake/intake

Intake is a lightweight package for finding, investigating, loading and disseminating data.

data-access data-catalog python

Last synced: 01 Apr 2025

https://github.com/hyperqueryhq/whale

🐳 The stupidly simple CLI workspace for your data warehouse.

data-catalog data-discovery data-documentation

Last synced: 10 Feb 2025

https://github.com/rsyi/whale

🐳 The stupidly simple CLI workspace for your data warehouse.

data-catalog data-discovery data-documentation

Last synced: 22 Nov 2024

https://github.com/gabledata/recap

Work with your web service, database, and streaming schemas in a single format.

data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap

Last synced: 27 Apr 2025

https://github.com/recap-build/recap

Work with your web service, database, and streaming schemas in a single format.

data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap

Last synced: 13 Dec 2024

https://github.com/tokern/piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

aws-athena aws-glue aws-redshift catalog data data-catalog database phi pii python snowflake

Last synced: 12 Apr 2025

https://github.com/raystack/meteor

Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.

bigdata collector data-catalog data-management dataops extractors metadata scraper sinks

Last synced: 09 Apr 2025

https://github.com/googlecloudplatform/bigquery-data-lineage

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

bigdata bigquery data-catalog data-governance data-lineage data-management dataflow zetasql

Last synced: 05 Feb 2025

https://github.com/intake/intake-esm

An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.

cesm-lens climate-datasets cmip6 data-access data-catalog earth-system-model hacktoberfest intake pangeo

Last synced: 27 Nov 2024

https://github.com/Bayer-Group/COLID-Documentation

The documentation repository is part of the Corporate Linked Data Catalog - short: COLID - application.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 18 Jan 2025

https://github.com/bayer-group/colid-documentation

The documentation repository is part of the Corporate Linked Data Catalog - short: COLID - application.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake

Last synced: 10 Apr 2025

https://github.com/googlecloudplatform/datacatalog-connectors-bi

Sample code with integration between Data Catalog and BI data sources.

data-catalog datacatalog gcp looker looker-sdk metadata metadata-management python qlik qlik-sense tableau

Last synced: 22 Jan 2025

https://github.com/tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

bigquery data-catalog data-discovery data-engineering data-governance data-lineage data-management data-ops dbt gcs lineage redash s3 sql

Last synced: 19 Nov 2024

https://github.com/carte-data/carte

A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable front end that's just HTML.

carte data-catalog data-discovery data-documentation lightweight-data-catalogs python-library

Last synced: 04 Dec 2024

https://github.com/aaronspring/remote_climate_data

a collection of remote climate data accessed via intake cached to disk

accessibility climate-data climate-science data-catalog netcdf observations opendap remote shapefiles thredds-catalogs

Last synced: 24 Apr 2025

https://github.com/darenasc/aeda

Build a data catalog by running a single line of code

data-catalog data-exploration database eda metadata metadata-extraction

Last synced: 16 Mar 2025

https://github.com/bayer-group/colid-setup

The setup repository is part of the Corporate Linked Data Catalog - short: COLID - application. It helps setting up a local environment based on Docker Compose.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/googlecloudplatform/datacatalog-tag-history

Historical metadata of your data warehouse is a treasure trove to discover not just insights about changing data patterns, but also quality and user behaviour. This solution creates Data Catalog Tags history in BigQuery since Data Catalog keeps only the latest version of metadata for fast searchability.

analytics bigquery data-catalog data-governance metadata-management

Last synced: 22 Jan 2025

https://github.com/slaclab/datacat

A system for managing files and file replicas across many diverse sites

data-catalog data-discovery datacat dataset dataset-catalog metadata metadata-store

Last synced: 14 Apr 2025

https://github.com/bayer-group/colid-data-marketplace-frontend

The Data Marketplace frontend repository is part of the Corporate Linked Data Catalog - short: COLID - application. Users can search for registered resources in COLID. It provides a search bar, aggregation filters and search result displaying including term highlighting.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/zillow/intake-nested-yaml-catalog

Supports a single YAML file hierarchical catalog to organize datasets and avoid a data swamp.

data-access data-catalog intake python

Last synced: 23 Apr 2025

https://github.com/bayer-group/colid-appdata-service

The appdata service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It maintains the user data and application settings.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/sahays/serverless-analytics

AWS Serverless Analytics using Amazon S3, Athena, Glue, and QuickSight

athena aws-cli data-catalog dataset glue quicksight transform-data visualization

Last synced: 04 Dec 2024

https://github.com/bayer-group/colid-reporting-service

The reporting service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It offers an API for statistics of registered resources.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/bayer-group/colid-editor-frontend

The editor frontend repository is part of the Corporate Linked Data Catalog - short: COLID - application. It offers user an metadata based user interface to register resources in COLID.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/bayer-group/colid-search-service

The search service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It makes the data findable and provides indexing and search functionalities based on Elasticsearch.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/bayer-group/colid-indexing-crawler-service

The Indexing Crawler Service (ICS) repository is part of the Corporate Linked Data Catalog - short: COLID - application. It is responsible to extract data from a RDF storage system, transform and enrich the data and finally to send it via a message queue to the DMP Webservice for indexing.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/bayer-group/colid-registration-service

The registration service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It is the central microservice to register resources in the triplestore.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/bayer-group/colid-scheduler-service

The scheduler service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It sets up recurring jobs for user notifications and analytics.

cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore

Last synced: 12 Apr 2025

https://github.com/blw-ofag-ufag/data-catalog

A MVP data catalog for the Federal Office for Agriculture (FOAG)

agriculture data-catalog dcat json-schema

Last synced: 23 Mar 2025

https://github.com/open-metadata/openmetadata-sdk

OpenMetadata client SDK. Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

data-catalog datacatalog metadata

Last synced: 11 Mar 2025

https://github.com/csiro-enviro-informatics/hbee

An Approximate Deep Spatial Catalog and Search

data-catalog h3 postgis spatial-database spatial-index

Last synced: 13 Mar 2025

https://github.com/lasyakonduru/superstore-sales-data-analysis

Analysis of sales performance and operational efficiency in a superstore using AWS Athena and QuickSight

amazon-web-services athena aws-glue aws-quicksight aws-s3 data-catalog iam-user

Last synced: 22 Mar 2025

https://github.com/aminekaabachi/lexy

📙 Lexy enables you to easily build and share data dictionaries to explain and document your data terminology using code.

data-catalog data-dictionaries data-dictionary documentation pandas pyspark

Last synced: 27 Mar 2025

https://github.com/himanshub16/lekhpal

Monitor and catalog Twitter feed matching your desired keywords

analytics data data-catalog data-filtering mongodb twitter twitter-streaming-api

Last synced: 21 Feb 2025

https://github.com/ev2900/datazone_demo

Prebuilt demo of Amazon DataZone using fake data for Pharmaceutical drug discovery

aws bussiness-data-catalog data-catalog datazone

Last synced: 10 Apr 2025

https://github.com/hackolade/glue

Hackolade(https://hackolade.com) plugin for AWS Glue Data Catalog

aws-glue data-catalog data-modeling data-models entity-relationship-diagram er-diagram glue hive nosql schema-design

Last synced: 30 Apr 2025

https://github.com/ncar/esmcol-validator

A utility for validating esm-collection json files against the esm-collection-spec: https://github.com/NCAR/esm-collection-spec

data-catalog intake-esm xdev

Last synced: 08 Mar 2025