Projects in Awesome Lists tagged with data-catalog
A curated list of projects in awesome lists tagged with data-catalog .
https://github.com/datahub-project/datahub
The Metadata Platform for your Data and AI Stack
data-catalog data-discovery data-governance datahub metadata
Last synced: 30 Apr 2025
https://github.com/linkedin/WhereHows
The Metadata Platform for your Data Stack
data-catalog data-discovery datahub hacktoberfest linkedin metadata
Last synced: 16 Feb 2025
https://github.com/linkedin/datahub
The Metadata Platform for your Data Stack
data-catalog data-discovery datahub hacktoberfest linkedin metadata
Last synced: 21 Jan 2025
https://github.com/open-metadata/openmetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake
Last synced: 30 Apr 2025
https://github.com/open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake
Last synced: 15 Mar 2025
https://github.com/amundsen-io/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
amundsen data-catalog data-discovery linuxfoundation metadata
Last synced: 22 Apr 2025
https://github.com/apache/gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog opendatacatalog skycomputing stratosphere
Last synced: 28 Apr 2025
https://github.com/opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
alerting bigdata data-catalog data-discovery data-engineering data-exploration data-governance data-lineage data-observability data-pipelines data-platform data-profiling data-quality data-science datacatalog lineage metadata metadata-management observability oss
Last synced: 12 Apr 2025
https://github.com/apache/Gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog opendatacatalog skycomputing stratosphere
Last synced: 22 Jan 2025
https://github.com/intake/intake
Intake is a lightweight package for finding, investigating, loading and disseminating data.
data-access data-catalog python
Last synced: 01 Apr 2025
https://github.com/hyperqueryhq/whale
🐳 The stupidly simple CLI workspace for your data warehouse.
data-catalog data-discovery data-documentation
Last synced: 10 Feb 2025
https://github.com/rsyi/whale
🐳 The stupidly simple CLI workspace for your data warehouse.
data-catalog data-discovery data-documentation
Last synced: 22 Nov 2024
https://github.com/gabledata/recap
Work with your web service, database, and streaming schemas in a single format.
data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap
Last synced: 27 Apr 2025
https://github.com/recap-build/recap
Work with your web service, database, and streaming schemas in a single format.
data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap
Last synced: 13 Dec 2024
https://github.com/tokern/piicatcher
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
aws-athena aws-glue aws-redshift catalog data data-catalog database phi pii python snowflake
Last synced: 12 Apr 2025
https://github.com/raystack/meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
bigdata collector data-catalog data-management dataops extractors metadata scraper sinks
Last synced: 09 Apr 2025
https://github.com/googlecloudplatform/bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
bigdata bigquery data-catalog data-governance data-lineage data-management dataflow zetasql
Last synced: 05 Feb 2025
https://github.com/intake/intake-esm
An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
cesm-lens climate-datasets cmip6 data-access data-catalog earth-system-model hacktoberfest intake pangeo
Last synced: 27 Nov 2024
https://github.com/aws-samples/aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
amazon-emr data-analytics data-catalog data-lake data-transformation emr-cluster glue hive-metastore ingest-data
Last synced: 13 Nov 2024
https://github.com/googlecloudplatform/datacatalog-connectors-rdbms
Sample code with integration between Data Catalog and RDBMS data sources.
data-catalog database-management datacatalog datacatalog-connectors-rdbms gcp greenplum metadata-extraction metadata-management mysql oracle postgresql python rdbms redshift sqlserver teradata vertica
Last synced: 22 Jan 2025
https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms
Sample code with integration between Data Catalog and RDBMS data sources.
data-catalog database-management datacatalog datacatalog-connectors-rdbms gcp greenplum metadata-extraction metadata-management mysql oracle postgresql python rdbms redshift sqlserver teradata vertica
Last synced: 04 Dec 2024
https://github.com/google/grizzly
End-to-end DataOps platform deployed by Terraform.
airflow bigquery cloud-sql cloud-storage composer data-catalog data-lineage data-loss-prevention dataflow dataops dataops-platform gcp git google-cloud google-cloud-platform pubsub spanner terraform
Last synced: 27 Apr 2025
https://github.com/Bayer-Group/COLID-Documentation
The documentation repository is part of the Corporate Linked Data Catalog - short: COLID - application.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 18 Jan 2025
https://github.com/bayer-group/colid-documentation
The documentation repository is part of the Corporate Linked Data Catalog - short: COLID - application.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/getstrm/pace
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake
Last synced: 10 Apr 2025
https://github.com/googlecloudplatform/datacatalog-connectors-bi
Sample code with integration between Data Catalog and BI data sources.
data-catalog datacatalog gcp looker looker-sdk metadata metadata-management python qlik qlik-sense tableau
Last synced: 22 Jan 2025
https://github.com/tosh2230/stairlight
A data lineage tool detects table dependencies from rendered SQL statements.
bigquery data-catalog data-discovery data-engineering data-governance data-lineage data-management data-ops dbt gcs lineage redash s3 sql
Last synced: 19 Nov 2024
https://github.com/carte-data/carte
A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable front end that's just HTML.
carte data-catalog data-discovery data-documentation lightweight-data-catalogs python-library
Last synced: 04 Dec 2024
https://github.com/odpi/egeria-docs
Documentation repository for the Egeria project.
data-catalog data-catalogue data-engineering documentation egeria metadata metadata-api metadata-editor metadata-extraction metadata-management metadata-parser open-metadata open-source
Last synced: 23 Feb 2025
https://github.com/aaronspring/remote_climate_data
a collection of remote climate data accessed via intake cached to disk
accessibility climate-data climate-science data-catalog netcdf observations opendap remote shapefiles thredds-catalogs
Last synced: 24 Apr 2025
https://github.com/related-sciences/articat
articat: data artifact catalog
data-catalog data-discovery data-management data-platform
Last synced: 15 Nov 2024
https://github.com/darenasc/aeda
Build a data catalog by running a single line of code
data-catalog data-exploration database eda metadata metadata-extraction
Last synced: 16 Mar 2025
https://github.com/bayer-group/colid-setup
The setup repository is part of the Corporate Linked Data Catalog - short: COLID - application. It helps setting up a local environment based on Docker Compose.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/open-metadata/openmetadata-site
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
automation bigdata bigdataanalytics data-catalog data-discovery data-observability data-profiling data-quality-monitoring data-science datadiscovery dataengineering dataquality datascience dbt governance hacktoberfest hacktoberfest2022 metadata metadata-api metadata-management
Last synced: 14 Apr 2025
https://github.com/unytics/catalog_builder
Data Catalogs Made Easy
bigquery data-catalog data-discovery databricks dbt redshift snowflake
Last synced: 12 Apr 2025
https://github.com/googlecloudplatform/datacatalog-tag-history
Historical metadata of your data warehouse is a treasure trove to discover not just insights about changing data patterns, but also quality and user behaviour. This solution creates Data Catalog Tags history in BigQuery since Data Catalog keeps only the latest version of metadata for fast searchability.
analytics bigquery data-catalog data-governance metadata-management
Last synced: 22 Jan 2025
https://github.com/cdcgov/cdh-lava-react
CDC Data Hub Lifecycle, Analysis & Visualization Accelerator (LAVA) REACT Components based on machine readable requirements.
agile-development azure data-analysis data-catalog data-governance data-quality data-science data-visualization databricks datavisualization devops excel-export metadata operations powerautomate powerbi pyspark security sql test-automation
Last synced: 22 Apr 2025
https://github.com/apache/airavata-data-catalog
Apache Airavata Data Catalog
airavata apache data-catalog metadata schema search
Last synced: 28 Feb 2025
https://github.com/slaclab/datacat
A system for managing files and file replicas across many diverse sites
data-catalog data-discovery datacat dataset dataset-catalog metadata metadata-store
Last synced: 14 Apr 2025
https://github.com/opendatadiscovery/odd-collectors
data-catalog data-governance data-observability
Last synced: 12 Apr 2025
https://github.com/bayer-group/colid-data-marketplace-frontend
The Data Marketplace frontend repository is part of the Corporate Linked Data Catalog - short: COLID - application. Users can search for registered resources in COLID. It provides a search bar, aggregation filters and search result displaying including term highlighting.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/zillow/intake-nested-yaml-catalog
Supports a single YAML file hierarchical catalog to organize datasets and avoid a data swamp.
data-access data-catalog intake python
Last synced: 23 Apr 2025
https://github.com/bayer-group/colid-appdata-service
The appdata service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It maintains the user data and application settings.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/sahays/serverless-analytics
AWS Serverless Analytics using Amazon S3, Athena, Glue, and QuickSight
athena aws-cli data-catalog dataset glue quicksight transform-data visualization
Last synced: 04 Dec 2024
https://github.com/bayer-group/colid-reporting-service
The reporting service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It offers an API for statistics of registered resources.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/bayer-group/colid-editor-frontend
The editor frontend repository is part of the Corporate Linked Data Catalog - short: COLID - application. It offers user an metadata based user interface to register resources in COLID.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/bayer-group/colid-search-service
The search service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It makes the data findable and provides indexing and search functionalities based on Elasticsearch.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/bayer-group/colid-indexing-crawler-service
The Indexing Crawler Service (ICS) repository is part of the Corporate Linked Data Catalog - short: COLID - application. It is responsible to extract data from a RDF storage system, transform and enrich the data and finally to send it via a message queue to the DMP Webservice for indexing.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/bayer-group/colid-registration-service
The registration service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It is the central microservice to register resources in the triplestore.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/bayer-group/colid-scheduler-service
The scheduler service repository is part of the Corporate Linked Data Catalog - short: COLID - application. It sets up recurring jobs for user notifications and analytics.
cloud-native colid data-catalog data-catalogue elasticsearch fair fair-data findable linked-data rdf shacl triplestore
Last synced: 12 Apr 2025
https://github.com/blw-ofag-ufag/data-catalog
A MVP data catalog for the Federal Office for Agriculture (FOAG)
agriculture data-catalog dcat json-schema
Last synced: 23 Mar 2025
https://github.com/open-metadata/openmetadata-sdk
OpenMetadata client SDK. Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
data-catalog datacatalog metadata
Last synced: 11 Mar 2025
https://github.com/csiro-enviro-informatics/hbee
An Approximate Deep Spatial Catalog and Search
data-catalog h3 postgis spatial-database spatial-index
Last synced: 13 Mar 2025
https://github.com/lasyakonduru/superstore-sales-data-analysis
Analysis of sales performance and operational efficiency in a superstore using AWS Athena and QuickSight
amazon-web-services athena aws-glue aws-quicksight aws-s3 data-catalog iam-user
Last synced: 22 Mar 2025
https://github.com/alankrantas/loc-documentation
LOC Documentation (archived) for FST Network's Logic Operating Centre (LOC)
c-sharp cloud data-catalog data-integration data-mesh data-pipeline documentation docusaurus fst-network javascript reactjs rollupjs saas serverless startup typescript
Last synced: 13 Mar 2025
https://github.com/aminekaabachi/lexy
📙 Lexy enables you to easily build and share data dictionaries to explain and document your data terminology using code.
data-catalog data-dictionaries data-dictionary documentation pandas pyspark
Last synced: 27 Mar 2025
https://github.com/himanshub16/lekhpal
Monitor and catalog Twitter feed matching your desired keywords
analytics data data-catalog data-filtering mongodb twitter twitter-streaming-api
Last synced: 21 Feb 2025
https://github.com/ev2900/datazone_demo
Prebuilt demo of Amazon DataZone using fake data for Pharmaceutical drug discovery
aws bussiness-data-catalog data-catalog datazone
Last synced: 10 Apr 2025
https://github.com/hackolade/glue
Hackolade(https://hackolade.com) plugin for AWS Glue Data Catalog
aws-glue data-catalog data-modeling data-models entity-relationship-diagram er-diagram glue hive nosql schema-design
Last synced: 30 Apr 2025
https://github.com/ncar/esmcol-validator
A utility for validating esm-collection json files against the esm-collection-spec: https://github.com/NCAR/esm-collection-spec
Last synced: 08 Mar 2025