Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/raystack/meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
bigdata collector data-catalog data-management dataops extractors metadata scraper sinks
Last synced: 01 Jul 2024
https://github.com/intake/intake
Intake is a lightweight package for finding, investigating, loading and disseminating data.
data-access data-catalog python
Last synced: 29 Jun 2024
https://github.com/datahub-project/datahub
The Metadata Platform for your Data Stack
data-catalog data-discovery datahub hacktoberfest linkedin metadata
Last synced: 28 Jun 2024
https://github.com/opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
awesome awesome-list big-data data-catalog data-discovery data-engineering data-quality datacatalog datadiscovery dataops metadata metadata-management ml observability open-source opendata opensource oss
Last synced: 16 Jun 2024
https://github.com/datastrato/gravitino
World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog skycomputing stratosphere
Last synced: 07 Jun 2024
https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms
Sample code with integration between Data Catalog and RDBMS data sources.
data-catalog database-management datacatalog datacatalog-connectors-rdbms gcp greenplum metadata-extraction metadata-management mysql oracle postgresql python rdbms redshift sqlserver teradata vertica
Last synced: 03 Jun 2024
https://github.com/opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
alerting bigdata data-catalog data-discovery data-engineering data-exploration data-governance data-lineage data-observability data-pipelines data-platform data-profiling data-quality data-science datacatalog lineage metadata metadata-management observability oss
Last synced: 02 Jun 2024
https://github.com/sahays/serverless-analytics
AWS Serverless Analytics using Amazon S3, Athena, Glue, and QuickSight
athena aws-cli data-catalog dataset glue quicksight transform-data visualization
Last synced: 27 May 2024
https://github.com/carte-data/carte
A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable front end that's just HTML.
carte data-catalog data-discovery data-documentation lightweight-data-catalogs python-library
Last synced: 27 May 2024
https://github.com/intake/intake-esm
An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
cesm-lens climate-datasets cmip6 data-access data-catalog earth-system-model hacktoberfest intake pangeo
Last synced: 09 May 2024
https://github.com/open-metadata/OpenMetadata
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datacatalog datadiscovery dataengineering dataquality dbt metadata metadata-management snowflake
Last synced: 22 Apr 2024
https://github.com/related-sciences/articat
articat: data artifact catalog
data-catalog data-discovery data-management data-platform
Last synced: 16 Apr 2024
https://github.com/getstrm/pace
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery, with definitions imported from Collibra, Datahub, ODD and the like.
bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake
Last synced: 11 Apr 2024
https://github.com/google/grizzly
End-to-end DataOps platform deployed by Terraform.
airflow bigquery cloud-sql cloud-storage composer data-catalog data-lineage data-loss-prevention dataflow dataops dataops-platform gcp git google-cloud google-cloud-platform pubsub spanner terraform
Last synced: 01 Apr 2024
https://github.com/recap-build/recap
Work with your web service, database, and streaming schemas in a single format.
data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap
Last synced: 01 Apr 2024
https://github.com/rsyi/whale
🐳 The stupidly simple CLI workspace for your data warehouse.
data-catalog data-discovery data-documentation
Last synced: 31 Mar 2024
https://github.com/amundsen-io/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
amundsen data-catalog data-discovery linuxfoundation metadata
Last synced: 23 Mar 2024
https://github.com/aws-samples/aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
amazon-emr data-analytics data-catalog data-lake data-transformation emr-cluster glue hive-metastore ingest-data
Last synced: 21 Mar 2024
https://github.com/tokern/piicatcher
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
aws-athena aws-glue aws-redshift catalog data data-catalog database phi pii python snowflake
Last synced: 20 Mar 2024