An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-discovery

A curated list of projects in awesome lists tagged with data-discovery .

https://github.com/datahub-project/datahub

The Metadata Platform for your Data and AI Stack

data-catalog data-discovery data-governance datahub metadata

Last synced: 08 May 2026

https://github.com/open-metadata/openmetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake

Last synced: 22 Feb 2026

https://github.com/open-metadata/OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake

Last synced: 15 Mar 2025

https://github.com/amundsen-io/amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

amundsen data-catalog data-discovery linuxfoundation metadata

Last synced: 13 May 2025

https://github.com/reata/sqllineage

SQL Lineage Analysis Tool powered by Python

data-discovery data-governance data-lineage lineage metadata sql

Last synced: 14 May 2025

https://github.com/nasa/earthdata-search

Earthdata Search is a web application developed by NASA EOSDIS to enable data discovery, search, comparison, visualization, and access across EOSDIS' Earth Science data holdings.

data-discovery earthdata-search eosdis hacktoberfest

Last synced: 26 Jan 2026

https://github.com/rsyi/whale

🐳 The stupidly simple CLI workspace for your data warehouse.

data-catalog data-discovery data-documentation

Last synced: 18 Feb 2026

https://github.com/marmotdata/marmot

Marmot helps teams discover, understand, and leverage their data with powerful search and lineage visualisation tools. It's designed to make data accessible for everyone.

bigdata data-catalog data-collaboration data-discovery data-exploration data-governance data-lineage data-observability datacatalog datadiscovery dataengineering lineage mcp mcp-server metadata

Last synced: 09 Apr 2026

https://github.com/swhl/ai-competition-collections

AI比赛经验帖子 & 训练和测试技巧帖子 集锦(收集整理各种人工智能比赛经验帖)

competition cv data-discovery graph-neural-networks knowledge-graph nlp recommender-system speech

Last synced: 16 May 2025

https://github.com/gabledata/recap

Work with your web service, database, and streaming schemas in a single format.

data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap

Last synced: 11 Mar 2026

https://github.com/mpostol/opc-ua-ooi

Object Oriented Internet - C# deliverables supporting a new Machine To Machine (M2M) communication architecture

c-sharp communication data-discovery data-oriented-architecture ebook iiot internet iot m2m metadata networking ooi opc-ua opcua publish-subscribe semantic-data

Last synced: 23 Mar 2025

https://github.com/decisionbox-io/decisionbox-platform

DecisionBox connects to your data warehouse, runs autonomous AI agents that write and execute SQL, and surfaces validated insights and actionable recommendations — without you asking a single question.

ai ai-agents analytics aws bigquery data-discovery data-warehouse gcp golang kubernetes llm nextjs open-source redshift

Last synced: 15 Apr 2026

https://github.com/commondataio/dataportals-registry

Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard

data-catalog data-discovery data-portal data-repository dataset datasets open-data opendata registry

Last synced: 03 Mar 2026

https://github.com/ondata/ckan-mcp-server

MCP server for querying CKAN open data portals (package search, DataStore SQL, organizations, groups, tags)

ai-tools api-client civic-tech ckan ckan-api claude cloudflare-workers data-discovery data-portal datastore government-data mcp model-context-protocol nodejs open-data public-data solr typescript

Last synced: 14 Apr 2026

https://github.com/tsegall/fta

Metadata/data identification Java library. Identifies Semantic Type information (e.g. Gender, Age, Color, Country,...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support.

data-discovery data-profiler data-profiling date java metadata semantic-type-detection semantic-typechecking semantic-types

Last synced: 11 May 2026

https://github.com/carte-data/carte

A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable front end that's just HTML.

carte data-catalog data-discovery data-documentation lightweight-data-catalogs python-library

Last synced: 29 Jul 2025

https://github.com/tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

bigquery data-catalog data-discovery data-engineering data-governance data-lineage data-management data-ops dbt gcs lineage redash s3 sql

Last synced: 16 May 2025

https://github.com/karrlab/datanator

Toolkit for discovering and aggregating data for whole-cell modeling

cells data-aggregation data-discovery data-integration mathematical-modeling systems-biology

Last synced: 02 Sep 2025

https://github.com/worldbank/wb-nlp-apps

This repository contains the NLP modeling components and web application implementations of a project for knowledge and data discovery funded by the Knowledge for Change Program (KCP) and the Joint Data Center on Forced Displacement (JDC).

data-discovery lda machine-learning nlp python topic-modeling word2vec

Last synced: 02 Sep 2025

https://github.com/slaclab/datacat

A system for managing files and file replicas across many diverse sites

data-catalog data-discovery datacat dataset dataset-catalog metadata metadata-store

Last synced: 14 Apr 2025

https://github.com/michalporeba/odis

Search in decentralised systems. Search federation, result moderation, aggregation and feedback with hypermedia in ReSTful API to round it all of.

data data-discovery discoverability federated information-discovery mesh-networks search

Last synced: 18 Jan 2026

https://github.com/raoumer/dwx

Deep Web Extractor (DWX): Deep Web Extractor system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.

data-discovery data-science data-visualization machine-learning python

Last synced: 09 Mar 2026

https://github.com/andykee/aurora

A lightweight tool for indexing, cataloging, and browsing data.

catalog data data-catalog data-discovery indexing metadata metadata-extraction search-and-discovery

Last synced: 17 Jan 2026

https://github.com/kornev/kornev-gin-app

A JSON-RPC 2.0 operation based API for Hive Metastore which used PostgreSQL, Hive JDBC and HDFS.

big-data clojure data-catalog data-discovery hdfs hive metadata

Last synced: 12 Jan 2026

https://github.com/hugozanini/open-metadata-cursor-extension

AI-powered data discovery extension for Cursor & VS Code with natural language search and interactive lineage visualization

ai cursor cursor-extension data-discovery data-lineage gemini open-metadata vscode-extension

Last synced: 18 May 2026

https://github.com/ywatanabe1989/scitex-dataset

Multi-domain scientific dataset fetcher — neuroscience, biology, pharmacology, medical. Part of SciTeX.

ai-research bids dandi data-discovery datasets eeg mcp mcp-server metadata mri neuroimaging neuroscience nwb openneuro physionet python research-automation scientific-data scitex zenodo

Last synced: 30 Apr 2026

https://github.com/tjas/postgrad-ai-ddv-plotly

Jupyter Notebook to analyze the salaries of Federal District government public servants, using Python, Pandas and Plotly Express, to solve the proposed exercise in "Data Discovery and Visualization" discipline.

analysis analytics data data-analytics data-discovery data-science data-visualization graph graphs jupyter-notebook jupyter-notebooks pandas plotly plotly-express python

Last synced: 07 May 2026