Projects in Awesome Lists tagged with data-discovery
A curated list of projects in awesome lists tagged with data-discovery .
https://github.com/eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
applied-data-science applied-machine-learning computer-vision data-discovery data-engineering data-quality data-science deep-learning machine-learning natural-language-processing production recsys reinforcement-learning search
Last synced: 17 Mar 2025
https://github.com/datahub-project/datahub
The Metadata Platform for your Data and AI Stack
data-catalog data-discovery data-governance datahub metadata
Last synced: 08 May 2026
https://github.com/open-metadata/openmetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake
Last synced: 22 Feb 2026
https://github.com/open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake
Last synced: 15 Mar 2025
https://github.com/amundsen-io/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
amundsen data-catalog data-discovery linuxfoundation metadata
Last synced: 13 May 2025
https://github.com/marquezproject/marquez
Collect, aggregate, and visualize a data ecosystem's metadata
data-dictionary data-discovery data-ecosystem-metadata data-governance data-lineage data-ops data-provenance marquez metadata metadata-service
Last synced: 13 May 2025
https://marquezproject.github.io/marquez/
Collect, aggregate, and visualize a data ecosystem's metadata
data-dictionary data-discovery data-ecosystem-metadata data-governance data-lineage data-ops data-provenance marquez metadata metadata-service
Last synced: 05 May 2025
https://github.com/MarquezProject/marquez
Collect, aggregate, and visualize a data ecosystem's metadata
data-dictionary data-discovery data-ecosystem-metadata data-governance data-lineage data-ops data-provenance marquez metadata metadata-service
Last synced: 27 Mar 2025
https://github.com/reata/sqllineage
SQL Lineage Analysis Tool powered by Python
data-discovery data-governance data-lineage lineage metadata sql
Last synced: 14 May 2025
https://github.com/opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
alerting bigdata data-catalog data-discovery data-engineering data-exploration data-governance data-lineage data-observability data-pipelines data-platform data-profiling data-quality data-science datacatalog lineage metadata metadata-management observability oss
Last synced: 02 Apr 2026
https://github.com/nasa/earthdata-search
Earthdata Search is a web application developed by NASA EOSDIS to enable data discovery, search, comparison, visualization, and access across EOSDIS' Earth Science data holdings.
data-discovery earthdata-search eosdis hacktoberfest
Last synced: 26 Jan 2026
https://github.com/rsyi/whale
🐳 The stupidly simple CLI workspace for your data warehouse.
data-catalog data-discovery data-documentation
Last synced: 18 Feb 2026
https://github.com/marmotdata/marmot
Marmot helps teams discover, understand, and leverage their data with powerful search and lineage visualisation tools. It's designed to make data accessible for everyone.
bigdata data-catalog data-collaboration data-discovery data-exploration data-governance data-lineage data-observability datacatalog datadiscovery dataengineering lineage mcp mcp-server metadata
Last synced: 09 Apr 2026
https://github.com/swhl/ai-competition-collections
AI比赛经验帖子 & 训练和测试技巧帖子 集锦(收集整理各种人工智能比赛经验帖)
competition cv data-discovery graph-neural-networks knowledge-graph nlp recommender-system speech
Last synced: 16 May 2025
https://github.com/gabledata/recap
Work with your web service, database, and streaming schemas in a single format.
data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap
Last synced: 11 Mar 2026
https://github.com/mpostol/opc-ua-ooi
Object Oriented Internet - C# deliverables supporting a new Machine To Machine (M2M) communication architecture
c-sharp communication data-discovery data-oriented-architecture ebook iiot internet iot m2m metadata networking ooi opc-ua opcua publish-subscribe semantic-data
Last synced: 23 Mar 2025
https://github.com/opendatadiscovery/opendatadiscovery-specification
ODD Specification is a universal open standard for collecting metadata.
api big-data big-data-platform data-discovery data-engineering data-governance data-mesh data-platform metadata metadata-management metadata-parser open-source opensource spec specification
Last synced: 25 Jan 2026
https://github.com/decisionbox-io/decisionbox-platform
DecisionBox connects to your data warehouse, runs autonomous AI agents that write and execute SQL, and surfaces validated insights and actionable recommendations — without you asking a single question.
ai ai-agents analytics aws bigquery data-discovery data-warehouse gcp golang kubernetes llm nextjs open-source redshift
Last synced: 15 Apr 2026
https://github.com/commondataio/dataportals-registry
Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard
data-catalog data-discovery data-portal data-repository dataset datasets open-data opendata registry
Last synced: 03 Mar 2026
https://github.com/ondata/ckan-mcp-server
MCP server for querying CKAN open data portals (package search, DataStore SQL, organizations, groups, tags)
ai-tools api-client civic-tech ckan ckan-api claude cloudflare-workers data-discovery data-portal datastore government-data mcp model-context-protocol nodejs open-data public-data solr typescript
Last synced: 14 Apr 2026
https://github.com/tsegall/fta
Metadata/data identification Java library. Identifies Semantic Type information (e.g. Gender, Age, Color, Country,...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support.
data-discovery data-profiler data-profiling date java metadata semantic-type-detection semantic-typechecking semantic-types
Last synced: 11 May 2026
https://github.com/carte-data/carte
A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable front end that's just HTML.
carte data-catalog data-discovery data-documentation lightweight-data-catalogs python-library
Last synced: 29 Jul 2025
https://github.com/tosh2230/stairlight
A data lineage tool detects table dependencies from rendered SQL statements.
bigquery data-catalog data-discovery data-engineering data-governance data-lineage data-management data-ops dbt gcs lineage redash s3 sql
Last synced: 16 May 2025
https://github.com/related-sciences/articat
articat: data artifact catalog
data-catalog data-discovery data-management data-platform
Last synced: 08 May 2025
https://github.com/unytics/catalog_builder
Data Catalogs Made Easy
bigquery data-catalog data-discovery databricks dbt redshift snowflake
Last synced: 12 Apr 2025
https://github.com/open-metadata/openmetadata-site
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
automation bigdata bigdataanalytics data-catalog data-discovery data-observability data-profiling data-quality-monitoring data-science datadiscovery dataengineering dataquality datascience dbt governance hacktoberfest hacktoberfest2022 metadata metadata-api metadata-management
Last synced: 14 Apr 2025
https://github.com/karrlab/datanator
Toolkit for discovering and aggregating data for whole-cell modeling
cells data-aggregation data-discovery data-integration mathematical-modeling systems-biology
Last synced: 02 Sep 2025
https://github.com/guinsoolab/darkseal
A Single place to Discover, Collaborate, and Get your data right
catalog data-alerting data-catalog data-collaboration data-compliance data-discovery data-documentation data-glossaries data-governance data-lineage data-notification data-platform data-security data-structures data-trust guinsoolab metadata metadata-management metadata-standard
Last synced: 27 Jan 2026
https://github.com/worldbank/wb-nlp-apps
This repository contains the NLP modeling components and web application implementations of a project for knowledge and data discovery funded by the Knowledge for Change Program (KCP) and the Joint Data Center on Forced Displacement (JDC).
data-discovery lda machine-learning nlp python topic-modeling word2vec
Last synced: 02 Sep 2025
https://github.com/slaclab/datacat
A system for managing files and file replicas across many diverse sites
data-catalog data-discovery datacat dataset dataset-catalog metadata metadata-store
Last synced: 14 Apr 2025
https://github.com/matttriano/analytics_data_where_house
An analytics engineering sandbox focusing on real estates prices in Cook County, IL
airflow data-catalog data-discovery data-engineering data-pipelines data-platform data-warehousing dbt docker elt mkdocs-material open-source python superset
Last synced: 18 Jan 2026
https://github.com/michalporeba/odis
Search in decentralised systems. Search federation, result moderation, aggregation and feedback with hypermedia in ReSTful API to round it all of.
data data-discovery discoverability federated information-discovery mesh-networks search
Last synced: 18 Jan 2026
https://github.com/raoumer/dwx
Deep Web Extractor (DWX): Deep Web Extractor system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.
data-discovery data-science data-visualization machine-learning python
Last synced: 09 Mar 2026
https://github.com/andykee/aurora
A lightweight tool for indexing, cataloging, and browsing data.
catalog data data-catalog data-discovery indexing metadata metadata-extraction search-and-discovery
Last synced: 17 Jan 2026
https://github.com/kornev/kornev-gin-app
A JSON-RPC 2.0 operation based API for Hive Metastore which used PostgreSQL, Hive JDBC and HDFS.
big-data clojure data-catalog data-discovery hdfs hive metadata
Last synced: 12 Jan 2026
https://github.com/hugozanini/open-metadata-cursor-extension
AI-powered data discovery extension for Cursor & VS Code with natural language search and interactive lineage visualization
ai cursor cursor-extension data-discovery data-lineage gemini open-metadata vscode-extension
Last synced: 18 May 2026
https://github.com/mjanez/portaljs-starter-marmot
Docker-based deployment PortalJS Framework template for Marmot
data-catalog data-discovery data-distribution-platform data-governance data-lineage data-observability data-quality docker docker-compose marmot metadata nextjs portaljs starter-kit
Last synced: 28 Feb 2026
https://github.com/ywatanabe1989/scitex-dataset
Multi-domain scientific dataset fetcher — neuroscience, biology, pharmacology, medical. Part of SciTeX.
ai-research bids dandi data-discovery datasets eeg mcp mcp-server metadata mri neuroimaging neuroscience nwb openneuro physionet python research-automation scientific-data scitex zenodo
Last synced: 30 Apr 2026
https://github.com/tjas/postgrad-ai-ddv-plotly
Jupyter Notebook to analyze the salaries of Federal District government public servants, using Python, Pandas and Plotly Express, to solve the proposed exercise in "Data Discovery and Visualization" discipline.
analysis analytics data data-analytics data-discovery data-science data-visualization graph graphs jupyter-notebook jupyter-notebooks pandas plotly plotly-express python
Last synced: 07 May 2026