Projects in Awesome Lists tagged with data-platform
A curated list of projects in awesome lists tagged with data-platform .
https://github.com/memodb-io/acontext
Agent Skills as a Memory Layer
agent agent-development-kit agent-observability ai-agent anthropic context-data-platform context-engineering data-platform llm llm-observability llmops memory openai self-evolving self-learning
Last synced: 01 Apr 2026
https://github.com/opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
alerting bigdata data-catalog data-discovery data-engineering data-exploration data-governance data-lineage data-observability data-pipelines data-platform data-profiling data-quality data-science datacatalog lineage metadata metadata-management observability oss
Last synced: 02 Apr 2026
https://github.com/bruin-data/bruin
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
analytics bigquery data-analysis data-ingestion data-modeling data-pipelines data-platform data-transformation python snowflake sql
Last synced: 02 Apr 2026
https://github.com/stitchfix/hamilton
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix
Last synced: 29 Sep 2025
https://github.com/flowerfine/scaleph
Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.
dag data-platform dataops doris doris-manager doris-operator flink flink-kubernetes flink-kubernetes-operator flink-sql flink-sql-gateway seatunnel
Last synced: 19 Oct 2025
https://github.com/silverton-io/buz
Serverless multi-protocol + multi-destination event collection system.
analytics analytics-tracking cloudevents cloudevents-schema contracts data data-collection data-platform eventbridge jsonschema product-analytics redpanda redpanda-console schema-registry schema-validation snowplow-analytics streaming-analytics streaming-data webhook-receiver webhook-server
Last synced: 12 Apr 2025
https://github.com/src-d/sourced-ce
source{d} Community Edition (CE)
charts dashboards data-analysis data-mining data-platform data-visualization git github metrics sql
Last synced: 11 Jan 2026
https://github.com/azure/data-management-zone
Template to deploy the Data Management Zone of Cloud Scale Analytics (former Enterprise-Scale Analytics). The Data Management Zone provides data governance and management capabilities for the data platform of an organization.
architecture arm azure bicep data-fabric data-mesh data-platform datamesh enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 09 Sep 2025
https://github.com/Azure/data-management-zone
Template to deploy the Data Management Zone of Cloud Scale Analytics (former Enterprise-Scale Analytics). The Data Management Zone provides data governance and management capabilities for the data platform of an organization.
architecture arm azure bicep data-fabric data-mesh data-platform datamesh enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 05 May 2025
https://github.com/azure/data-landing-zone
Template to deploy a single Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Landing Zone is a logical construct and a unit of scale in the architecture that enables data retention and execution of data workloads for generating insights and value with data.
architecture arm azure bicep data-fabric data-mesh data-platform datamesh enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 09 Apr 2025
https://github.com/Azure/data-landing-zone
Template to deploy a single Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Landing Zone is a logical construct and a unit of scale in the architecture that enables data retention and execution of data workloads for generating insights and value with data.
architecture arm azure bicep data-fabric data-mesh data-platform datamesh enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 26 Apr 2025
https://github.com/opendatadiscovery/opendatadiscovery-specification
ODD Specification is a universal open standard for collecting metadata.
api big-data big-data-platform data-discovery data-engineering data-governance data-mesh data-platform metadata metadata-management metadata-parser open-source opensource spec specification
Last synced: 25 Jan 2026
https://github.com/opensource-observer/oso
Open source AI-driven data platform
ai data-platform data-visualization impact-analysis open-source public-goods
Last synced: 10 Oct 2025
https://github.com/anna-geller/prefect-dataplatform
Example repository showing how to build a data platform with Prefect, dbt and Snowflake
analytics analytics-engineering automation data-engineering data-platform data-warehousing dataflow dataflow-ops dbt orchestration prefect python snowflake sql
Last synced: 25 Oct 2025
https://github.com/taogeyt/pyetl
python ETL framework
csv data-analytics data-pipeline data-platform db es etl etl-process excel export hive mysql oracle python sql sqlserver
Last synced: 30 Oct 2025
https://github.com/blueapron/kafka-connect-protobuf-converter
Protobuf converter plugin for Kafka Connect
data data-platform jar kafka kafka-connect protobuf protobuf-converter protocol-buffers
Last synced: 03 May 2025
https://github.com/ssimunic/Temp-Monitor
Internet of Things data platform for temperature and humidity sensors with maps
data-platform humidity internet-of-things iot iot-platform temperature
Last synced: 13 Jul 2025
https://github.com/Leading-AI-IO/palantir-ontology-strategy
A comprehensive guide to Palantir Foundry's Ontology strategy. / 世界最強のデータプラットフォーム「パランティア」の中核概念である『オントロジー』の戦略と実装を解き明かすOSS書籍プロジェクト。
book data-integration data-platform data-strategy enterprise-ai foundry governance ontology open-source palantir palantir-foundry
Last synced: 08 Mar 2026
https://github.com/keboola/mcp-server
Model Context Protocol (MCP) Server for the Keboola Platform
data-platform etl-pipeline mcp mcp-server model-context-protocol
Last synced: 03 Mar 2026
https://github.com/josephmachado/online_store
End to end data engineering project
dagster data-analysis data-engineering data-pipeline data-platform data-processing datawarehouse postgresql python3
Last synced: 05 Jul 2025
https://github.com/Azure/data-product-batch
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
architecture arm azure bicep data-fabric data-integration data-mesh data-platform data-product enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 05 May 2025
https://github.com/azure/data-product-batch
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
architecture arm azure bicep data-fabric data-integration data-mesh data-platform data-product enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 23 Jul 2025
https://github.com/evoluteur/kaggle-look-alike
Kaggle Data Explorer UI look-alike built in React.
data data-analysis data-engineering data-exploration data-mining data-platform data-science datascience exploratory-data-analysis explorer front-end frontend kaggle react spa
Last synced: 09 Apr 2025
https://github.com/davidgasquez/filecoin-data-portal
🧮 Open, serverless, and local friendly Data Platform for the Filecoin Ecosystem
data-analysis data-platform filecoin
Last synced: 02 Apr 2026
https://github.com/rpj/rpi
RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.
data-pipeline data-platform data-processing data-stream garden-bots python raspberry-pi redis rpi sensor sensors
Last synced: 12 Apr 2025
https://github.com/related-sciences/articat
articat: data artifact catalog
data-catalog data-discovery data-management data-platform
Last synced: 08 May 2025
https://github.com/profcomff/dwh-pipelines
Графы работы с данными в Airflow
Last synced: 15 Apr 2025
https://github.com/guinsoolab/darkseal
A Single place to Discover, Collaborate, and Get your data right
catalog data-alerting data-catalog data-collaboration data-compliance data-discovery data-documentation data-glossaries data-governance data-lineage data-notification data-platform data-security data-structures data-trust guinsoolab metadata metadata-management metadata-standard
Last synced: 27 Jan 2026
https://github.com/feluelle/kind-data-platform
A kind data platform on your local machine. 🤗
data-platform docker helm kind terraform
Last synced: 28 Oct 2025
https://github.com/finbourne/lusid-sdk-python
Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.
bi-temporal data-platform finbourne fintech lusid openapi python
Last synced: 09 Apr 2025
https://github.com/canonical/postgresql-operator
A Charmed Operator for running PostgreSQL on machines
charm data-platform postgresql python
Last synced: 16 Jan 2026
https://github.com/perfectthymetech/cloudscaleanalytics-v2-terraform
Cloud Scale Analytics (v2) to create a scalable data platform on Azure using a Data Management Zone, Data Landing Zones and Data Applications to build Data Products.
architecture azure cloud-scale-analytics cloudscaleanalytics data-platform datamesh enterprise-architecture enterprise-scale enterprise-scale-analytics terraform
Last synced: 14 Apr 2025
https://github.com/matttriano/analytics_data_where_house
An analytics engineering sandbox focusing on real estates prices in Cook County, IL
airflow data-catalog data-discovery data-engineering data-pipelines data-platform data-warehousing dbt docker elt mkdocs-material open-source python superset
Last synced: 18 Jan 2026
https://github.com/canonical/data-platform-workflows
Reusable GitHub Actions workflows used by the Data Platform team
Last synced: 16 Feb 2026
https://github.com/aabouzaid/modern-data-platform-poc
My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).
big-data cloud-agnostic cloud-native data-engineering data-lakehouse data-platform dataops edinburgh-napier kubernetes msc msc-project
Last synced: 27 Jun 2025
https://github.com/afranzi/mini-data-platform
Mini Data Platform
airflow data-platform kubernetes minikube terraform
Last synced: 13 Apr 2025
https://github.com/finbourne/lusid-sdk-java
Java SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.
bi-temporal data-platform finbourne fintech java lusid openapi
Last synced: 02 Aug 2025
https://github.com/perfectthymetech/terraform-azurerm-data-landing-zone
Cloud Scale Analytics - Data Landing Zone Terraform Module
architecture azure cloud-scale-analytics cloudscaleanalytics data-management data-platform datamesh enterprise-architecture enterprise-scale enterprise-scale-analytics terraform terraform-module
Last synced: 14 Apr 2025
https://github.com/profcomff/dwh-definitions
Data structures and migrations library
Last synced: 24 Dec 2025
https://github.com/huwngnosleep/complete_lakehouse_techstack
This project implements an end-to-end techstack for a data platform, for local development.
bigdata data-lakehouse data-platform data-warehouse etl hadoop kafka lambda-architecture spark
Last synced: 24 Jan 2026
https://github.com/canonical/postgresql-ldap-sync
Package to sync LDAP users with PG
Last synced: 14 Jan 2026
https://github.com/canonical/pgbouncer-operator
A charmed operator for running PgBouncer on virtual machines.
Last synced: 22 Apr 2025
https://github.com/zncdatadev/kubedoop
The modular open source data platform using kubernetes and cloud-native ecosystem which is the base for DataOps/MLOps(LLMOps)
bigdata cloud-native data-platform dataops hadoop kubernetes llmops mlops
Last synced: 06 Jul 2025
https://github.com/socialfinancedigitallabs/liia-tools
Tools to be used for 903, annex_a, and CIN census
Last synced: 24 Aug 2025
https://github.com/perfectthymetech/terraform-azurerm-data-management-zone
Cloud Scale Analytics - Data Management Zone Terraform Module
architecture azure cloud-scale-analytics cloudscaleanalytics data-management data-platform datamesh enterprise-architecture enterprise-scale enterprise-scale-analytics terraform terraform-module
Last synced: 18 Aug 2025
https://github.com/kimtth/bicep-azure-data-platform-lac
🗄️ 👨🏾💻🏭Azure Data platform Infrastructure as Code (Datafactory, Databricks, Synapse Analytics, Purview)
bicep data-platform infrastructure-as-code
Last synced: 22 Jan 2026
https://github.com/canonical/postgresql-single-kernel-library
Library containing shared code for PostgreSQL operators (PostgreSQL, PgBouncer, VM and K8s)
charm data-platform postgresql
Last synced: 17 Nov 2025
https://github.com/ilssaf/data-platform-deployer
CLI tool for automatic data platform deployment
cdc clickhouse data-engineering data-infrastructure data-platform devops etl infrastructure-as-code kafka kafka-connect postgresql s3
Last synced: 06 Apr 2025
https://github.com/frocode/realtime_streaming_unstructured-data
Real-time streaming and processing of unstructured data (spark, airflow)
airflow cicd data-devops data-engineering data-platform iac-terraform iot-platform jenkins platform-deployment spark-streaming streaming-data unstructured-data
Last synced: 26 Jul 2025
https://github.com/zncdatadev/kubedoop-catalog
OLM catalog of the Kubedoop
bigdata data-platform k8s kubernetes olm
Last synced: 14 May 2025
https://github.com/canonical/mysql-shell-client
Package to interact with MySQL Shell
Last synced: 13 Jan 2026
https://github.com/vnvo/deltaforge
A modular Change Data Capture (CDC) micro-framework built in Rust. Stream database changes to Kafka, Redis and etc.
cdc change-data-capture data-engineering data-platform etl event-sourcing kafka mysql postgresql redis schema-registry turso-db
Last synced: 12 Mar 2026
https://github.com/flowsynx/plugin-csv
FlowSynx plugin to reads and writes CSV files, enabling easy batch data import/export operations and integration with spreadsheet-based data workflows.
comma-separated-values csv data data-platform flowsynx
Last synced: 10 Mar 2026
https://github.com/flowsynx/plugin-json
FlowSynx plugin to loads and parses local JSON files. Supports transformation, extraction, and mapping of hierarchical data structures in workflows.
data data-platform flowsynx json
Last synced: 10 Mar 2026
https://github.com/finbourne/lusid-sdk-js
JavaScript SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.
bi-temporal data-platform finbourne fintech javascript lusid openapi
Last synced: 15 Jul 2025
https://github.com/yandex-cloud-examples/yc-data-platform-solutions
Каталог решений Data Platform в Yandex Cloud.
data-platform solutions yandex-cloud yandexcloud
Last synced: 05 Feb 2026
https://github.com/yandex-cloud-examples/yc-courses-ru-corpplatform
Материалы для курса «Построение корпоративной аналитической платформы».
clickhouse course data-platform datalens debezium kafka kafka-connector yandex-cloud yandex-practicum yandex-praktikum yandexcloud
Last synced: 18 Jan 2026
https://github.com/scribd/terraform-oxbow
This repository contains oxbow terraform module
data-platform managed-by-terraform terraform-oxbow
Last synced: 31 Jan 2026
https://github.com/flowsynx/plugin-base64
FlowSynx plugin to provides encoding and decoding of Base64 strings, allowing workflows to handle Base64 content transformations efficiently.
base64 base64-decoding base64-encoding data data-platform decoding encoding flowsynx flowsynx-plugins
Last synced: 10 Mar 2026
https://github.com/bablukumarjha/startup-funding-revenue-analysis-by-sql-and-pandas
SQL project analyzing startup funding, revenue, and founder data to extract business insights using Python and MySQL.
data data-analysis data-platform data-science dataanalysisusingpython dataanalytics pandas-dataframe pandas-library python sql sql-server sqlalchemy sqldatabase
Last synced: 02 Sep 2025
https://github.com/epappas/dataflix
A decentralized and transparent data sharing ecosystem
airflow data-platform data-science data-sharing hardhat protocol python solidity typescript web3
Last synced: 07 Apr 2025
https://github.com/dxtaner/datafromimdb_python
Data From IMDB
data-platform imdb-api python3
Last synced: 24 Jul 2025
https://github.com/pureinsights/discovery-sandbox
Discovery Sandbox SDK
ai ai-search ai-tools amazon-bedrock apache-solr data-platform elasticsearch huggingface hybrid-search llm llm-integration mongodb-atlas openai opensearch rag retrieval-augmented-generation search semantic-search vector-search
Last synced: 23 Jan 2026
https://github.com/cloudformations/training
Cloud Formations live training session content, available in person or online from industry leading experts on the latest Microsoft technologies.
data-analytics data-engineering data-platform microsoft-azure microsoft-fabric training
Last synced: 05 Apr 2025
https://github.com/caprogs/paris-events-analyzer
A project to analyze events in Paris using open source data provided by the city.
data data-analysis data-platform dbt docker ingestion python streamlit transformation vizualisation
Last synced: 25 Jun 2025
https://github.com/irwandifo/gcp-batch-infra
GCP Infrastructure for Batch Processing
data-lakehouse data-platform gcp terraform
Last synced: 30 Oct 2025
https://github.com/ministryofjustice/data-platform-github-actions
Data Platform GitHub Actions
data-platform ministryofjustice
Last synced: 29 Jan 2026
https://github.com/pavedroad-io/eventbridge
Ingest data from all major cloud platforms via events, API, or polling interfaces. Then filter, transform, and process generating workflows or trigger action on other clouds and frameworks.
argo-events argo-workflows data-platform data-processing event-emitter event-management event-sourcing go golang kubernetes
Last synced: 17 Jan 2026
https://github.com/tomblancdev/ratatouille
🐀 Self-hostable data platform - Iceberg lakehouse + ClickHouse + MinIO. Anyone can data!
clickhouse dagster data-engineering data-platform docker iceberg lakehouse minio python self-hosted
Last synced: 12 Feb 2026
https://github.com/arverma/config-manager
Config Manager is a powerful, open source platform for managing and versioning configuration data at scale. It features a Postgres-backed registry with immutable versioning, a modern web UI for browsing and editing configs with audit history.
config config-manager contribute contributions-welcome data-engineer data-platform dataengineering platform
Last synced: 14 Feb 2026
https://github.com/irwandifo/gcp-batch-pipeline
GCP Batch Data Pipeline
batch-processing data-pipeline data-platform kestra
Last synced: 28 Mar 2025
https://github.com/ingenii-solutions/azure-data-platform-databricks-runtime
Python package and custom runtime to use in Azure Databricks as part of Ingenii's Data Platform
Last synced: 21 Jan 2026