An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-platform

A curated list of projects in awesome lists tagged with data-platform .

https://github.com/bruin-data/bruin

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

analytics bigquery data-analysis data-ingestion data-modeling data-pipelines data-platform data-transformation python snowflake sql

Last synced: 02 Apr 2026

https://github.com/stitchfix/hamilton

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix

Last synced: 29 Sep 2025

https://github.com/meltwater/served

A C++11 RESTful web server library

data-platform datasift

Last synced: 15 Mar 2025

https://github.com/flowerfine/scaleph

Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.

dag data-platform dataops doris doris-manager doris-operator flink flink-kubernetes flink-kubernetes-operator flink-sql flink-sql-gateway seatunnel

Last synced: 19 Oct 2025

https://github.com/azure/data-management-zone

Template to deploy the Data Management Zone of Cloud Scale Analytics (former Enterprise-Scale Analytics). The Data Management Zone provides data governance and management capabilities for the data platform of an organization.

architecture arm azure bicep data-fabric data-mesh data-platform datamesh enterprise-scale enterprise-scale-analytics policy-driven

Last synced: 09 Sep 2025

https://github.com/Azure/data-management-zone

Template to deploy the Data Management Zone of Cloud Scale Analytics (former Enterprise-Scale Analytics). The Data Management Zone provides data governance and management capabilities for the data platform of an organization.

architecture arm azure bicep data-fabric data-mesh data-platform datamesh enterprise-scale enterprise-scale-analytics policy-driven

Last synced: 05 May 2025

https://github.com/azure/data-landing-zone

Template to deploy a single Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Landing Zone is a logical construct and a unit of scale in the architecture that enables data retention and execution of data workloads for generating insights and value with data.

architecture arm azure bicep data-fabric data-mesh data-platform datamesh enterprise-scale enterprise-scale-analytics policy-driven

Last synced: 09 Apr 2025

https://github.com/Azure/data-landing-zone

Template to deploy a single Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Landing Zone is a logical construct and a unit of scale in the architecture that enables data retention and execution of data workloads for generating insights and value with data.

architecture arm azure bicep data-fabric data-mesh data-platform datamesh enterprise-scale enterprise-scale-analytics policy-driven

Last synced: 26 Apr 2025

https://github.com/ssimunic/Temp-Monitor

Internet of Things data platform for temperature and humidity sensors with maps

data-platform humidity internet-of-things iot iot-platform temperature

Last synced: 13 Jul 2025

https://github.com/Leading-AI-IO/palantir-ontology-strategy

A comprehensive guide to Palantir Foundry's Ontology strategy. / 世界最強のデータプラットフォーム「パランティア」の中核概念である『オントロジー』の戦略と実装を解き明かすOSS書籍プロジェクト。

book data-integration data-platform data-strategy enterprise-ai foundry governance ontology open-source palantir palantir-foundry

Last synced: 08 Mar 2026

https://github.com/keboola/mcp-server

Model Context Protocol (MCP) Server for the Keboola Platform

data-platform etl-pipeline mcp mcp-server model-context-protocol

Last synced: 03 Mar 2026

https://github.com/Azure/data-product-batch

Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.

architecture arm azure bicep data-fabric data-integration data-mesh data-platform data-product enterprise-scale enterprise-scale-analytics policy-driven

Last synced: 05 May 2025

https://github.com/azure/data-product-batch

Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.

architecture arm azure bicep data-fabric data-integration data-mesh data-platform data-product enterprise-scale enterprise-scale-analytics policy-driven

Last synced: 23 Jul 2025

https://github.com/davidgasquez/filecoin-data-portal

🧮 Open, serverless, and local friendly Data Platform for the Filecoin Ecosystem

data-analysis data-platform filecoin

Last synced: 02 Apr 2026

https://github.com/rpj/rpi

RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.

data-pipeline data-platform data-processing data-stream garden-bots python raspberry-pi redis rpi sensor sensors

Last synced: 12 Apr 2025

https://github.com/profcomff/dwh-pipelines

Графы работы с данными в Airflow

airflow data-platform dwh

Last synced: 15 Apr 2025

https://github.com/feluelle/kind-data-platform

A kind data platform on your local machine. 🤗

data-platform docker helm kind terraform

Last synced: 28 Oct 2025

https://github.com/finbourne/lusid-sdk-python

Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.

bi-temporal data-platform finbourne fintech lusid openapi python

Last synced: 09 Apr 2025

https://github.com/canonical/postgresql-operator

A Charmed Operator for running PostgreSQL on machines

charm data-platform postgresql python

Last synced: 16 Jan 2026

https://github.com/perfectthymetech/cloudscaleanalytics-v2-terraform

Cloud Scale Analytics (v2) to create a scalable data platform on Azure using a Data Management Zone, Data Landing Zones and Data Applications to build Data Products.

architecture azure cloud-scale-analytics cloudscaleanalytics data-platform datamesh enterprise-architecture enterprise-scale enterprise-scale-analytics terraform

Last synced: 14 Apr 2025

https://github.com/canonical/data-platform-workflows

Reusable GitHub Actions workflows used by the Data Platform team

data-platform

Last synced: 16 Feb 2026

https://github.com/aabouzaid/modern-data-platform-poc

My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

big-data cloud-agnostic cloud-native data-engineering data-lakehouse data-platform dataops edinburgh-napier kubernetes msc msc-project

Last synced: 27 Jun 2025

https://github.com/finbourne/lusid-sdk-java

Java SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.

bi-temporal data-platform finbourne fintech java lusid openapi

Last synced: 02 Aug 2025

https://github.com/profcomff/dwh-definitions

Data structures and migrations library

airflow data-platform dwh

Last synced: 24 Dec 2025

https://github.com/profcomff/dwh-airflow

Airflow build and deploy

airflow data-platform dwh

Last synced: 15 Apr 2025

https://github.com/huwngnosleep/complete_lakehouse_techstack

This project implements an end-to-end techstack for a data platform, for local development.

bigdata data-lakehouse data-platform data-warehouse etl hadoop kafka lambda-architecture spark

Last synced: 24 Jan 2026

https://github.com/canonical/postgresql-ldap-sync

Package to sync LDAP users with PG

data-platform postgresql

Last synced: 14 Jan 2026

https://github.com/canonical/pgbouncer-operator

A charmed operator for running PgBouncer on virtual machines.

data-platform pgbouncer

Last synced: 22 Apr 2025

https://github.com/zncdatadev/kubedoop

The modular open source data platform using kubernetes and cloud-native ecosystem which is the base for DataOps/MLOps(LLMOps)

bigdata cloud-native data-platform dataops hadoop kubernetes llmops mlops

Last synced: 06 Jul 2025

https://github.com/socialfinancedigitallabs/liia-tools

Tools to be used for 903, annex_a, and CIN census

data-platform

Last synced: 24 Aug 2025

https://github.com/kimtth/bicep-azure-data-platform-lac

🗄️ 👨🏾‍💻🏭Azure Data platform Infrastructure as Code (Datafactory, Databricks, Synapse Analytics, Purview)

bicep data-platform infrastructure-as-code

Last synced: 22 Jan 2026

https://github.com/canonical/postgresql-single-kernel-library

Library containing shared code for PostgreSQL operators (PostgreSQL, PgBouncer, VM and K8s)

charm data-platform postgresql

Last synced: 17 Nov 2025

https://github.com/zncdatadev/kubedoop-catalog

OLM catalog of the Kubedoop

bigdata data-platform k8s kubernetes olm

Last synced: 14 May 2025

https://github.com/canonical/mysql-shell-client

Package to interact with MySQL Shell

data-platform library mysql

Last synced: 13 Jan 2026

https://github.com/vnvo/deltaforge

A modular Change Data Capture (CDC) micro-framework built in Rust. Stream database changes to Kafka, Redis and etc.

cdc change-data-capture data-engineering data-platform etl event-sourcing kafka mysql postgresql redis schema-registry turso-db

Last synced: 12 Mar 2026

https://github.com/flowsynx/plugin-csv

FlowSynx plugin to reads and writes CSV files, enabling easy batch data import/export operations and integration with spreadsheet-based data workflows.

comma-separated-values csv data data-platform flowsynx

Last synced: 10 Mar 2026

https://github.com/flowsynx/plugin-json

FlowSynx plugin to loads and parses local JSON files. Supports transformation, extraction, and mapping of hierarchical data structures in workflows.

data data-platform flowsynx json

Last synced: 10 Mar 2026

https://github.com/finbourne/lusid-sdk-js

JavaScript SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.

bi-temporal data-platform finbourne fintech javascript lusid openapi

Last synced: 15 Jul 2025

https://github.com/yandex-cloud-examples/yc-data-platform-solutions

Каталог решений Data Platform в Yandex Cloud.

data-platform solutions yandex-cloud yandexcloud

Last synced: 05 Feb 2026

https://github.com/yandex-cloud-examples/yc-courses-ru-corpplatform

Материалы для курса «Построение корпоративной аналитической платформы».

clickhouse course data-platform datalens debezium kafka kafka-connector yandex-cloud yandex-practicum yandex-praktikum yandexcloud

Last synced: 18 Jan 2026

https://github.com/scribd/terraform-oxbow

This repository contains oxbow terraform module

data-platform managed-by-terraform terraform-oxbow

Last synced: 31 Jan 2026

https://github.com/flowsynx/plugin-base64

FlowSynx plugin to provides encoding and decoding of Base64 strings, allowing workflows to handle Base64 content transformations efficiently.

base64 base64-decoding base64-encoding data data-platform decoding encoding flowsynx flowsynx-plugins

Last synced: 10 Mar 2026

https://github.com/bablukumarjha/startup-funding-revenue-analysis-by-sql-and-pandas

SQL project analyzing startup funding, revenue, and founder data to extract business insights using Python and MySQL.

data data-analysis data-platform data-science dataanalysisusingpython dataanalytics pandas-dataframe pandas-library python sql sql-server sqlalchemy sqldatabase

Last synced: 02 Sep 2025

https://github.com/epappas/dataflix

A decentralized and transparent data sharing ecosystem

airflow data-platform data-science data-sharing hardhat protocol python solidity typescript web3

Last synced: 07 Apr 2025

https://github.com/cloudformations/training

Cloud Formations live training session content, available in person or online from industry leading experts on the latest Microsoft technologies.

data-analytics data-engineering data-platform microsoft-azure microsoft-fabric training

Last synced: 05 Apr 2025

https://github.com/caprogs/paris-events-analyzer

A project to analyze events in Paris using open source data provided by the city.

data data-analysis data-platform dbt docker ingestion python streamlit transformation vizualisation

Last synced: 25 Jun 2025

https://github.com/irwandifo/gcp-batch-infra

GCP Infrastructure for Batch Processing

data-lakehouse data-platform gcp terraform

Last synced: 30 Oct 2025

https://github.com/pavedroad-io/eventbridge

Ingest data from all major cloud platforms via events, API, or polling interfaces. Then filter, transform, and process generating workflows or trigger action on other clouds and frameworks.

argo-events argo-workflows data-platform data-processing event-emitter event-management event-sourcing go golang kubernetes

Last synced: 17 Jan 2026

https://github.com/tomblancdev/ratatouille

🐀 Self-hostable data platform - Iceberg lakehouse + ClickHouse + MinIO. Anyone can data!

clickhouse dagster data-engineering data-platform docker iceberg lakehouse minio python self-hosted

Last synced: 12 Feb 2026

https://github.com/arverma/config-manager

Config Manager is a powerful, open source platform for managing and versioning configuration data at scale. It features a Postgres-backed registry with immutable versioning, a modern web UI for browsing and editing configs with audit history.

config config-manager contribute contributions-welcome data-engineer data-platform dataengineering platform

Last synced: 14 Feb 2026

https://github.com/ingenii-solutions/azure-data-platform-databricks-runtime

Python package and custom runtime to use in Azure Databricks as part of Ingenii's Data Platform

azure data-platform

Last synced: 21 Jan 2026