An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with datawarehouse

A curated list of projects in awesome lists tagged with datawarehouse .

https://github.com/hydradatabase/columnar

Postgres-native columnar storage extension

data-warehouse datawarehouse postgres postgresql postgresql-extension

Last synced: 14 May 2025

https://github.com/DataLinkDC/dinky

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

datalake datawarehouse flink flinkcdc flinksql olap real-time-computing-platform sql

Last synced: 27 Mar 2025

https://github.com/getdozer/dozer

Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.

api apis clickhouse data datawarehouse debe etl low-code postgres realtime rust snowflake sql streaming

Last synced: 11 Apr 2025

https://github.com/simbafl/datawarehouse

从数据仓库到用户画像,从数据建设到数据应用

bigdata datawarehouse olap presto sql userprofile

Last synced: 23 Apr 2025

https://github.com/simbafl/DataWarehouse

从数据仓库到用户画像,从数据建设到数据应用

bigdata datawarehouse olap presto sql userprofile

Last synced: 27 Mar 2025

https://github.com/Datavault-UK/automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

data-vault dataengineering datalake datavault datavault20 datawarehouse datawarehousing dbt elt etl metadata snowflake sql

Last synced: 13 May 2025

https://github.com/shunfei/indexr

An open-source columnar data format designed for fast & realtime analytic with big data.

columnar-storage datawarehouse indexr olap realtime

Last synced: 15 Mar 2025

https://github.com/rdagumampan/yuniql

Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS MAY 2022! v1.3.15 released!

amazon-rds azure-sql-database data-engineering database-migrations datawarehouse dotnet-core dotnet-tool mariadb mysql oracle postgresql redshift snowflake sql sqlserver yuniql

Last synced: 15 May 2025

https://github.com/zhongyu09/openchatbi

OpenChatBI is an intelligent chat-based BI tool powered by large language models, designed to help users query, analyze, and visualize data through natural language conversations. It uses LangGraph and LangChain to build chat agent and workflows that support natural language to SQL conversion and data analysis.

agent ai analytics bi database datawarehouse gpt langchain langgraph llm nlp text2sql

Last synced: 21 Nov 2025

https://github.com/cuebook/CueObserve

Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases

anomaly anomaly-detection bigquery datawarehouse prophet-facebook redshift root-cause-analysis snowflake sql timeseries-analysis timeseries-forecasting

Last synced: 07 May 2025

https://github.com/cuebook/cueobserve

Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases

anomaly anomaly-detection bigquery datawarehouse prophet-facebook redshift root-cause-analysis snowflake sql timeseries-analysis timeseries-forecasting

Last synced: 30 Jun 2025

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 27 Dec 2025

https://github.com/jitsucom/bulker

Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)

data-engineering datawarehouse etl etl-pipeline ingestion pipeline

Last synced: 07 Mar 2025

https://github.com/dermatologist/pyomop

Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries, MCP server and FHIR import.

cdm clinical-trials datawarehouse hacktoberfest health-data-analysis health-informatics llm ohdsi python text-to-sql

Last synced: 24 Dec 2025

https://github.com/data-solution-automation-engine/data-solution-framework

A library for data warehouse and data integration pattern and architecture documentation.

architecture data-warehouses datawarehouse design etl etl-control etl-processes patterns solution

Last synced: 20 Jul 2025

https://github.com/umer7/Data-Warehouse-Concepts-Design-and-Data-Integration

Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)

data-integration data-warehouse datawarehouse oracle pentaho

Last synced: 20 Jul 2025

https://github.com/kevchant/azuredevops-fabricdwdbproject

Template to perform CI/CD for Microsoft Fabric Data Warehouses

datawarehouse microsoft-fabric

Last synced: 19 Jun 2025

https://github.com/chenqingspring/rules-based-modeling-engine

一款基于规则的可视化模型构建引擎。支持指标定义,规则定义,多数据源接入,RESTful API 查询

big-data data-modeling datawarehouse restful-api rule-engine sql-generator

Last synced: 25 Feb 2025

https://github.com/imsanjoykb/etl-project

The goal of this project is to illustrate Extract Transform Load (ETL) using Python and SQL. ETL is a process commonly done in computing, which takes raw data, cleans it and stores it for later use. The extraction phase targets and retrieves the data. Transform manipulates and cleans the data. Then load stores the data, typically in a data warehouse.

data-engineering database datalake datawarehouse etl etl-automation etl-pipeline etl-solutions

Last synced: 18 Aug 2025

https://github.com/marco-roy/DDO

A DBT package to perform DataOps & administrative CI/CD on your data warehouse.

data dataops datawarehouse datawarehouseautomation dbt snowflake

Last synced: 05 May 2025

https://github.com/dazheng/SparkETL

Implement a complete data warehouse etl using spark SQL

datawarehouse etl spark sparksql

Last synced: 06 May 2025

https://github.com/dermatologist/hephaestus

:stars: Hephaestus - ETL and ML tools for OHDSI - OMOP CDM

datawarehouse emr etl health-data health-informatics

Last synced: 09 Sep 2025

https://github.com/LabmemNo004/AmazonMoviesDataWarehouse

数据仓库--存储并分析亚马逊历年电影数据

amazon datawarehouse movie

Last synced: 23 Apr 2025

https://github.com/stonezhong/DataManager

Better organize data in data lake and build ETL pipeline with Web UI tool.

datalake datawarehouse etl spark sparksql

Last synced: 20 Jul 2025

https://github.com/dynonguyen/data-warehouse-ukaccident

Information system for business project - building and mining data warehouse

datawarehouse mining olap sqlserver ssas-multidimensional ssis-packages visualize-data

Last synced: 12 Apr 2025

https://github.com/dina-hosny/etl-data-pipeline-using-airflow

An ETL Data Pipelines Project that uses AirFlow DAGs to extract employees' data from PostgreSQL Schemas, load it in AWS Data Lake, Transform it with Python script, and Finally load it into SnowFlake Data warehouse using SCD type 2.

airflow airflow-dags aws-s3 datawarehouse etl pandas python snowflake

Last synced: 18 Jun 2025

https://github.com/DataDrivenGit/Music-Streaming-App-using-AWS-ETL

Implemented Data Warehouse, Data Lake on AWS and Data modeling with Postgres and Apache Cassandra, Also used Apache Airflow to create data pipeline

airflow-operators cassandra data-lake data-pipelines datawarehouse postgres python3 sql

Last synced: 20 Jul 2025

https://github.com/birg81/teachingcoderepo

Hi Guys. I'm Biagio, teacher of Computer Science. This repository is where I share code co-developed during our lessons, providing interesting solutions to programming problems. Share your favorite one(s) with friends and colleagues, and if you have any suggestions or edits, I'll be happy to consider them.

css database datawarehouse development hashing html5 java javascript jwt-token php programming python rdbms rest-api sql web webapi webapp webapplication

Last synced: 07 May 2025

https://github.com/kevchant/github-fabricdwdbproject

About Template to perform CI/CD for Microsoft Fabric Data Warehouses using GitHub Actions

datawarehouse microsoft-fabric

Last synced: 20 Feb 2025

https://github.com/blakedrumm/scom-dw-grooming-tool

This is the official location of the System Center Operations Manager Data Warehouse Grooming Tool! Which is compatible with all versions of Operations Manager!

datawarehouse dw-tool gui opsmgr powershell scom scom-scripts scom-tool system-center-operations-manager

Last synced: 25 Dec 2025

https://github.com/dain55788/ibm-data-engineer-lecture-note

Lecture Notes and Practice Materials of IBM Data Engineering Course

data-analysis database dataengineering datawarehouse ibm

Last synced: 03 Apr 2025

https://github.com/agentmode/server

All-in-1 MCP server for developers

ai api database datawarehouse llm mcp python

Last synced: 29 Jun 2025

https://github.com/khaouitiabdelhakim/etl-slowly-changing-dimensions

Dive deep into Slowly Changing Dimensions (SCD) ETL in action with this comprehensive tutorial! In this video, we'll explore how to effectively manage changing data using SQL Server Management Studio (SSMS) and SQL Server Integration Services (SSIS).

datawarehouse etl slowly-changing-dimensions ssis ssms

Last synced: 05 Apr 2025

https://github.com/aventius-software/datawarehouse

An open source and free to use generic (basic) Microsoft SQL Server data warehouse

datawarehouse sql sqlserver

Last synced: 07 Oct 2025

https://github.com/hailiang-wang/parse-server-stack

Quick Get Started with parse server and dashboard.

datawarehouse nodejs parse-server

Last synced: 24 Feb 2025

https://github.com/taysir17/elt-movies-data-warehouse---analysis-project

This project focuses on 10,850 movies, providing a comprehensive overview of the movie industry across various dimensions like genres, cast, directors, ratings, and more.

big-data datawarehouse etl powerbi talend

Last synced: 06 Apr 2025

https://github.com/pymarcus/data_warehouse

Criação de um modelo de dados, desing, star schema com o SQL power architect

architect datawarehouse sql

Last synced: 15 Sep 2025

https://github.com/stefen-taime/myubereats_datapipeline

Building a Modern Uber Eats Data Pipeline

airflow api data datawarehouse mongodb pipeline powerbi snowflake

Last synced: 06 Mar 2025

https://github.com/medjb10/etl-movies-data-warehouse--analysis-project

This project focuses on 400K movies, providing a comprehensive overview of the movie industry across various dimensions like genres, cast, directors, ratings, and more.

bigdata datawarehouse etl imdb-dataset powerbi talend

Last synced: 21 Feb 2025

https://github.com/patilni3/etl-notes

Extract Transform Load all theoretical notes

datawarehouse etl etl-pipeline extract load schema transform

Last synced: 03 Apr 2025

https://github.com/worst001/note_bigdata

收录了大数据相关各类资料、笔记、手册

bigdata cdh datawarehouse development flink flume guide hadoop hbase hive learning markdown mkdocs note notebook spark

Last synced: 31 Dec 2025

https://github.com/fabioba/udacity-dwh-etl

This project refers to an example of populating a star schema on AWS - Redshift ingesting data from AWS - S3.

aws datawarehouse etl

Last synced: 23 Mar 2025

https://github.com/essien1990/etl_pipeline_airflow

Creating pipelines using Python3 and Apache Airflow to load tables into Google Big Query Dataware House

airflow airflow-dags airflow-operators bash bigquery bq datawarehouse etl-pipeline python3

Last synced: 03 Jan 2026

https://github.com/toskpl/sql

Advanced analytical functions sql

datawarehouse oracle oracle-database sql

Last synced: 03 Mar 2025

https://github.com/abki12c/data-warehouse-and-bi-project

A project for creating a Data Warehouse, designing the ETL process, creating visualizations on Power BI and creating data mining models

bussiness-intelligence datamining datawarehouse etl powerbi

Last synced: 27 Oct 2025

https://github.com/thunchanokbow/audiblebook-revenue

Manage big data on cloud computing to find a list of best-selling audible books, generate reports and dashboards, and provide products and sales promotions that meet the needs of consumers in Thailand

apache-airflow bigquery cloudcomposer data-visualization datalake datawarehouse googlecloudstorage lookerstudio pandas python3

Last synced: 19 Jul 2025

https://github.com/windi-wulandari/pbi_kimia-farma-x-rakamin

A data-driven analytics project for Kimia Farma to evaluate business performance from 2020-2023 using BigQuery. Focused on transaction data, inventory, branch operations, and product insights. Results were visualized through an interactive dashboard to support strategic decisions and optimizations.

big-data-analytics bigquery datawarehouse googlelooker sql

Last synced: 03 Jan 2026

https://github.com/shiningrush/dolphin

A lightweight data warehouse solution to solve problems about complex query in microservices, such as data statistics and query on multiple micro services.

background datawarehouse

Last synced: 15 Dec 2025

https://github.com/alefrp/sql_dwacoes

Criado com base em minha experiência em dados financeiros e SQL, para análise de ações.

database datawarehouse modeling sql

Last synced: 30 Nov 2025

https://github.com/tim-hub/adventureworks

this is a SSIS project for building a datawarehouse on adventure works.

adventureworksdw datawarehouse ssis

Last synced: 17 Mar 2025

https://github.com/kstrassheim/datawarehouse-crawler

This is a content and schema crawler tool to receive, update and import various kinds of data into a Onprem or Cloud based SQLServer or Azure-Synapse-Analysis (Azure Datawarehouse SQLServer). As source it supports SQLServer Tables, ODATA Endpoints, CSV Files or Excel Files. For multiple sources it can run in parallel mode where it would make a thread for each connection. The speciality of this crawler is that it creates the target tables by himself using the additional info from source.json. In case of Azure-Synapse-Analysis it would estimate the distribution type and keys. The syncing works completely without SQL Transactions by using a consistency correction algorithm for very frequent fact tables. There are 5 Syncing Algorithms (see Manual/Insert) which can be selected as well as one Update Algorithm.

azure-data-warehouse azure-synapse-analytics business-intelligence crawler csv data-import data-science datawarehouse datawarehousing docker dotnet-core-2 excel integration-testing odata parallel-computing sql

Last synced: 03 Apr 2025

https://github.com/rohit196/sql-learning-hub

A comprehensive collection of SQL resources, projects, tutorials, and interview preparation materials

bigquery datawarehouse learning-resources nosql sql sql-projects

Last synced: 17 Jun 2025

https://github.com/i-am-uchenna/sql-data-warehouse-project

The Data Warehouse and Analytics Project is a comprehensive initiative designed to demonstrate the end-to-end process of building a modern data warehouse and deriving actionable insights through SQL-based analytics.

architecture business-intelligence crm data data-analysis database database-management datawarehouse erp etl etl-pipeline model sql sqlserver

Last synced: 29 Oct 2025

https://github.com/emmanuelkdev/sql-data-warehouse-project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

data-analytics dataengineering-a datawarehouse etl etl-pipeline python sql

Last synced: 23 Jun 2025

https://github.com/gitgut01/washed-up-lineage

A half-decent, LLM-powered cartographer that maps your data warehouse mess

datawarehouse lineage llm neo4j

Last synced: 24 Jun 2025

https://github.com/niranjanrao07/data-226-assignments

This repository includes assignments for DATA 226, focused on designing databases, implementing SQL for analytics, performing ETL operations, building data pipelines, and conducting OLAP.

airflow datalake datawarehouse dbt pipeline python snowflake sql

Last synced: 05 Sep 2025

https://github.com/akin-mustapha/university_data_warehouse

Implementation of a data warehouse, using a fictional datasets

database datawarehouse msql

Last synced: 11 May 2025

https://github.com/zack0061/end-to-end-data-pipeline

📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.

airflow cassandra data-analysis data-engineering-pipeline data-science dataengineering datawarehouse etl etl-framework etl-job python redshift scheduler terraform

Last synced: 07 Jul 2025

https://github.com/ayushman0511/data-warehouse-project1

A comprehensive guide to building a data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

data data-ana data-anal data-cleaning data-enginee data-lakehou datalake datasci dataware datawarehouse datawarehousi etl etl-job etl-pipeline medallion sql sql-quer sql-query sql-server sqlserver

Last synced: 26 Jun 2025

https://github.com/bbqqvv/datawarehouse

Dự án DataWarehouse phân tích dữ liệu chuyến bay từ 2019 đến 2021

datawarehouse powerbi sqlserver ssas ssis ssrs

Last synced: 24 Feb 2025

https://github.com/thuanvu301103/co4031_data_warehouse_and_data_support_system_assignment_weather_warehouse

This is a simple Data Warehouse structure for DW and DSS Assignment Semester 242

apachenifi datawarehouse postgresql

Last synced: 22 Mar 2025

https://github.com/merrill007/sql-data-warehouse-project

The Data Warehouse and Analytics Project is a comprehensive initiative designed to demonstrate the end-to-end process of building a modern data warehouse and deriving actionable insights through SQL-based analytics.

architecture business-intelligence crm data data-analysis database database-management datawarehouse erp etl etl-pipeline model sql sqlserver

Last synced: 22 Mar 2025

https://github.com/epomatti/az-synapse

Creating and loading data to Azure Synapse

azure azure-synapse-analytics datawarehouse pluralsight terraform

Last synced: 25 Dec 2025

https://github.com/mohammad-malik/metro-meshjoin-data-warehouse

This repository contains a simulated real-time data warehouse for the METRO Shopping Store. The project utilizes streamed transactional (simulated) and a master database to create a warehouse using MESHJOIN, then performs advanced OLAP analyses on it. It was developed as part of the course Data Warehousing & Business Intelligence (DS3003).

datawarehouse eclipse-ide java meshjoin mysql-database

Last synced: 25 Jul 2025

https://github.com/madmax55555/real_estate_sales_de_project

This project aims to extract, transform, and load (ETL) real estate sales data into a data warehouse (`re_sales_dw`) .

data-engineering datawarehouse etl jupyter-notebook pandas

Last synced: 01 Aug 2025

https://github.com/anderakooken/dw-molap-datamart

datamart scripts - molap (Data Science)

cube datamart datascience datawarehouse sql sqlserver

Last synced: 01 Aug 2025

https://github.com/pradip-data/world-merchandise-trade

This project analyzes global merchandise trade trends from 1948 to 2023 using Google BigQuery and Python. It includes country-wise and product-wise trade performance, covering exports, imports, total trade, and trade deficit. The analysis features SQL queries for BigQuery, data visualizations, and detailed reports to uncover long-term trade pattern

datawarehouse google-biquery google-cloud-platform merchandise python python-visualization sql trade trade-data-1948-2023

Last synced: 10 Aug 2025

https://github.com/mesmacosta/hive-table-metadata-generator

This script generates random metadata for the Hive metastore.

apache-hive bigdata datawarehouse metadata

Last synced: 04 Oct 2025

https://github.com/marcosach/tp-infra

Este repositorio contiene todos los archivos que componen al trabajo práctico final de la materia Infraestructura para la Ciencia de Datos de la Licenciatura en Ciencia de Datos (UNSAM).

bigquery buckets datamart datawarehouse etl gcp gcs pipelines python sql

Last synced: 13 Aug 2025

https://github.com/dougdss89/wideworldadventure

This repository includes all files that compose the design and unification of the databases AdventureWorks and WideWorldAdventure project.

bigdata databricks datalake datawarehouse dbt deltalake duckdb elt etl etl-pipeline spark

Last synced: 05 Oct 2025

https://github.com/tuanai-vireox/dataform-utils

Bigquery Dataform Javascript Utils Package - Support Ads, Query Common, ...

bigquery dataform datawarehouse

Last synced: 22 Aug 2025

https://github.com/trannhatnguyen2/bi_cloud_kientap

Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration

azure datalake datawarehouse powerbi

Last synced: 29 Aug 2025

https://github.com/dakshsammi/arkaid

Arkaid is a game performance analytics platform developed for the Information Integration Architecture Course - CSE656 (IIIT Delhi). It uses a data warehouse approach to analyze gaming data from multiple sources and provides insights via an AI-driven interface.

ai airflow aws database-management datawarehouse docker etl flask information-integration numpy openai postgresql python togetherai

Last synced: 30 Dec 2025

https://github.com/moshora99/sql-data-warehouse-project

Build modern data warehouse with mysql, Including ETL processes, data modeling and analytics

data-analysis data-engineering data-science database datawarehouse datawarehousing etl scheme sql sql-query sql-server

Last synced: 29 Jun 2025

https://github.com/haidendr/data-warehouse-design-for-an-airline-company

Simple Data Warehouse Design for an Airline Company using PostgreSQL

datawarehouse erdiagram postgresql schema starmodel

Last synced: 15 Mar 2025

https://github.com/tuancamtbtx/dataform-utils

Bigquery Dataform Javascript Utils Package - Support Ads, Query Common, ...

bigquery dataform datawarehouse

Last synced: 15 May 2025