An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-warehousing

A curated list of projects in awesome lists tagged with data-warehousing .

https://github.com/cynkra/dm

Working with relational data models in R

data-model data-warehousing datawarehousing dbi dbplyr r relational-databases

Last synced: 14 May 2025

https://github.com/apache/doris-flink-connector

Flink Connector for Apache Doris

apache connector data-warehousing dbms doris flink mpp olap

Last synced: 14 May 2025

https://github.com/totalhack/zillion

Make sense of it all. Semantic data modeling and analytics with a sprinkle of AI. https://totalhack.github.io/zillion/

ai analytics data-analysis data-warehousing datasources openai python query-builder reporting semantic-data-model semantic-layer sql text-to-sql warehouse

Last synced: 07 Jan 2026

https://github.com/iam-mhaseeb/skytrax-data-warehouse

A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.

airflow data-analysis data-analytics data-cleaning data-engineering data-orchestration data-processing data-visualization data-warehouse data-warehousing database docker metabase python python3 redshift s3 s3-bucket sql

Last synced: 12 Aug 2025

https://github.com/apache/doris-spark-connector

Spark Connector for Apache Doris

apache connector data-warehousing dbms doris mpp olap spark

Last synced: 16 May 2025

https://github.com/cedstandards/ceds-data-warehouse

Modeled for longitudinal storage and reporting of P-20W data, the Common Education Data Standards (CEDS) Data Warehouse implements star schema data warehouse normalization techniques for improved query performance.

ceds data-warehouse data-warehouses data-warehousing education-data education-data-standards education-database sql-server

Last synced: 01 May 2026

https://github.com/snowflake-labs/emerging-solutions-toolbox

The Emerging Solutions Toolbox is a collection of solutions created by Snowflake's Solution Innovation Team (SIT) that consists of demos, helpers, and frameworks to help you get the most out of Snowflake.

ai data-engineering data-science data-warehousing machine-learning native-apps notebooks optimization python snowflake streamlit

Last synced: 22 Jun 2025

https://github.com/exasol/dbt-exasol

Data Build Tool adapter for Exasol

data-builder data-warehousing exasol-integration

Last synced: 11 May 2026

https://github.com/cfpb/aurora

An open source enterprise data warehousing and analysis platform.

ansible data-science data-warehousing

Last synced: 09 Apr 2025

https://github.com/alexisrolland/listof

:scroll: Simple and flexible application to manage configuration data aka lists of values.

data-warehouse data-warehousing dimensions feature-store list-of-values lists mapping mapping-tools

Last synced: 08 Feb 2026

https://github.com/gavindsouza/instagram-to-sqlite

Save data from Instagram takeout to a SQLite database

data-warehousing dogsheep hacktoberfest instagram sqlite takeout

Last synced: 12 Apr 2025

https://github.com/amey-thakur/data-warehousing-and-mining-and-data-warehousing-and-mining-lab

CSC603: Data Warehousing and Mining [DWM] & CSL603: Data Warehousing and Mining Lab [DWM Lab] | TE Semester VI | Computer Engineering

amey ameythakur computer-engineering computer-science data-mining data-warehouse data-warehouse-architecture data-warehousing engineering megasatish textbooks

Last synced: 02 Mar 2026

https://github.com/dataforgeopenaihub/steam-sales-analysis

This repository features an ETL pipeline for retrieving, processing, validating, and ingesting game metadata and sales data from SteamSpy and Steam APIs. Data is stored in a MySQL database on Aiven Cloud and visualized using Tableau dashboards for insightful analysis of gaming trends and sales performance.

cloud-computing data-analysis data-engineering data-pipepline data-warehousing games mysql-database python steam-api tableau typer-cli

Last synced: 06 Feb 2026

https://github.com/trieb-work/schemabase

The schemabase Integration framework to build custom data movers between different cloud services. Using BullMQ, Webhooks, Prisma Database and more

data-warehousing ecommerce ecommerce-platform

Last synced: 15 Apr 2025

https://github.com/aymane-maghouti/youtube-data-pipeline

The project aims to automate the extraction of data from a YouTube channel, transform the data into a suitable format, and make it available for analysis through a Power BI dashboard. By following a structured ETL process, this project streamlines data retrieval, preparation, and visualization.

csv data-engineering data-pipeline data-visualization data-warehousing etl pandas powerbi python snowflake web-scraping

Last synced: 29 Oct 2025

https://github.com/Narius2030/MOLISA-Data-Warehouse

Extract data from many databases of Labor, Invalids and Social Affairs sectors and convert to appropriate structure and format, then upload to shared data warehouse and data mart. Thanks to that, people of state agencies can easily retrieve and analyze data based on the compiled data warehouse.

apache-airflow apache-spark api-rest data-pipeline data-warehousing medallion-architecture postgresql

Last synced: 29 Oct 2025

https://github.com/al-ghaly/airline-company-data-warehouse

Data Warehouse modeling, design, implementation, and analysis for an Airline Company.

data-analysis data-warehousing database-modeling sql-server

Last synced: 14 Apr 2025

https://github.com/aventius-software/microsoftfabricnhsdemo

This is a full example of E2E data processing using Microsoft Fabric with some publically available NHS data. Feel free to explore/fork/reuse etc...

data-engineering data-warehousing lakehouse microsoft-fabric nhs pyspark

Last synced: 12 Feb 2026

https://github.com/madhurimarawat/data-warehousing

This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.

data-aggregation data-cleaning data-cleaning-and-preprocessing data-warehousing detailed-documentation etl etl-pipeline mysql normalization olap-cube olap-data olap-database query-optimization snowflake-schema star-schema

Last synced: 28 Apr 2026

https://github.com/sagarpednekar/sparkify-data-warehouse

The project involves developing an ETL pipeline on Amazon Redshift to load data from S3, create staging tables, and transform the data into a star schema optimized for song play analysis. The goal is to enable Sparkify's analytical team to uncover insights about user behavior and music preferences.

aws data-warehousing etl python redshift s3 sql udacity

Last synced: 19 May 2026

https://github.com/aniketraut16/vi-sem-labs

This repository is a collection of all the codes and screenshots from my 6th-semester lab sessions. It serves as a personal archive of my work across various subjects, capturing both the learning process and the results. Dive in to explore and learn!

data-mining data-warehousing deep-learning image-processing natural-language-processing video-processing

Last synced: 02 Sep 2025

https://github.com/tedoaba/kaim-w7

Building a Data Warehouse to Store Data on Ethiopian Medical Business Data Scraped from Telegram Channels

data-scraping data-warehousing dbt kaim medical-business-data postgresql sqlalchemy store-data telegram telegram-api telethon yolov5

Last synced: 20 May 2026

https://github.com/willie-conway/relational-database-administration-capstone-project

🧱 Relational Database Administration Capstone Project focuses on design, secure, optimize, and automate OLTP & Data Warehouse systems using MySQL, PostgreSQL, Apache Airflow, and shell scripting. πŸ’ΎπŸ”πŸ“Šβš™οΈ

airflow backup data-pipelines data-warehousing database-admin database-security encryption etl mysql oltp optimization phpmyadmin phppgadmin postgresql restore shell-scripting sql

Last synced: 16 Apr 2026

https://github.com/chaseofthejungle/data-repositories-overview

A quick guide to several different kinds of data repositories and their use cases.

data-architecture data-repository data-warehousing

Last synced: 30 Jan 2026

https://github.com/sebastian-huynh/data-technology-information

Research paper where I cover numerous topics ranging from big data to data warehousing.

big-data data-management data-technologies data-warehousing database internet-of-things

Last synced: 12 Nov 2025

https://github.com/jorgermduarte/real-time-data-architecture-kafka-flink-dw-k8s

Real-time data processing architecture using Apache Kafka, Flink, and Kubernetes. This project demonstrates how to build a scalable and resilient pipeline for streaming data, performing ETL with Flink, and storing the processed data in a Data Warehouse for analysis.

apache big-data bussiness-intelligence data-dictionary data-pipeline data-warehouse data-warehousing distributed-systems etl flink grafana kafka kubernetes node prometheus real-time streaming

Last synced: 16 Feb 2026

https://github.com/azizy22/duckdb-cho

πŸ¦† Explore DuckDB with enhanced features and performance for efficient data analytics and querying in your applications.

analytics big-data cross-platform data-analytics data-management data-science data-warehousing database duckdb in-memory-database open-source performance sql sql-query visualization

Last synced: 12 Apr 2026

https://github.com/vaxdata22/cities-weather-s3-snowflake-slack-notif-etl-by-airflow-on-ec2

This is my second industry-level ETL project. This data pipeline orchestration uses Apache Airflow on AWS EC2. It demonstrates how to build an ETL data pipeline that would extract data (JSON) from the OpenWeatherMap API, transform it, dump it as CSV in S3 bucket, then copy it to destination tables in Snowflake DW and send Slack notification.

apache-airflow aws-ec2 business-intelligence dags data-warehousing etl-pipeline openweathermap-api orchestration python3 slack-webhook snowflake sql

Last synced: 26 May 2026

https://github.com/hase3b/end-to-end-dwh-pipeline

This repository contains the end-to-end pipeline for building a data warehouse for a real estate management company. The pipeline includes data generation, ETL process, creation of star schema dimensions and fact table, visualization using Power BI, and automation with Pabbly Connect.

api automation dashboard data-engineering data-generation data-pipeline data-warehousing database-schema dimensional-modeling eda erd etl-pipeline mockaroo pabbly powerbi relational-database star-schema workflow

Last synced: 04 Apr 2025

https://github.com/thaitechtales/snowflake

This repository houses projects that demonstrate proficiency in Snowflake, focusing on cloud-based data warehousing, analytics, and integrations.

analytics big-data cloud-database data-management data-storage data-warehousing scalable-solutions snowflake

Last synced: 22 Jul 2025

https://github.com/gloryodeyemi/sql-data-warehouse

A comprehensive SQL Data Warehouse built from scratch using Azure Data Studio and SQL Server Express. It simulates an enterprise data pipeline using the Medallion Architecture and reflects industry best practices in Data Engineering, ETL design, and SQL-based data modeling.

data-transformation data-warehousing etl-pipeline medallion-architecture sql-server tsql

Last synced: 26 Jun 2025

https://github.com/nickklos10/sql-data-warehouse

A complete SQL-based data warehouse implementation featuring a medallion architecture (Bronze, Silver, Gold layers) for processing and analyzing customer and sales data from multiple source systems.

business-intelligence data-engineering data-warehousing etl postgresql sql

Last synced: 23 Feb 2026

https://github.com/praveendecode/data_pipeline

Implemented ETL projects with interactive Streamlit UI for user-friendly data extraction, transformation, and loading tasks

data-harvesting data-warehousing database database-management extract-transform-load mysql postgresql python

Last synced: 14 Apr 2026

https://github.com/vaxdata22/city-weather-and-s3file-rds-s3-bigquery-etl-by-airflow-on-ec2

This is my third AWS Cloud ETL project. This data pipeline orchestration uses Apache Airflow on AWS EC2. It demonstrates how to build an ETL data pipeline that would perform data extraction to a database in parallel to a loading process into the same database, join the tables, copy joined data to S3 and finally copy the S3 file to BigQuery DW.

apache-airflow aws-ec2 aws-rds-postgres aws-s3 bigquery business-intelligence dags data-warehousing etl-pipeline openweathermap-api orchestration python3 sql

Last synced: 21 May 2026

https://github.com/emediongfrancis/unified-data-lake-implementation-gcp-kafka-airflow-snowflake

This project demonstrates the integration of data from multiple sources into a unified data lake. The project showcases the use of Apache Airflow for ETL tasks, Google Cloud Storage as a data lake, Apache Kafka for data movement automation, Snowflake for data warehousing, and Google BigQuery for analysis.

airflow data-analysis data-warehousing etl etl-pipeline gcp-storage kafka snowflake value variety

Last synced: 07 Feb 2026

https://github.com/adudko/publications

My publications

data-lakes data-warehousing

Last synced: 01 Feb 2026

https://github.com/maxinexiong/cloud-data-warehousing-with-aws-redshift

This project builds a cloud-based ETL pipeline for Sparkify to move data to a cloud data warehouse. It extracts song and user activity data from AWS S3, stages it in Redshift, and transforms it into a star-schema data model with fact and dimension tables, enabling efficient querying to answer business questions.

aws-boto3 aws-redshift aws-s3 cloud-data-warehouse data-engineering data-warehouse data-warehousing dimensional-model dimensional-modeling etl etl-pipeline extract-transform-load infrastructure-as-code postgresql postgresql-database redshift-cluster

Last synced: 27 Feb 2026

https://github.com/gaaniruddha/fit5195-a1

This repository contains assignment #1 that was completed as a part of "FIT5195 Business Intelligence and Data Warehousing", taught at Monash Uni in S1 2020.

data-warehousing sql star-schema

Last synced: 01 Mar 2026

https://github.com/vaxdata22/city-weather-and-s3file-rds-s3-bigquery-by-airflow-on-ec2

This is my third industry-level ETL project. This data pipeline orchestration uses Apache Airflow on AWS EC2. It demonstrates how to build an ETL data pipeline that would perform data extraction to a database in parallel to a loading process into the same database, join the tables, copy joined data to S3 and finally copy the S3 file to BigQuery DW.

apache-airflow aws-ec2 aws-rds-postgres aws-s3 bigquery business-intelligence dags data-warehousing etl-pipeline openweathermap-api orchestration python3 sql

Last synced: 18 Mar 2025

https://github.com/mariann95/sql_data_warehouse_and_analytics_project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

data-analysis data-analytics data-cleaning data-engineering data-lakehouse data-science data-science-portfolio data-warehouse data-warehousing datalake datawarehouse datawarehousing etl etl-job etl-pipeline medallion-architecture sql sql-query sql-server sqlserver

Last synced: 06 Jun 2026

https://github.com/dannykyungh/data-analytics-portfolio

This is a repository that I have created to showcase skills, share projects and track my progress in Data Analytics / Data Science related topics.

advanced-excel data-cleaning data-modeling data-visualization data-warehousing google-sheets looker-studio python r sql tableau

Last synced: 12 May 2026

https://github.com/dina-hosny/sparkify---aws-redshift-data-warehousing

Sparkify - AWS Redshift Data Warehousing - Udacity Data Engineering Expert Track.

analytics aws data-engineering data-pipeline data-warehousing etl fwd redshift sql udacity

Last synced: 12 May 2026

https://github.com/lb-lewisham/lewisham-commonplaces

pooling together of various lewisham consultations into one sqlite database

data-warehousing datasette sqlite

Last synced: 06 May 2026

https://github.com/dina-hosny/sparkify---data-modeling-with-cassandra

Sparkify - Data Modeling with Cassandra - Udacity Data Engineering Expert Track.

cassandra cql data-analysis data-engineering data-modeling data-warehousing etl python

Last synced: 11 Apr 2026

https://github.com/tknishh/data-modeling-and-analyzing

Analyzed the dvdrental data along with converting it into star schema.

data-analytics data-engineering data-warehousing dvdrental pgadmin4 postgresql-database star-schema-tables

Last synced: 16 Aug 2025

https://github.com/bhargav-joshi/datawarehousing-mining

DWM Practical Implementations

data-mining data-warehousing dwm ml

Last synced: 09 Apr 2025

https://github.com/saravanansuriya/youtube-data-harvesting-and-warehousing-using-sql-mongodb-and-streamlit

In This project aims to develop a user-friendly Streamlit application that utilizes the Google API to extract information on a YouTube channel, stores it in a MongoDB database, migrates it to a SQL data warehouse, and enables users to search for channel details and join tables to view data in the Streamlit app.

data-warehousing mongodb mysql-database python streamlit-webapp youtube-api

Last synced: 06 Feb 2026

https://github.com/diem0n/100daysofdatascience

This repository is a collection of things i do on as a data scientist each day as i am hired at a fictional company called keko corp

data-analysis data-engineering data-science data-science-from-scratch data-warehousing machine-learning python

Last synced: 09 Apr 2026

https://github.com/revolutionarybukhari/datawarehouse_meshjoin_superstore

A dataware house is generated for streaming data of a superstore using extended mesh join by Syed Husnain Haider Bukhari

data data-science data-warehousing meshjoin

Last synced: 23 May 2026

https://github.com/mohammedowaiskh/retail-data-warehouse

An end-to-end Azure Data Engineering pipeline for Retail Analytics, built using ADF, Databricks, Synapse, and Power BI.

azure data-engineering data-warehousing etl-pipeline python

Last synced: 17 May 2026