{"id":15003311,"url":"https://github.com/abroniewski/tpc-di-ms-sql-benchmark","last_synced_at":"2026-02-09T04:03:45.802Z","repository":{"id":67119384,"uuid":"440672442","full_name":"abroniewski/TPC-DI-MS-SQL-Benchmark","owner":"abroniewski","description":"Using TPC-DI to benchmark MS SQL server using SQL script for extract, transform and load (ETL). ","archived":false,"fork":false,"pushed_at":"2022-07-12T18:03:57.000Z","size":4014,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-29T18:14:08.699Z","etag":null,"topics":["bdma","benchmark","data-engineering","database-management","dataops","ms-sql-server","mssql","sql","tpc-di","tpc-ds","tpc-ds-benchmark"],"latest_commit_sha":null,"homepage":"","language":"TSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abroniewski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-21T23:00:36.000Z","updated_at":"2025-01-02T15:48:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"cb47087d-47e5-4d87-999e-09b56f5364e6","html_url":"https://github.com/abroniewski/TPC-DI-MS-SQL-Benchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abroniewski/TPC-DI-MS-SQL-Benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abroniewski%2FTPC-DI-MS-SQL-Benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abroniewski%2FTPC-DI-MS-SQL-Benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abroniewski%2FTPC-DI-MS-SQL-Benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abroniewski%2FTPC-DI-MS-SQL-Benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abroniewski","download_url":"https://codeload.github.com/abroniewski/TPC-DI-MS-SQL-Benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abroniewski%2FTPC-DI-MS-SQL-Benchmark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267616695,"owners_count":24116160,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-28T02:00:09.689Z","response_time":68,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bdma","benchmark","data-engineering","database-management","dataops","ms-sql-server","mssql","sql","tpc-di","tpc-ds","tpc-ds-benchmark"],"created_at":"2024-09-24T18:57:54.268Z","updated_at":"2026-02-09T04:03:45.642Z","avatar_url":"https://github.com/abroniewski.png","language":"TSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TPC-DI: MS SQL Server Benchmark\n\nThe project was complete in 2021 as part of the Big Data Management and Analytics (BDMA) program for the Database Warehouse course at Universite Libre de Bruxelles (ULB) in Bruselles.\n\n**Are you a current BDMA student?** Don't be shy! [Reach out](mailto:abroniewski@gmail.com?subject=[GitHub]%20DBW%20TPCDI%20Benchmark) for insights and tips!\n\n\n## Project Team\nThe work in this repo was completed by:  \n- Diogo Repas\n- Nicole Kovacs\n- Andres Espinal\n- Adam Broniewski\n\n## The files you need...\nThe report in the `/Deliverables` path provides an overview of how the benchmark was performed.  \nThe `/Helpers` path has the scripts used to run the benchmark. \n\n***Note:*** *The DimSecurity table has unresolved issues in this script the result in no data being returned when it is used. As such, use of the FactMarketHistory, FactWatches and a part of DimTrade was left out of the benchmark*\n\n## Instructions for Replication\n1.\tGenerate files following TCP-DI instructions\n2.\tUse python script to unpack FINWIRE files. This was done using `Helpers/Scripts/ConvertFinwireFilesToCSV.py`\n3.\tLoad files into MS SQL database using SSIS\n4.\tMove raw files into schema named “source” in SQL table format. The schema was created using `Helpers/Scripts/CreateDBTableSchema.sql`\n5.\tTransform and load all tables from “source” to “dbo” using main SQL script located here `Helpers/Scripts/historical_load.sql`\n\nEven if you are not using SQL for the transformation, reading through the SQL script will provide you an overview of the transformations needed in whatever integration service you are using.\n\nHave questions? Drop me a line at abroniewski@gmail.com\n\n# Benchmarking Methodology\n\nThe following tools were installed to complete the benchmark: \n- SQL Server 2019 Express\n- SQL Server Data Tools 2017 (Standalone along with Visual Studio)\n- Materials and programs provided by TPC-DI\n\nThe benchmark queries and logging were implemented using Microsoft SQL Server Integration Services (SSIS). \nThe timing results were plotted in a live Tableau dashboard  that collects the logging results automatically from the database. \nData was generated using the TPC-DI data generator at 4 scale factors (SF):\n- SF 3\n- SF 10\n- SF 20\n- SF 30\n\nThere were two research papers used as a general reference for the TPC-DI ETL process that provided support in identifying data quality issues. These papers were:\n- Data Quality Problems in TPC-DI Based Data Integration Processes\n- TPC-DI: The First Industry Benchmark for Data Integration\n\nA git repository was also used as a reference for the data warehouse table creation. The repository used was:\n- https://github.com/detobel36/tpc-di (reviewed and checked against current version of TPC-DI spec)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabroniewski%2Ftpc-di-ms-sql-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabroniewski%2Ftpc-di-ms-sql-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabroniewski%2Ftpc-di-ms-sql-benchmark/lists"}