{"id":15693235,"url":"https://github.com/ggeop/board-game-analysis","last_synced_at":"2025-09-07T02:04:22.013Z","repository":{"id":102403973,"uuid":"133557405","full_name":"ggeop/Board-Game-Analysis","owner":"ggeop","description":"Board games analysis \u0026 Tableau visualizations.  :game_die:","archived":false,"fork":false,"pushed_at":"2018-05-20T08:32:40.000Z","size":985,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-09-07T02:03:29.366Z","etag":null,"topics":["boardgamegeek","olap","olap-cube","r","sql-server","tableau"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ggeop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-15T18:31:55.000Z","updated_at":"2025-04-03T15:31:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"6de5a0a8-232d-447e-aa7f-850addcca0b8","html_url":"https://github.com/ggeop/Board-Game-Analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ggeop/Board-Game-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FBoard-Game-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FBoard-Game-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FBoard-Game-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FBoard-Game-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ggeop","download_url":"https://codeload.github.com/ggeop/Board-Game-Analysis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FBoard-Game-Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273986629,"owners_count":25202708,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["boardgamegeek","olap","olap-cube","r","sql-server","tableau"],"created_at":"2024-10-03T18:42:21.487Z","updated_at":"2025-09-07T02:04:21.976Z","avatar_url":"https://github.com/ggeop.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BoardGameAnalysis\n\n## Introduction\n\nIt is undeniable fact that board games have been making a comeback lately, and deeper, more strategic board games, like Scythe or Catan have become hugely popular all over the world. In Greece there are many associations and “fan clubs” of popular board games, which they organized many tournaments with many participants and a rich variety of awards. \n\nMost game designers have full time jobs and creating games is merely their hobby (yes, even for popular games!). They typically only make enough profit to break even or at best squeeze out a couple expansions.\nThat is, until Kickstarter. Kickstarter is a global crowdfunding platform that helps bring creative projects to life. Kickstarter has been revolutionary to the board game market, as it gives avid gamers a chance to put their idea out in front of other like minded people. It gave the table top community a way to bring silent ideas to life. \n\nBased on the above reasons, this assignment is an excellent opportunity to study and analyze a huge dataset of board games, so as to investigate which attributes and users’ stats exactly define a successful and popular board game.\nWe are going to use SQL Server Database to define our schema, to design and develop our data warehouse. Moreover we are going to use SQL Server Analysis Services to define our multi-dimensional model over our schema. Furthermore we will connect our cube to Tableau and we are going show some OLAP reports and visualize the most interesting results. \n\nAt last, we are going to create a linear regression model through IBM SPSS Statistics Platform, to make predictions for newly created board games and to understand which stats from BGG community and main attributes of games affect the rating of a game more.\nOur main approach to this report, in exploring the BoardGameGeek (BGG) dataset, will be an effort on trying to answer the question “Wouldn’t it be nice to know if a board game is good before you buy it?” This is a very broad question and cannot be answered without a deep knowledge of the data itself.\n\n\n## Application\n\nBoth of us we are board games enthousiasts, so we chose to do our project with a dataset from BoardGameGeek (BGG).\n\nBoardGameGeek is an online forum for board gaming hobbyists and a game database that holds reviews, images and videos for over 90,000 different tabletop games, including European-style board games, wargames, and card games. In addition to the game database, the site allows users to rate games on a 1–10 scale and publishes a ranked list of board games. Since 2005, BoardGameGeek hosts an annual board game convention, BGG.CON,  that has a focus on playing games. New games are showcased and convention staff is provided to teach rules.\n\n## Dataset\n\nWe use as our dataset which contains the attributes and the ratings for around 94.000 among board games and expansions, from BoardGameGeek. A few details about our initial dataset:\n\n•\tThe initial size was around 150MB\n•\tAround 94.000 rows \u0026 80 columns\n\nEach row represents a single board game and has descriptive statistics about the board game, as well as review information. Some interesting columns for analysis are:\n\n•\tgame.type – Board games are divided in two categories, “BoardGames” \u0026 “BoardGames Expanions”\n•\tdetails.maxplayers – Suggested max number of players (given by the manufacturer)\n•\tdetails.maxplaytime – Maximum playing time (given by the manufacturer)\n•\tdetails.minage – Minimum recommended age to play\n•\tdetails.minplayers – Suggested min number of players (given by the manufacturer)\n•\tdetails.minplaytime – Minimum playing time (given by the manufacturer)\n•\tdetails.name – Name of the board game\n•\tdetails.yearpublished – Published year\n•\tattributes.boardgamecategory – Board games categories\n•\tattributes.boardgamemechanic – Type of mechanics (e.g. Hand Management,Set Collection,Trading)\n•\tattributes.total – Number of attributes (e.g. dices, cards, pawns...)\n•\tstats.average –  Average rating given to the game by users (0-10)\n•\tstats.averageweight –  Average of all the subjective weights (0-5)\n•\tstats.numcomments – Number of comments in each game given by the BoardGameGeek)\n•\tstats.numweights – Number of votes in  weight/difficulty\n•\tstats.owned  –  Number of players who have a game\n•\tstats.trading  – Number of players who trade a game\n•\tstats.usersrated – Users rate in each game\n•\tstats.wanting – Number of player who want a game\n•\tstats.wishing – Number of players who added a game in wish list\n\nWe drop bayes_average_rating since it is almost analogous to our target.For regression analysis we have chosen the columns:\n\n\n## ETL - Extract Transform Load\n\nETL is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. We did all the ETL procedure via Rstudio.\n\n### Extract\nWe extracted the data from a github repository. \n\n```\n#Install package\ninstall.packages(\"devtools\")\ndevtools::install_github(\"9thcirclegames/bgg-analysis\")\n\n```\n\nAfter we created a .csv file in order to insert it in Rstudio and start the cleaning procedure:\n\n```\n# Create our initial csv\ndata(BoardGames)\nwrite.csv(BoardGames, 'BoardGames.csv', row.names = FALSE)\n\n```\n\n### Transform \n\nWe transformed the data by doing the following tasks: \n1.\tFilter the columns and we keeped only certain columns to load that have potential value to our analysis.\n\n2.\tWe categorized to Dimensions and Measures according to our analysis needs.\n\n3.\tApply rules according to BoardGameGeek platform\no\tPlayers age limits (0-100)\no\tRating score range(1-10)\no\tDifficulty score range (1-5)\no\tPublish year range, (1980-2016)\no\tNumber of player limits (1-100)\n\n4.\tCleaning\no\tRemoving records with destroy titles (e.g. no-English titles)\no\tRemoving records with no titles or only special character titles\no\tDelete duplications\no\tInsert ranges in measures (e.g. average rating between 1 and 10)\n\n5.\tSplitting columns, we spitted columns with multiple values to different cells. Specifically, we created two bridge tables (categories and mechanic)\n\n6.\tWe created one data frame for each dimension and one for the fact. \n7.\tWe convert the schema of the table. We converted to a star-flake schema by replacing the dimension columns with their ids.\n\nWe created and run the following code* in order to clean the dataset:\n*We have omitted a couple of lines for cleaning the columns\n\n```\n1.\tBoardGames\u003c-BoardGames[BoardGames$game.type==\"boardgame\",]\n2.\tBoardGames\u003c-BoardGames[,-c(1)]\n3.\t\n4.\t#NOTE: THERE ARE 2 TYPE OF GAMES THE BOARDGAMES AND THE EXPANSIONS OF THEM. \n5.\t#THE EXPANSIONS ARE THE SAME GAMES WITH FEW EXTRA FEATURES\n6.\t\n7.\t##### Delete Dublications in Board games\n8.\tBoardGames\u003c-BoardGames[!duplicated(BoardGames$details.name),]\n9.\t\n10.\t##### Date limitsPublish year limits\n11.\tBoardGames\u003c-BoardGames[2017\u003eBoardGames$details.yearpublished \u0026 BoardGames$details.yearpublished\u003e1980,]\n12.\t\n13.\t##### Date limits\n14.\tBoardGames\u003c-BoardGames[100\u003eBoardGames$details.minage \u0026 BoardGames$details.minage\u003e0,]\n15.\t\n16.\t##### Rating limints according to the site\n17.\tBoardGames\u003c-BoardGames[BoardGames$stats.average\u003e0 \u0026 BoardGames$stats.average\u003c=10,]\n18.\t\n19.\t##### Difficulty limits according to the site\n20.\tBoardGames\u003c-BoardGames[BoardGames$stats.averageweight\u003e0 \u0026 BoardGames$stats.averageweight\u003c=5,]\n21.\t\n22.\t##### Number of Players limits\n23.\tBoardGames\u003c-BoardGames[100\u003eBoardGames$details.maxplayers \u0026 BoardGames$details.maxplayers\u003e0,]\n24.\tBoardGames\u003c-BoardGames[!is.na(BoardGames$details.maxplayers),] #Remove NULL values\n25.\t\n26.\tBoardGames\u003c-BoardGames[100\u003eBoardGames$details.minplayers \u0026 BoardGames$details.minplayers\u003e0,]\n27.\tBoardGames\u003c-BoardGames[!is.na(BoardGames$details.minplayers),] #Remove NULL values\n28.\t\n29.\t##### Filter the attribute counts\n30.\tBoardGames\u003c-BoardGames[BoardGames$attributes.total\u003e0,]\n31.\t\n32.\t##### Removing characters from dataset\n33.\tBoardGames\u003c-BoardGames[!grepl('\u003c',BoardGames$details.name),]\n34.\tBoardGames\u003c-BoardGames[!grepl('\u003e',BoardGames$details.name),]\n```\n\nAlso, how we created the dimension tables:\n\n```\n1.\tnameDim\u003c-as.data.frame(unique(BoardGames$details.name))\n2.\tnames(nameDim)\u003c-paste(\"NameLabel\")\n3.\tnameDim$id \u003c- seq.int(nrow(nameDim))\n4.\twrite.csv(nameDim,'nameDim.csv',row.names = FALSE)\n5.\t\n6.\t#### Replace the details.name with id\n7.\tBoardGames$details.name\u003c- nameDim$id[match(BoardGames$details.name,nameDim$NameLabel)]\n8.\t\n9.\t#### Convert details.name into id\n10.\tcolnames(BoardGames)[6]\u003c-\"id\"\n\n11.\t##### Separate the multivalues in different rows (by categories)\n12.\tbridge_categories\u003c-subset(BoardGames,select=c(id,attributes.boardgamecategory))\n13.\tbridge_categories\u003c-separate_rows(bridge_categories,attributes.boardgamecategory,convert = TRUE, sep = \",\")\n14.\t\n15.\t##### Separate the multivalues in different rows (by mechanic)\n16.\tbridge_mechanic\u003c-subset(BoardGames,select=c(id,attributes.boardgamemechanic))\n17.\tbridge_mechanic\u003c-separate_rows(bridge_mechanic,attributes.boardgamemechanic,convert = TRUE, sep = \",\")\n\n18.\t##########################################################################\n19.\t###################### Create Category Dimention #########################\n20.\t##########################################################################\n\n21.\tcategoryDim\u003c-as.data.frame(unique(bridge_categories$attributes.boardgamecategory))\n22.\tnames(categoryDim)\u003c-paste(\"CategoryLabel\")\n23.\tcategoryDim$CategoryID \u003c- seq.int(nrow(categoryDim))\n24.\twrite.csv(categoryDim,'categoryDim.csv',row.names = FALSE)\n\n25.\t##########################################################################\n26.\t###################### Create mechanic Dimention #########################\n27.\t##########################################################################\n\n28.\tmechanicDim\u003c as.data.frame(unique(bridge_mechanic$attributes.boardgamemechanic))\n29.\tnames(mechanicDim)\u003c-paste(\"MechanicLabel\")\n30.\tmechanicDim$MechanicID \u003c- seq.int(nrow(mechanicDim))\n31.\twrite.csv(mechanicDim,'mechanicDim.csv',row.names = FALSE)\n\n32.\t##########################################################################\n33.\t###################### Create yearpublished Dimention ####################\n34.\t##########################################################################\n\n35.\tyearDim\u003c-as.data.frame(unique(BoardGames$details.yearpublished))\n36.\tnames(yearDim)\u003c-paste(\"YearLabel\")\n37.\tyearDim$id \u003c- seq.int(nrow(yearDim))\n38.\twrite.csv(yearDim,'yearDim.csv',row.names = FALSE)\n\n39.\t##########################################################################\n40.\t###################### Create maxplayers Dimention #######################\n41.\t##########################################################################\n\n42.\tmaxplayersDim\u003c-as.data.frame(unique(BoardGames$details.maxplayers))\n43.\tnames(maxplayersDim)\u003c-paste(\"maxplayersLabel\")\n44.\tmaxplayersDim$id \u003c- seq.int(nrow(maxplayersDim))\n45.\twrite.csv(maxplayersDim,'maxplayersDim.csv',row.names = FALSE)\n\n46.\t##########################################################################\n47.\t###################### Create minage Dimention ###########################\n48.\t##########################################################################\n\n49.\tminageDim\u003c-as.data.frame(unique(BoardGames$details.minage))\n50.\tnames(minageDim)\u003c-paste(\"minageLabel\")\n51.\tminageDim$id\u003c- seq.int(nrow(minageDim))\n52.\twrite.csv(minageDim,'minageDim.csv',row.names = FALSE)\n\n53.\t##########################################################################\n54.\t###################### Create minplayers Dimention #######################\n55.\t##########################################################################\n\n56.\tminplayersDim\u003c-as.data.frame(unique(BoardGames$details.minplayers))\n57.\tnames(minplayersDim)\u003c-paste(\"minplayersLabel\")\n58.\tminplayersDim$id \u003c- seq.int(nrow(minplayersDim))\n59.\twrite.csv(minplayersDim,'minplayersDim.csv',row.names = FALSE)\n```\n\nCreating the bridge tables for categories and mechanic:\n```\n1.\tbridge_categories$attributes.boardgamecategory\u003c- categoryDim$CategoryID[match(bridge_categories$attributes.boardgamecategory,categoryDim$CategoryLabel)]\n2.\tcolnames(bridge_categories)[2]\u003c-\"categoryID\"\n3.\twrite.csv(bridge_categories,'bridge_categories.csv',row.names = FALSE)\n\n4.\tbridge_mechanic$attributes.boardgamemechanic \u003c- mechanicDim$MechanicID[match(bridge_mechanic$attributes.boardgamemechanic,mechanicDim$MechanicLabel)]\n5.\tcolnames(bridge_mechanic)[2]\u003c-\"mechanicID\"\n6.\twrite.csv(bridge_mechanic,'bridge_mechanic.csv',row.names = FALSE)\n```\n\nFinally, creating the fact table:\n\n```\n1.\t BoardGames$details.maxplayers\u003c- maxplayersDim$id[match(BoardGames$details.maxplayers,maxplayersDim$maxplayersLabel)]\n2.\tBoardGames$details.yearpublished\u003c- yearDim$id[match(BoardGames$details.yearpublished,yearDim$YearLabel)]\n3.\tBoardGames$details.minage\u003c- minageDim$id[match(BoardGames$details.minage,minageDim$minageLabel)]\n4.\tBoardGames$details.minplayers\u003c- minplayersDim$id[match(BoardGames$details.minplayers,minplayersDim$minplayersLabel)]\n\n5.\twrite.csv(BoardGames,'fact_table.csv',row.names = FALSE) \n```\n\nAfter cleaning procedure we have a dataset with:\n\n•\t10 CSVs\no\tFact_table.csv\no\tBridge_categories.csv\no\tBridge_mechanic.scv\no\tMechanicDim.csv\no\tCategoriesDim.csv\no\tmaxplayersDim.csv\no\tminplayersDim.csv\no\tminageDim.csv \no\tnameDim.csv\no\tyearDim.csv\n•\t21.371 rows \u0026 18 columns\n•\tTotal size 2.6 MB\n\n### Load Dataset to SQL Server\nIn this step firstly we create the database DATADB2 in Microsoft SQL Server.\n\nWe run the following code:\n\n```\n1.\t#DB connection            \n2.\tdbhandle \u003c- odbcDriverConnect('driver={SQL Server};server=.;database=dmbiDB;trusted_connection=true')\n3.\t\n4.\t\n5.\t#Bulk insert Fact table\n6.\tsqlSave(dbhandle, BoardGames, tablename = \"fact\")\n7.\t\n8.\t#Insert Dimentions\n9.\tsqlSave(dbhandle, nameDim, tablename = \"nameDim\")\n10.\tsqlSave(dbhandle, maxplayersDim, tablename = \"maxplayersDim\")\n11.\tsqlSave(dbhandle, yearDim, tablename = \"yearDim\")\n12.\tsqlSave(dbhandle, minageDim, tablename = \"minageDim\")\n13.\tsqlSave(dbhandle, minplayersDim, tablename = \"minplayersDim\")\n14.\t\n15.\t#Insert category and mechanic tables\n16.\tsqlSave(dbhandle, categoryDim, tablename = \"categoryDim\")\n17.\tsqlSave(dbhandle, mechanicDim, tablename = \"mechanicDim\")\n18.\t\n19.\t#Insert bridge tables\n20.\tsqlSave(dbhandle, bridge_categories, tablename = \"bridge_categories\")\n21.\tsqlSave(dbhandle, bridge_mechanic, tablename = \"bridge_mechanic\")  \n\n```\n### Star Schema\nIn this step we have to create table relationships. We have attached the database Schema in the directory.\n\n## Statistical Analysis\n\n### SPSS\n\nIn SPSS Statistics, we used fifteen (15) variables: 1) “stats.average”, which is the average score for every game, 2) “stats.wishing”, the users who wish to get the game, 3) “details.minplayers”, the minimum players to play the game, 4)  “details.maxplaytime”, the maximum playing time, 5) “details.minage” the minimum age to play the game, 6) “details.maxplayers” the maximum players to play the game, 7) “attributes.total” the tags description for the game, 8) “stats.averageweight” the average difficulty of the game, 9) “stats.trading”, the users who want to trade the game, 10) “details.minplaytime”, the minimum playing time, 11) “stats.numweights”, the number of users who give the difficulty value, 12) “stats.owned”, the number of users who own the game , 13) “stats.wanting”, the number of users who want the game, 14) “stats.numcomments”, the number of users who comment about the game, 15) “stats.usersrated”, the number of users who rated the game.\n\n### Correlations\n\nCorrelations tell you what columns are closely related to the column you are interested in. The closer to 0 the correlation, the weaker the connection. The closer to 1, the stronger the positive correlation, and the closer to -1, the stronger the negative correlation.\nAs we see above a couple of columns show higher values of correlating with our average_rating column. The average_weight column seems to be correlated with our average_rating column implying the more \"weight\" a game has the more highly it tends to be rated. Weight is a subjective measure that is made up by BoardGameGeek. It describes how \"deep\" or involved a game is.\nWe can also note that games for older players, where minage is high, tend to have higher average rating. The yearpublished correlation values tell us that newer games tend to have a higher rating.\n\n\n##Visualizations in Tableau\n\n### General Trend\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture.PNG)\n\n### Difficulty Trend\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture2.PNG)\n\n### Playing Time Trend\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture4.PNG)\n\n### Playing Time Trend For Each Category\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture3.PNG)\n\n### Game Categories \u0026 Mechanic Type Distribution\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture6.PNG)\n\n### Ratings Distribution \u0026 Ratings Trend\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture7.PNG)\n\n### Difficulty among Categories \u0026 Mechanics Type\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture8.PNG)\n\n### Top 10 Caategories\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture9.PNG)\n\n### Top 10 Games\n![alt text](https://github.com/ggeop/Board-Game-Analysis/blob/master/Visualizations/Capture5.PNG)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fggeop%2Fboard-game-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fggeop%2Fboard-game-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fggeop%2Fboard-game-analysis/lists"}