https://github.com/pkrystian/datasetgames_project
A machine learning project harnessing Kaggles game and sales dataset to generate forecasts for potential sales of a specific example game, relying on a handful of crucial factors.
https://github.com/pkrystian/datasetgames_project
Last synced: 4 months ago
JSON representation
A machine learning project harnessing Kaggles game and sales dataset to generate forecasts for potential sales of a specific example game, relying on a handful of crucial factors.
- Host: GitHub
- URL: https://github.com/pkrystian/datasetgames_project
- Owner: PKrystian
- Created: 2022-12-06T11:25:03.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-01-27T19:30:08.000Z (over 2 years ago)
- Last Synced: 2025-01-14T02:23:49.088Z (5 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 1010 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Machine Learning Game Sales Project
## Games Dataset - [kaggle link](https://www.kaggle.com/datasets/gregorut/videogamesales?resource=download)
## Description:
Utilizing kaggle's dataset of games and game sales, program will create a prediction on how many sales a given example game could sell based on few important features.## Operation scheme
First, after importing the data from the csv file, we start analyzing our dataset.
Here we see the 5 best selling games from our dataset:
Now we check for irregularities through plotbox:
We can see that below 1995 there are only single game records, later after analysis we will remove them.
Here is the heatmap of columns with correlation. We can see their little degree of connections.
After analyzing the dataset, we start removing unnecessary information. At the beginning we rename the columns for aesthetic reasons, later we remove unnecessary ones, e.g. Name or Rank. Then we delete individual data, including games released before 1995.
Now we check how does plotbox look after cleaning:
Now we normalize the data through one hot encoder function.
Here's how our dataset looks like now.
After cleaning the data, we can start training the model, we have chosen PCA, which will reduce the number of data only to the most important.
Here's how our weight graph looks like.
We see a low degree of importance of the data, which means that our data does not have a sufficiently large correlation with the Global sales.
After training the model, we obtained a percentage result with a degree of: **0.0422979797979798**## Conclusion:
Based on the results of trained models, we can conclude that we have too little correlating information in our Dataset. The changes that we can apply include increasing the data range or adding new columns along a greater correlation with sales (e.g. game budget).