https://github.com/mepland/steam_ana
Analysis of Steam Data
https://github.com/mepland/steam_ana
Last synced: about 2 months ago
JSON representation
Analysis of Steam Data
- Host: GitHub
- URL: https://github.com/mepland/steam_ana
- Owner: mepland
- License: mit
- Created: 2019-05-04T01:32:43.000Z (about 6 years ago)
- Default Branch: main
- Last Pushed: 2023-05-18T03:08:12.000Z (about 2 years ago)
- Last Synced: 2025-02-08T11:15:34.677Z (3 months ago)
- Language: Jupyter Notebook
- Size: 1.52 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Steam Analysis
Matthew Epland
[phy.duke.edu/~mbe9](http://www.phy.duke.edu/~mbe9)## Cloning the Repository
ssh
```bash
git clone [email protected]:mepland/steam_ana.git
```https
```bash
git clone https://github.com/mepland/steam_ana.git
```
## Installing Dependencies
It is recommended to work in a `virtualenv` to avoid clashes with other installed software. A useful extension for this purpose is [`virtualenvwrapper`](https://virtualenvwrapper.readthedocs.io/en/latest/). Follow the instructions in the documentation to install and initialize wrapper before continuing.```bash
mkvirtualenv newenv
pip install -r requirements.txt
jupyter nbextension enable --py widgetsnbextension
```## Getting the Data
### Download and Extract
Download `steam.sql.gz` from [steam.internet.byu.edu](http://steam.internet.byu.edu/), then extract with gunzip. Note that the download size is 18GB and the extracted size is 161GB so you'll probably need a decent workstation.```bash
wget http://steam.phoenixteam.net/steam.sql.gz
gunzip steam.sql.gz
```### Setup MySQL and Load the SQL Dump
```bash
mysql -u root -p
```
```sql
/* setup user and database */
GRANT ALL PRIVILEGES ON *.* TO 'user'@'localhost' IDENTIFIED BY 'pw';
```
```bash
mysql -u user -password=pw
```
```sql
CREATE DATABASE steamdb; USE steamdb;/* load the SQL dump */
SET autocommit=0;
source steam.sql;
COMMIT;
```### Extract the Needed Data to CSV
```bash
# First look
mysql -u user --password=pw --database=steamdb --execute='SELECT steamid, appid, playtime_forever FROM Games_1 WHERE playtime_forever > 120 LIMIT 50;' -q -n -B -r > test_out.tsv
```Now save values out to create graph ([top 5 games per user](https://www.databasejournal.com/features/mysql/selecting-the-top-n-results-by-group-in-mysql.html), each having at least 120 minutes of play time)
Raw command:
```sql
SELECT steamid, appid, playtime_forever
FROM
(
SELECT steamid, appid, playtime_forever,
@steamid_rank := IF(@current_steamid = steamid,
@steamid_rank + 1,
1
) AS steamid_rank,
@current_steamid := steamid
FROM Games_1
WHERE playtime_forever > 120
ORDER BY steamid, playtime_forever DESC
) ranked
WHERE steamid_rank <= 5;
mysql -u root -p
```
```bash
# piped to tsv (avoids file system permission errors)
mysql -u user --password=pw --database=steamdb --execute='SELECT steamid, appid, playtime_forever FROM ( SELECT steamid, appid, playtime_forever, @steamid_rank := IF(@current_steamid = steamid, @steamid_rank + 1, 1) AS steamid_rank, @current_steamid := steamid FROM Games_1 WHERE playtime_forever > 120 ORDER BY steamid, playtime_forever DESC) ranked WHERE steamid_rank <= 5' -q -n -B -r > games_1.tsv# save out all player libraries to test predictions
mysql -u user --password=pw --database=steamdb --execute='SELECT steamid, appid, playtime_forever FROM Games_1 ORDER BY steamid;' -q -n -B -r > all_players.csv# convert tsv to csv
mv games_1.tsv games_1.csv
sed -i '/\t/ s//,/g' games_1.csv# get titles
mysql -u user --password=pw --database=steamdb --execute='SELECT appid, Title FROM App_ID_Info WHERE Type = "game";' -q -n -B -r > app_title.csv && sed -i '/\t/ s//,/g' app_title.csv# get genres
mysql -u user --password=pw --database=steamdb --execute='SELECT appid, Genre FROM Games_Genres;' -q -n -B -r > app_genres.csv && sed -i '/\t/ s//,/g' app_genres.csv
```