Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wkwan/perch

Mining Game Analytics from Steam and Social Media
https://github.com/wkwan/perch

data-mining datamining game-analytics gameanalytics python social-media social-media-analytics steam steam-api tiktok tiktok-api twitch twitch-api youtube youtube-api youtube-api-v3

Last synced: 15 days ago
JSON representation

Mining Game Analytics from Steam and Social Media

Awesome Lists containing this project

README

        

# Perch
> 📈 Mining Game Analytics from Steam and Social Media

Perch aggregates game data from Steam, Twitch, YouTube, and TikTok to help you do market research on video games.

This repo contains all the backend code to retrieve the data (by querying API's and webscraping), organize it with algorithms and AI, and save it to a PostgreSQL database.

### If you just want to see the data, you can use the website: [Perch.gg](https://perch.gg)

## Why Perch?

Perch has 2 big advantages over other game market research tools:

#### 1. Not platform-specific

Other tools like SteamDB, VG Insights, TwitchTracker, etc. have more in-depth metrics for specific platforms, while Perch's goal is to show a simple holistic overview of game performance across every platform. Aggregating the data in a useful way is hard, especially because most social media API's besides Twitch lack metadata for what game a piece of content is about.

#### 2. You can read the code to understand what the data actually means

Steam and social media recommendation algorithms are mysterious and constantly changing, and a lot of important metrics available from closed-source market research tools like game sales, social media revenue, etc. can only be approximated very roughly. The point of market research is to help you understand the market, so it helps to understand how the numbers are calculated.

## Installation

Tested with Python 3.12.8. Don't use Python 3.13, _psycopg2_binary_ currently doesn't support it.

1. Clone this repo
```shell
git clone https://github.com/wkwan/perch.git
cd perch
```

2. Created a new Python virtual environment named _venv_

```shell
python -m venv venv
```

3. Activate the environment

#### Windows PowerShell
```shell
.\venv\Scripts\activate
```

#### macOS/Linux
```shell
source venv/bin/activate
```

4. Install the Python packages

You might get an error saying you need _Microsoft C++ Build Tools_ which you can get here: https://visualstudio.microsoft.com/visual-cpp-build-tools/
```shell
pip install -r requirements.txt
```

[requirements.txt](requirements.txt) is generated with pipreqs. If you add Python packages, you can regenerate it:

```shell
pip install pipreqs
pipreqs . --force --ignore venv
```

## Authentication Credentials and Configuration

Required credentials and other configuration variables are defined as environment variables in [.env](.env). Replace the placeholders in [.env](.env) or set the environment variables on your machine yourself ([.env](.env) won't override your local environment variables).

You can setup a free PostgreSQL local database using the official documentation at https://www.postgresql.org/

The unofficial TikTok API from RapidAPI is a paid service: https://rapidapi.com/Lundehund/api/tiktok-api23

Download ChromeDriver here: https://developer.chrome.com/docs/chromedriver/downloads

## Mining Data

With your venv activated:
```shell
python scheduler.py
```

This fetches the data and saves it to your database. It fetches immediately and then hangs until the start of the next hour (1pm, 2pm, 3pm, etc).

For faster debugging with fewer requests:
```shell
python scheduler.py --fast
```

Steam and Twitch data will be fetched every hour. YouTube and TikTok data will be fetched every 25 hours because of rate limits.

[scheduler.py](scheduler.py) uses a lock file to prevent multiple processes running it simultaneously. This means you can schedule a cron job to make sure it restarts when it fails, and it won't lead to multiple processes mining data redundantly.

For example, to setup a cron job on a Linux server that tries to start [scheduler.py](scheduler.py) every minute, open your crontab file with:

```shell
crontab -e
```

And add this:
```
* * * * * //venv/bin/python //scheduler.py
```

## Contributing

So far I've written all the code for Perch myself. AMA about Perch when I'm working on it live at: https://twitch.tv/willkwan

Schedule: Sunday and Monday, 11am-9pm PT

### 3 ways to contribute:

1. Buy a subscription to [Perch.gg](https://perch.gg)
2. Pull requests
3. Twitch subscriptions (you can subscribe to 1 Twitch channel for free every month if you have Amazon Prime, and I get up to $2.25 for each sub depending on what country you're in)

## Master Plan

### Phase 1
Do whatever it takes to maximize active Perch.gg paid subscriptions and community code contributions.

### Phase 2
Expand to other big data problems in games.

## Licensing

This project is licensed under the MIT License. See [LICENSE](LICENSE).