Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/archie-cm/mobile_game_analysis_real-time_pipeline_with_pubsub_and_dataflow
This project demonstrates how to build a real-time analytics pipeline for mobile game data using Google Cloud Pub/Sub and Apache Beam (Dataflow).
https://github.com/archie-cm/mobile_game_analysis_real-time_pipeline_with_pubsub_and_dataflow
apache-beam dataflow streaming-data
Last synced: 2 days ago
JSON representation
This project demonstrates how to build a real-time analytics pipeline for mobile game data using Google Cloud Pub/Sub and Apache Beam (Dataflow).
- Host: GitHub
- URL: https://github.com/archie-cm/mobile_game_analysis_real-time_pipeline_with_pubsub_and_dataflow
- Owner: archie-cm
- Created: 2025-01-08T10:12:35.000Z (14 days ago)
- Default Branch: main
- Last Pushed: 2025-01-08T15:07:03.000Z (13 days ago)
- Last Synced: 2025-01-20T05:46:07.316Z (2 days ago)
- Topics: apache-beam, dataflow, streaming-data
- Language: Python
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Mobile Game Analysis Real-Time Pipeline with Pub/Sub and Dataflow
This project demonstrates how to build a real-time analytics pipeline for mobile game data using Google Cloud Pub/Sub and Apache Beam (Dataflow). The pipeline processes game events, analyzes player and team scores, and calculates weapon performance in battles, all in real time.
## Table of Contents
- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Pipeline Process](#pipeline-process)
- [1. Upload Source File](#1-upload-source-file)
- [2. Set Up Pub/Sub](#2-set-up-pubsub)
- [3. Run `publish.py`](#3-run-publishpy)
- [4. Set Up Dataflow](#4-set-up-dataflow)
- [5. Analyze Player and Team Scores](#5-analyze-player-and-team-scores)
- [6. Analyze Weapon Performance](#6-analyze-weapon-performance)
- [Windowing, Trigger, and Watermark Explained](#windowing-trigger-and-watermark-explained)
- [Windowing](#windowing)
- [Tumbling Window](#tumbling-window)
- [Sliding Window](#sliding-window)
- [Session Window](#session-window)
- [Global Window](#global-window)
- [Trigger](#trigger)
- [Watermark](#watermark)
- [License](#license)---
## Overview
This pipeline enables real-time analysis of mobile game data by leveraging Pub/Sub for data ingestion and Dataflow for streaming processing. Key features include:
- Analyzing player and team scores.
- Calculating weapon performance metrics.
- Writing results to Pub/Sub topics for further use.## Prerequisites
- Google Cloud account.
- GCP SDK installed.
- `gcloud` CLI configured.
- Python 3.x installed locally.## Pipeline Process
### 1. Upload Source File
Upload the source file `mobile_game.txt` to a Google Cloud Storage bucket:
```bash
gsutil cp mobile_game.txt gs:///
```### 2. Set Up Pub/Sub
1. Enable the Pub/Sub API:
```bash
gcloud services enable pubsub.googleapis.com
```
2. Create a topic:
```bash
gcloud pubsub topics create mobile-game-events
```
3. Create a subscription:
```bash
gcloud pubsub subscriptions create mobile-game-subscription --topic=mobile-game-events
```### 3. Run `publish.py`
Run the `publish.py` script to publish events from the `mobile_game.txt` file to the Pub/Sub topic.### 4. Set Up Dataflow
Enable the Dataflow API:
```bash
gcloud services enable dataflow.googleapis.com
```### 5. Analyze Player and Team Scores
Run the `score.py` script to process player and team scores:
```bash
python score.py
```
The script reads events from Pub/Sub, calculates scores using global windows with custom triggers, and writes results back to a Pub/Sub topic.### 6. Analyze Weapon Performance
Run the `weapon.py` script to calculate weapon performance:
```bash
python weapon.py
```
The script reads battle data from Pub/Sub, calculates average battle points using session windows, and writes results back to a Pub/Sub topic.## Windowing, Trigger, and Watermark Explained
### Windowing
Windowing allows data to be grouped based on temporal bounds. Here are the commonly used window types:#### Tumbling Window
- Fixed-size, non-overlapping windows.
- Each event belongs to exactly one window.
- Example: A 1-minute tumbling window processes events from `00:00` to `00:01`.#### Sliding Window
- Fixed-size windows that overlap.
- Each event can belong to multiple windows.
- Example: A 1-minute window with a 30-second slide processes events in overlapping intervals (`00:00-00:01`, `00:00:30-00:01:30`, etc.).#### Session Window
- Captures events that occur within a defined gap duration.
- New events extend the session if they arrive within the gap.
- Example: A session window with a 5-minute gap groups events until there is a 5-minute period of inactivity.#### Global Window
- Processes all data in a single, unbounded window.
- Useful for continuous aggregations with custom triggers.### Trigger
Triggers define when results are emitted for a window. Common triggers include:
1. **Event-time trigger (default):** Emits results based on the event timestamp and watermark.
2. **Processing-time trigger:** Emits results based on the system clock.
3. **AfterWatermark:** Emits results when the watermark passes the end of the window.
4. **AfterCount:** Emits results after a specified number of elements arrive.
5. **Repeatedly:** Re-applies a trigger after the initial trigger fires.### Watermark
A watermark tracks the progress of event time in a pipeline. It defines the point in time at which all earlier events are expected to have arrived. Late events can be handled using:
- **Allowed lateness:** Specifies how late events are accepted.
- **Dropping or updating results:** Determines how late events are treated.## License
This project is licensed under the MIT License. See the LICENSE file for details.