https://github.com/cagandemirmr/game_chi_and_a-b_testing-sql
In this repo, i use hypotetical mobile game company data set is used to make chi square test and A/B testing
https://github.com/cagandemirmr/game_chi_and_a-b_testing-sql
ab-testing chisquare-test gameanalysis gameanalytics postgresql sql
Last synced: 14 days ago
JSON representation
In this repo, i use hypotetical mobile game company data set is used to make chi square test and A/B testing
- Host: GitHub
- URL: https://github.com/cagandemirmr/game_chi_and_a-b_testing-sql
- Owner: cagandemirmr
- Created: 2024-11-15T08:19:32.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-15T18:18:15.000Z (over 1 year ago)
- Last Synced: 2025-03-13T23:12:52.144Z (over 1 year ago)
- Topics: ab-testing, chisquare-test, gameanalysis, gameanalytics, postgresql, sql
- Homepage:
- Size: 1.13 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ฎ Game Analysis with A/B Testing and Chi-Square Test
## ๐ Project Overview
This project focuses on analyzing game-related user data to evaluate the effectiveness of different onboarding variants using **A/B testing** and the **Chi-Square test**. The goal is to determine whether variant changes influence user engagement and purchases. For this analysis, I utilized **SQL** to gather insights and perform statistical tests.
---
## ๐ Data Import and Setup
- Imported CSV files into SQL tables using the `CREATE TABLE` statement and loaded data using the file paths.
- The dataset consists of artificially generated data using ChatGPT, simulating user behaviors and actions in a game environment.

---
## ๐ About the Dataset
### 1. Experiment Assignments (`exp_analysis`)
- **Rows**: 100,172
- **Columns**: 4 (`exp_name`, `user_id`, `exp_date`, `variant`)
- **Details**:
- `exp_name`: Contains a single experiment labeled "Onboarding".
- `user_id`: Contains **100,172 distinct values**.
- `exp_date`: Ranges from **2020-01-01** to **2020-02-19**, with 50 unique dates.
- `variant`: Two groups - **Variant 1** and **Control Group**.
### 2. Game Actions (`game_actions`)
- **Rows**: 124,185
- **Columns**: 3 (`user_id`, `action`, `action_date`)
- **Details**:
- `action`: Includes "onboarding complete" and "email_optin".
- `user_id`: **87,124 distinct values**.
- `action_date`: Ranges from **2020-01-01** to **2020-02-19**, with 50 unique dates.
### 3. Game Purchases (`game_purchases`)
- **Rows**: 12,897
- **Columns**: 3 (`user_id`, `purch_date`, `amount`)
- **Details**:
- `user_id`: **9,969 distinct values**.
- `purch_date`: Ranges from **2020-01-01** to **2020-03-11**, with 70 unique dates.
- `amount`: Contains 6 distinct purchase values: **2.99, 10, 25, 50, 100, 999**.

---
## ๐งช A/B Testing & Chi-Square Analysis
### 1. Initial Chi-Square Test
- Conducted a **Chi-Square test** online to compare the success rates between **Variant 1** and the **Control Group**.
- **Result**: Variant 1 was significantly more successful.

### 2. Calculating Statistical Metrics
- Used SQL functions (`AVG`, `STDDEV`) to compute **mean** and **standard deviation** for further statistical analysis.

### 3. Updated Chi-Square Test
- Recalculated the Chi-Square test with SQL-derived metrics.
- **Result**: No significant difference between **Variant 1** and **Control** after recalculating.
---
## ๐ฏ Hypothesis Testing
### 1. Onboarding Completion Analysis
- Joined the `game_actions` table to assess the total number of users who completed onboarding.
- **Result**: The null hypothesis was rejected, indicating a significant difference between the two variants.

### 2. Success Metric: Purchase Behavior
- Redefined the success metric as **user purchases** and reran the analysis.
- **Result**: Significant difference between the two variants based on purchase behavior.

### 3. 7-Day Interval Analysis
- Performed an analysis to evaluate differences over a 7-day interval.
- **Result**: No significant difference between the variants in a week-by-week comparison.

### 4. Time Period Analysis
- Analyzed differences between two distinct time periods.
- **Result**: Significant differences observed, but reliability may be affected by external factors.

### 5. Country-Based Analysis (Canada vs. USA)
- Compared user behavior between Canada and the USA.
- **Result**: Differences were noted, but conclusions may be unreliable due to varying population sizes and demographic factors.

---
## ๐ Insights and Conclusions
1. **Variant 1** initially appeared to be more effective than the Control group based on onboarding actions.
2. However, when shifting the metric to actual **purchases**, the difference between the variants became less clear.
3. The time-based analysis revealed fluctuations that could be due to external factors, making it necessary to control for variables like seasonality or promotional events.
4. Country-specific analyses suggest that demographic factors may influence game engagement and purchases.
---
## ๐ ๏ธ Tools and Technologies
- **SQL**: Data extraction, transformation, and analysis.