https://github.com/emredemirbas/google-playstore-eda
Exploratory Data Analysis (EDA) of the Google Play Store dataset — examining trends in app ratings, categories, pricing, and user engagement using Python and LaTeX.
https://github.com/emredemirbas/google-playstore-eda
exploratory-data-analysis matplotlib numpy pandas python seaborn
Last synced: 3 months ago
JSON representation
Exploratory Data Analysis (EDA) of the Google Play Store dataset — examining trends in app ratings, categories, pricing, and user engagement using Python and LaTeX.
- Host: GitHub
- URL: https://github.com/emredemirbas/google-playstore-eda
- Owner: emredemirbas
- License: mit
- Created: 2025-10-19T20:16:52.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-10-27T07:23:41.000Z (3 months ago)
- Last Synced: 2025-10-27T09:21:40.630Z (3 months ago)
- Topics: exploratory-data-analysis, matplotlib, numpy, pandas, python, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 6.81 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Exploratory Data Analysis on Google Play Store Dataset
> A comprehensive analysis of Google Play Store apps to uncover insights on user ratings, categories, pricing, and popularity trends.
## Overview
This repository presents an Exploratory Data Analysis (EDA) of the **Google Play Store dataset**.
The objective is to explore relationships between key app attributes — such as category, rating, price, content rating, and popularity — and to derive meaningful insights about user engagement and market trends.
The project includes:
- A clean and well-documented Jupyter Notebook (`google_play_store_eda.ipynb`)
- A compiled LaTeX report with figures and commentary (`google-playstore-eda-report.pdf`)
- Visualizations generated during analysis (in the `figures/` directory)
---
## Repository Structure
```
├── google_play_store_eda.ipynb # Final EDA notebook
├── google-playstore-eda-report.pdf # Final LaTeX report (compiled)
├── figures/ # Folder containing all visualizations
│ ├── fig1_ratings_distribution.png
│ ├── fig2_reviews_distribution.png
│ ├── ...
│ └── fig17_installs_by_android_version.png
├── googleplaystore.csv # Dataset used for analysis
└── LICENSE # MIT License
```
---
## Key Analytical Topics
1. **Data Preprocessing and Cleaning**
- Handled missing values and duplicates
- Converted non-numeric data (e.g., `"10,000+"`, `"19M"`) into numeric form
- Normalized column types and standardized categories
2. **Descriptive Analysis**
- Distribution of ratings, reviews, app size, and prices
- Analysis of free vs. paid apps
- Identification of dominant app categories
3. **Research Queries and Insights**
- Relationship between app type and user ratings
- Content rating vs. satisfaction levels
- Total installs by Android version
- Word frequency in app titles
- Popularity vs. satisfaction correlations
4. **Visualization**
- 17 detailed plots (bar plots, box plots, heatmaps, scatter plots)
- Figures integrated into the final LaTeX report
---
## Example Findings
- Most apps receive ratings between **4.0 and 4.5**, indicating strong user satisfaction.
- **Family**, **Game**, and **Tools** categories dominate the store, making up over **60%** of all apps.
- Paid apps show **slightly higher and more stable ratings** compared to free apps.
- **App popularity and rating** are weakly correlated — lesser-known apps can still achieve high satisfaction.
- Most apps maintain compatibility with **older Android versions (4.x)** for wider user reach.
---
The full written report is available here:
[View the LaTeX Report (google-playstore-eda-report.pdf)](google-playstore-eda-report.pdf)
It provides:
- A structured overview of the methods and results
- Captions and detailed interpretations for each figure
- Academic-level formatting suitable for reports and portfolios
---
## License
This project is licensed under the [MIT License](LICENSE).
You are free to use, modify, and distribute the material with proper attribution.
[](https://opensource.org/licenses/MIT)
---
## Acknowledgments
Dataset source: [Google Play Store Apps Dataset (Kaggle)](https://www.kaggle.com/datasets/lava18/google-play-store-apps)