https://github.com/hgabrali/masterschool-python-data-analysis-starter
A standardized, best-practice, and bilingual curriculum template for Data Analysis projects. Focuses on mastering core Python libraries (Pandas, NumPy) and the **CRISP-DM** methodology, covering essential steps from Data Assessment to advanced Data Cleaning and Integration. **Content is structured for both Turkish and English learners.*
https://github.com/hgabrali/masterschool-python-data-analysis-starter
data-analysis-python data-cleaning data-science data-wrangling datascience english masterschool multilingual multilingual-translations pandas pandas-dataframe python starter-template turkce-kaynak turkish
Last synced: 5 months ago
JSON representation
A standardized, best-practice, and bilingual curriculum template for Data Analysis projects. Focuses on mastering core Python libraries (Pandas, NumPy) and the **CRISP-DM** methodology, covering essential steps from Data Assessment to advanced Data Cleaning and Integration. **Content is structured for both Turkish and English learners.*
- Host: GitHub
- URL: https://github.com/hgabrali/masterschool-python-data-analysis-starter
- Owner: hgabrali
- License: mit
- Created: 2025-10-04T12:43:25.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-10-06T23:00:09.000Z (5 months ago)
- Last Synced: 2025-10-09T07:03:07.612Z (5 months ago)
- Topics: data-analysis-python, data-cleaning, data-science, data-wrangling, datascience, english, masterschool, multilingual, multilingual-translations, pandas, pandas-dataframe, python, starter-template, turkce-kaynak, turkish
- Language: Jupyter Notebook
- Homepage: https://de.masterschool.com/
- Size: 975 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ Masterschool - Python Data Analysis Starter
This repository serves as a foundational and **bilingual** curriculum template for mastering core **Data Wrangling** and **Data Analysis** techniques using the **Pandas** library in Python. It provides organized documentation and hands-on Google Colab exercises, following a structured Data Mining methodology.
๐ **Unique Feature:** To support deeper technical understanding, the repository includes **Turkish-English technical study notes** created during the Masterschool curriculum, designed to clarify complex concepts and terminology in both languages.
---
## ๐งญ Project Navigation (Table of Contents)
The core curriculum documentation is organized into sequential Markdown files (M.D. files). These files correspond to the main phases of a data project, from initial setup to final cleaning.
| File Path | Description |
| :--- | :--- |
| ๐ผ [01. Pandas Foundations.md](01_Pandas_Foundations.md) | **Introduction to Pandas:** Fundamental concepts, including the **Series** and **DataFrame** structures, essential indexing (`.loc[]`, `.iloc[]`), and basic aggregation. |
| ๐ [02. Data Wrangling.md](02_Data_Wrangling.md) | **Data Wrangling Overview:** Defines the process, its importance within the **CRISP-DM** framework, and the crucial steps of **Assessment** and **Cleaning**. |
| ๐ค [03. Data Integration.md](03_Data_Integration.md) | **Combining Datasets:** Techniques for joining and merging data, including **Concatenation** (`pd.concat()`) for stacking, and **SQL-style Joins** (`pd.merge()`: Inner, Left, Right, Outer). |
| ๐ [04. Data Assessment.md](04_Data_Assessment.md) | **Identifying Data Quality Issues:** Methods for checking data types, reviewing dimensionality (`.shape`), and detecting early signs of errors (nulls, duplicates, inconsistencies). |
| ๐งผ [05. Data Cleaning.md](05_Data_Cleaning.md) | **Data Transformation and Correction:** Comprehensive techniques for handling duplicates, managing missing values (**Imputation** and **Deletion**), and performing complex **String Manipulation** (e.g., `.split()`, `.replace()`). |
| ๐ข [06. Aggregating information and applying.md](06_Aggregating_information_and_applying.md) | **Data Summarization:** Methods for calculating statistics across the dataset, focusing on **aggregation functions** (`.sum()`, `.mean()`) and preparing for **grouping** (`.groupby()`). |
---
---
### ๐ Colab Links & Exercises
* ๐ผ [Introduction to Pandas Series](https://colab.research.google.com/drive/1vI65qFNIcqAGb11k5JJAedJF_wBjLi3n#scrollTo=Gir2rJtsd0aT)
* ๐ [Understanding DataFrames](https://colab.research.google.com/drive/1oXqNn54G8WrNfZlXQ08qzVX8xfJchdmy)
* ๐๏ธ [Pandas Foundations](https://colab.research.google.com/drive/1JPlLGtMkMhvTbJ_DIz8I2NEOCCXT1kqv#scrollTo=sy8miZEoTKhe)
* ๐ [Data Wrangling & Integration](https://colab.research.google.com/drive/1kVIzB9atUmTqN_1W7I377K--ww5YMHX8#scrollTo=xRMW7sXq6BXc)
* ๐งผ [Data Cleaning](https://colab.research.google.com/drive/1uxzTS-o8fwGFyKKvQnwm6nC2wjXcmeD7)
* ๐ข [Aggregating Information and Applying](https://colab.research.google.com/drive/1cEVaitv3D4TzSCqCbs8Af9oWoxU5bMS_#scrollTo=5pwkr2zNzVxA)
* ๐ [Exploratory Data Analysis (EDA)](https://colab.research.google.com/drive/1IdQkw2xNS7aCbGAlmyvwotfnpXm_IY1G#scrollTo=WsFInjJI0Axv)
---
---
### Prerequisites
To get the most out of this material, you should have:
* A basic understanding of Python syntax.
* Access to a Google account for using the Colab notebooks.