https://github.com/hgabrali/masterschool-python-data-analysis-starter

A standardized, best-practice, and bilingual curriculum template for Data Analysis projects. Focuses on mastering core Python libraries (Pandas, NumPy) and the **CRISP-DM** methodology, covering essential steps from Data Assessment to advanced Data Cleaning and Integration. **Content is structured for both Turkish and English learners.*
https://github.com/hgabrali/masterschool-python-data-analysis-starter

data-analysis-python data-cleaning data-science data-wrangling datascience english masterschool multilingual multilingual-translations pandas pandas-dataframe python starter-template turkce-kaynak turkish

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/hgabrali/masterschool-python-data-analysis-starter
Owner: hgabrali
License: mit
Created: 2025-10-04T12:43:25.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-10-06T23:00:09.000Z (5 months ago)
Last Synced: 2025-10-09T07:03:07.612Z (5 months ago)
Topics: data-analysis-python, data-cleaning, data-science, data-wrangling, datascience, english, masterschool, multilingual, multilingual-translations, pandas, pandas-dataframe, python, starter-template, turkce-kaynak, turkish
Language: Jupyter Notebook
Homepage: https://de.masterschool.com/
Size: 975 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # 🐍 Masterschool - Python Data Analysis Starter

This repository serves as a foundational and **bilingual** curriculum template for mastering core **Data Wrangling** and **Data Analysis** techniques using the **Pandas** library in Python. It provides organized documentation and hands-on Google Colab exercises, following a structured Data Mining methodology.

🌟 **Unique Feature:** To support deeper technical understanding, the repository includes **Turkish-English technical study notes** created during the Masterschool curriculum, designed to clarify complex concepts and terminology in both languages.

---

## 🧭 Project Navigation (Table of Contents)

The core curriculum documentation is organized into sequential Markdown files (M.D. files). These files correspond to the main phases of a data project, from initial setup to final cleaning.

| File Path | Description |

| :--- | :--- |

| 🐼 [01. Pandas Foundations.md](01_Pandas_Foundations.md) | **Introduction to Pandas:** Fundamental concepts, including the **Series** and **DataFrame** structures, essential indexing (`.loc[]`, `.iloc[]`), and basic aggregation. |

| 🔗 [02. Data Wrangling.md](02_Data_Wrangling.md) | **Data Wrangling Overview:** Defines the process, its importance within the **CRISP-DM** framework, and the crucial steps of **Assessment** and **Cleaning**. |

| 🤝 [03. Data Integration.md](03_Data_Integration.md) | **Combining Datasets:** Techniques for joining and merging data, including **Concatenation** (`pd.concat()`) for stacking, and **SQL-style Joins** (`pd.merge()`: Inner, Left, Right, Outer). |

| 🔎 [04. Data Assessment.md](04_Data_Assessment.md) | **Identifying Data Quality Issues:** Methods for checking data types, reviewing dimensionality (`.shape`), and detecting early signs of errors (nulls, duplicates, inconsistencies). |

| 🧼 [05. Data Cleaning.md](05_Data_Cleaning.md) | **Data Transformation and Correction:** Comprehensive techniques for handling duplicates, managing missing values (**Imputation** and **Deletion**), and performing complex **String Manipulation** (e.g., `.split()`, `.replace()`). |

| 🔢 [06. Aggregating information and applying.md](06_Aggregating_information_and_applying.md) | **Data Summarization:** Methods for calculating statistics across the dataset, focusing on **aggregation functions** (`.sum()`, `.mean()`) and preparing for **grouping** (`.groupby()`). |

---

---

### 🚀 Colab Links & Exercises

* 🐼 [Introduction to Pandas Series](https://colab.research.google.com/drive/1vI65qFNIcqAGb11k5JJAedJF_wBjLi3n#scrollTo=Gir2rJtsd0aT)

* 📊 [Understanding DataFrames](https://colab.research.google.com/drive/1oXqNn54G8WrNfZlXQ08qzVX8xfJchdmy)

* 🏗️ [Pandas Foundations](https://colab.research.google.com/drive/1JPlLGtMkMhvTbJ_DIz8I2NEOCCXT1kqv#scrollTo=sy8miZEoTKhe)

* 🔗 [Data Wrangling & Integration](https://colab.research.google.com/drive/1kVIzB9atUmTqN_1W7I377K--ww5YMHX8#scrollTo=xRMW7sXq6BXc)

* 🧼  [Data Cleaning](https://colab.research.google.com/drive/1uxzTS-o8fwGFyKKvQnwm6nC2wjXcmeD7)

* 🔢 [Aggregating Information and Applying](https://colab.research.google.com/drive/1cEVaitv3D4TzSCqCbs8Af9oWoxU5bMS_#scrollTo=5pwkr2zNzVxA)

* 🔍 [Exploratory Data Analysis (EDA)](https://colab.research.google.com/drive/1IdQkw2xNS7aCbGAlmyvwotfnpXm_IY1G#scrollTo=WsFInjJI0Axv)

---

---

### Prerequisites

To get the most out of this material, you should have:

* A basic understanding of Python syntax.

* Access to a Google account for using the Colab notebooks.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hgabrali/masterschool-python-data-analysis-starter

Awesome Lists containing this project

README