An open API service indexing awesome lists of open source software.

https://github.com/neha-dev-dot/pyspark-tutorial

This repository is part of my journey to learn **PySpark**, the Python API for Apache Spark. I explored the fundamentals of distributed data processing using Spark and practiced with real-world data transformation and querying use cases.
https://github.com/neha-dev-dot/pyspark-tutorial

actions data-partitioning dataframes pyspark-basics pyspark-sql rdds sparkbasics sparkcontext sparksession transformations udfs window-functions

Last synced: 6 months ago
JSON representation

This repository is part of my journey to learn **PySpark**, the Python API for Apache Spark. I explored the fundamentals of distributed data processing using Spark and practiced with real-world data transformation and querying use cases.

Awesome Lists containing this project

README

          

# ๐Ÿ”ฅ PySpark Essentials

This project is a hands-on collection of notebooks, code snippets, and exercises focused on learning **Apache Spark with Python (PySpark)**. It includes my notes and experiments while exploring **core Spark concepts, transformations, actions, DataFrame API, and more**.

---

## ๐Ÿš€ What is PySpark?

**PySpark** is the Python API for **Apache Spark**, a powerful open-source distributed computing engine used for large-scale data processing and analytics. PySpark allows you to leverage the power of distributed computing using Python.

---

## ๐Ÿ“˜ Topics Covered

- โœ… Introduction to Spark & PySpark
- โœ… SparkContext & SparkSession
- โœ… RDDs (Resilient Distributed Datasets)
- โœ… DataFrames & Datasets
- โœ… Transformations vs Actions
- โœ… Reading/Writing: JSON, CSV, Parquet
- โœ… PySpark SQL & Queries
- โœ… GroupBy, Aggregations, Joins
- โœ… Handling Nulls & Missing Data
- โœ… User-Defined Functions (UDFs)
- โœ… Window Functions
- โœ… Data Partitioning & Performance Optimization
- โœ… Intro to MLlib (Optional)

---

## โœ๏ธ How I Learn
I follow a "Learn by Doing" approach.
Each notebook contains:

โœ… Detailed explanations

๐Ÿงช Hands-on code examples

๐Ÿ“Œ Real-world case studies