https://github.com/tatevkaren/pyspark_tutorial
PySpark Tutorial
https://github.com/tatevkaren/pyspark_tutorial
Last synced: about 1 month ago
JSON representation
PySpark Tutorial
- Host: GitHub
- URL: https://github.com/tatevkaren/pyspark_tutorial
- Owner: TatevKaren
- Created: 2021-04-03T15:35:41.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-04-18T12:39:00.000Z (about 4 years ago)
- Last Synced: 2025-02-02T10:42:44.340Z (3 months ago)
- Language: Python
- Size: 11.7 KB
- Stars: 8
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PySpark Cheat Sheet For Big Data Analytics
For this article we have used Stroke Prediction Dataset publicly available on Kaggle .
Following topics are included in this tutorial:- Loading Data
- Viewing Data
- Selecting Data
- Counting Data
- Unique Values
- Filtering Data
- Ordering Data
- Creating New Variables
- Deleting Data
- Changing Data Types
- Conditions
- Data Aggregation
Deatiled explanation and sample outputs can be found in this Medium article PySpark Cheat Sheet For Big Data Analytics