https://github.com/sarthak-1408/pyspark-tutorial
In this Repo, I create a tutorial of PySpark to better understand how to read and manage Big Data.
https://github.com/sarthak-1408/pyspark-tutorial
machine-learning pyspark pyspark-mllib pyspark-python pyspark-tutorial python3
Last synced: 19 days ago
JSON representation
In this Repo, I create a tutorial of PySpark to better understand how to read and manage Big Data.
- Host: GitHub
- URL: https://github.com/sarthak-1408/pyspark-tutorial
- Owner: Sarthak-1408
- Created: 2021-10-16T17:48:26.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-10-19T06:10:41.000Z (over 3 years ago)
- Last Synced: 2025-03-27T21:04:11.309Z (about 1 month ago)
- Topics: machine-learning, pyspark, pyspark-mllib, pyspark-python, pyspark-tutorial, python3
- Language: Jupyter Notebook
- Homepage:
- Size: 46.9 KB
- Stars: 6
- Watchers: 1
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PySpark Tutorial
## Overview- PySpark is an interface for Apache Spark in Python. It not only allows you to write Spark applications using Python APIs, but also provides the PySpark shell for interactively analyzing your data in a distributed environment.
- In this Repository i explain each and everything about PySpark and how can you do read , handle missing values etc with the help of PySpark.## Installation
```sh
pip install pyspark
```
- For install Windows, Mac, Linux :- https://www.datacamp.com/community/tutorials/installation-of-pyspark### Credits
- Krish Naik (https://www.youtube.com/user/krishnaik06)