https://github.com/patilni3/pyspark_practice_file
Apache Spark - Pyspark with the help of Pycharm IDE
https://github.com/patilni3/pyspark_practice_file
Last synced: about 2 months ago
JSON representation
Apache Spark - Pyspark with the help of Pycharm IDE
- Host: GitHub
- URL: https://github.com/patilni3/pyspark_practice_file
- Owner: PatilNi3
- Created: 2022-11-08T16:30:13.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-11-08T16:47:38.000Z (over 2 years ago)
- Last Synced: 2025-02-08T23:27:10.380Z (3 months ago)
- Language: Python
- Size: 9.77 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PySpark_Practice_File
Apache Spark - Pyspark with the help of Pycharm IDE## About PySpark
Pyspark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real time large scale data processing.## Why PySpark
• It is a general engine for big data analysis, processing & computation.• It provides fast computation over big data.
## Contents
1) Reading Dataset using Pyspark.
2) Check schema, datatypes, type of dataframe.
3) Adding column
4) Dropping column
5) Handling null values
6) Filter operations
7) Aggregate functions
8) Linear regression problem8.1) importing vector assembler
8.2) importing linear regression
8.3) coefficient
8.4) intercept
8.5) prediction
# Thank You.☻