An open API service indexing awesome lists of open source software.

https://github.com/patilni3/pyspark_practice_file

Apache Spark - Pyspark with the help of Pycharm IDE
https://github.com/patilni3/pyspark_practice_file

Last synced: about 2 months ago
JSON representation

Apache Spark - Pyspark with the help of Pycharm IDE

Awesome Lists containing this project

README

        

# PySpark_Practice_File
Apache Spark - Pyspark with the help of Pycharm IDE

## About PySpark
Pyspark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real time large scale data processing.

## Why PySpark
• It is a general engine for big data analysis, processing & computation.

• It provides fast computation over big data.

## Contents
1) Reading Dataset using Pyspark.
2) Check schema, datatypes, type of dataframe.
3) Adding column
4) Dropping column
5) Handling null values
6) Filter operations
7) Aggregate functions
8) Linear regression problem

8.1) importing vector assembler

8.2) importing linear regression


8.3) coefficient

8.4) intercept

8.5) prediction

# Thank You.☻