An open API service indexing awesome lists of open source software.

https://github.com/mahmoudparsian/pyspark-algorithms

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
https://github.com/mahmoudparsian/pyspark-algorithms

algorithms big-data data data-abstractions data-science dataframe distributed-computing graphframes mapreduce monoid nosql partitioning pyspark pyspark-algorithms python rdd spark transformations

Last synced: 6 months ago
JSON representation

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2

Awesome Lists containing this project

README

          

# Source Code for PySpark Algorithms Book

## Unlock the Power of Big Data by PySpark Algorithms book

## [Buy PySpark Algorithms Book → PDF Version (.pdf)](https://www.amazon.com/PySpark-Algorithms-Mahmoud-Parsian-ebook/dp/B07WQHTVCJ/)

## [Buy PySpark Algorithms Book → Kindle Version (.kpf)](https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2)

---

## PySpark Algorithms Book:
#### Author: Mahmoud Parsian ()
#### [Purchase PySpark Algorithms Book from amazon.com](https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2)
#### Publication date: August 2019

---

## About PySpark Algorithms Book
* This book is about PySpark (Python API for Spark)
* Introductory book on how to solve data problems using PySpark
* Learn how to use mappers, filters, and reducers
* Learn how to partition data for fast queries
* Learn how to use the `mapPartitions()` transformation
* Learn how to use `reduceByKey()`, `groupByKey()`, and `combineByKey()` transformations
* Learn how to use Spark's transformations and actions for solving real problems
* Learn how to use RDDs and DataFrames
* Learn how to read/write data from many data sources
* Learn how to use Logistic regression
* Learn how to use Spark's reduction transformations
* Learn how to use GraphFrames
* Learn how to use Motifs in GraphFrames
* Learn how to use Monoids in MapReduce algorithms

---

[![PySpark Algorithms Book](./images/pyspark_algorithms0.jpg)](https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2)

---

## Software

* [Spark 2.4.3](http://spark.apache.org)
* [Python 3.7.2](https://www.python.org/ftp/python/3.7.4/python-3.7.4-macosx10.9.pkg)
* [Plan for dropping Python 2 support](http://spark.apache.org/news/plan-for-dropping-python-2-support.html)
* [Java 8](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)

---

## Table of Contents

chap01: Introduction to PySpark
chap02: Hello World
chap03: Data Abstractions
[chap04: Getting Started -- Sample Chapter](./sample_chapters/)
chap05: Transformations in Spark
chap06: Reductions in Spark
chap07: DataFrames and SQL
chap08: Spark DataSources
chap09: Logistic Regression
chap10: Movie Recommendations
chap11: Graph Algorithms
chap12: Design Patterns and Monoids

Appendix A: How To Install Spark
Appendix B: How to Use Lambda Expressions
[Appendix C: Questions And Answers (50+ QA)](./sample_chapters/)

---

## Future chapters:

chap13: FP-Growth
chap14: LDA
chap15: Linear Regression

[//]: # (metadata:)
[//]: # (Spark, PySpark, Python, GraphFrames, Distributed Computing)
[//]: # (MapReduce, Distributed Algorithms, map, mappers, filters, reduce, reducers, reductions)
[//]: # (partitioners, partitioning data, data partitioner, Parquet, NoSQL)
[//]: # (big data, Transformations, Actions, RDDs, DataFrames, SQL, Graph Algorithms)
[//]: # (Data Abstractions, Reductions in Spark, Design Patterns and Monoids)
[//]: # (Machine Learning, Logistic Regression, Spark Data Sources)
[//]: # (Resilient Distributed Datasets, Partitioning, Data Partitioning)

---

[![PySpark Algorithms Book](./images/pyspark_algorithms0.jpg)](https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2)