https://github.com/mahmoudparsian/pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
https://github.com/mahmoudparsian/pyspark-algorithms
algorithms big-data data data-abstractions data-science dataframe distributed-computing graphframes mapreduce monoid nosql partitioning pyspark pyspark-algorithms python rdd spark transformations
Last synced: 6 months ago
JSON representation
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
- Host: GitHub
- URL: https://github.com/mahmoudparsian/pyspark-algorithms
- Owner: mahmoudparsian
- License: other
- Created: 2017-12-26T17:49:46.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2020-01-03T02:41:05.000Z (almost 6 years ago)
- Last Synced: 2025-03-22T19:43:37.922Z (7 months ago)
- Topics: algorithms, big-data, data, data-abstractions, data-science, dataframe, distributed-computing, graphframes, mapreduce, monoid, nosql, partitioning, pyspark, pyspark-algorithms, python, rdd, spark, transformations
- Language: Python
- Homepage:
- Size: 40.5 MB
- Stars: 84
- Watchers: 5
- Forks: 44
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Source Code for PySpark Algorithms Book
## Unlock the Power of Big Data by PySpark Algorithms book
## [Buy PySpark Algorithms Book → PDF Version (.pdf)](https://www.amazon.com/PySpark-Algorithms-Mahmoud-Parsian-ebook/dp/B07WQHTVCJ/)
## [Buy PySpark Algorithms Book → Kindle Version (.kpf)](https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2)
---
## PySpark Algorithms Book:
#### Author: Mahmoud Parsian ()
#### [Purchase PySpark Algorithms Book from amazon.com](https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2)
#### Publication date: August 2019---
## About PySpark Algorithms Book
* This book is about PySpark (Python API for Spark)
* Introductory book on how to solve data problems using PySpark
* Learn how to use mappers, filters, and reducers
* Learn how to partition data for fast queries
* Learn how to use the `mapPartitions()` transformation
* Learn how to use `reduceByKey()`, `groupByKey()`, and `combineByKey()` transformations
* Learn how to use Spark's transformations and actions for solving real problems
* Learn how to use RDDs and DataFrames
* Learn how to read/write data from many data sources
* Learn how to use Logistic regression
* Learn how to use Spark's reduction transformations
* Learn how to use GraphFrames
* Learn how to use Motifs in GraphFrames
* Learn how to use Monoids in MapReduce algorithms---
[](https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2)
---
## Software
* [Spark 2.4.3](http://spark.apache.org)
* [Python 3.7.2](https://www.python.org/ftp/python/3.7.4/python-3.7.4-macosx10.9.pkg)
* [Plan for dropping Python 2 support](http://spark.apache.org/news/plan-for-dropping-python-2-support.html)
* [Java 8](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)---
## Table of Contents
chap01: Introduction to PySpark
chap02: Hello World
chap03: Data Abstractions
[chap04: Getting Started -- Sample Chapter](./sample_chapters/)
chap05: Transformations in Spark
chap06: Reductions in Spark
chap07: DataFrames and SQL
chap08: Spark DataSources
chap09: Logistic Regression
chap10: Movie Recommendations
chap11: Graph Algorithms
chap12: Design Patterns and MonoidsAppendix A: How To Install Spark
Appendix B: How to Use Lambda Expressions
[Appendix C: Questions And Answers (50+ QA)](./sample_chapters/)---
## Future chapters:
chap13: FP-Growth
chap14: LDA
chap15: Linear Regression[//]: # (metadata:)
[//]: # (Spark, PySpark, Python, GraphFrames, Distributed Computing)
[//]: # (MapReduce, Distributed Algorithms, map, mappers, filters, reduce, reducers, reductions)
[//]: # (partitioners, partitioning data, data partitioner, Parquet, NoSQL)
[//]: # (big data, Transformations, Actions, RDDs, DataFrames, SQL, Graph Algorithms)
[//]: # (Data Abstractions, Reductions in Spark, Design Patterns and Monoids)
[//]: # (Machine Learning, Logistic Regression, Spark Data Sources)
[//]: # (Resilient Distributed Datasets, Partitioning, Data Partitioning)---
[](https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2)