Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dimitrov-s-dev/pyspark
PySpark
https://github.com/dimitrov-s-dev/pyspark
pyspark python3 spark spark-sql
Last synced: 30 days ago
JSON representation
PySpark
- Host: GitHub
- URL: https://github.com/dimitrov-s-dev/pyspark
- Owner: Dimitrov-S-Dev
- Created: 2023-10-14T07:18:58.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-10-14T12:33:26.000Z (over 1 year ago)
- Last Synced: 2024-11-16T01:20:33.934Z (3 months ago)
- Topics: pyspark, python3, spark, spark-sql
- Language: Jupyter Notebook
- Homepage: https://www.udemy.com/
- Size: 62.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
#
![alt text](https://github.com/Dimitrov-S-Dev/PySpark/blob/master/pyspark.jpg)
# Big Data Practices with PySpark & Spark Tuning
Semi-Structured (JSON), Structured and Unstructured Data Analysis with Spark and Python & Spark Performance Tuning
## Acquired skills
- Apache Spark’s framework, execution and programming model.
- Lazy evaluations (Narrow vs Wide transformation) and internal working of Spark.
- PySpark practices on structured, unstructured and semi-structured data using RDD, DataFrame and SQL.
- Build simple to advanced Big Data applications for different types of data (volume, variety, veracity) through real case studies.
- Apply Adaptive Query Execution (AQE) to optimize Spark SQL query execution at runtime