Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mxagar/spark_big_data_guide
This repository contains my personal guide on Spark and topics related to Big Data.
https://github.com/mxagar/spark_big_data_guide
big-data hadoop machine-learning spark
Last synced: about 1 month ago
JSON representation
This repository contains my personal guide on Spark and topics related to Big Data.
- Host: GitHub
- URL: https://github.com/mxagar/spark_big_data_guide
- Owner: mxagar
- Created: 2023-04-12T13:44:59.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-30T07:44:32.000Z (10 months ago)
- Last Synced: 2024-11-05T20:22:43.123Z (3 months ago)
- Topics: big-data, hadoop, machine-learning, spark
- Language: Jupyter Notebook
- Homepage:
- Size: 5.1 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spark and Big Data Guide
This repository contains my personal guide on Spark and topics related to Big Data. The contents originate from projects, tutorials and courses, such as:
- [Udacity: Intro to Hadoop and MapReduce](https://www.udacity.com/course/intro-to-hadoop-and-mapreduce--ud617)
- [Udacity: Deploying a Hadoop Cluster](https://www.udacity.com/course/deploying-a-hadoop-cluster--ud1000)
- [Udacity: Spark](https://www.udacity.com/course/learn-spark-at-udacity--ud2002)
- [Datacamp: Big Data with PySpark Track](https://app.datacamp.com/learn/skill-tracks/big-data-with-pyspark)The repository is structured in folders/modules, which contain a Markdown file with the associated topic guide and the related code/exercises:
- [`00_Intro_Big_Data`](./00_Intro_Big_Data): general introduction material, without code.
- [`01_Intro_Hadoop`](./01_Intro_Hadoop): introduction material on Hadoop; small code examples are shown, but not implemented.
- **[`02_Spark`](./02_Spark): full Spark course with exercises.**To use this guide, open the desired topic folder and read the main Markdown in there; if you're new to the topic, follow a sequential order. To run the exercises and examples, you'll need to set up an environment either locally or remotely; all instructions for that should be provided in each folder/module.
Finally, if you're looking for an **example project where the theory and examples are applied**, consider checking this project implemented with Spark: **[sparkify_customer_churn](https://github.com/mxagar/sparkify_customer_churn)**.
Mikel Sagardia, 2023.
No guarantees.