Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mxagar/spark_big_data_guide

This repository contains my personal guide on Spark and topics related to Big Data.
https://github.com/mxagar/spark_big_data_guide

big-data hadoop machine-learning spark

Last synced: about 1 month ago
JSON representation

This repository contains my personal guide on Spark and topics related to Big Data.

Host: GitHub
URL: https://github.com/mxagar/spark_big_data_guide
Owner: mxagar
Created: 2023-04-12T13:44:59.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-03-30T07:44:32.000Z (10 months ago)
Last Synced: 2024-11-05T20:22:43.123Z (3 months ago)
Topics: big-data, hadoop, machine-learning, spark
Language: Jupyter Notebook
Homepage:
Size: 5.1 MB
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Spark and Big Data Guide

This repository contains my personal guide on Spark and topics related to Big Data. The contents originate from projects, tutorials and courses, such as:

- [Udacity: Intro to Hadoop and MapReduce](https://www.udacity.com/course/intro-to-hadoop-and-mapreduce--ud617)
- [Udacity: Deploying a Hadoop Cluster](https://www.udacity.com/course/deploying-a-hadoop-cluster--ud1000)
- [Udacity: Spark](https://www.udacity.com/course/learn-spark-at-udacity--ud2002)
- [Datacamp: Big Data with PySpark Track](https://app.datacamp.com/learn/skill-tracks/big-data-with-pyspark)

The repository is structured in folders/modules, which contain a Markdown file with the associated topic guide and the related code/exercises:

- [`00_Intro_Big_Data`](./00_Intro_Big_Data): general introduction material, without code.
- [`01_Intro_Hadoop`](./01_Intro_Hadoop): introduction material on Hadoop; small code examples are shown, but not implemented.
- **[`02_Spark`](./02_Spark): full Spark course with exercises.**

To use this guide, open the desired topic folder and read the main Markdown in there; if you're new to the topic, follow a sequential order. To run the exercises and examples, you'll need to set up an environment either locally or remotely; all instructions for that should be provided in each folder/module.

Finally, if you're looking for an **example project where the theory and examples are applied**, consider checking this project implemented with Spark: **[sparkify_customer_churn](https://github.com/mxagar/sparkify_customer_churn)**.

Mikel Sagardia, 2023.
No guarantees.