https://github.com/spark-examples/spark-scala-examples

This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
https://github.com/spark-examples/spark-scala-examples

Last synced: about 1 year ago
JSON representation

This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language

Host: GitHub
URL: https://github.com/spark-examples/spark-scala-examples
Owner: spark-examples
Created: 2019-11-26T10:30:07.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2024-03-20T05:49:29.000Z (over 2 years ago)
Last Synced: 2025-04-02T08:12:55.672Z (about 1 year ago)
Language: Scala
Homepage: https://sparkbyexamples.com
Size: 3.29 MB
Stars: 562
Watchers: 29
Forks: 553
Open Issues: 9
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-list - spark-scala-examples - Provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language. (Programming Language Tutorials / Scala)

README

Explanation of all Spark SQL, RDD, DataFrame and Dataset examples present on this project are available at https://sparkbyexamples.com/ , All these examples are coded in Scala language and tested in our development environment.

# Table of Contents (Spark Examples in Scala)

## Spark RDD Examples
- Create a Spark RDD using Parallelize
- Spark – Read multiple text files into single RDD?
- Spark load CSV file into RDD
- Different ways to create Spark RDD
- Spark – How to create an empty RDD?
- Spark RDD Transformations with examples
- Spark RDD Actions with examples
- Spark Pair RDD Functions
- Spark Repartition() vs Coalesce()
- Spark Shuffle Partitions
- Spark Persistence Storage Levels
- Spark RDD Cache and Persist with Example
- Spark Broadcast Variables
- Spark Accumulators Explained
- Convert Spark RDD to DataFrame | Dataset

## Spark SQL Tutorial
- Spark Create DataFrame with Examples
- Spark DataFrame withColumn
- Ways to Rename column on Spark DataFrame
- Spark – How to Drop a DataFrame/Dataset column
- Working with Spark DataFrame Where Filter
- Spark SQL “case when” and “when otherwise”
- Collect() – Retrieve data from Spark RDD/DataFrame
- Spark – How to remove duplicate rows
- How to Pivot and Unpivot a Spark DataFrame
- Spark SQL Data Types with Examples
- Spark SQL StructType & StructField with examples
- Spark schema – explained with examples
- Spark Groupby Example with DataFrame
- Spark – How to Sort DataFrame column explained
- Spark SQL Join Types with examples
- Spark DataFrame Union and UnionAll
- Spark map vs mapPartitions transformation
- Spark foreachPartition vs foreach | what to use?
- Spark DataFrame Cache and Persist Explained
- Spark SQL UDF (User Defined Functions
- Spark SQL DataFrame Array (ArrayType) Column
- Working with Spark DataFrame Map (MapType) column
- Spark SQL – Flatten Nested Struct column
- Spark – Flatten nested array to single array column
- [Spark explode array and map columns to rows

## Spark SQL Functions
- Spark SQL String Functions Explained
- Spark SQL Date and Time Functions
- Spark SQL Array functions complete list
- Spark SQL Map functions – complete list
- Spark SQL Sort functions – complete list
- Spark SQL Aggregate Functions
- Spark Window Functions with Examples

## Spark Data Source API
- Spark Read CSV file into DataFrame
- Spark Read and Write JSON file into DataFrame
- Spark Read and Write Apache Parquet
- Spark Read XML file using Databricks API
- Read & Write Avro files using Spark DataFrame
- Using Avro Data Files From Spark SQL 2.3.x or earlier
- Spark Read from & Write to HBase table | Example
- Create Spark DataFrame from HBase using Hortonworks
- Spark Read ORC file into DataFrame
- Spark 3.0 Read Binary File into DataFrame

## Spark Streaming & Kafka
- Spark Streaming – Different Output modes explained
- Spark Streaming files from a directory
- Spark Streaming – Reading data from TCP Socket
- Spark Streaming with Kafka Example
- Spark Streaming – Kafka messages in Avro format
- Spark SQL Batch Processing – Produce and Consume Apache Kafka Topic

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/spark-examples/spark-scala-examples

Awesome Lists containing this project

README