Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spark-examples/spark-scala-examples
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
https://github.com/spark-examples/spark-scala-examples
Last synced: about 16 hours ago
JSON representation
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
- Host: GitHub
- URL: https://github.com/spark-examples/spark-scala-examples
- Owner: spark-examples
- Created: 2019-11-26T10:30:07.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-03-20T05:49:29.000Z (10 months ago)
- Last Synced: 2024-12-23T00:03:09.675Z (8 days ago)
- Language: Scala
- Homepage: https://sparkbyexamples.com
- Size: 3.29 MB
- Stars: 561
- Watchers: 31
- Forks: 546
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-list - spark-scala-examples - Provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language. (Programming Language Tutorials / Scala)
README
Explanation of all Spark SQL, RDD, DataFrame and Dataset examples present on this project are available at https://sparkbyexamples.com/ , All these examples are coded in Scala language and tested in our development environment.
# Table of Contents (Spark Examples in Scala)
## Spark RDD Examples
- Create a Spark RDD using Parallelize
- Spark – Read multiple text files into single RDD?
- Spark load CSV file into RDD
- Different ways to create Spark RDD
- Spark – How to create an empty RDD?
- Spark RDD Transformations with examples
- Spark RDD Actions with examples
- Spark Pair RDD Functions
- Spark Repartition() vs Coalesce()
- Spark Shuffle Partitions
- Spark Persistence Storage Levels
- Spark RDD Cache and Persist with Example
- Spark Broadcast Variables
- Spark Accumulators Explained
- Convert Spark RDD to DataFrame | Dataset
## Spark SQL Tutorial
- Spark Create DataFrame with Examples
- Spark DataFrame withColumn
- Ways to Rename column on Spark DataFrame
- Spark – How to Drop a DataFrame/Dataset column
- Working with Spark DataFrame Where Filter
- Spark SQL “case when” and “when otherwise”
- Collect() – Retrieve data from Spark RDD/DataFrame
- Spark – How to remove duplicate rows
- How to Pivot and Unpivot a Spark DataFrame
- Spark SQL Data Types with Examples
- Spark SQL StructType & StructField with examples
- Spark schema – explained with examples
- Spark Groupby Example with DataFrame
- Spark – How to Sort DataFrame column explained
- Spark SQL Join Types with examples
- Spark DataFrame Union and UnionAll
- Spark map vs mapPartitions transformation
- Spark foreachPartition vs foreach | what to use?
- Spark DataFrame Cache and Persist Explained
- Spark SQL UDF (User Defined Functions
- Spark SQL DataFrame Array (ArrayType) Column
- Working with Spark DataFrame Map (MapType) column
- Spark SQL – Flatten Nested Struct column
- Spark – Flatten nested array to single array column
- [Spark explode array and map columns to rows
## Spark SQL Functions
- Spark SQL String Functions Explained
- Spark SQL Date and Time Functions
- Spark SQL Array functions complete list
- Spark SQL Map functions – complete list
- Spark SQL Sort functions – complete list
- Spark SQL Aggregate Functions
- Spark Window Functions with Examples
## Spark Data Source API
- Spark Read CSV file into DataFrame
- Spark Read and Write JSON file into DataFrame
- Spark Read and Write Apache Parquet
- Spark Read XML file using Databricks API
- Read & Write Avro files using Spark DataFrame
- Using Avro Data Files From Spark SQL 2.3.x or earlier
- Spark Read from & Write to HBase table | Example
- Create Spark DataFrame from HBase using Hortonworks
- Spark Read ORC file into DataFrame
- Spark 3.0 Read Binary File into DataFrame
## Spark Streaming & Kafka
- Spark Streaming – Different Output modes explained
- Spark Streaming files from a directory
- Spark Streaming – Reading data from TCP Socket
- Spark Streaming with Kafka Example
- Spark Streaming – Kafka messages in Avro format
- Spark SQL Batch Processing – Produce and Consume Apache Kafka Topic