Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/akaliutau/spark-recipes
Contains a collection of data processing solutions built on the top of Spark
https://github.com/akaliutau/spark-recipes
java spark
Last synced: about 1 month ago
JSON representation
Contains a collection of data processing solutions built on the top of Spark
- Host: GitHub
- URL: https://github.com/akaliutau/spark-recipes
- Owner: akaliutau
- License: apache-2.0
- Created: 2020-11-27T20:01:59.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2022-10-30T11:02:46.000Z (over 2 years ago)
- Last Synced: 2024-11-12T04:27:17.448Z (3 months ago)
- Topics: java, spark
- Language: Java
- Homepage:
- Size: 3.82 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## About
This repository contains recipes tied with Apache Spark data processing framework## Building and Running the code
Prerequisites:
You will need:
* `git`
* Apache Spark 3.0.0+1. Clone this project
```
git clone https://github.com/akaliutau/spark-recipes
```All code are put into namespace net.ddp which stands for distributed data processing
## Installation notes
On Windows:
Set environment variables:
HADOOP_HOME -> C:\spark - this directory must contain winutils.exe binary.
SPARK_HOME -> C:\spark - spark installation directory
SPARK_LOCAL_DIRS -> C:\spark\tmp - temporary directory which will be used by Spark to hold jar files to run. This directory is not cleaned up by Spark, see the bug description at
https://stackoverflow.com/questions/41825871/exception-while-deleting-spark-temp-dir-in-windows-7-64-bit
Due to this issue it will be necessary to update Spark configuration:
Add to /conf/log4j.properties the following lines:
log4j.logger.org.apache.hadoop.util.ShutdownHookManager=OFF
log4j.logger.org.apache.spark.util.ShutdownHookManager=OFF
log4j.logger.org.apache.spark.SparkEnv=ERROR