https://github.com/newfront/odsc-west-2019-realtime-analytics

Workshop Material for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop at the Open Data Science Conference WEST 2019
https://github.com/newfront/odsc-west-2019-realtime-analytics

apache-spark odsc-west-2019 odsc2019 realtime-predictive-analytics workshop-material

Last synced: about 2 months ago
JSON representation

Workshop Material for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop at the Open Data Science Conference WEST 2019

Host: GitHub
URL: https://github.com/newfront/odsc-west-2019-realtime-analytics
Owner: newfront
License: gpl-3.0
Created: 2019-09-19T19:55:40.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-10-30T07:24:34.000Z (over 6 years ago)
Last Synced: 2025-10-06T08:57:43.142Z (8 months ago)
Topics: apache-spark, odsc-west-2019, odsc2019, realtime-predictive-analytics, workshop-material
Language: Shell
Size: 23.4 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Workshop Material: for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop
Open Data Science Conference WEST 2019

[Session Information @ ODSC](http://bit.ly/odsc-west-2019-realish)

### About the Speaker
Find me on Twitter: [@newfront](https://twitter.com/newfront)
Find me on Medium [@newfrontcreative](https://medium.com/@newfrontcreative)
About Twilio: [Twilio](https://twilio.com)

## Runtime Requirments
1. Docker (at least 2 CPU cores and 8gb RAM)
2. System Terminal (iTerm, Terminal, etc)
3. Working Web Browser (Chrome or Firefox)

### Technologies Used
1. [Apache Zeppelin](https://zeppelin.apache.org/docs/latest/interpreter/spark.html)
2. [Apache Spark](http://spark.apache.org/)
3. [Redis](https://redis.io/)

### Docker
Install Docker Desktop (https://www.docker.com/products/docker-desktop)

Additional Docker Resources:
* https://docs.docker.com/get-started/
* https://hub.docker.com/

#### Docker Runtime Recommendations
1. 2 or more cpu cores.
2. 8gb/ram or higher.

## Installation
1. Install Docker (See Docker above)
2. Once Docker is installed. Open up your terminal application and `cd /path/to/odsc-west-2019-realtime-analytics/docker`
3. `./run.sh install`
4. `./run.sh start`

### Notes
The initial download can take some time depending on your WiFi connection. Expect this to take around 5-10 minutes and fingers crossed it goes faster!

#### Initialization Process
The `./run.sh init` process will 1.) download Apache Spark and untar it into `docker/spark-2.4.4` and 2.) `unzip` the wine reviews data set from `docker/data`.

#### Runtime Process
The `./run.sh start` will 1.) download the official `Apache Zeppelin` docker image, and 2.) download the official `Redis` docker image. It will then run `docker compose` on redis followed by zeppelin. Zeppelin will use the spark version (`2.4.4`) that you downloaded in the `init` phase so we are running on the latest and greatest Spark.

## Checking Zeppelin and Updating Zeppelin
1. The **Main Application** should now be running at http://localhost:8080/

### Update the Zeppelin Spark Interpreter Runtime
1. Go to http://localhost:8080/#/interpreter on your Web Browser
2. Search for `spark` in the `Search Interpreters` input field.
3. Click the `edit` button to initiate editing mode.

#### Update the Properties (under the properties section)
Add the following key/values.
1. **spark.redis.host** redis5
2. **spark.redis.port** 6379

Updated the following key/values
1. **spark.cores.max** 2
2. **spark.executor.memory** 8g

#### Update the Dependencies (under the dependencies section)
1. Add `com.redislabs:spark-redis:2.4.0`
2. Click `Save` and these settings will be applied to the Zeppelin Runtime.

#### Sending User Book Likes via Redis Streams
~~~
docker exec -it redis5 redis-cli
~~~
~~~
xadd books-liked * userId 1 bookId 3
~~~

These events will now be preocessed in spark-2.4.4 `foreachBatch`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/newfront/odsc-west-2019-realtime-analytics

Awesome Lists containing this project

README