{"id":22178359,"url":"https://github.com/newfront/odsc-west-2019-realtime-analytics","last_synced_at":"2026-04-20T13:36:40.365Z","repository":{"id":138838411,"uuid":"209637807","full_name":"newfront/odsc-west-2019-realtime-analytics","owner":"newfront","description":"Workshop Material for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop at the Open Data Science Conference WEST 2019","archived":false,"fork":false,"pushed_at":"2019-10-30T07:24:34.000Z","size":24540,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-06T08:57:43.142Z","etag":null,"topics":["apache-spark","odsc-west-2019","odsc2019","realtime-predictive-analytics","workshop-material"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/newfront.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-19T19:55:40.000Z","updated_at":"2021-04-24T00:53:44.000Z","dependencies_parsed_at":"2023-03-25T14:18:14.604Z","dependency_job_id":null,"html_url":"https://github.com/newfront/odsc-west-2019-realtime-analytics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/newfront/odsc-west-2019-realtime-analytics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/newfront%2Fodsc-west-2019-realtime-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/newfront%2Fodsc-west-2019-realtime-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/newfront%2Fodsc-west-2019-realtime-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/newfront%2Fodsc-west-2019-realtime-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/newfront","download_url":"https://codeload.github.com/newfront/odsc-west-2019-realtime-analytics/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/newfront%2Fodsc-west-2019-realtime-analytics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32049105,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T11:35:06.609Z","status":"ssl_error","status_checked_at":"2026-04-20T11:34:48.899Z","response_time":94,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","odsc-west-2019","odsc2019","realtime-predictive-analytics","workshop-material"],"created_at":"2024-12-02T08:46:16.491Z","updated_at":"2026-04-20T13:36:40.345Z","avatar_url":"https://github.com/newfront.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Workshop Material: for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop\nOpen Data Science Conference WEST 2019\n\n[Session Information @ ODSC](http://bit.ly/odsc-west-2019-realish)\n\n### About the Speaker\nFind me on Twitter: [@newfront](https://twitter.com/newfront)\nFind me on Medium [@newfrontcreative](https://medium.com/@newfrontcreative)\nAbout Twilio: [Twilio](https://twilio.com)\n\n## Runtime Requirments\n1. Docker (at least 2 CPU cores and 8gb RAM)\n2. System Terminal (iTerm, Terminal, etc)\n3. Working Web Browser (Chrome or Firefox)\n\n### Technologies Used\n1. [Apache Zeppelin](https://zeppelin.apache.org/docs/latest/interpreter/spark.html)\n2. [Apache Spark](http://spark.apache.org/)\n3. [Redis](https://redis.io/)\n\n### Docker\nInstall Docker Desktop (https://www.docker.com/products/docker-desktop)\n\nAdditional Docker Resources:\n* https://docs.docker.com/get-started/\n* https://hub.docker.com/\n\n#### Docker Runtime Recommendations\n1. 2 or more cpu cores.\n2. 8gb/ram or higher.\n\n## Installation\n1. Install Docker (See Docker above)\n2. Once Docker is installed. Open up your terminal application and `cd /path/to/odsc-west-2019-realtime-analytics/docker`\n3. `./run.sh install`\n4. `./run.sh start`\n\n### Notes\nThe initial download can take some time depending on your WiFi connection. Expect this to take around 5-10 minutes and fingers crossed it goes faster!\n\n#### Initialization Process\nThe `./run.sh init` process will 1.) download Apache Spark and untar it into `docker/spark-2.4.4` and 2.) `unzip` the wine reviews data set from `docker/data`.\n\n#### Runtime Process\nThe `./run.sh start` will 1.) download the official `Apache Zeppelin` docker image, and 2.) download the official `Redis` docker image. It will then run `docker compose` on redis followed by zeppelin. Zeppelin will use the spark version (`2.4.4`) that you downloaded in the `init` phase so we are running on the latest and greatest Spark.\n\n## Checking Zeppelin and Updating Zeppelin\n1. The **Main Application** should now be running at http://localhost:8080/\n\n### Update the Zeppelin Spark Interpreter Runtime\n1. Go to http://localhost:8080/#/interpreter on your Web Browser\n2. Search for `spark` in the `Search Interpreters` input field.\n3. Click the `edit` button to initiate editing mode.\n\n#### Update the Properties (under the properties section)\nAdd the following key/values.\n1. **spark.redis.host** redis5\n2. **spark.redis.port** 6379\n\nUpdated the following key/values\n1. **spark.cores.max** 2\n2. **spark.executor.memory** 8g\n\n#### Update the Dependencies (under the dependencies section)\n1. Add `com.redislabs:spark-redis:2.4.0`\n2. Click `Save` and these settings will be applied to the Zeppelin Runtime.\n\n#### Sending User Book Likes via Redis Streams\n~~~\ndocker exec -it redis5 redis-cli\n~~~\n~~~\nxadd books-liked * userId 1 bookId 3\n~~~\n\nThese events will now be preocessed in spark-2.4.4 `foreachBatch`","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnewfront%2Fodsc-west-2019-realtime-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnewfront%2Fodsc-west-2019-realtime-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnewfront%2Fodsc-west-2019-realtime-analytics/lists"}