{"id":13569292,"url":"https://github.com/jaceklaskowski/spark-workshop","last_synced_at":"2025-04-05T13:09:50.643Z","repository":{"id":37663113,"uuid":"53619047","full_name":"jaceklaskowski/spark-workshop","owner":"jaceklaskowski","description":"Apache Spark™ and Scala Workshops","archived":false,"fork":false,"pushed_at":"2024-07-29T02:38:25.000Z","size":59763,"stargazers_count":263,"open_issues_count":8,"forks_count":148,"subscribers_count":29,"default_branch":"gh-pages","last_synced_at":"2025-03-29T12:09:35.313Z","etag":null,"topics":["apache-spark","spark","spark-mllib","spark-sql","spark-structured-streaming","spark-workshops","workshop"],"latest_commit_sha":null,"homepage":"https://jaceklaskowski.github.io/spark-workshop/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaceklaskowski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-03-10T21:40:50.000Z","updated_at":"2025-03-07T05:28:24.000Z","dependencies_parsed_at":"2024-09-29T06:15:28.934Z","dependency_job_id":null,"html_url":"https://github.com/jaceklaskowski/spark-workshop","commit_stats":{"total_commits":316,"total_committers":3,"mean_commits":"105.33333333333333","dds":0.006329113924050667,"last_synced_commit":"a6fd3ee00f1a413c5efa49a8f04e72947f411234"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaceklaskowski%2Fspark-workshop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaceklaskowski%2Fspark-workshop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaceklaskowski%2Fspark-workshop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaceklaskowski%2Fspark-workshop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaceklaskowski","download_url":"https://codeload.github.com/jaceklaskowski/spark-workshop/tar.gz/refs/heads/gh-pages","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247339158,"owners_count":20923014,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","spark","spark-mllib","spark-sql","spark-structured-streaming","spark-workshops","workshop"],"created_at":"2024-08-01T14:00:38.198Z","updated_at":"2025-04-05T13:09:50.627Z","avatar_url":"https://github.com/jaceklaskowski.png","language":"HTML","readme":"# Apache Spark™ and Scala Workshops\n\nThis repository contains the materials (i.e. [agendas](slides/#agendas), [slides](slides/#unit-1-spark-sql-for-large-scale-structured-data-processing), [demo](demo), [exercises](exercises)) for [Apache Spark™](http://spark.apache.org/) and [Scala](https://www.scala-lang.org/) workshops led by [Jacek Laskowski](https://twitter.com/jaceklaskowski).\n\n- Have you ever thought about learning Apache Spark™ or Scala?\n- Would you like to gain expertise in the tools used for Big Data and Predictive Analytics but you don't know where to start?\n- Do you know the basics of Apache Spark™ and have been wondering how to reach the higher levels of expertise?\n- Are you considering a Apache Spark™ Developer Certification from companies like Databricks, Cloudera, Hortonworks or MapR?\n\nIf you answered **YES** to any of the questions above, I have good news for you! Join one of the following Apache Spark™ workshops and become a Apache Spark™ pro.\n\n1. [Advanced Apache Spark for Developers Workshop (5 days)](agendas/advanced-apache-spark-for-developers.md)\n2. [Spark Structured Streaming Workshop (Apache Spark 2.3)](spark-structured-streaming-workshop.md)\n3. [Spark and Scala (Application Development) Workshop](AGENDA.md)\n4. [Spark Administration and Monitoring Workshop](AGENDA-admin.md)\n5. [Spark and Scala Workshop for Developers (1 Day)](AGENDA-ONE-DAY.md)\n\nYou can find the slides for the above workshops and others at [Apache Spark Workshops and Webinars](slides/README.md#toc) page.\n\nNo prior experience with Apache Spark or Scala required.\n\n**CAUTION**: The workshops are very hands-on and practical, and certainly not for faint-hearted. _Seriously!_ After 5 days your mind, eyes, and hands will all be trained to recognize the patterns where and how to use Spark and Scala in your Big Data projects.\n\n---\n\n## Apache Spark™ Workshop Setup\n\n`git clone` the project first and execute `sbt test` in the cloned project's directory.\n\n```\n$ sbt test\n...\n[info] All tests passed.\n[success] Total time: 3 s, completed Mar 10, 2016 10:37:26 PM\n```\n\nYou should see `[info] All tests passed.` to consider yourself prepared.\n\n## Docker Image\n\nExecute the following command to have a complete Docker image for the workshop.\n\nNOTE: It was tested on Mac OS only. I assume that `-v` in the command will not work on Windows and need to be changed to appropriate environment settings.\n\n```bash\ndocker run -ti -p 4040:4040 -p 8080:8080 -v \"$PWD:/home/spark/workspace\" -v \"$HOME/.ivy2\":/home/spark/.ivy2 -h spark --name=spark jaceklaskowski/docker-spark\n```\n\n## Contact The Author\n\n- Read [Mastering Apache Spark](https://bit.ly/mastering-apache-spark)\n- Read [Mastering Spark SQL](https://bit.ly/mastering-spark-sql)\n- Read [Mastering Spark Structured Streaming](https://bit.ly/spark-structured-streaming)\n- Follow [@jaceklaskowski](https://twitter.com/jaceklaskowski) on twitter\n- Upvote [Jacek Laskowski's questions and answers on StackOverflow](http://stackoverflow.com/users/1305344/jacek-laskowski)\n- Use [Jacek's code on GitHub](https://github.com/jaceklaskowski)\n- Read [blog posts on Medium](https://medium.com/@jaceklaskowski)\n- Upvote [Jacek's answers on Quora](https://www.quora.com/profile/Jacek-Laskowski)\n- Connect [on LinkedIn](https://www.linkedin.com/in/jaceklaskowski/)\n- Visit [Jacek Laskowski's blog](https://blog.jaceklaskowski.pl)\n","funding_links":[],"categories":["HTML"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaceklaskowski%2Fspark-workshop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaceklaskowski%2Fspark-workshop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaceklaskowski%2Fspark-workshop/lists"}