{"id":13467808,"url":"https://github.com/P7h/docker-spark","last_synced_at":"2025-03-26T03:31:09.673Z","repository":{"id":151889782,"uuid":"67425855","full_name":"P7h/docker-spark","owner":"P7h","description":":ship: Docker image for Apache Spark","archived":false,"fork":false,"pushed_at":"2019-11-08T06:34:20.000Z","size":15,"stargazers_count":76,"open_issues_count":9,"forks_count":46,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-29T21:59:04.230Z","etag":null,"topics":["docker","hadoop","java","scala","spark"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/u/p7hb/","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/P7h.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-09-05T14:03:56.000Z","updated_at":"2023-08-03T06:25:07.000Z","dependencies_parsed_at":"2023-05-14T08:30:22.303Z","dependency_job_id":null,"html_url":"https://github.com/P7h/docker-spark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/P7h%2Fdocker-spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/P7h%2Fdocker-spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/P7h%2Fdocker-spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/P7h%2Fdocker-spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/P7h","download_url":"https://codeload.github.com/P7h/docker-spark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245584696,"owners_count":20639607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","hadoop","java","scala","spark"],"created_at":"2024-07-31T15:01:00.818Z","updated_at":"2025-03-26T03:31:09.336Z","avatar_url":"https://github.com/P7h.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# docker-spark\n[![](https://images.microbadger.com/badges/version/p7hb/docker-spark.svg)](http://microbadger.com/images/p7hb/docker-spark) ![](https://img.shields.io/docker/automated/p7hb/docker-spark.svg) [![Docker Pulls](https://img.shields.io/docker/pulls/p7hb/docker-spark.svg)](https://hub.docker.com/r/p7hb/docker-spark/) [![Size](https://images.microbadger.com/badges/image/p7hb/docker-spark.svg)](https://microbadger.com/images/p7hb/docker-spark)\n\nDockerfiles for ***Apache Spark***.\u003cbr\u003e\nApache Spark Docker image is available directly from [https://index.docker.io](https://hub.docker.com/u/p7hb/ \"» Docker Hub\").\n\nThis image contains the following softwares:\n\n* OpenJDK 64-Bit v1.8.0_131\n* Scala v2.12.2\n* SBT v0.13.15\n* Apache Spark v2.2.0\n\n\n## Various versions of Spark Images\nDepending on the version of the Spark Image you want, please run the corresponding command.\u003cbr\u003e\nLatest image is always the most recent version of Apache Spark available. As of 11th July, 2017 it is v2.2.0.\n\n### Apache Spark latest [i.e. v2.2.0]\n[Dockerfile for Apache Spark v2.2.0](https://github.com/P7h/docker-spark)\n\n    docker pull p7hb/docker-spark\n\n### Apache Spark v2.2.0\n[Dockerfile for Apache Spark v2.2.0](https://github.com/P7h/docker-spark/tree/2.2.0)\n\n    docker pull p7hb/docker-spark:2.2.0\n\n### Apache Spark v2.1.1\n[Dockerfile for Apache Spark v2.1.1](https://github.com/P7h/docker-spark/tree/2.1.1)\n\n    docker pull p7hb/docker-spark:2.1.1\n\n### Apache Spark v2.1.0\n[Dockerfile for Apache Spark v2.1.0](https://github.com/P7h/docker-spark/tree/2.1.0)\n\n    docker pull p7hb/docker-spark:2.1.0\n\n### Apache Spark v2.0.2\n[Dockerfile for Apache Spark v2.0.2](https://github.com/P7h/docker-spark/tree/2.0.2)\n\n    docker pull p7hb/docker-spark:2.0.2\n\n### Apache Spark v2.0.1\n[Dockerfile for Apache Spark v2.0.1](https://github.com/P7h/docker-spark/tree/2.0.1)\n\n    docker pull p7hb/docker-spark:2.0.1\n\n### Apache Spark v2.0.0\n[Dockerfile for Apache Spark v2.0.0](https://github.com/P7h/docker-spark/tree/2.0.0)\n\n    docker pull p7hb/docker-spark:2.0.0\n\n### Apache Spark v1.6.3\n[Dockerfile for Apache Spark v1.6.3](https://github.com/P7h/docker-spark/tree/1.6.3)\n\n    docker pull p7hb/docker-spark:1.6.3\n\n### Apache Spark v1.6.2\n[Dockerfile for Apache Spark v1.6.2](https://github.com/P7h/docker-spark/tree/1.6.2)\n\n\tdocker pull p7hb/docker-spark:1.6.2\n\n\n## Get the latest image\nThere are 2 ways of getting this image:\n\n1. Build this image using [`Dockerfile`](Dockerfile) OR\n2. Pull the image directly from DockerHub.\n\n### Build the latest image\nCopy the [`Dockerfile`](Dockerfile) to a folder on your local machine and then invoke the following command.\n\n    docker build -t p7hb/docker-spark .\n\n### Pull the latest image\n\n    docker pull p7hb/docker-spark\n\n\n## Run Spark image\n### Run the latest image i.e. Apache Spark `2.2.0`\nSpark latest version as on 11th July, 2017 is `2.2.0`.  So, `:latest` or `2.2.0` both refer to the same image.\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark\n\n### Run images of previous versions\nOther Spark image versions of this repository can be booted by suffixing the image with the Spark version. It can have values of `2.2.0`, `2.1.1`, `2.1.0`, `2.0.2`, `2.0.1`, `2.0.0`, `1.6.3` and `1.6.2`.\n\n#### Apache Spark latest [i.e. v2.2.0]\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.2.0\n\n#### Apache Spark v2.1.1\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.1.1\n\n#### Apache Spark v2.1.0\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.1.0\n\n#### Apache Spark v2.0.2\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.0.2\n\n#### Apache Spark v2.0.1\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.0.1\n\n#### Apache Spark v2.0.0\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.0.0\n\n#### Apache Spark v1.6.3\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:1.6.3\n\n#### Apache Spark v1.6.2\n\n    docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:1.6.2\n\nThe above step will launch and run the image with:\n\n* `root` is the user we logged into.\n * `spark` is the container name.\n * `spark` is host name of this container.\n \t* This is very important as Spark Slaves are started using this host name as the master.\n * The container exposes ports 4040, 8080, 8081 for Spark Web UI console(s).\n\n## Check softwares and versions\n\n### Host name\n\n    root@spark:~# hostname\n    spark\n\n### Java\n\n    root@spark:~# java -version\n    openjdk version \"1.8.0_131\"\n    OpenJDK Runtime Environment (build 1.8.0_111-8u131-b11-2~bpo8+1-b11)\n    OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)\n\n### Scala\n\n    root@spark:~# scala -version\n    Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.\n\n### SBT\n\nRunning `sbt about` will download and setup SBT on the image.\n\n### Spark\n\n```\nroot@spark:~# spark-shell\nSpark context Web UI available at http://172.17.0.2:4040\nSpark context available as 'sc' (master = local[*], app id = local-1483032227786).\nSpark session available as 'spark'.\nWelcome to\n      ____              __\n     / __/__  ___ _____/ /__\n    _\\ \\/ _ \\/ _ `/ __/  '_/\n   /___/ .__/\\_,_/_/ /_/\\_\\   version 2.1.1\n      /_/\n\nUsing Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)\nType in expressions to have them evaluated.\nType :help for more information.\n\nscala\u003e\n```\n\n## Spark commands\nAll the required binaries have been added to the `PATH`.\n\n### Start Spark Master\n\n    start-master.sh\n\n### Start Spark Slave\n\n    start-slave.sh spark://spark:7077\n\n### Execute Spark job for calculating `Pi` Value\n\n    spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark:7077 $SPARK_HOME/examples/jars/spark-examples*.jar 100\n    .......\n    .......\n    Pi is roughly 3.140495114049511\n\n\nOR even simpler\n\n    $SPARK_HOME/bin/run-example SparkPi 100\n    .......\n    .......\n    Pi is roughly 3.1413855141385514\n\nPlease note the first command above expects Spark Master and Slave to be running. And we can even check the Spark Web UI after executing this command. But with the second command, this is not possible.\n\n### Start Spark Shell\n\n    spark-shell --master spark://spark:7077\n\n### View Spark Master WebUI console\n\n[`http://192.168.99.100:8080/`](http://192.168.99.100:8080/)\n\n### View Spark Worker WebUI console\n\n[`http://192.168.99.100:8081/`](http://192.168.99.100:8081/)\n\n### View Spark WebUI console\nOnly available for the duration of the application.\n\n[`http://192.168.99.100:4040/`](http://192.168.99.100:4040/)\n\n## Misc Docker commands\n\n### Find IP Address of the Docker machine\nThis is the IP Address which needs to be used to look upto for all the exposed ports of our Docker container.\n\n    docker-machine ip default\n\n### Find all the running containers\n\n    docker ps\n\n### Find all the running and stopped containers\n\n\tdocker ps -a\n\n### Show running list of containers\n\n\tdocker stats --all shows a running list of containers.\n\n### Find IP Address of a specific container\n\n    docker inspect \u003c\u003cContainer_Name\u003e\u003e | grep IPAddress\n\n### Open new terminal to a Docker container\nWe can open new terminal with new instance of container's shell with the following command.\n\n    docker exec -it \u003c\u003cContainer_ID\u003e\u003e /bin/bash #by Container ID\n\nOR\n\n    docker exec -it \u003c\u003cContainer_Name\u003e\u003e /bin/bash #by Container Name\n\n\n## Problems? Questions? Contributions? [![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](http://p7h.org/contact/)\nIf you find any issues or would like to discuss further, please ping me on my Twitter handle [@P7h](http://twitter.com/P7h \"» @P7h\") or drop me an [email](http://p7h.org/contact/ \"» Contact me\").\n\n\n## License [![License](http://img.shields.io/:license-apache-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.html)\nCopyright \u0026copy; 2016 Prashanth Babu.\u003cbr\u003e\nLicensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FP7h%2Fdocker-spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FP7h%2Fdocker-spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FP7h%2Fdocker-spark/lists"}