{"id":14988126,"url":"https://github.com/apache/spark-docker","last_synced_at":"2025-04-04T20:10:13.456Z","repository":{"id":61174565,"uuid":"548730031","full_name":"apache/spark-docker","owner":"apache","description":"Official Dockerfile for Apache Spark","archived":false,"fork":false,"pushed_at":"2025-02-28T04:45:46.000Z","size":130,"stargazers_count":128,"open_issues_count":1,"forks_count":38,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-03-28T19:06:40.666Z","etag":null,"topics":["big-data","java","jdbc","python","r","scala","spark","sql"],"latest_commit_sha":null,"homepage":"https://spark.apache.org/","language":"Dockerfile","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-10T05:02:05.000Z","updated_at":"2025-03-16T09:23:53.000Z","dependencies_parsed_at":"2023-12-02T03:21:16.561Z","dependency_job_id":"30d47b82-8204-4b63-9c44-4c176489c782","html_url":"https://github.com/apache/spark-docker","commit_stats":{"total_commits":64,"total_committers":8,"mean_commits":8.0,"dds":0.296875,"last_synced_commit":"cf333e1f7403fa68c7b359cff77b7949ec0990b3"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fspark-docker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fspark-docker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fspark-docker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fspark-docker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/spark-docker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246983672,"owners_count":20864294,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","java","jdbc","python","r","scala","spark","sql"],"created_at":"2024-09-24T14:16:07.513Z","updated_at":"2025-04-04T20:10:13.437Z","avatar_url":"https://github.com/apache.png","language":"Dockerfile","readme":"# Apache Spark Official Dockerfiles\n\n## What is Apache Spark?\n\nSpark is a unified analytics engine for large-scale data processing. It provides\nhigh-level APIs in Scala, Java, Python, and R, and an optimized engine that\nsupports general computation graphs for data analysis. It also supports a\nrich set of higher-level tools including Spark SQL for SQL and DataFrames,\npandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing,\nand Structured Streaming for stream processing.\n\nhttps://spark.apache.org/\n\n## Create a new version \n\n### Step 1 Add dockerfiles for a new version.\n\nYou can see [3.4.0 PR](https://github.com/apache/spark-docker/pull/33) as reference.\n\n- 1.1 Add gpg key to [tools/template.py](https://github.com/apache/spark-docker/blob/master/tools/template.py#L24)\n\n    This gpg key will be used by Dockerfiles (such as [3.4.0](https://github.com/apache/spark-docker/blob/04e85239a8fcc9b3dcfe146bc144ee2b981f8f42/3.4.0/scala2.12-java11-ubuntu/Dockerfile#L41)) to verify the signature of the Apache Spark tarball.\n\n- 1.2 Add image build workflow (such as [3.4.0 yaml](https://github.com/apache/spark-docker/blob/04e85239a8fcc9b3dcfe146bc144ee2b981f8f42/.github/workflows/build_3.4.0.yaml))\n\n    This file will be used by GitHub Actions to build the Docker image when you submit the PR to make sure dockerfiles are correct and pass all tests (build/standalone/kubernetes).\n\n- 1.3 Using `./add-dockerfiles.sh [version]` to add Dockerfiles.\n\n    You will get a new directory with the Dockerfiles for the specified version.\n\n- 1.4 Add version and tag info to versions.json, publish.yml and test.yml.\n\n    This version file will be used by image build workflow (such as [3.4.0](https://github.com/apache/spark-docker/commit/47c357a52625f482b8b0cb831ccb8c9df523affd) reference) and docker official image.\n\n### Step 2. Publish apache/spark Images.\n\nClick [Publish (Java 21 only)](https://github.com/apache/spark-docker/actions/workflows/publish-java21.yaml), [Publish (Java 17 only)](https://github.com/apache/spark-docker/actions/workflows/publish-java17.yaml) (such as 4.x) or [Publish](https://github.com/apache/spark-docker/actions/workflows/publish.yml) (such as 3.x) to publish images.\n\nAfter this, the [apache/spark](https://hub.docker.com/r/apache/spark) docker images will be published.\n\n\n### Step 3. Publish spark Docker Official Images.\n\nSubmit the PR to [docker-library/official-images](https://github.com/docker-library/official-images/), see (link)[https://github.com/docker-library/official-images/pull/15363] as reference.\n\nYou can type `tools/manifest.py manifest` to generate the content.\n\nAfter this, the [spark](https://hub.docker.com/_/spark) docker images will be published.\n\n## About images\n\n|               | Apache Spark Image                                     | Spark Docker Official Image                            |\n|---------------|--------------------------------------------------------|--------------------------------------------------------|\n| Name          | apache/spark                                           | spark                                                  |\n| Maintenance   | Reviewed, published by Apache Spark community          | Reviewed, published and maintained by Docker community |\n| Update policy | Only build and push once when specific version release | Actively rebuild for updates and security fixes        |\n| Link          | https://hub.docker.com/r/apache/spark                  | https://hub.docker.com/_/spark                         |\n| source        | [apache/spark-docker](https://github.com/apache/spark-docker)                                           | [apache/spark-docker](https://github.com/apache/spark-docker) and [docker-library/official-images](https://github.com/docker-library/official-images/blob/master/library/spark)     |\n\nWe recommend using [Spark Docker Official Image](https://hub.docker.com/_/spark), the [Apache Spark Image](https://hub.docker.com/r/apache/spark) are provided in case of delays in the review process by Docker community.\n\n## About this repository\n\nThis repository contains the Dockerfiles used to build the Apache Spark Docker Image.\n\nSee more in [SPARK-40513: SPIP: Support Docker Official Image for Spark](https://issues.apache.org/jira/browse/SPARK-40513).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fspark-docker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Fspark-docker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fspark-docker/lists"}