{"id":18368684,"url":"https://github.com/radanalyticsio/openshift-spark","last_synced_at":"2025-04-06T17:31:52.616Z","repository":{"id":46050135,"uuid":"66651212","full_name":"radanalyticsio/openshift-spark","owner":"radanalyticsio","description":null,"archived":false,"fork":false,"pushed_at":"2021-11-17T18:26:07.000Z","size":2222,"stargazers_count":72,"open_issues_count":14,"forks_count":83,"subscribers_count":21,"default_branch":"master","last_synced_at":"2025-03-22T04:02:07.292Z","etag":null,"topics":["docker","openshift","spark"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/radanalyticsio.png","metadata":{"files":{"readme":"README.md","changelog":"change-yaml.sh","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-08-26T13:49:17.000Z","updated_at":"2023-11-20T14:33:39.000Z","dependencies_parsed_at":"2022-08-30T21:20:33.019Z","dependency_job_id":null,"html_url":"https://github.com/radanalyticsio/openshift-spark","commit_stats":null,"previous_names":[],"tags_count":31,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radanalyticsio%2Fopenshift-spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radanalyticsio%2Fopenshift-spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radanalyticsio%2Fopenshift-spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radanalyticsio%2Fopenshift-spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/radanalyticsio","download_url":"https://codeload.github.com/radanalyticsio/openshift-spark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247522404,"owners_count":20952542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","openshift","spark"],"created_at":"2024-11-05T23:26:58.325Z","updated_at":"2025-04-06T17:31:48.429Z","avatar_url":"https://github.com/radanalyticsio.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build status](https://travis-ci.org/radanalyticsio/openshift-spark.svg?branch=master)](https://travis-ci.org/radanalyticsio/openshift-spark)\n[![Docker build](https://img.shields.io/docker/automated/radanalyticsio/openshift-spark.svg)](https://hub.docker.com/r/radanalyticsio/openshift-spark)\n[![Layers info](https://images.microbadger.com/badges/image/radanalyticsio/openshift-spark.svg)](https://microbadger.com/images/radanalyticsio/openshift-spark)\n\n# Apache Spark images for OpenShift\n\nThis repository contains several files for building\n[Apache Spark](https://spark.apache.org) focused container images, targeted\nfor usage on [OpenShift Origin](https://openshift.org).\n\nBy default, it will build the following images into your local Docker\nregistry:\n\n* `openshift-spark`, Apache Spark, Python 3.6\n\nFor Spark versions, please see the `image.yaml` file.\n\n# Instructions\n\n## Build\n\n### Prerequisites\n\n* `cekit` version 3.7.0 from the [cekit project](https://github.com/cekit/cekit)\n\n### Procedure\n\nCreate all images and save them in the local Docker registry.\n\n    make\n\n## Push\n\nTag and push the images to the designated reference.\n\n    make push SPARK_IMAGE=[REGISTRY_HOST[:REGISTRY_PORT]/]NAME[:TAG]\n\n## Customization\n\nThere are several ways to customize the construction and build process. This\nproject uses the [GNU Make tool](https://www.gnu.org/software/make/) for\nthe build workflow, see the `Makefile` for more information. For container\nspecification and construction, the\n[Container Evolution Kit `cekit`](https://github.com/cekit/cekit) is\nused as the primary point of investigation, see the `image.yaml` file for\nmore information.\n\n# Partial images without an Apache Spark distribution installed\n\nThis repository also supports building 'incomplete' versions of\nthe images which contain tooling for OpenShift but lack an actual\nSpark distribution. An s2i workflow can be used with these partial\nimages to install a Spark distribution of a user's choosing.\nThis gives users an alternative to checking out the repository\nand modifying build files if they want to run a custom\nSpark distribution. By default, the partial images built will be\n\n* `openshift-spark-inc`, Apache Spark, Python 3.6\n\n## Build\n\nTo build the partial images, use make with Makefile.inc\n\n    make -f Makefile.inc\n\n## Push\n\nTag and push the images to the designated reference.\n\n    make -f Makefile.inc push SPARK_IMAGE=[REGISTRY_HOST[:REGISTRY_PORT]/]NAME[:TAG]\n\n## Image Completion\n\nTo produce a final image, a source-to-image build must be performed which takes\na Spark distribution as input. This can be done in OpenShift or locally using\nthe [s2i tool](https://github.com/openshift/source-to-image) if it's installed.\nThe final images created can be used just like the `openshfit-spark` image\ndescribed above.\n\n### Build inputs\n\nThe OpenShift method can take either local files or a URL as build input.\nFor the s2i method, local files are required. Here is an example which\ndownloads an Apache Spark distribution to a local 'build-input' directory\n(including the sha512 file is optional).\n\n    $ mkdir build-input\n    $ wget https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz -O build-input/spark-3.0.0-bin-hadoop3.2.tgz\n    $ wget https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz.sha512 -O build-input/spark-3.0.0-bin-hadoop3.2.tgz.sha512\n\nOptionally, your `build-input` directory may contain a `modify-spark` directory. The structure of this directory should be parallel to the structure\nof the top-level directory in the Spark distribution tarball. During the installation, the contents of this directory will be copied to the Spark\ninstallation using `rsync`, allowing you to add or overwrite files. To add `my.jar` to Spark, for example, put it in  `build-input/modify-spark/jars/my.jar`\n\n### Running the image completion\n\nTo complete the image using the [s2i tool](https://github.com/openshift/source-to-image)\n\n    $ s2i build build-input radanalyticsio/openshift-spark-inc openshift-spark\n\nTo complete the image using OpenShift, for example:\n\n    $ oc new-build --name=openshift-spark --docker-image=radanalyticsio/openshift-spark-inc --binary\n    $ oc start-build openshift-spark --from-file=https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz\n\n    Note that the value of `--from-file` could also be the `build-input` directory from the s2i example above.\n\nThis will write the completed image to an imagestream called `openshift-spark` in the current project\n\n# A 'usage' command for all images\n\nNote that all of the images described here will respond to a 'usage' command for reference. For example\n\n    $ docker run --rm openshift-spark:latest usage\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fradanalyticsio%2Fopenshift-spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fradanalyticsio%2Fopenshift-spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fradanalyticsio%2Fopenshift-spark/lists"}