{"id":18746064,"url":"https://github.com/fscm/packer-aws-spark","last_synced_at":"2025-06-27T15:06:58.857Z","repository":{"id":202050306,"uuid":"79719519","full_name":"fscm/packer-aws-spark","owner":"fscm","description":"Packer Template to build a AWS Apache Spark AMI","archived":false,"fork":false,"pushed_at":"2022-01-03T16:05:53.000Z","size":36,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-27T15:04:27.176Z","etag":null,"topics":["ami","aws","packer","spark"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fscm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-01-22T13:31:17.000Z","updated_at":"2022-01-03T16:05:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"78f69d97-b3ea-48a2-a941-fe499a1ea9a6","html_url":"https://github.com/fscm/packer-aws-spark","commit_stats":null,"previous_names":["fscm/packer-aws-spark"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/fscm/packer-aws-spark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fscm%2Fpacker-aws-spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fscm%2Fpacker-aws-spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fscm%2Fpacker-aws-spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fscm%2Fpacker-aws-spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fscm","download_url":"https://codeload.github.com/fscm/packer-aws-spark/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fscm%2Fpacker-aws-spark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262279099,"owners_count":23286547,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ami","aws","packer","spark"],"created_at":"2024-11-07T16:20:43.265Z","updated_at":"2025-06-27T15:06:58.831Z","avatar_url":"https://github.com/fscm.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Apache Spark AMI\n\nAMI that should be used to create virtual machines with Apache Spark installed.\n\n## Synopsis\n\nThis script will create an AMI with Apache Spark installed and with all of the\nrequired initialization scripts.\n\nThe AMI resulting from this script should be the one used to instantiate a\nSpark server (master or worker).\n\n## Getting Started\n\nThere are a couple of things needed for the script to work.\n\n### Prerequisites\n\nPacker and AWS Command Line Interface tools need to be installed on your local\ncomputer.\nTo build a base image you have to know the id of the latest Debian AMI files\nfor the region where you wish to build the AMI.\n\n#### Packer\n\nPacker installation instructions can be found\n[here](https://www.packer.io/docs/installation.html).\n\n#### AWS Command Line Interface\n\nAWS Command Line Interface installation instructions can be found [here](http://docs.aws.amazon.com/cli/latest/userguide/installing.html)\n\n#### Debian AMI's\n\nThis AMI will be based on an official Debian AMI. The latest version of that\nAMI will be used.\n\nA list of all the Debian AMI id's can be found at the Debian official page:\n[Debian official Amazon EC2 Images](https://wiki.debian.org/Cloud/AmazonEC2Image/)\n\n### Usage\n\nIn order to create the AMI using this packer template you need to provide a\nfew options.\n\n```\nUsage:\n  packer build \\\n    -var 'aws_access_key=AWS_ACCESS_KEY' \\\n    -var 'aws_secret_key=\u003cAWS_SECRET_KEY\u003e' \\\n    -var 'aws_region=\u003cAWS_REGION\u003e' \\\n    -var 'spark_version=\u003cSPARK_VERSION\u003e' \\\n    -var 'spark_hadoop_version=\u003cHADOOP_VERSION\u003e' \\\n    [-var 'option=value'] \\\n    spark.json\n```\n\n#### Script Options\n\n- `aws_access_key` - *[required]* The AWS access key.\n- `aws_ami_name` - The AMI name (default value: \"spark\").\n- `aws_ami_name_prefix` - Prefix for the AMI name (default value: \"\").\n- `aws_instance_type` - The instance type to use for the build (default value: \"t2.micro\").\n- `aws_region` - *[required]* The regions were the build will be performed.\n- `aws_secret_key` - *[required]* The AWS secret key.\n- `java_build_number` - Java build number (default value: \"11\").\n- `java_major_version` - Java major version (default value: \"8\").\n- `java_token` - Java link token (default version: \"d54c1d3a095b4ff2b6607d096fa80163\").\n- `java_update_version` - Java update version (default value: \"131\").\n- `scala_short_version` - Scala short version (default value: \"2.11\"). Setting this option also requires setting the `scala_version` option.\n- `scala_version` - Scala version (default value: \"2.11.8\"). Seting this option may also require setting the `scala_short_version` option.\n- `spark_hadoop_version` - *[required]* Hadoop version of the Spark package.\n- `spark_version` - *[required]* Spark version.\n- `system_locale` - Locale for the system (default value: \"en_US\").\n\n### Instantiate a Cluster\n\nIn order to end up with a functional Spark Cluster some configurations have to\nbe performed after instantiating the servers.\n\nTo help perform those configurations a small script is included on the AWS\nimage. The script is called **spark_config**.\n\n#### Configuration Script\n\nThe script can and should be used to set some of the Spark options as well as\nsetting the Spark service to start at boot.\n\n```\nUsage: spark_config [options] \u003cinstance_type\u003e\n```\n\n##### Instance Type\n\nThe script can only configure one instance at a time. Setting a instance type\nis **required** by the script.\n\n- `master` - Treats the configuration options as if it was a Spark Master instance.\n- `worker` - Treats the configuration options as if it was a Spark Worker instance.\n- `history` - Treats the configuration options as if it was a Spark History instance.\n\n##### Options\n\n* `-c \u003cCORES\u003e` - *[worker]* Sets the number of Executor cores that Spark Executor will use (default value is the number of cpu cores/threads).\n* `-D` - *[master,worker,history]* Disables the respective Spark service from start at boot time.\n* `-E` - *[master,worker,history]* Enables the respective Spark service to start at boot time.\n* `-h \u003cAGE\u003e` - *[master,worker,history]* Sets how old the job history files will have to be before being deleted on the server (default value is '15d').\n* `-i \u003cNUMBER\u003e` - *[worker]* Sets the number of Spark Executor instances that will de started (default value is '1').\n* `-k \u003cSIZE\u003e` - *[master,worker,history]* Sets the size of the Kryo Serializer buffer (default value is '16m'). Values should be provided following the same Java heap nomenclature.\n* `-m \u003cMEMORY\u003e` - *[worker]* Sets the Spark Executor maximum heap size (default value is 80% of the server memory). Values should be provided following the same Java heap nomenclature.\n* `-p \u003cADDRESS\u003e` - *[master,worker,history]* Sets the public DNS name of the Spark instance (default value is the server FQDN). This is the value that the instance will report as the server address on all the url's (including the ones on the Spark UI).\n* `-r \u003cNUMBER\u003e` - *[worker]* Sets the maximum number of log files kept by the Executer log rotator (default value is '15').\n* `-s \u003cADDRESS\u003e` - *[worker]* Sets the Spark Master address to which the Spark Worker will connect to (default value is 'localhost').\n* `-S` - *[master,worker,history]* Starts the respective Spark service after performing the required configurations (if any given).\n* `-W \u003cSECONDS\u003e` - *[master,worker,history]* Waits the specified amount of seconds before starting the respective Spark service (default value is '0').\n\n#### Configuring the Spark Master Instance\n\nTo prepare an instance to act as a Spark Master the following steps need to\nbe performed.\n\nRun the configuration tool (*spark_config*) to configure the instance as a\nSpark Master server.\n\n```\nspark_config -E -S master\n```\n\nAfter this steps a Spark Master service should be running and configured to\nstart on server boot.\n\nMore options can be used on the instance configuration, see the\n[Configuration Script](#configuration-script) section for more details\n\n#### Configuring a Spark Worker Instance\n\nTo prepare an instance to act as a Spark Worker the following steps need to\nbe performed.\n\nRun the configuration tool (*spark_config*) to configure the instance as a\nSpark Worker server.\n\n```\nspark_config -E -S -s spark-master.my-domain.tld worker\n```\n\nAfter this steps a Spark Worker instance should be running, connected to the\nspecified Spark Master address and configured to start on server boot.\n\nMore options can be used on the instance configuration, see the\n[Configuration Script](#configuration-script) section for more details\n\n#### Configuring the Spark History Instance\n\nTo prepare an instance to act as a Spark History the following steps need to\nbe performed.\n\nRun the configuration tool (*spark_config*) to configure the instance as a\nSpark History server.\n\n```\nspark_config -E -S history\n```\n\nTo be able to use the Spark History service properly every Spark instance needs\nto write the job logs to a shared folder. The shared folder should be mounted\non the following location on every instance/server (including the History\ninstance) and *write* permission needs to be given to the *spark* user\n(uid=2000).\n\n```\n/var/log/spark\n```\n\nAfter this steps the Spark History service should be running and configured to\nstart on server boot.\n\nMore options can be used on the instance configuration, see the\n[Configuration Script](#configuration-script) section for more details\n\n## Services\n\nThis AMI will have the SSH service running as well as the Spark (Master and/or\nWorker) services. The following ports will have to be configured on Security\nGroups.\n\n| Service           | Port   | Protocol |\n|:------------------|:------:|:--------:|\n| SSH               | 22     |    TCP   |\n| Spark Application | 4040   |    TCP   |\n| Spark REST Server | 6066   |    TCP   |\n| Spark Master      | 7077   |    TCP   |\n| Spark Master UI   | 8080   |    TCP   |\n| Spark Worker UI   | 8081   |    TCP   |\n| Spark History     | 18080  |    TCP   |\n\n## Contributing\n\n1. Fork it!\n2. Create your feature branch: `git checkout -b my-new-feature`\n3. Commit your changes: `git commit -am 'Add some feature'`\n4. Push to the branch: `git push origin my-new-feature`\n5. Submit a pull request\n\nPlease read the [CONTRIBUTING.md](CONTRIBUTING.md) file for more details on how\nto contribute to this project.\n\n## Versioning\n\nThis project uses [SemVer](http://semver.org/) for versioning. For the versions\navailable, see the [tags on this repository](https://github.com/fscm/packer-aws-spark/tags).\n\n## Authors\n\n* **Frederico Martins** - [fscm](https://github.com/fscm)\n\nSee also the list of [contributors](https://github.com/fscm/packer-aws-spark/contributors)\nwho participated in this project.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE)\nfile for details\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffscm%2Fpacker-aws-spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffscm%2Fpacker-aws-spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffscm%2Fpacker-aws-spark/lists"}