{"id":18802316,"url":"https://github.com/oracle-quickstart/oci-arch-spark","last_synced_at":"2025-04-13T18:31:33.428Z","repository":{"id":106379823,"uuid":"331309927","full_name":"oracle-quickstart/oci-arch-spark","owner":"oracle-quickstart","description":null,"archived":true,"fork":false,"pushed_at":"2021-06-04T10:54:29.000Z","size":1221,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-19T21:12:50.447Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"upl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oracle-quickstart.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-20T13:07:33.000Z","updated_at":"2024-05-13T14:55:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"fa4f15d1-cd94-4ffd-bd8c-a96fa8a3fe83","html_url":"https://github.com/oracle-quickstart/oci-arch-spark","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":"oracle-quickstart/oci-quickstart-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle-quickstart%2Foci-arch-spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle-quickstart%2Foci-arch-spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle-quickstart%2Foci-arch-spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle-quickstart%2Foci-arch-spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oracle-quickstart","download_url":"https://codeload.github.com/oracle-quickstart/oci-arch-spark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248760367,"owners_count":21157345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T22:27:27.046Z","updated_at":"2025-04-13T18:31:33.419Z","avatar_url":"https://github.com/oracle-quickstart.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# oci-arch-spark\n\nApache Spark is an open source, cluster-computing framework for data analytics. It was built outside of Hadoop’s two-stage MapReduce paradigm, but it runs on Hadoop Distributed File System (HDFS). It uses an in-memory data-processing engine to increase speed. Oracle Cloud Infrastructure combines open source technologies, such as Apache Spark and Apache Hadoop, to deliver a platform for running and managing Big Data applications.\n\nThis architecture deploys an Apache Spark cluster on Oracle Cloud Infrastructure using the manager/worker model. It deploys a single manager node and three worker nodes on Compute instances.\n\nFor details of the architecture, see [_Deploy an Apache Spark cluster in manager/worker mode_](https://docs.oracle.com/en/solutions/spark-master-worker-mode/index.html)\n\n## Prerequisites\n\n- Permission to `manage` the following types of resources in your Oracle Cloud Infrastructure tenancy: `vcns`, `internet-gateways`, `route-tables`, `security-lists`, `subnets`, and `instances`.\n\n- Quota to create the following resources: 1 VCN, 2 subnets, 1 Internet Gateway, 1 NAT Gateway, 2 route rules, and 4 compute instances (Manager host + 3 Worker nodes).\n\nIf you don't have the required permissions and quota, contact your tenancy administrator. See [Policy Reference](https://docs.cloud.oracle.com/en-us/iaas/Content/Identity/Reference/policyreference.htm), [Service Limits](https://docs.cloud.oracle.com/en-us/iaas/Content/General/Concepts/servicelimits.htm), [Compartment Quotas](https://docs.cloud.oracle.com/iaas/Content/General/Concepts/resourcequotas.htm).\n\n## Deploy Using Oracle Resource Manager\n\n1. Click [![Deploy to Oracle Cloud](https://oci-resourcemanager-plugin.plugins.oci.oraclecloud.com/latest/deploy-to-oracle-cloud.svg)](https://cloud.oracle.com/resourcemanager/stacks/create?region=home\u0026zipUrl=https://github.com/oracle-quickstart/oci-arch-spark/releases/latest/download/oci-arch-spark-stack-latest.zip)\n\n    If you aren't already signed in, when prompted, enter the tenancy and user credentials.\n\n2. Review and accept the terms and conditions.\n\n3. Select the region where you want to deploy the stack.\n\n4. Follow the on-screen prompts and instructions to create the stack.\n\n5. After creating the stack, click **Terraform Actions**, and select **Plan**.\n\n6. Wait for the job to be completed, and review the plan.\n\n    To make any changes, return to the Stack Details page, click **Edit Stack**, and make the required changes. Then, run the **Plan** action again.\n\n7. If no further changes are necessary, return to the Stack Details page, click **Terraform Actions**, and select **Apply**. \n\n## Deploy Using the Terraform CLI\n\n### Clone the Module\n\nNow, you'll want a local copy of this repo. You can make that with the commands:\n\n```\n    git clone https://github.com/oracle-quickstart/oci-arch-spark.git\n    cd oci-arch-spark\n    ls\n```\n\n### Prerequisites\nFirst off, you'll need to do some pre-deploy setup.  That's all detailed [here](https://github.com/cloud-partners/oci-prerequisites).\n\nCreate a `terraform.tfvars` file, and specify the following variables:\n\n```\n# Authentication\ntenancy_ocid         = \"\u003ctenancy_ocid\u003e\"\nuser_ocid            = \"\u003cuser_ocid\u003e\"\nfingerprint          = \"\u003cfinger_print\u003e\"\nprivate_key_path     = \"\u003cpem_private_key_path\u003e\"\n\n# Region\nregion = \"\u003coci_region\u003e\"\n\n# Availablity Domain \navailablity_domain_name = \"\u003cavailablity_domain_name\u003e\"\n\n# Compartment\ncompartment_ocid = \"\u003ccompartment_ocid\u003e\"\n\n````\n\n### Create the Resources\nRun the following commands:\n\n    terraform init\n    terraform plan\n    terraform apply\n\n\n### Testing your Deployment\n\nThis deployment compiles from source, as such it does take some time (15-20 minutes) after deployment before the Spark UI is available.  You can monitor progress on the Spark Master by watching the log file /var/log/spark-OCI-initialize.log:\n\n\tsudo tail -n 500 -f /var/log/spark-OCI-initialize.log\n\nWhen everything is done, you should be able to access Manager with your web browser. Pick up the value of the spark_manager_url:\n\n````\nspark_manager_url = http://129.213.112.177:8080\n`````\n\nThen copy it into Web browser. Here is the example of the succesfull outcome:\n\n![](./images/outcome.png)\n\nAs the load balancer alternates between the 2 Tomcat nodes, the session data should persist.\n\n### Destroy the Deployment\nWhen you no longer need the deployment, you can run this command to destroy the resources:\n\n    terraform destroy\n\n## Architecture Diagram\n\n![](./images/spark-oci.png)\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle-quickstart%2Foci-arch-spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foracle-quickstart%2Foci-arch-spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle-quickstart%2Foci-arch-spark/lists"}