{"id":18802299,"url":"https://github.com/oracle-quickstart/oci-cloudera","last_synced_at":"2025-04-13T18:31:30.553Z","repository":{"id":106379971,"uuid":"146038740","full_name":"oracle-quickstart/oci-cloudera","owner":"oracle-quickstart","description":"Terraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)","archived":true,"fork":false,"pushed_at":"2021-10-20T23:10:50.000Z","size":1750,"stargazers_count":20,"open_issues_count":2,"forks_count":6,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-21T06:43:56.222Z","etag":null,"topics":["cdh","cdp","cloud","cloudera","dsw","edh","hadoop","oci","oracle","partner-led","spark","terraform"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oracle-quickstart.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-24T20:49:19.000Z","updated_at":"2024-04-18T10:39:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"2892cc0b-5d38-4f25-b31c-a76d623aa059","html_url":"https://github.com/oracle-quickstart/oci-cloudera","commit_stats":null,"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle-quickstart%2Foci-cloudera","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle-quickstart%2Foci-cloudera/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle-quickstart%2Foci-cloudera/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle-quickstart%2Foci-cloudera/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oracle-quickstart","download_url":"https://codeload.github.com/oracle-quickstart/oci-cloudera/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248760355,"owners_count":21157345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdh","cdp","cloud","cloudera","dsw","edh","hadoop","oci","oracle","partner-led","spark","terraform"],"created_at":"2024-11-07T22:27:23.394Z","updated_at":"2025-04-13T18:31:30.547Z","avatar_url":"https://github.com/oracle-quickstart.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp float=\"left\"\u003e\n  \u003cimg align=\"left\" width=\"120\" src=\"./images/cloudera_logo.png\"\u003e\n  \u003cbr/\u003e\n  \u003ch1\u003eCloudera on Oracle Cloud Infrastructure\u003c/h1\u003e\n  \u003cbr/\u003e\n\u003c/p\u003e\n\n![cloudera-stack](https://github.com/oracle-quickstart/oci-cloudera/workflows/cloudera-stack/badge.svg)\n\nThis is a Terraform module that deploys [Cloudera Data Platform (CDP) Data Center](https://www.cloudera.com/products/cloudera-data-platform.html) on [Oracle Cloud Infrastructure (OCI)](https://cloud.oracle.com/en_US/cloud-infrastructure).  It is developed jointly by Oracle and Cloudera.\n\n## Deployment Information\nThe following table shows Recommended and Minimum supported OCI shapes for each cluster role:\n\n|             | Worker Nodes   | Bastion Instance | Utility and Master Instances |\n|-------------|----------------|------------------|------------------------------|\n| Recommended | BM.DenseIO2.52 | VM.Standard2.4   | VM.Standard2.16              |\n| Minimum     | VM.Standard2.8 | VM.Standard2.1   | VM.Standard2.8               |\n\n## Resource Manager Deployment\nThis Quick Start uses [OCI Resource Manager](https://docs.cloud.oracle.com/iaas/Content/ResourceManager/Concepts/resourcemanager.htm) to make deployment quite easy.  \n\nSimply click this button to deploy to OCI.\n\n[![Deploy to Oracle Cloud](https://oci-resourcemanager-plugin.plugins.oci.oraclecloud.com/latest/deploy-to-oracle-cloud.svg)](https://console.us-ashburn-1.oraclecloud.com/resourcemanager/stacks/create?region=home\u0026zipUrl=https://github.com/oracle-quickstart/oci-cloudera/archive/v3.3.7.zip)\n\nThis template uses Terraform v0.12, and has support to target existing VCN/Subnets for cluster deployment.   To engage this functionality, just use the Schema menu system to select an existing VCN target, then select appropriate Subnets for each cluster host type.\n\nIf you deploy Cloudera Manager to a private subnet, you will require a VPN or SSH Tunnel through an edge node to access cluster management.\n\nOnce the deployment is complete you can access Cloudera manager at `http://\u003csome IP address\u003e:7180/cmf/login`.  \n\nCluster Provisioning is executed on the Utility host using CloudInit.   That activity is logged in /var/log/cloudera-OCI-initialize.log.   This log file can be used to triage cluster setup issues.\n\n![](images/01%20-%20manager.png)\n\nThe default username is `cm_admin` and the default password is `changeme`.  You should see a cluster up and running like this:\n\n![](images/02%20-%20home.png)\n\nIf upon login you are presenetd with a licensing prompt, please wait, do not interact, and allow additional time for the automated cluster provisioning process to complete.   Refresh the page after a few minutes to check on deployment.\n\n## Python Deployment using cm_client\nThe deployment script `deploy_on_oci.py` uses cm_client against Cloudera Manager API v31.  This script can be customized before execution.  Reference the header section in the script, the following parameters are passed at deployment time either manually or via ORM schema:\n\n\t\tadmin_user_name\n\t\tadmin_password\n\nWhen using ORM schema, these values are put into Utility instance metadata.   It is highly encouraged to modify the admin password in Cloudera Manager after deployment is complete.\n\nIn addition, advanced customization of the cluster deployment can be done by modification of the following functions:\n\n\t\tsetup_mgmt_rcg\n\t\tupdate_cluster_rcg_configuration\n\nThis requires some knowledge of Python and Cloudera configuration - modify at your own risk.  These functions contain Cloudera specific tuning parameters as well as host mapping for roles.\n\n## Kerberos Secure Cluster Option\n\nThis automation supports using a local KDC deployed on the Cloudera Manager instance for secure cluster operation.  Please read the scripts [README](scripts/README.md) for information regarding how to set these parameters prior to deployment if desired.  This can be toggled during ORM stack setup using the schema.\n\nAlso - for cluster management using Kerberos, you will need to manually create at a minimum the HDFS Superuser Principal as [detailed here](https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_using_cm_sec_config.html#create-hdfs-superuser) after deployment.\n\n## High Availability\n\nHigh Availability for HDFS services is also offered as part of the deployment process.  This can be toggled during ORM stack setup using the Schema.\n\n## Metadata and MySQL\n\nYou can customize the default root password for MySQL by editing the source script [cms_mysql.sh](scripts/cms_mysql.sh#L188).  For the various Cloudera databases, random passwords are generated and used.  These are stored in a flat file on the Utility host for use at deployment time.  This file should be removed after you notate/change the pre-generated passwords, it is located here on the Utility node:  `/etc/mysql/mysql.pw`\n\n## Object Storage Integration\n\nObject Storage can also be leveraged by setting S3 compatability paramaters in the Python deployment script.   Details can be found in the [header section](https://github.com/oracle-quickstart/oci-cloudera/blob/8af97b91fb50cd77262c97580454137c2955dd4e/scripts/deploy_on_oci.py#L79-L86).  You will need to setup the appropriate S3 compatability pre-requisites as [detailed here](https://docs.cloud.oracle.com/iaas/Content/Identity/Tasks/managingcredentials.htm#Working2) for this to work.\n\n## Architecture Diagram\nHere is a diagram showing what is typically deployed using this template.   Note that resources are automatically distributed among Fault Domains in an Availability Domain to ensure fault tolerance.   Additional workers deployed will stripe between the 3 fault domains in sequence starting with the Fault Domain 1 and incrementing sequentially.\n\n![Deployment Architecture Diagram](images/deployment_architecture.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle-quickstart%2Foci-cloudera","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foracle-quickstart%2Foci-cloudera","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle-quickstart%2Foci-cloudera/lists"}