{"id":20763676,"url":"https://github.com/infosys/high-availability-hadoop","last_synced_at":"2026-04-25T02:37:36.439Z","repository":{"id":70448029,"uuid":"46915340","full_name":"Infosys/High-Availability-Hadoop","owner":"Infosys","description":"Automate high availability setup for Hadoop using Python scripts","archived":false,"fork":false,"pushed_at":"2015-11-26T10:43:27.000Z","size":15,"stargazers_count":3,"open_issues_count":2,"forks_count":1,"subscribers_count":8,"default_branch":"master","last_synced_at":"2026-04-20T22:55:15.626Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Infosys.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-11-26T09:00:04.000Z","updated_at":"2025-10-15T20:16:39.000Z","dependencies_parsed_at":"2023-02-28T04:00:48.410Z","dependency_job_id":null,"html_url":"https://github.com/Infosys/High-Availability-Hadoop","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Infosys/High-Availability-Hadoop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infosys%2FHigh-Availability-Hadoop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infosys%2FHigh-Availability-Hadoop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infosys%2FHigh-Availability-Hadoop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infosys%2FHigh-Availability-Hadoop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Infosys","download_url":"https://codeload.github.com/Infosys/High-Availability-Hadoop/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Infosys%2FHigh-Availability-Hadoop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32248264,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"online","status_checked_at":"2026-04-25T02:00:06.260Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-17T10:45:27.679Z","updated_at":"2026-04-25T02:37:36.416Z","avatar_url":"https://github.com/Infosys.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Hadoop HA Automation\n====================\n\nHadoop HA Automation is a utility to automate the HA setup for\n   - Namenode\n   - Resource Manager\n   - Hive Metastore\n   - Hiveserver2\n\nUsage\n=====\n\nPlace the source files in the Hadoop master node directory path. Ensure that the same directory path is available \nin each of the slave node with read-write access. For example, if you place the source files in /home/hadoop/HA folder \nin the master node, also ensure that every slave node has /home/hadoop/HA with read-write access for the user id\n\nBefore execution, edit the constant values in servers.py to customize the utility for your Hadoop installation.\n\n'ha_setup.sh' is the entry point for the utility.  The utility works by starting agent processes in the \ncluster nodes. 'ha_setup.sh' first starts the slave(agent) processes and then invokes the HA setup program ('ha_setup.py'). \nLastly as cleanup, the agent processes get killed.\n\nThe agent processes communicate with the driver program ('ha_setup.py') by way of RPC on port 8888.  If you wish to\nmodify the port, you can change the same (SLAVE_DAEMON_PORT) in cluster_constants.py\n\nThe file cluster_constants.py contains the values of various parameters. Default values are provided as much as possible\nfor each configuration parameter. If your Hadoop installation uses any non-standard configuration, \nyou may want to change the corresponding parameter in cluster_constants.py\n\n\nDependencies\n============\n\n1) The utility depends on valid values for below environment variables. Please verify the same before execution.\n   - HOME\n   - HADOOP_HOME\n   - HIVE_HOME\n   - ZOOKEEPER_HOME \n\n2) Setup passwordless SSH from the master node to each of the slave node. It is a simple procedure and below link tells \nhow to go about the same\n\nhttp://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/\n\n3) Create directories in each slave node identical to the master node directory with the source code. For example, if \nyou use 'hadoop' userid and the source files are in /home/hadoop/HA, create the folder /home/hadoop/HA in each slave node \nwith read-write access for 'hadoop' user id.\n\n4) This utility requires Python 2.7.5 or above in all the Hadoop cluster nodes\n\nThis code is shared under the MIT License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfosys%2Fhigh-availability-hadoop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfosys%2Fhigh-availability-hadoop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfosys%2Fhigh-availability-hadoop/lists"}