Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/infosys/high-availability-hadoop
Automate high availability setup for Hadoop using Python scripts
https://github.com/infosys/high-availability-hadoop
Last synced: about 23 hours ago
JSON representation
Automate high availability setup for Hadoop using Python scripts
- Host: GitHub
- URL: https://github.com/infosys/high-availability-hadoop
- Owner: Infosys
- License: mit
- Created: 2015-11-26T09:00:04.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2015-11-26T10:43:27.000Z (almost 9 years ago)
- Last Synced: 2023-07-31T17:01:56.004Z (over 1 year ago)
- Language: Python
- Size: 14.6 KB
- Stars: 2
- Watchers: 10
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
Hadoop HA Automation
====================Hadoop HA Automation is a utility to automate the HA setup for
- Namenode
- Resource Manager
- Hive Metastore
- Hiveserver2Usage
=====Place the source files in the Hadoop master node directory path. Ensure that the same directory path is available
in each of the slave node with read-write access. For example, if you place the source files in /home/hadoop/HA folder
in the master node, also ensure that every slave node has /home/hadoop/HA with read-write access for the user idBefore execution, edit the constant values in servers.py to customize the utility for your Hadoop installation.
'ha_setup.sh' is the entry point for the utility. The utility works by starting agent processes in the
cluster nodes. 'ha_setup.sh' first starts the slave(agent) processes and then invokes the HA setup program ('ha_setup.py').
Lastly as cleanup, the agent processes get killed.The agent processes communicate with the driver program ('ha_setup.py') by way of RPC on port 8888. If you wish to
modify the port, you can change the same (SLAVE_DAEMON_PORT) in cluster_constants.pyThe file cluster_constants.py contains the values of various parameters. Default values are provided as much as possible
for each configuration parameter. If your Hadoop installation uses any non-standard configuration,
you may want to change the corresponding parameter in cluster_constants.pyDependencies
============1) The utility depends on valid values for below environment variables. Please verify the same before execution.
- HOME
- HADOOP_HOME
- HIVE_HOME
- ZOOKEEPER_HOME2) Setup passwordless SSH from the master node to each of the slave node. It is a simple procedure and below link tells
how to go about the samehttp://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/
3) Create directories in each slave node identical to the master node directory with the source code. For example, if
you use 'hadoop' userid and the source files are in /home/hadoop/HA, create the folder /home/hadoop/HA in each slave node
with read-write access for 'hadoop' user id.4) This utility requires Python 2.7.5 or above in all the Hadoop cluster nodes
This code is shared under the MIT License