{"id":19872055,"url":"https://github.com/saltstack-formulas/hadoop-formula","last_synced_at":"2025-08-03T00:32:57.185Z","repository":{"id":138671590,"uuid":"10653765","full_name":"saltstack-formulas/hadoop-formula","owner":"saltstack-formulas","description":null,"archived":false,"fork":false,"pushed_at":"2018-07-31T13:48:50.000Z","size":240,"stargazers_count":37,"open_issues_count":6,"forks_count":55,"subscribers_count":36,"default_branch":"master","last_synced_at":"2025-05-02T09:48:50.704Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://docs.saltstack.com/en/latest/topics/development/conventions/formulas.html","language":"SaltStack","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saltstack-formulas.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2013-06-12T22:26:22.000Z","updated_at":"2020-12-06T02:16:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"70338a64-4da5-4f30-b9a7-71caf13fb286","html_url":"https://github.com/saltstack-formulas/hadoop-formula","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/saltstack-formulas/hadoop-formula","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltstack-formulas%2Fhadoop-formula","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltstack-formulas%2Fhadoop-formula/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltstack-formulas%2Fhadoop-formula/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltstack-formulas%2Fhadoop-formula/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saltstack-formulas","download_url":"https://codeload.github.com/saltstack-formulas/hadoop-formula/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saltstack-formulas%2Fhadoop-formula/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259642248,"owners_count":22888982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T16:14:23.397Z","updated_at":"2025-06-13T12:04:23.470Z","avatar_url":"https://github.com/saltstack-formulas.png","language":"SaltStack","funding_links":[],"categories":[],"sub_categories":[],"readme":"======\nhadoop\n======\n\nFormula to set up and configure hadoop components\n\n.. note::\n\n    See the full `Salt Formulas installation and usage instructions\n    \u003chttp://docs.saltstack.com/en/latest/topics/development/conventions/formulas.html\u003e`_.\n\nAvailable states\n================\n\n.. contents::\n    :local:\n\n``hadoop``\n----------\n\nDownloads the hadoop tarball from the hadoop:source_url, installs the package, creates the hadoop group for all other components to share.\n\n``hadoop.hdfs``\n---------------\n\nInstalls the hdfs service configuration and starts the hdfs services.\nWhich services hadoop ends up running on a given host depends on the roles defined via salt grains:\n\n- hadoop_master will run the hadoop-namenode and hadoop-secondarynamenode services\n- hadoop_slave will run the hadoop-datanode service\n\n::\n\n    roles:\n      - hadoop_slave\n\n``hadoop.mapred``\n-----------------\n\nInstalls the mapreduce service scripts and configuration, adds directories.\nWhich services end up running on a given host will again depend on the role(s) assigned via grains:\n\n- hadoop_master will run the hadoop-jobtracker service\n- hadoop_slave will run the hadoop-tasktracker service\n\n``hadoop.snappy``\n-----------------\n\nInstall snappy and snappy-devel system packages, adds a jar and shared lib compiled off of https://code.google.com/p/hadoop-snappy and also puts symlinks to the snappy libs in place, thus providing compression with snappy to the ecosystem.\n\n``hadoop.yarn``\n---------------\n\nInstalls the yarn daemon scripts and configuration (if a hadoop 2.2+ version was installed), adds directories.\nWhich services end up running on a given host will again depend on the role(s) assigned via grains:\n\n- hadoop_master will run the hadoop-resourcemanager service\n- hadoop_slave will run the hadoop-nodemanager service\n\n``hadoop.hdfs.uninstall``\n---------------\n\nStops the hdfs services and uninstalls the hdfs service configuration. Removes hdfs data from local disks.\n\n``hadoop.mapred.uninstall``\n---------------\n\nUninstalls the mapreduce service scripts and configuration. Removes mapred data from local disks.\n\n``hadoop.yarn.uninstall``\n---------------\n\nUninstalls the yarn daemon scripts and configuration. Removes yarn data from local disks.\n\n``hadoop.uninstall``\n---------------\n\nUninstalls all Hadoop services and configurations.\n\nFormula Dependencies\n====================\n\n* ``hostsfile``\n* ``sun-java``\n\nSalt Minion Configuration\n=========================\n\nAs mentioned above, all installation and configuration is assinged via roles. \nMounted disks (or just directories) can be configured for use with hdfs and mapreduce via grains.\n\nExample ``/etc/salt/grains`` for a datanode:\n::\n\n    hdfs_data_disks:\n      - /data1\n      - /data2\n      - /data3\n      - /data4\n\n    mapred_data_disks:\n      - /data1\n      - /data2\n      - /data3\n      - /data4\n\n    yarn_data_disks:\n      - /data1\n      - /data2\n      - /data3\n      - /data4\n\n    roles:\n      - hadoop_slave\n\nFor the namenode address to be dynamically configured it is necessary to setup salt mine like below /etc/salt/minion.d/mine_functions.conf:\n\n::\n\n    mine_functions:\n      network.interfaces: []\n      network.ip_addrs: []\n      grains.items: []\n\nOne thing to keep in mind here is that the implementation currently relies on the minion_id of all nodes to match their FQDN (which is the default) and working name resolution. \n\nHadoop configuration\n====================\n\nThe hadoop formula exposes the general (cluster-independent) part of the main configuration files (core-site.xml, hdfs-site.sml, mapred-site.xml) \nas pillar keys.\n\nExample:\n::\n\n    hadoop:\n      config:\n        tmp_dir: /var/lib/hadoop/tmp\n        directory: /etc/hadoop/conf\n        core-site:\n          io.native.lib.available:\n            value: true\n          io.file.buffer.size:\n            value: 65536\n          fs.trash.interval:\n            value: 60\n\nWhere the core-site part will appear in core-site.xml as:\n::\n\n    \u003cproperty\u003e\n        \u003cname\u003eio.native.lib.available\u003c/name\u003e\n        \u003cvalue\u003eTrue\u003c/value\u003e\n    \u003c/property\u003e\n\n    \u003cproperty\u003e\n        \u003cname\u003efs.trash.interval\u003c/name\u003e\n        \u003cvalue\u003e60\u003c/value\u003e\n    \u003c/property\u003e\n\n    \u003cproperty\u003e\n        \u003cname\u003eio.file.buffer.size\u003c/name\u003e\n        \u003cvalue\u003e65536\u003c/value\u003e\n    \u003c/property\u003e\n\nPlease note that host- and cluster-specific values are not exposed - the formula controls these (think: fs.default.name)\n\nCustom Hadoop distribution settings\n===================================\n\nThe formula includes all data to allow referencing a specific distribution release by simply using the version key:\n\nExample:\n::\n\n    hadoop:\n      version: hdp-2.6.0\n\nThis example will make the formula use the latest (maintained) version of HDP-2.2 (which happens to be Hadoop 2.6.0).\nAt the time this documentation is written this is more specifically 2.6.0.2.2.6.0-2800, an update release that will soon \nchange and with some likelyhood be what you need.\n\nIf for whatever reason that is not the case (because for example you need to provision HDP 2.6.0.2.2.4.2-2) then you need to \nprovide the full data structure in the versions hash that is normally part of the formula.\n\nExample:\n::\n\n    hadoop:\n      version: hdp-2.6.0-update2242\n      versions:\n        hdp-2.6.0-update2242:\n          version: 2.6.0.2.2.4.2-2\n          version_name: hadoop-2.6.0.2.2.4.2-2\n          source_url: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.2/tars/hadoop-2.6.0.2.2.4.2-2.tar.gz\n          major_version: '2'\n        hdp-2.6.0-GA:\n          version: 2.6.0.2.2.0.0-2041\n          version_name: hadoop-2.6.0.2.2.0.0-2041\n          source_url: http://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.2.0.0/tars/hadoop-2.6.0.2.2.0.0-2041.tar.gz\n          major_version: '2'\n\nThis would end up provisioning the earlier update version and additionally give you a way to install the GA version - just by changing the `hadoop.version` attribute to hdp-2.6.0-GA.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaltstack-formulas%2Fhadoop-formula","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaltstack-formulas%2Fhadoop-formula","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaltstack-formulas%2Fhadoop-formula/lists"}