{"id":22190924,"url":"https://github.com/yuhexiong/deploy-hadoop-guide","last_synced_at":"2026-01-06T03:37:45.454Z","repository":{"id":248499907,"uuid":"827071766","full_name":"yuhexiong/deploy-hadoop-guide","owner":"yuhexiong","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-21T06:33:38.000Z","size":63,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T01:14:37.692Z","etag":null,"topics":["apache-hadoop","deployment","hadoop","hdfs"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yuhexiong.png","metadata":{"files":{"readme":"README-CH.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-11T01:07:57.000Z","updated_at":"2025-01-21T06:33:42.000Z","dependencies_parsed_at":"2024-07-15T11:15:48.321Z","dependency_job_id":"471e22d4-ee08-4ce8-a1d5-bad32194565f","html_url":"https://github.com/yuhexiong/deploy-hadoop-guide","commit_stats":null,"previous_names":["yuhexiong/deploy-hadoop-guide"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuhexiong%2Fdeploy-hadoop-guide","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuhexiong%2Fdeploy-hadoop-guide/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuhexiong%2Fdeploy-hadoop-guide/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuhexiong%2Fdeploy-hadoop-guide/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yuhexiong","download_url":"https://codeload.github.com/yuhexiong/deploy-hadoop-guide/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245351757,"owners_count":20601087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-hadoop","deployment","hadoop","hdfs"],"created_at":"2024-12-02T12:13:25.724Z","updated_at":"2026-01-06T03:37:40.437Z","avatar_url":"https://github.com/yuhexiong.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deploy Hadoop Guide  \n在三台虛擬機上部署 Apache Hadoop 的指南。  \n\n![image](hadoop.png)\n\n## Overview  \n\n- 虛擬機：Ubuntu v22.04.4  \n- 平台：JDK 8  \n- 系統：Hadoop v3.3.6  \n\n### Architecture  \n\n**10.0.0.1 hadoop01**：NameNode DataNode  \n**10.0.0.2 hadoop02**：SecondaryNameNode DataNode  \n**10.0.0.3 hadoop03**：DataNode  \n**10.0.0.4**：DNS（非必要）  \n掛載於 **/mnt/hadoop**  \n\n## IP And Host  \n\n### 設定 IP  \n\n```bash\nsudo vim /etc/netplan/00-installer-config.yaml\n```  \n參考 [00-installer-config.yaml](./00-installer-config.yaml)  \n\n### 設定 Hostname 和 Hosts  \n\n```bash\nsudo vim /etc/hostname\n```  \n修改為 hadoop01  \n\n```bash\nsudo vim /etc/hosts\n```  \n新增以下內容  \n```\n10.0.0.1 hadoop01\n10.0.0.2 hadoop02\n10.0.0.3 hadoop03\n```  \n\n設定完成後重新啟動虛擬機  \n```bash\nsudo reboot\n```  \n\n## Hadoop Admin  \n\n```bash\nsudo addgroup hadoop_group\nsudo adduser --ingroup hadoop_group hadoop_admin\nsudo usermod -aG sudo hadoop_admin\n```  \n\n切換到 hadoop_admin  \n```bash\nsu hadoop_admin\ncd ~\n```  \n\n## SSH Key  \n```bash\nssh-keygen -t rsa -P \"\"\ncat ~/.ssh/id_rsa.pub \u003e ~/.ssh/authorized_keys\n```  \n\n## 安裝 Java 和 Hadoop  \n\n```bash\nsudo apt-get update\n```  \n\n### Java  \n```bash\nsudo apt-get install openjdk-8-jdk\n```  \n\n### 安裝 Hadoop  \n使用 `/usr/local/hadoop` 作為 HADOOP_HOME  \n\n```bash\nwget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz\ntar zxvf hadoop-3.3.6.tar.gz\nsudo mv hadoop-3.3.6/ /usr/local/hadoop\n```  \n\n### 環境變數  \n```bash\nvim ~/.bashrc\n```  \n新增以下內容  \n```\nexport HADOOP_HOME=/usr/local/hadoop\n\nexport JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64\nexport HADOOP_INSTALL=$HADOOP_HOME\nexport HADOOP_MAPRED_HOME=$HADOOP_HOME\nexport HADOOP_COMMON_HOME=$HADOOP_HOME\nexport HADOOP_HDFS_HOME=$HADOOP_HOME\nexport YARN_HOME=$HADOOP_HOME\nexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native\nexport HADOOP_OPTS=\"$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native\"\n\nexport PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin\nexport HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar\n```  \n\n使環境變數生效  \n```bash\nsource ~/.bashrc\n```  \n\n檢查變數  \n```bash\necho $HADOOP_HOME\n```  \n\n## 設定 HDFS 配置  \n\n### CORE  \n```bash\nvim /usr/local/hadoop/etc/hadoop/core-site.xml\n```  \n參考 [core-site.xml](./core-site.xml)  \n\n設定：  \n- fs.defaultFS 指向 hadoop01  \n- hadoop.tmp.dir 設為 `/mnt/hadoop`，若出現錯誤，可以刪除此目錄並重啟。  \n\n### HDFS  \n\n```bash\nvim /usr/local/hadoop/etc/hadoop/hdfs-site.xml\n```  \n參考 [hdfs-site.xml](./hdfs-site.xml)  \n\n設定：  \n- NameNode 指向 hadoop01  \n- NameNode 和 DataNode 的 tmp.dir 設為 `/mnt/hadoop`，若出現錯誤，可以刪除此目錄並重啟。  \n- SecondaryNameNode 指向 hadoop02  \n\n### Workers（DataNode）  \n\n```bash\nsudo vim /usr/local/hadoop/etc/hadoop/workers\n```  \n新增以下內容  \n```\nhadoop01\nhadoop02\nhadoop03\n```  \n\n### Environment  \n\n```bash\nsudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh\n```  \n新增以下內容  \n```\nexport JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64\n\nexport HDFS_NAMENODE_USER=\"hadoop_admin\"\nexport HDFS_DATANODE_USER=\"hadoop_admin\"\nexport HDFS_SECONDARYNAMENODE_USER=\"hadoop_admin\"\nexport YARN_RESOURCEMANAGER_USER=\"hadoop_admin\"\nexport YARN_NODEMANAGER_USER=\"hadoop_admin\"\n```  \n\n### 掛載磁碟  \n建立目錄並更改權限  \n```bash\nsudo mkdir -p /mnt/hadoop\nsudo chmod -R 777 /mnt/hadoop\n```  \n\n## 啟動  \n\n將虛擬機複製為三份並修改 IP 和 Hostname，**不需要其他設定調整**。  \n\n### 格式化 NameNode（在 hadoop01 執行）  \n```bash\ncd $HADOOP_HOME\nbin/hdfs namenode -format\n```  \n\n### 啟動所有服務（在 hadoop01 執行）  \n```bash\nsbin/start-all.sh\n```  \n\n## 狀態檢查  \n\n### JPS（每台虛擬機執行）  \n```bash\njps\n```  \n依我們的架構設定，應預期如下結果  \n```\n2132 NameNode\n2265 DataNode\n7546 NodeManager\n9295 Jps\n```  \n\n### HDFS（在 hadoop01 執行）  \n```bash\nhdfs dfsadmin -report\n```  \n預期看到三個 DataNode，如下所示  \n```\nConfigured Capacity: 12983532773376 (11.81 TB)\nPresent Capacity: 12323766214656 (11.21 TB)\nDFS Remaining: 12323766124544 (11.21 TB)\nDFS Used: 90112 (88 KB)\nDFS Used%: 0.00%\nReplicated Blocks:\n        Under replicated blocks: 0\n        Blocks with corrupt replicas: 0\n        Missing blocks: 0\n        Missing blocks (with replication factor 1): 0\n        Low redundancy blocks with highest priority to recover: 0\n        Pending deletion blocks: 0\nErasure Coded Block Groups:\n        Low redundancy block groups: 0\n        Block groups with corrupt internal blocks: 0\n        Missing block groups: 0\n        Low redundancy blocks with highest priority to recover: 0\n        Pending deletion blocks: 0\n\n-------------------------------------------------\nLive datanodes (3):\n\nName: 10.0.0.1:9866 (hadoop01)\nHostname: hadoop01\nDecommission Status : Normal\nConfigured Capacity: 4327844257792 (3.94 TB)\nDFS Used: 339968 (332 KB)\nNon DFS Used: 2154496 (2.05 MB)\nDFS Remaining: 4107922767872 (3.74 TB)\nDFS Used%: 0.00%\nDFS Remaining%: 94.92%\nConfigured Cache Capacity: 0 (0 B)\nCache Used: 0 (0 B)\nCache Remaining: 0 (0 B)\nCache Used%: 100.00%\nCache Remaining%: 0.00%\nXceivers: 0\nLast contact: Mon Jul 15 09:28:38 UTC 2024\nLast Block Report: Mon Jul 15 07:58:02 UTC 2024\nNum of Blocks: 34\n\n...\n```  ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyuhexiong%2Fdeploy-hadoop-guide","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyuhexiong%2Fdeploy-hadoop-guide","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyuhexiong%2Fdeploy-hadoop-guide/lists"}