{"id":14655009,"url":"https://github.com/cubxxw/big_data","last_synced_at":"2025-10-27T16:31:14.781Z","repository":{"id":135947634,"uuid":"475898815","full_name":"cubxxw/big_data","owner":"cubxxw","description":"Big data, hadoop installation and deployment   ","archived":false,"fork":false,"pushed_at":"2023-02-20T01:43:21.000Z","size":8206,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-01T06:51:15.674Z","etag":null,"topics":["big-data","cluster","database","git","hadoop","linux","mysql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cubxxw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-30T13:46:58.000Z","updated_at":"2024-03-26T12:02:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"bf88808f-727e-4b12-8022-8a4e1bb98b6d","html_url":"https://github.com/cubxxw/big_data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cubxxw%2Fbig_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cubxxw%2Fbig_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cubxxw%2Fbig_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cubxxw%2Fbig_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cubxxw","download_url":"https://codeload.github.com/cubxxw/big_data/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238526720,"owners_count":19487085,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","cluster","database","git","hadoop","linux","mysql"],"created_at":"2024-09-11T11:01:04.289Z","updated_at":"2025-10-27T16:31:13.936Z","avatar_url":"https://github.com/cubxxw.png","language":null,"funding_links":[],"categories":["🗒️CS系列"],"sub_categories":[],"readme":"﻿# 大数据笔记和配置\r\n\r\n### [hadoop详细文档讲解下载](hadooop详解.docx)\r\n\r\n\r\n\r\n## TOC\r\n\r\n#### 1. [hadoop教程](markdown/1.md)\r\n\r\n#### 2. [Hadoop 运行环境](markdown/2.md)\r\n\r\n#### 3. [Hadoop 概念](markdown/3.md)\r\n\r\n#### 4. [HDFS 配置与使用](markdown/4.md)\r\n\r\n#### 5. [HDFS 集群](markdown/5.md)\r\n\r\n#### 6. [MapReduce 使用](markdown/6.md)\r\n\r\n#### 7. [MapReduce 编程](markdown/7.md)\r\n\r\n#### 8. [基本环境](markdown/8.md)\r\n\r\n#### 9. [hadoop运行模式](markdown/9.md)\r\n\r\n#### 10. [配置ssh](markdown/10.md)\r\n\r\n#### 11. [配置ip](markdown/11.md)\r\n\r\n#### 12. [三个虚拟机的配置](markdown/12.md)\r\n\r\n#### 13. [集群配置](markdown/13.md)\r\n\r\n#### 14. [配置ip](markdown/14.md)\r\n\r\n#### 15. [配置历史服务器](markdown/15.md)\r\n\r\n#### \r\n\r\n\r\n## 补充：关于 Kubernetes 部署 hadoop 方法\r\n\r\n**部署 Kubernetes 的集群：**\r\n```bash\r\nsudo /home/sealer/sealer2/_output/bin/sealer/linux_amd64/sealer run docker.io/sealerio/kubernetes:v1.22.15 --masters 10.0.0.245 --nodes 10.0.0.246,10.0.0.247  --user sealer --passwd ********\r\n```\r\n\r\n\r\n**Hadoop镜像，到docker hub上拉取：**\r\n```bash\r\ndocker pull kubeguide/hadoop:latest\r\n```\r\n\r\n**编辑 hadoop.yaml 文件：**\r\n```yaml\r\napiVersion: v1\r\nkind: ConfigMap\r\nmetadata:\r\n  name: kube-hadoop-conf\r\n  namespace: default\r\ndata:\r\n  HDFS_MASTER_SERVICE: hadoop-hdfs-master\r\n  HDOOP_YARN_MASTER: hadoop-yarn-master\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n  name: hadoop-hdfs-master\r\nspec:\r\n  type: NodePort\r\n  selector:\r\n    app: hdfs-master\r\n  ports:\r\n    - name: rpc\r\n      port: 9000\r\n      targetPort: 9000\r\n    - name: http\r\n      port: 50070\r\n      targetPort: 50070\r\n      nodePort: 32007\r\n---\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n  name: hdfs-master\r\n  labels:\r\n    app: hdfs-master\r\nspec:\r\n  containers:\r\n    - name: hdfs-master\r\n      image: 192.168.242.132/library/kubernetes-hadoop:latest\r\n      imagePullPolicy: IfNotPresent\r\n      ports:\r\n        - containerPort: 9000\r\n        - containerPort: 50070    \r\n      env:\r\n        - name: HADOOP_NODE_TYPE\r\n          value: namenode\r\n        - name: HDFS_MASTER_SERVICE\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDFS_MASTER_SERVICE\r\n        - name: HDOOP_YARN_MASTER\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDOOP_YARN_MASTER\r\n  restartPolicy: Always\r\n---\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n    name: hadoop-datanode-1\r\n    labels:\r\n      app: hadoop-datanode-1\r\nspec:\r\n  containers:\r\n    - name: hadoop-datanode-1\r\n      image: 192.168.242.132/library/kubernetes-hadoop:latest\r\n      imagePullPolicy: IfNotPresent\r\n      ports:\r\n        - containerPort: 9000\r\n        - containerPort: 50070    \r\n      env:\r\n        - name: HADOOP_NODE_TYPE\r\n          value: datanode\r\n        - name: HDFS_MASTER_SERVICE\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDFS_MASTER_SERVICE\r\n        - name: HDOOP_YARN_MASTER\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDOOP_YARN_MASTER        \r\n  restartPolicy: Always\r\n---\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n    name: hadoop-datanode-2\r\n    labels:\r\n      app: hadoop-datanode-2\r\nspec:\r\n  containers:\r\n    - name: hadoop-datanode-2\r\n      image: 192.168.242.132/library/kubernetes-hadoop:latest\r\n      imagePullPolicy: IfNotPresent\r\n      ports:\r\n        - containerPort: 9000\r\n        - containerPort: 50070    \r\n      env:\r\n        - name: HADOOP_NODE_TYPE\r\n          value: datanode\r\n        - name: HDFS_MASTER_SERVICE\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDFS_MASTER_SERVICE\r\n        - name: HDOOP_YARN_MASTER\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDOOP_YARN_MASTER        \r\n  restartPolicy: Always\r\n---\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n    name: hadoop-datanode-3\r\n    labels:\r\n      app: hadoop-datanode-3\r\nspec:\r\n  containers:\r\n    - name: hadoop-datanode-3\r\n      image: 192.168.242.132/library/kubernetes-hadoop:latest\r\n      imagePullPolicy: IfNotPresent\r\n      ports:\r\n        - containerPort: 9000\r\n        - containerPort: 50070    \r\n      env:\r\n        - name: HADOOP_NODE_TYPE\r\n          value: datanode\r\n        - name: HDFS_MASTER_SERVICE\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDFS_MASTER_SERVICE\r\n        - name: HDOOP_YARN_MASTER\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDOOP_YARN_MASTER        \r\n  restartPolicy: Always\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n  name: hadoop-yarn-master\r\nspec:\r\n  type: NodePort\r\n  selector:\r\n    app: yarn-master\r\n  ports:\r\n     - name: \"8030\"       \r\n       port: 8030\r\n     - name: \"8031\"     \r\n       port: 8031\r\n     - name: \"8032\"\r\n       port: 8032     \r\n     - name: http\r\n       port: 8088\r\n       targetPort: 8088\r\n       nodePort: 32088\r\n---\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n  name: yarn-master\r\n  labels:\r\n    app: yarn-master\r\nspec:\r\n  containers:\r\n    - name: yarn-master\r\n      image: 192.168.242.132/library/kubernetes-hadoop:latest\r\n      imagePullPolicy: IfNotPresent\r\n      ports:\r\n        - containerPort: 9000\r\n        - containerPort: 50070    \r\n      env:\r\n        - name: HADOOP_NODE_TYPE\r\n          value: resourceman\r\n        - name: HDFS_MASTER_SERVICE\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDFS_MASTER_SERVICE\r\n        - name: HDOOP_YARN_MASTER\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDOOP_YARN_MASTER          \r\n  restartPolicy: Always\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n  name: yarn-node-1\r\nspec:\r\n  clusterIP: None\r\n  selector:\r\n    app: yarn-node-1\r\n  ports:\r\n     - port: 8040\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n  name: yarn-node-2\r\nspec:\r\n  clusterIP: None\r\n  selector:\r\n    app: yarn-node-2\r\n  ports:\r\n     - port: 8040\r\n---\r\napiVersion: v1\r\nkind: Service\r\nmetadata:\r\n  name: yarn-node-3\r\nspec:\r\n  clusterIP: None\r\n  selector:\r\n    app: yarn-node-3\r\n  ports:\r\n     - port: 8040\r\n---\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n  name: yarn-node-1\r\n  labels:\r\n    app: yarn-node-1\r\nspec:\r\n  containers:\r\n    - name: yarn-node-1\r\n      image: 192.168.242.132/library/kubernetes-hadoop:latest\r\n      imagePullPolicy: IfNotPresent\r\n      ports:\r\n        - containerPort: 8040\r\n        - containerPort: 8041   \r\n        - containerPort: 8042        \r\n      env:\r\n        - name: HADOOP_NODE_TYPE\r\n          value: yarnnode\r\n        - name: HDFS_MASTER_SERVICE\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDFS_MASTER_SERVICE\r\n        - name: HDOOP_YARN_MASTER\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDOOP_YARN_MASTER          \r\n  restartPolicy: Always\r\n---\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n  name: yarn-node-2\r\n  labels:\r\n    app: yarn-node-2\r\nspec:\r\n  containers:\r\n    - name: yarn-node-2\r\n      image: 192.168.242.132/library/kubernetes-hadoop:latest\r\n      imagePullPolicy: IfNotPresent\r\n      ports:\r\n        - containerPort: 8040\r\n        - containerPort: 8041   \r\n        - containerPort: 8042        \r\n      env:\r\n        - name: HADOOP_NODE_TYPE\r\n          value: yarnnode\r\n        - name: HDFS_MASTER_SERVICE\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDFS_MASTER_SERVICE\r\n        - name: HDOOP_YARN_MASTER\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDOOP_YARN_MASTER          \r\n  restartPolicy: Always\r\n---\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n  name: yarn-node-3\r\n  labels:\r\n    app: yarn-node-3\r\nspec:\r\n  containers:\r\n    - name: yarn-node-3\r\n      image: 192.168.242.132/library/kubernetes-hadoop:latest\r\n      imagePullPolicy: IfNotPresent\r\n      ports:\r\n        - containerPort: 8040\r\n        - containerPort: 8041   \r\n        - containerPort: 8042        \r\n      env:\r\n        - name: HADOOP_NODE_TYPE\r\n          value: yarnnode\r\n        - name: HDFS_MASTER_SERVICE\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDFS_MASTER_SERVICE\r\n        - name: HDOOP_YARN_MASTER\r\n          valueFrom:\r\n            configMapKeyRef:\r\n              name: kube-hadoop-conf\r\n              key: HDOOP_YARN_MASTER          \r\n  restartPolicy: Always\r\n```\r\n\u003e 这个yaml文件包含一个ConfigMap，5个Service，8个pod，这里需要注意的是ConfigMap中HDFS_MASTER_SERVICE和HDOOP_YARN_MASTER不要使用IP，使用HDFS service的名称，否则datanode将会连接不上namenode，出现错误【ipc.Client: Retrying connect to server: xxx:9000.】\r\n\r\n**创建命令：**\r\n```bash\r\nkubectl create -f hadoop.yaml\r\n```\r\n\r\n## else\r\n\r\n\u003e 2022年6月27日 18:49:58\r\n\r\n```bash\r\n\r\n    目录: C:\\Users\\smile\\Desktop\\git\\big-date\r\n\r\n\r\nMode                 LastWriteTime         Length Name                                                                 \r\n----                 -------------         ------ ----\r\nd-----         2022/6/27     18:48                markdowm                                                             \r\n-a----         2022/6/27     18:45        4957395 hadooop详解.docx                                                       \r\n-a----         2022/6/27     18:45        3870121 hadoop.zip                                                           \r\n-a----          2022/4/8     15:58           3275 HADOOP全分布式安装.md                                                      \r\n-a----         2022/6/27     18:49             56 README.md                                                            \r\n-a----          2022/4/4     21:42          70002 solution.md                                                          \r\n-a----         2022/6/27     18:45              0 TOC.md                                                               \r\n-a----         2022/6/27     18:45           2279 配置文件.md                                                              \r\n-a----         2022/6/27     18:45          43376 集群.md                                                                \r\n\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcubxxw%2Fbig_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcubxxw%2Fbig_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcubxxw%2Fbig_data/lists"}