{"id":13680981,"url":"https://github.com/iamabug/BigDataParty","last_synced_at":"2025-04-30T02:33:44.644Z","repository":{"id":42351339,"uuid":"225847534","full_name":"iamabug/BigDataParty","owner":"iamabug","description":"大数据组件 All-in-One 的 Dockerfile","archived":false,"fork":false,"pushed_at":"2024-11-19T03:27:10.000Z","size":62,"stargazers_count":88,"open_issues_count":1,"forks_count":28,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-19T04:26:23.063Z","etag":null,"topics":["big-data","dockerfile","hadoop","kafka","spark"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iamabug.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-04T11:08:49.000Z","updated_at":"2024-11-19T03:27:14.000Z","dependencies_parsed_at":"2022-08-28T05:52:48.668Z","dependency_job_id":null,"html_url":"https://github.com/iamabug/BigDataParty","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamabug%2FBigDataParty","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamabug%2FBigDataParty/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamabug%2FBigDataParty/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamabug%2FBigDataParty/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iamabug","download_url":"https://codeload.github.com/iamabug/BigDataParty/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251629573,"owners_count":21618201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","dockerfile","hadoop","kafka","spark"],"created_at":"2024-08-02T13:01:24.553Z","updated_at":"2025-04-30T02:33:44.363Z","avatar_url":"https://github.com/iamabug.png","language":"Shell","funding_links":[],"categories":["Shell"],"sub_categories":[],"readme":"# BigDataParty\n\n大数据组件 All-in-One 的 Dockerfile。\n\n## 1. 基本信息\n\n各个组件的版本信息如下（MySQL的 root 密码为 root）：\n\n|     组件      |     版本     |\n| :-----------: | :----------: |\n|   基础镜像    | ubuntu:18.04 |\n|    Hadoop     |    3.1.4     |\n|     Spark     |    2.4.4     |\n| Hive (on Tez) |    3.1.2     |\n|      Tez      |    0.9.2     |\n|      Hue      |    4.5.0     |\n|     Flink     |    1.9.1     |\n|   Zookeeper   |    3.5.6     |\n|     Kafka     |    2.3.1     |\n|     MySQL     |     5.7      |\n\n## 2. 启动说明\n\n镜像已经推送到Docker Hub，直接执行如下命令应当会开始拉取镜像：\n\n```\ndocker run -it -p 8088:8088 -p 8888:8888 -h bigdata iamabug1128/bdp bash\n```\n\n或者 clone 本项目并执行 `run-bdp.sh` 脚本。\n\n\u003e 8088 是 YARN 的 Web UI 端口，8888 是 Hue 的端口。\n\u003e\n\u003e 主机名必须指定为 bigdata。\n\n进入镜像后，启动所有组件的命令：\n\n```\n/run/entrypoint.sh\n```\n\n或者，单独启动 Kafka：\n\n```bash\n/run/start_kafka.sh\n```\n\n查看进程，确认所有进程都已经启动：\n\n```bash\nroot@bigdata:/# jps\n1796 ResourceManager\n1316 DataNode\n2661 RunJar\n1205 NameNode\n2662 RunJar\n3719 Jps\n1914 NodeManager\n1530 SecondaryNameNode\n523 QuorumPeerMain\n543 Kafka\n```\n\n除了 Hue 安装在 `/usr/share/hue` 、MySQL 安装在系统路径以外，其它所有的组件的安装在 `/usr/local/` 目录下：\n\n```bash\nroot@bigdata:/# ls /usr/local/      \nbin  etc  flink  games  hadoop  hive  include  kafka  lib  man  sbin  share  spark  src  tez  zookeeper\n```\n\n## 3. 使用示例\n\n### 3.1 使用 Hue 上传文件到 HDFS\n\n访问 `localhost:8888` ，输入 `admin, admin` 登录 Hue，点击左侧 `Files` 导航按钮，出现文件浏览器页面：\n\n![](https://tva1.sinaimg.cn/large/006tNbRwly1g9rj4l1p6jj32l60jqafp.jpg)\n\n点击右上角的 `Upload` 按钮，选择一个文件上传，上传后页面：\n\n![](https://tva1.sinaimg.cn/large/006tNbRwly1g9rj9tqq8qj31fc0b83zp.jpg)\n\n回到容器的命令行中，查看 `/user/admin` 目录：\n\n![](https://tva1.sinaimg.cn/large/006tNbRwly1g9rjbu571mj31dg08mn2d.jpg)\n\n说明上传确实成功了。\n\n### 3.2 运行 Flink on Yarn 的 WordCount 例子\n\n在命令行中切换到 `/usr/local/flink` 目录，执行 `./bin/flink run -m yarn-cluster -p 4 -yjm 1024m -ytm 4096m ./examples/batch/WordCount.jar`：\n\n![](https://tva1.sinaimg.cn/large/006tNbRwly1g9rjqylh15j32ia0qunpd.jpg)\n\n在浏览器中打开 `http://localhost:8088`，可以看到正在执行的 Flink 任务：\n\n![](https://tva1.sinaimg.cn/large/006tNbRwly1g9rjp3xjjoj327e0e2jv8.jpg)\n\n任务顺利完成：\n\n![](https://tva1.sinaimg.cn/large/006tNbRwly1g9rjsiqu1qj318a0hagxq.jpg)\n\n## 4. 构建说明\n\n目录结构如下：\n\n```bash\nBigDataParty $ tree               \n.\n├── Dockerfile\n├── README.md\n├── build.sh\n├── conf\n├── packages\n├── run-bdp.sh\n└── scripts\n```\n\n除了 README 和 Dockerfile 各文件目录简介如下：\n\n* build.sh：下载各组件的压缩包并执行 `docker build`\n* run-bdp.sh：运行构建好的镜像，并暴露 Hue 和 Yarn 的 Web 端口\n* conf：存放各个组件的配置文件，构建镜像时拷贝到各组件的目录下\n* packages：存放各个组件的压缩包，构建镜像时解压到 `/usr/local` 目录下\n* scripts：存放各个组件初始化和启动脚本，构建镜像时拷贝到 `/run` 目录下\n\n## 5. 待续\n\n写这个镜像的目的是为了方便自己平时使用（学习、测试、验证等等），以后还会继续完善，如果你有兴趣，欢迎加入我。\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamabug%2FBigDataParty","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiamabug%2FBigDataParty","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamabug%2FBigDataParty/lists"}