Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/big-data-europe/docker-hadoop
Apache Hadoop docker image
https://github.com/big-data-europe/docker-hadoop
docker docker-hadoop hadoop hadoop-cluster hadoop-docker
Last synced: 4 days ago
JSON representation
Apache Hadoop docker image
- Host: GitHub
- URL: https://github.com/big-data-europe/docker-hadoop
- Owner: big-data-europe
- Created: 2016-03-09T13:13:52.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-02-01T07:07:08.000Z (12 months ago)
- Last Synced: 2025-01-10T11:11:12.248Z (11 days ago)
- Topics: docker, docker-hadoop, hadoop, hadoop-cluster, hadoop-docker
- Language: Shell
- Homepage:
- Size: 109 KB
- Stars: 2,222
- Watchers: 81
- Forks: 1,317
- Open Issues: 85
-
Metadata Files:
- Readme: README.md
- Changelog: historyserver/Dockerfile
Awesome Lists containing this project
README
[![Gitter chat](https://badges.gitter.im/gitterHQ/gitter.png)](https://gitter.im/big-data-europe/Lobby)
# Changes
Version 2.0.0 introduces uses wait_for_it script for the cluster startup
# Hadoop Docker
## Supported Hadoop Versions
See repository branches for supported hadoop versions## Quick Start
To deploy an example HDFS cluster, run:
```
docker-compose up
```Run example wordcount job:
```
make wordcount
```Or deploy in swarm:
```
docker stack deploy -c docker-compose-v3.yml hadoop
````docker-compose` creates a docker network that can be found by running `docker network list`, e.g. `dockerhadoop_default`.
Run `docker network inspect` on the network (e.g. `dockerhadoop_default`) to find the IP the hadoop interfaces are published on. Access these interfaces with the following URLs:
* Namenode: http://:9870/dfshealth.html#tab-overview
* History server: http://:8188/applicationhistory
* Datanode: http://:9864/
* Nodemanager: http://:8042/node
* Resource manager: http://:8088/## Configure Environment Variables
The configuration parameters can be specified in the hadoop.env file or as environmental variables for specific services (e.g. namenode, datanode etc.):
```
CORE_CONF_fs_defaultFS=hdfs://namenode:8020
```CORE_CONF corresponds to core-site.xml. fs_defaultFS=hdfs://namenode:8020 will be transformed into:
```
fs.defaultFShdfs://namenode:8020
```
To define dash inside a configuration parameter, use triple underscore, such as YARN_CONF_yarn_log___aggregation___enable=true (yarn-site.xml):
```
yarn.log-aggregation-enabletrue
```The available configurations are:
* /etc/hadoop/core-site.xml CORE_CONF
* /etc/hadoop/hdfs-site.xml HDFS_CONF
* /etc/hadoop/yarn-site.xml YARN_CONF
* /etc/hadoop/httpfs-site.xml HTTPFS_CONF
* /etc/hadoop/kms-site.xml KMS_CONF
* /etc/hadoop/mapred-site.xml MAPRED_CONFIf you need to extend some other configuration file, refer to base/entrypoint.sh bash script.