Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dimajix/docker-hadoop
Repository for building Docker containers for Hadoop
https://github.com/dimajix/docker-hadoop
docker hadoop
Last synced: 7 days ago
JSON representation
Repository for building Docker containers for Hadoop
- Host: GitHub
- URL: https://github.com/dimajix/docker-hadoop
- Owner: dimajix
- License: apache-2.0
- Created: 2017-04-18T07:51:40.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-04-11T13:59:17.000Z (almost 6 years ago)
- Last Synced: 2024-11-09T17:38:42.531Z (2 months ago)
- Topics: docker, hadoop
- Language: Shell
- Size: 21.5 KB
- Stars: 1
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Docker Hadoop
This image provides Docker containers for Hadoop including support for Alluxio.
## Building
This Hadoop build also includes support for Alluxio. But since as of today no Alluxio build is available for the Hadoop
version of this image (Hadoop 2.8.0), a custom built Alluxio client is installed. It can be built from Alluxio 1.4 via
mvn clean install -Dhadoop.version=2.8.0 -DskipTests=true
## ConfigurationThe Hadoop image provides lots of configuration options. The most important ones are:
HDFS_NAMENODE_PORT=8020
HDFS_NAMENODE_HOSTNAME=hadoop-namenode
HDFS_DEFAULT_FS=hdfs://hadoop-namenode:8020
HDFS_DATANODE_DIRS=/mnt/dataNode1,/mnt/dataNode2
HDFS_NAMENODE_DIRS=/mnt/nameNode1,/mnt/nameNode2YARN_RESOURCEMANAGER_HOSTNAME=hadoop-resourcemanager
YARN_NODEMANAGER_CORES=32
YARN_NODEMANAGER_MEMORY=65536
YARN_NODEMANAGER_LOCALDIRSMAPRED_HISTORYSERVER_HOSTNAME=hadoop-historyserver
# Deployment
## Minimum Configuration
You need to specify at least the following environment variables:
YARN_RESOURCEMANAGER_HOSTNAME=hadoop-resourcemanager
MAPRED_HISTORYSERVER_HOSTNAME=hadoop-historyserver
YARN_NODEMANAGER_CORES=32
YARN_NODEMANAGER_MEMORY=65536
HDFS_NAMENODE_HOSTNAME=hadoop-namenode
Then you can setup a complete Hadoop cluster using the following commands:
* namenode - Runs Hadoop Namenode
* resourcemanager - Runs Hadoop Resource Manager
* historyserver - Runs Hadoop History Server
* slavenode - Runs a Hadoop slave node containg node manager and data nodeYou require exactly one namenode and one resourcemanager and at least one slavenode. You can scale up by running
multiple slave nodes.## S3 properties
Since many users want to access data stored on AWS S3, it is also possible to specify AWS credentials and general
settings.S3_PROXY_HOST=
S3_PROXY_PORT=-1
S3_PROXY_USE_HTTPS=false
S3_ENDPOINT=s3.amazonaws.com
S3_ENDPOINT_HTTP_PORT=80
S3_ENDPOINT_HTTPS_PORT=443AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=