Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alibaba/mpich2-yarn
Running MPICH2 on Yarn
https://github.com/alibaba/mpich2-yarn
Last synced: 2 months ago
JSON representation
Running MPICH2 on Yarn
- Host: GitHub
- URL: https://github.com/alibaba/mpich2-yarn
- Owner: alibaba
- Created: 2012-08-23T03:57:56.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2017-10-08T16:14:12.000Z (over 6 years ago)
- Last Synced: 2024-02-23T12:32:10.117Z (4 months ago)
- Language: Java
- Size: 17.2 MB
- Stars: 114
- Watchers: 34
- Forks: 62
- Open Issues: 19
-
Metadata Files:
- Readme: README.md
Lists
- awesome-hadoop - mpich2-yarn - Running MPICH2 on Yarn (YARN)
- awesome-hadoop - mpich2-yarn - Running MPICH2 on Yarn (YARN)
README
mpich-yarn
===========
#IntroductionMPICH-yarn is an application running on Hadoop YARN that enables
MPI programs running on Hadoop YARN clusters.##Prerequisite
As a prerequisite, you need to
1. The cluster has been deployed Hadoop YARN and HDFS.
2. Each node in the cluster has installed mpich-3.1.2 and its ./bin
folder has been added to PATH.This version of mpich-yarn uses MPICH-3.1.2 as implementation of MPI
and uses ssh as communication daemon.##Recommended Configuation
1. Ubuntu 12.04 LTS
2. hadoop 2.4.1
3. gcc 4.6.3
4. jdk 1.7.0_25
5. Apache Maven 3.2.3#Compile
To compile MPICH-yarn, first you need to have maven installed. Then
type command at source folder:mvn clean package -Dmaven.test.skip=true
You need to ensure Internet connected as maven needs to download plugins
on the maven repository, this may take minutes.After this command, you will get mpich2-yarn-1.0-SNAPSHOT.jar at
./target folder. This is the application running at YARN to execute
MPI programs.#Configuation
There are many tutorials on the Internet about configuring Hadoop. However,
there are many troubles in configuring YARN to make it work well with mpich2-
yarn. To save your time, here is a sample configuration that has successfully
run in our cluster for your reference.yarn-site.xml
yarn.resourcemanager.resource-tracker.address
${YOUR_HOST_IP_OR_NAME}:8031
yarn.resourcemanager.address
${YOUR_HOST_IP_OR_NAME}:8032
yarn.resourcemanager.hostname
${YOUR_HOST_IP_OR_NAME}
yarn.resourcemanager.scheduler.address
${YOUR_HOST_IP_OR_NAME}:8030
yarn.resourcemanager.admin.address
${YOUR_HOST_IP_OR_NAME}:8033
yarn.resourcemanager.webapp.address
${YOUR_HOST_IP_OR_NAME}:8088
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.resource.cpu-vcore
16
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.application.classpath
/home/hadoop/hadoop-2.4.1/etc/hadoop,
/home/hadoop/hadoop-2.4.1/share/hadoop/common/*,
/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/*,
/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/*,
/home/hadoop/hadoop-2.4.1/share/hadoop/common/lib/*,
/home/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/*,
/home/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/*
mpi-site.conf
yarn.mpi.scratch.dir
The HDFS address that stores temporary file like:
hdfs://sandking04:9000/home/hadoop/mpi-tmp
yarn.mpi.ssh.authorizedkeys.path
/home/hadoop/.ssh/authorized_keys
MPICH-YARN will create a temporary RSA key pair for
password-less login and automatically configure it.
All of your hosts should enable public_key login.
#Submit Jobs
## CPI
On the client nodes:
mpicc -o cpi cpi.c
hadoop jar mpich2-yarn-1.0-SNAPSHOT.jar -a cpi -M 1024 -m 1024 -n 2## Hello world
hadoop jar mpich2-yarn-1.0-SNAPSHOT.jar -a hellow -M 1024 -m 1024 -n 2
## PLDA
svn checkout http://plda.googlecode.com/svn/trunk/ plda # Prepare source code
cd plda
make # call mpicc to compile
cd ..Put the input data to the hdfs (P.S. there is a testdata in the PLDA source
code dir):hadoop fs -mkdir /group/dc/zhuoluo.yzl/plda\_input
hadoop fs -put plda/testdata/test\_data.txt /group/dc/zhuoluo.yzl/plda\_input/
hadoop jar mpich2-yarn-1.0-SNAPSHOT.jar -a plda/mpi\_lda -M 1024 -m 1024 -n 2\
-o "--num_topics 2 --alpha 0.1 --beta 0.01 --training_data_file MPIFILE1 --model_file MPIOUTFILE1 --total_iterations 150"\
-DMPIFILE1=/group/dc/zhuoluo.yzl/plda_input -SMPIFILE1=true -OMPIOUTFILE1=/group/dc/zhuoluo.yzl/lda_model_output.txt -ppc 2