Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nexr/RHive
RHive is an R extension facilitating distributed computing via Apache Hive.
https://github.com/nexr/RHive
Last synced: about 2 months ago
JSON representation
RHive is an R extension facilitating distributed computing via Apache Hive.
- Host: GitHub
- URL: https://github.com/nexr/RHive
- Owner: nexr
- Created: 2011-10-27T04:35:36.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2017-07-19T01:04:45.000Z (almost 7 years ago)
- Last Synced: 2024-04-16T01:38:01.341Z (2 months ago)
- Language: R
- Homepage: http://nexr.github.io/RHive
- Size: 3.64 MB
- Stars: 122
- Watchers: 52
- Forks: 63
- Open Issues: 55
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog
Lists
- awesome-R - RHive - R extension facilitating distributed computing via Apache Hive. (Database Management)
- awesome-R - RHive - R extension facilitating distributed computing via Apache Hive. (Database Management)
- awesome-R - RHive - R extension facilitating distributed computing via Apache Hive. (Database Management)
- awesome-R - RHive - R extension facilitating distributed computing via Apache Hive. (Database Management)
- awesome_R - RHive - R extension facilitating distributed computing via Apache Hive. (Database Management)
- awesome-R - RHive - R extension facilitating distributed computing via Apache Hive. (Database Management)
- awesome-R - RHive - R extension facilitating distributed computing via Apache Hive. (Database Management)
- fucking-awesome-R - RHive - R extension facilitating distributed computing via Apache Hive. (Database Management)
README
NexR RHive 2.0
================RHive is an R extension facilitating distributed computing via HIVE query.
RHive allows easy usage of HQL(Hive SQL) in R, and allows easy usage of R objects and R functions in Hive.> Before installing RHive, you have to have installed Hadoop and Hive
## Install Hadoop
1. Single Node
- [Single node installation](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html)
2. Cluster Node
- [Cluster node installation](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html)
3. set **HADOOP_HOME** at local machine on which R runs## Install Hive
1. install local machine and remote machine on which NameNode runs or Hive-Server runs.
2. Installation Guide
- [Hive installation guide](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration)
3. set **HIVE_HOME** at local machine on which R runs.
4. launch Hive Server with following command on remote machine. it should be as a background process.
-$HIVE_HOME/bin/hive --service hiveserver
## Install R and Packages
1. install R
- need to install R on all tasktracker nodes
2. install rJava
- only install rJava on local machine.
3. install Rserve
- need to install Rserve on all tasktracker nodes
- make configuration in path (/etc/Rserv.conf) on all tasktracker nodes.
edit this file to add 'remote enable' to allow remote connection.
- launch all Rserve on all tasktracker nodes.
- e.q>R CMD Rserve
4. setting tasktracker nodes
- add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
- e.q>export R_HOME=/usr/lib/R
5. install RUnit## Install RHive
1. Requirements
- ant (in order to build java files)
2. Installing RHive
1. Download source code:git clone https://github.com/nexr/RHive.git
2. Change your working directory:cd RHive
3. Set the environment variables HIVE_HOME and HADOOP_HOME:
export HIVE_HOME=/path/to/your/hive/directory
export HADOOP_HOME=/path/to/your/hadoop/directory
5. Build java files using ant:ant build
4. Build RHive:R CMD build RHive
5. Install RHive:R CMD INSTALL RHive_.tar.gz
## Loading RHive and connecting to Hive
1. Set the environment variables HIVE_HOME and HADOOP_HOME:
- Set the environment variables:
export HIVE_HOME=/path/to/your/hive/directory
export HADOOP_HOME=/path/to/your/hadoop/directory
export HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory
- Or, add environment variables into Renviron
HIVE_HOME=/path/to/your/hive/directory
HADOOP_HOME=/path/to/your/hadoop/directory
HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory
2. launch Rlibrary(RHive)
rhive.connect(host, port, hiveServer2)
## Tutorials
- [RHive user guide](https://github.com/nexr/RHive/wiki/User-Guide)## Requirements
- Java 1.6
- R 2.13.0
- Rserve 0.6-0
- rJava 0.9-0
- Hadoop 0.20.x (x >= 1)
- Hive 0.8.x (x >= 0)