https://github.com/bikash/r2time
R connector for OpenTSDB: Analyzing large time-series data in R environment using data-intensive capabilities.
https://github.com/bikash/r2time
hbase mapreduce opentsdb timeseries
Last synced: 3 months ago
JSON representation
R connector for OpenTSDB: Analyzing large time-series data in R environment using data-intensive capabilities.
- Host: GitHub
- URL: https://github.com/bikash/r2time
- Owner: bikash
- Created: 2014-02-19T15:56:24.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2016-05-03T09:04:32.000Z (over 9 years ago)
- Last Synced: 2025-04-02T17:53:08.732Z (6 months ago)
- Topics: hbase, mapreduce, opentsdb, timeseries
- Language: Java
- Homepage:
- Size: 60.4 MB
- Stars: 6
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
R2Time connector for R to Hbase for time-series data stored by OpenTSDB.
Analyzing time-series datasets at a massive scale is one of the biggest challenges that data scientists are facing.
This implementation of a tool is used for analyzing large time-series data.
It describes a way to analyze the data stored by OpenTSDB. OpenTSDB is an open
source distributed and scalable time series database.
Currently tools available for time-series analysis are time and memory consuming.
Moreover, no single tool exists that specializes on providing an efficient implementations
of analyzing time-series data through MapReduce programming model at massive
scale. For these reason, we have designed an efficient and distributed computing
framework - R2Time.
###Implementation for R2Time:###
```
Master Thesis:
http://brage.bibsys.no/xmlui/handle/11250/181819Published Paper CloudCom 2014:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7037792
```##Prerequisite Installation for R2Time.##
1. RHIPE
Installation steps are mention in the following link:
https://www.datadr.org/install.html2. OpenTSDB
Installation steps are mention in the following link:
http://opentsdb.net/docs/build/html/installation.html#id1Check OpenTSDB is runing successfully.
```
http://127.0.0.1:4242
```##Installation of R2Time.##
```
1. $ git clone https://github.com/bikash/R2Time.git
2. $ cd R2Time
3. $ R CMD INSTALL r2time_1.0.tar.gz
```To run R2Time it is necessary to have r2time.jar, hbase.jar, zookeeper.jar, asynchbase.jar to your HDFS location. Using rhput command from RHIPE, we can copy to HDFS location.
```
$ R
> library(RHIPE)
> rhinit()
> rhput("src_location_hbase_jar", "hdfs_location")
> rhput("src_location_zookeeper_jar", "hdfs_location")
> rhput("src_location_r2time_jar", "hdfs_location")
> rhput("src_location_asynchbase_jar", "hdfs_location")
```R2time.jar can be downloaded from GitHUB
###Example:###
Now running simple count example in R2Time.```
#Load all the necessary libraries
library(r2time)
library(Rhipe)
rhinit() ## Initialize rhipe framework.
library(rJava)
r2t.init() ## Initialize R2Time framework.
library(bitops) ## Load library for bits operation, It is used for conversion between float and integer numbers.
library(gtools)tagk = c("host") ## Tag keys. It could be list
tagv = c("*") ## Tag values. It could be list or can be separate multiple by pipe
metric = 'r2time.load.test1' ## Assign multiple metrics
startdate ='2000/01/01-00:00:00' ## Start datetime of timeseries
enddate ="2003/01/31-10:00:00" ## END datetime of timeseries
outputdir = "/home/bikash/tmp/mean/ex1.1" ## Output file location , should be in HDFS file system.
jobname= "Calculation for number of DP for 75 million Data points with 2 node" ## Assign relevant job description name.
mapred <- list(mapred.reduce.tasks=0) ## Mapreduce configuration, you can assign number of mapper and reducer for a task. For this case is 0, no reducer is required.
#Location of jar file in HDFS file system. Replace "/home/ekstern/haisen/bikash/tmp/r2time.jar" with your required hdfs_location of jar files.
jars=c("/home/ekstern/haisen/bikash/tmp/r2time.jar","/home/ekstern/haisen/bikash/tmp/zookeeper.jar", "/home/ekstern/haisen/bikash/tmp/hbase.jar")
# This jars need to be in HDFS file system. You can copy jar in HDFS using RHIPE rhput command
## Assign Zookeeper configuration. For HBase to read data, zookeeper quorum must be define.
zooinfo=list(zookeeper.znode.parent='/hbase',hbase.zookeeper.quorum='haisen24.ux.uis.no')
## running map function
map <- expression({
library(bitops)
library(r2time)
library(gtools)
len <- 0
m <- lapply(seq_along(map.values), function(r) {
attr <- names(map.values[[r]]);
leng <- length(attr)
})
rhcollect(1,sum(unlist(m)))
})#Reduce function
reduce <- expression(
pre={ len <- 0 },
reduce={ len <- len + sum(sapply(reduce.values, function(x) sum(x))) }
post={ rhcollect(reduce.key, len)
})
## Run job.
r2t.job(table='tsdb',sdate=startdate, edate=enddate, metrics=metric, tagk=tagk, tagv=tagv, jars=jars, zooinfo=zooinfo, output=outputdir, jobname=jobname, mapred=mapred, map=map, reduce=reduce, setup=NULL)
t = rhread(outputdir)
```### LIST OF FUNCTIONS ###
r2t.job:
Submit job to MapReduce.
Input Parameters:
```
table = name of the table by default 'tsdb'
sdate = Start data of metric
edate = End date of Metric
Metrics= Name of metric
tagk = List of tag keys
tagv = List of tag values
jars = list of jars files in HDFS location
output = Path of output result to be stored in HDFS.
zooinfo = Zookeeper informations
jobname = Name of job by default MapReduce Job
map = name of map function
reduce = name of reduce function, if no reduce assign 0
setup = initialization function need before map function.
```