https://github.com/bikash/r2time

R connector for OpenTSDB: Analyzing large time-series data in R environment using data-intensive capabilities.
https://github.com/bikash/r2time

hbase mapreduce opentsdb timeseries

Last synced: 3 months ago
JSON representation

R connector for OpenTSDB: Analyzing large time-series data in R environment using data-intensive capabilities.

Host: GitHub
URL: https://github.com/bikash/r2time
Owner: bikash
Created: 2014-02-19T15:56:24.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2016-05-03T09:04:32.000Z (over 9 years ago)
Last Synced: 2025-04-02T17:53:08.732Z (6 months ago)
Topics: hbase, mapreduce, opentsdb, timeseries
Language: Java
Homepage:
Size: 60.4 MB
Stars: 6
Watchers: 2
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          R2Time connector for R to Hbase for time-series data stored by OpenTSDB.  

Analyzing time-series datasets at a massive scale is one of the biggest challenges that data scientists are facing.

 This implementation of a tool is used for analyzing large time-series data.

 It describes a way to analyze the data stored by OpenTSDB. OpenTSDB is an open

 source distributed and scalable time series database. 

 Currently tools available for time-series analysis are time and memory consuming.

 Moreover, no single tool exists that specializes on providing an efficient implementations

 of analyzing time-series data through MapReduce programming model at massive

 scale. For these reason, we have designed an efficient and distributed computing

 framework - R2Time. 

 

###Implementation for R2Time:###

```

Master Thesis: 

http://brage.bibsys.no/xmlui/handle/11250/181819

Published Paper CloudCom 2014:

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7037792

```

##Prerequisite Installation for R2Time.##

1. RHIPE

Installation steps are mention in the following link:

https://www.datadr.org/install.html

2. OpenTSDB

Installation steps are mention in the following link:

http://opentsdb.net/docs/build/html/installation.html#id1

Check OpenTSDB is runing successfully.

```

http://127.0.0.1:4242

```

##Installation of R2Time.##

```

1. $ git clone https://github.com/bikash/R2Time.git

2. $ cd R2Time

3. $ R CMD INSTALL r2time_1.0.tar.gz

```

To run R2Time it is necessary to have r2time.jar, hbase.jar, zookeeper.jar, asynchbase.jar to your HDFS location. Using rhput command from RHIPE, we can copy to HDFS location.

```

$ R

> library(RHIPE)

> rhinit()

> rhput("src_location_hbase_jar", "hdfs_location")

> rhput("src_location_zookeeper_jar", "hdfs_location")

> rhput("src_location_r2time_jar", "hdfs_location")

> rhput("src_location_asynchbase_jar", "hdfs_location")

```

R2time.jar can be downloaded from GitHUB

###Example:###

Now running simple count example in R2Time.

```

#Load all the necessary libraries

library(r2time)

library(Rhipe)

rhinit()	## Initialize rhipe framework.

library(rJava)

r2t.init()  ## Initialize R2Time  framework.

library(bitops) ## Load library for bits operation, It is used for conversion between float and integer numbers.

library(gtools)

tagk = c("host") ## Tag keys. It could be list

tagv = c("*")	## Tag values. It could be list or can be separate multiple by pipe

metric = 'r2time.load.test1' ## Assign multiple metrics

startdate ='2000/01/01-00:00:00' ## Start datetime of timeseries

enddate ="2003/01/31-10:00:00"   ## END datetime of timeseries

outputdir = "/home/bikash/tmp/mean/ex1.1" ## Output file location , should be in HDFS file system.

jobname= "Calculation for number of DP for 75 million Data points with 2 node" ## Assign relevant job description name.

mapred <- list(mapred.reduce.tasks=0) ## Mapreduce configuration, you can assign number of mapper and reducer for a task. For this case is 0, no reducer is required.

#Location of jar file in HDFS file system. Replace "/home/ekstern/haisen/bikash/tmp/r2time.jar" with your required hdfs_location of jar files.

jars=c("/home/ekstern/haisen/bikash/tmp/r2time.jar","/home/ekstern/haisen/bikash/tmp/zookeeper.jar", "/home/ekstern/haisen/bikash/tmp/hbase.jar")

# This jars need to be in HDFS file system. You can copy jar in HDFS using RHIPE rhput command

 

## Assign Zookeeper configuration. For HBase to read data, zookeeper quorum must be define.

zooinfo=list(zookeeper.znode.parent='/hbase',hbase.zookeeper.quorum='haisen24.ux.uis.no')

 

## running map function

map <- expression({

	library(bitops)

	library(r2time)

	library(gtools)

	len <- 0

	m <- lapply(seq_along(map.values), function(r) {

	 	attr <- names(map.values[[r]]);

		leng <-  length(attr)

		})

	 	rhcollect(1,sum(unlist(m)))

 })

#Reduce function 

reduce <- expression(

	pre={ len <- 0 }, 

	reduce={ len <- len + sum(sapply(reduce.values, function(x) sum(x))) }

	post={ rhcollect(reduce.key, len)

})

 

## Run job.

r2t.job(table='tsdb',sdate=startdate, edate=enddate, metrics=metric, tagk=tagk, tagv=tagv, jars=jars, zooinfo=zooinfo,	output=outputdir, jobname=jobname, mapred=mapred, map=map, reduce=reduce, setup=NULL)

t = rhread(outputdir)

```

### LIST OF FUNCTIONS ###

r2t.job:

Submit job to MapReduce. 

Input Parameters:

```

table = name of the table by default 'tsdb'

sdate = Start data of metric

edate = End date of Metric

Metrics= Name of metric

tagk = List of tag keys

tagv = List of tag values

jars = list of jars files in HDFS location

output = Path of output result to be stored in HDFS.

zooinfo = Zookeeper informations

jobname = Name of job by default MapReduce Job

map = name of map function

reduce = name of reduce function, if no reduce assign 0

setup = initialization function need before map function. 

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bikash/r2time

Awesome Lists containing this project

README