Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hbutani/spark-datetime
functionstest
https://github.com/hbutani/spark-datetime
Last synced: 17 days ago
JSON representation
functionstest
- Host: GitHub
- URL: https://github.com/hbutani/spark-datetime
- Owner: hbutani
- License: apache-2.0
- Created: 2015-06-23T02:53:25.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-10-25T00:56:49.000Z (about 8 years ago)
- Last Synced: 2024-01-06T10:10:56.956Z (11 months ago)
- Language: Scala
- Homepage:
- Size: 1.2 MB
- Stars: 33
- Watchers: 9
- Forks: 11
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Spark DateTime Library
A library for exposing dateTime functions from the [joda time library](http://www.joda.org/joda-time/) as SQL
functions. Also provide a dsl for dateTime catylst expressions; this utilizes the
[scala wrapper library](https://github.com/nscala-time/nscala-time nscala).[![Build Status](https://travis-ci.org/SparklineData/spark-datetime.svg?branch=master)](https://travis-ci.org/SparklineData/spark-datetime)
## Requirements
This library requires Spark 1.4+
## Linking
You can link against this library in your program at the following coordiates:```
groupId: org.sparklinedata
artifactId: spark-datetime_2.10
version: 0.0.1
```## Using with Spark shell
This package can be added to Spark using the `--jars` command line option. For example, to include it when starting the spark shell:```
$ bin/spark-shell --packages org.sparklinedata:spark-datetime_2.10:0.0.1
```## Features
* A set of functions from the joda library to operate on dates.
* `field access`: all functions in the [DateTime class](http://www.joda.org/joda-time/apidocs/org/joda/time/DateTime.html) are
available as sql functions. The first argument is the DateTime object on which the function is to be applied.
* `construction`: functions are available to convert a String or a epoch value to DateTime
* `comparison` functions available to compare dates (=, <, <=, >, >=), also compare against __now__.
* `arithmetic`: functions available to add/subtract [Period](http://www.joda.org/joda-time/apidocs/org/joda/time/Period.html)
from dates.
* `intervals`: functions available to construct [Intervals](http://www.joda.org/joda-time/apidocs/org/joda/time/Interval.html)
and compare(contains, overlaps, abuts, gap) intervals and dateTimes.
* A _dsl_ for dateTime catylst expressions.
* A _StringContext_ to embed date expressions in SQL statements.### Function naming convention
* getter functions on the [DateTime class](http://www.joda.org/joda-time/apidocs/org/joda/time/DateTime.html) are exposed with the same
name, in camelCase. So _getYear_ is exposed as _year_, _getMonthOfYear_ is exposed as _monthOfYear_ etc.### SQL API
Assume you have a table _input_ with a string column called _dt_```sql
select dt, dateTime(dt), dayOfWeek(dateTime(dt)), dayOfWeekName(dateTime(dt)), dayOfWeekName(dateTimeWithTZ(dt))
from input
```### Date Expressions using the DSL
#### A basic example
```scala
import com.github.nscala_time.time.Imports._
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.sparklinedata.spark.dateTime.dsl.expressions._
import org.sparklinedata.spark.dateTime.Functions// register all functions
Functions.register(sqlContext)val dT = dateTime('dt)
val dOW = dateTime('dt) dayOfWeek
val dOWNm = dateTime('dt) dayOfWeekName
val dOWNm2 = dateTimeWithTZ('dt) dayOfWeekName
val dTFixed = dateTime("2015-05-22T08:52:41.903-07:00")val t = sql(date"select dt, $dT, $dOW, $dOWNm, $dOWNm2, $dTFixed," +
" dateTime(\"2015-05-22T08:52:41.903-07:00\") from input")
```#### An example about periods
```scala
import com.github.nscala_time.time.Imports._
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.sparklinedata.spark.dateTime.dsl.expressions._val dT = dateTime('dt)
val dT1 = dateTime('dt) + 3.months
val dT2 = dateTime('dt) - 3.months
val dT3 = dateTime('dt) + 12.week
val cE = dateTime('dt) + 3.months > (dateTime('dt) + 12.week)val t = sql(date"select dt, $dT, $dT1, $dT2, $dT3, $cE from input")
```#### Weekend filter example
```scala
val filter : Expression = ((dateTime('dt) dayOfWeekName) === "Saturday") ||
((dateTime('dt) dayOfWeekName) === "Sunday")val t = sql(date"select dt from input where $filter")
```#### Group By example
```scala
val dayOfWeek: Expression = dateTime('dt) dayOfWeekNameval t = sql(date"select $dayOfWeek, count(*) from input group by $dayOfWeek")
```#### Interval example
```scala
val i1 = END_DATE - 15.day to END_DATE - 10.dayval isBefore = i1 isBeforeE dateTime('dt)
val isAfter = i1 isAfterE dateTime('dt)
val i2 = dateTime('dt) to (dateTime('dt) + 5.days)
val overlapsE = i1 overlapsE i2
val abutsE = i1 abutsE i2val t = sql(date"select dt, $isBefore, $isAfter, $overlapsE, $abutsE from input")
```#### Time Bucketing
Use this feature to bucket dates into given Periods. For e.g. 8.hours, 30.mins, 2.days etc.The following example buckets rows into 3 day periods. The *bucket* function on a DateExpression
takes an _origin_ date and a Period specification. The Period is an
[iso8061 specification for period.](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/ISOPeriodFormat.html#standard\(\))```scala
val start = dateTime("2015-06-23T17:27:43.769-07:00")
val dT = dateTime('dt)
val timeBucket = dateTime('dt) bucket(start, 3.days)val t = sql(date"select dt, $dT, $timeBucket from input")
```Or the direct sql for the above query is:
```sql
select dt, dateTime(`dt`),
timeBucket(dateTime(`dt`),dateTime("2015-05-24T17:27:43.769-07:00"),period("P3D"))
from input
```## Building From Source
This library is built with [SBT](http://www.scala-sbt.org/0.13/docs/Command-Line-Reference.html), which is
automatically downloaded by the included shell script.
To build a JAR file simply run `build/sbt package` from the project root.
The build configuration includes support for both Scala 2.10 and 2.11.