Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/qubole/presto-udfs

Plugin for Presto to allow addition of user functions easily
https://github.com/qubole/presto-udfs
Last synced: 3 months ago
JSON representation
Plugin for Presto to allow addition of user functions easily
Host: GitHub
URL: https://github.com/qubole/presto-udfs
Owner: qubole
Created: 2015-03-04T20:02:33.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2021-03-31T20:07:29.000Z (over 3 years ago)
Last Synced: 2024-07-20T23:44:22.952Z (4 months ago)
Language: Java
Size: 114 KB
Stars: 115
Watchers: 21
Forks: 65
Open Issues: 1
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        
# Presto User-Defined Functions(UDFs)

Plugin for Presto to allow addition of user defined functions. The plugin simplifies the process of adding user functions to Presto.

## Plugging in Presto UDFs

The details about how to plug in presto UDFs can be found [here](https://www.qubole.com/blog/product/plugging-in-presto-udfs/?nabe=5695374637924352:1).

## Presto Version Compatibility

| Presto Version| Last Compatible Release|

| ---------------- |:----------:|

| _ver 300+_       | current    |

| _ver 0.193-0.2xx_| udfs-2.0.3 |

| _ver 0.180_      | udfs-2.0.2 |

| _ver 0.157_      | udfs-2.0.1 |

| _ver 0.142_      | udfs-1.0.0 |

| _ver 0.119_      | udfs-0.1.3 |

## Implemented User Defined Functions

The repository contains the following UDFs implemented for Presto :

#### HIVE UDFs

* **DATE-TIME Functions**

 1. **to_utc_timestamp(timestamp, string timezone) -> timestamp** 


      Assumes given timestamp is in given timezone and converts to UTC (as of Hive 0.8.0). For example, to_utc_timestamp('1970-01-01 00:00:00','PST') returns 1970-01-01 08:00:00.

 2. **from_utc_timestamp(timestamp, string timezone) -> timestamp**


      Assumes given timestamp is UTC and converts to given timezone (as of Hive 0.8.0). For example, from_utc_timestamp('1970-01-01 08:00:00','PST') returns 1970-01-01 00:00:00.

 3. **unix_timestamp() -> timestamp**


      Gets current Unix timestamp in seconds.

 4. **year(string date) -> int**


      Returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970.

 5. **month(string date) -> int**


      Returns the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11.

 6. **day(string date) -> int**


      Returns the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1.

 7. **hour(string date) -> int**


      Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12.

 8. **minute(string date) -> int**


      Returns the minute of the timestamp: minute('2009-07-30 12:58:59') = 58, minute('12:58:59') = 58.

 9. **second(string date) -> int**


      Returns the second of the timestamp: second('2009-07-30 12:58:59') = 59, second('12:58:59') = 59.

 10. **to_date(string timestamp) -> string**


      Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"

 11. **weekofyear(string date) -> int**


      Returns the week number of a timestamp string: weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44.

 12. **date_sub(string startdate, int days) -> string**


      Subtracts a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30'.

 13. **date_add(string startdate, int days) -> string**


      Adds a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'.

 14. **datediff(string enddate, string startdate) -> string**


      Returns the number of days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2.

 15. **format_unixtimestamp(bigint unixtime[, string format]) -> string**


      Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" unless a format string is specified. If a format string is specified the epoch time is converted in the specified format. More information about the formatter can be found [here](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html).


      _**NOTE :** Due to name collision of presto 0.142's implementaion of `from_unixtime(bigint unixtime)` function, which returns the value as a timestamp type and Hive's `from_unixtime(bigint unixtime[, string format])` function, which returns the value as string type and supports formatter, the hive UDF has been implemented as `format_unixtimestamp(bigint unixtime[, string format])`._

 16. **from_duration(string duration, string duration_unit) -> double**


      Converts a string representing time duration in airlift's Duration format (https://github.com/airlift/units/blob/master/src/main/java/io/airlift/units/Duration.java) to a double representing time in specified unit: from_duration('4h', 'ms') = 1.44E7.

 17. **from_datasize(string datasize, string size_unit) -> double**


       Converts a string representing data size in airlift's DataSize format (https://github.com/airlift/units/blob/master/src/main/java/io/airlift/units/DataSize.java) to a double representing size in specified unit: from_datasize('1GB', 'B') = 1.073741824E9.

* **MATH Functions**

 1. **pmod(INT a, INT b) -> INT, pmod(DOUBLE a, DOUBLE b) -> DOUBLE**


      Returns the positive value of a mod b: pmod(17, -5) = -3.

 2. **rands(INT seed) -> DOUBLE**


      Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1. Specifying the seed will make sure the generated random number sequence is deterministic: rands(3) = 0.731057369148862 


      _**NOTE :** Due to name collision of presto 0.142's implementaion of `rand(int a)` function, which returns a number between 0 to a and Hive's `rand(int seed)` function, which sets the seed for the random number generator, the hive UDF has been implemented as `rands(int seed)`._

 3. **bin(BIGINT a) -> STRING**


      Returns the number in binary format: bin(100) = 1100100.

 4. **hex(BIGINT a) -> STRING, hex(STRING a) -> STRING, hex(BINARY a) -> STRING**


      If the argument is an INT or binary, hex returns the number as a STRING in hexadecimal format. Otherwise if the number is a STRING, it converts each character into its hexadecimal representation and returns the resulting STRING:  hex(123) = 7b, hex('123') = 7b, hex('1100100') = 64.

 5. **unhex(STRING a) -> BINARY**


      Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number: unhex('7b') = 1111011.

* **STRING Functions**

 1. **locate(string substr, string str[, int pos]) -> int** 


      Returns the position of the first occurrence of substr in str after position pos: locate('si', 'mississipi', 2) = 4, locate('si', 'mississipi', 5) = 7

 2. **find_in_set(string str, string strList) -> int** 


      Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas:  find_in_set('ab', 'abc,b,ab,c,def') returns 3.

 3. **instr(string str, string substr) -> int** 


      Returns the position of the first occurrence of substr in str. Returns null if either of the arguments are null and returns 0 if substr could not be found in str: instr('mississipi' , 'si') = 4.

* **CONDITIONAL Functions**

  1. **nvl(T value, T default_value) -> T**


      ** Supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins

      Returns default value if value is null else returns value: nvl(3,4) = 3, nvl(NULL,4) = 4.

* **MISCELLANEOUS Functions**

  1. **hash(a1[, a2...]) -> int**


      ** Supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins

      Returns a hash value of the arguments. hash('a','b','c') = 143025634.

## Adding User Defined Functions to Presto-UDFs

 Functions can be added using annotations, follow https://prestosql.io/docs/current/develop/functions.html for details on how to add functions

  ** Note that Code generated functions were supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins

## Release a new version of presto-udfs

Releases are always created from `master`. During development, `master`

has a version like `X.Y.Z-SNAPSHOT`.

    # Change version as per http://semver.org/

    mvn release:prepare -Prelease

    mvn release:perform -Prelease

    git push

    git push --tags