Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/qubole/presto-udfs
Plugin for Presto to allow addition of user functions easily
https://github.com/qubole/presto-udfs
Last synced: 3 months ago
JSON representation
Plugin for Presto to allow addition of user functions easily
- Host: GitHub
- URL: https://github.com/qubole/presto-udfs
- Owner: qubole
- Created: 2015-03-04T20:02:33.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2021-03-31T20:07:29.000Z (over 3 years ago)
- Last Synced: 2024-07-20T23:44:22.952Z (4 months ago)
- Language: Java
- Size: 114 KB
- Stars: 115
- Watchers: 21
- Forks: 65
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Presto User-Defined Functions(UDFs)
Plugin for Presto to allow addition of user defined functions. The plugin simplifies the process of adding user functions to Presto.## Plugging in Presto UDFs
The details about how to plug in presto UDFs can be found [here](https://www.qubole.com/blog/product/plugging-in-presto-udfs/?nabe=5695374637924352:1).## Presto Version Compatibility
| Presto Version| Last Compatible Release|
| ---------------- |:----------:|
| _ver 300+_ | current |
| _ver 0.193-0.2xx_| udfs-2.0.3 |
| _ver 0.180_ | udfs-2.0.2 |
| _ver 0.157_ | udfs-2.0.1 |
| _ver 0.142_ | udfs-1.0.0 |
| _ver 0.119_ | udfs-0.1.3 |## Implemented User Defined Functions
The repository contains the following UDFs implemented for Presto :#### HIVE UDFs
* **DATE-TIME Functions**
1. **to_utc_timestamp(timestamp, string timezone) -> timestamp**
Assumes given timestamp is in given timezone and converts to UTC (as of Hive 0.8.0). For example, to_utc_timestamp('1970-01-01 00:00:00','PST') returns 1970-01-01 08:00:00.
2. **from_utc_timestamp(timestamp, string timezone) -> timestamp**
Assumes given timestamp is UTC and converts to given timezone (as of Hive 0.8.0). For example, from_utc_timestamp('1970-01-01 08:00:00','PST') returns 1970-01-01 00:00:00.
3. **unix_timestamp() -> timestamp**
Gets current Unix timestamp in seconds.
4. **year(string date) -> int**
Returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970.
5. **month(string date) -> int**
Returns the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11.
6. **day(string date) -> int**
Returns the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1.
7. **hour(string date) -> int**
Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12.
8. **minute(string date) -> int**
Returns the minute of the timestamp: minute('2009-07-30 12:58:59') = 58, minute('12:58:59') = 58.
9. **second(string date) -> int**
Returns the second of the timestamp: second('2009-07-30 12:58:59') = 59, second('12:58:59') = 59.
10. **to_date(string timestamp) -> string**
Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"
11. **weekofyear(string date) -> int**
Returns the week number of a timestamp string: weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44.
12. **date_sub(string startdate, int days) -> string**
Subtracts a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30'.
13. **date_add(string startdate, int days) -> string**
Adds a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'.
14. **datediff(string enddate, string startdate) -> string**
Returns the number of days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2.
15. **format_unixtimestamp(bigint unixtime[, string format]) -> string**
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" unless a format string is specified. If a format string is specified the epoch time is converted in the specified format. More information about the formatter can be found [here](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html).
_**NOTE :** Due to name collision of presto 0.142's implementaion of `from_unixtime(bigint unixtime)` function, which returns the value as a timestamp type and Hive's `from_unixtime(bigint unixtime[, string format])` function, which returns the value as string type and supports formatter, the hive UDF has been implemented as `format_unixtimestamp(bigint unixtime[, string format])`._
16. **from_duration(string duration, string duration_unit) -> double**
Converts a string representing time duration in airlift's Duration format (https://github.com/airlift/units/blob/master/src/main/java/io/airlift/units/Duration.java) to a double representing time in specified unit: from_duration('4h', 'ms') = 1.44E7.
17. **from_datasize(string datasize, string size_unit) -> double**
Converts a string representing data size in airlift's DataSize format (https://github.com/airlift/units/blob/master/src/main/java/io/airlift/units/DataSize.java) to a double representing size in specified unit: from_datasize('1GB', 'B') = 1.073741824E9.* **MATH Functions**
1. **pmod(INT a, INT b) -> INT, pmod(DOUBLE a, DOUBLE b) -> DOUBLE**
Returns the positive value of a mod b: pmod(17, -5) = -3.
2. **rands(INT seed) -> DOUBLE**
Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1. Specifying the seed will make sure the generated random number sequence is deterministic: rands(3) = 0.731057369148862
_**NOTE :** Due to name collision of presto 0.142's implementaion of `rand(int a)` function, which returns a number between 0 to a and Hive's `rand(int seed)` function, which sets the seed for the random number generator, the hive UDF has been implemented as `rands(int seed)`._
3. **bin(BIGINT a) -> STRING**
Returns the number in binary format: bin(100) = 1100100.
4. **hex(BIGINT a) -> STRING, hex(STRING a) -> STRING, hex(BINARY a) -> STRING**
If the argument is an INT or binary, hex returns the number as a STRING in hexadecimal format. Otherwise if the number is a STRING, it converts each character into its hexadecimal representation and returns the resulting STRING: hex(123) = 7b, hex('123') = 7b, hex('1100100') = 64.
5. **unhex(STRING a) -> BINARY**
Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number: unhex('7b') = 1111011.* **STRING Functions**
1. **locate(string substr, string str[, int pos]) -> int**
Returns the position of the first occurrence of substr in str after position pos: locate('si', 'mississipi', 2) = 4, locate('si', 'mississipi', 5) = 7
2. **find_in_set(string str, string strList) -> int**
Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas: find_in_set('ab', 'abc,b,ab,c,def') returns 3.
3. **instr(string str, string substr) -> int**
Returns the position of the first occurrence of substr in str. Returns null if either of the arguments are null and returns 0 if substr could not be found in str: instr('mississipi' , 'si') = 4.* **CONDITIONAL Functions**
1. **nvl(T value, T default_value) -> T**
** Supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins
Returns default value if value is null else returns value: nvl(3,4) = 3, nvl(NULL,4) = 4.* **MISCELLANEOUS Functions**
1. **hash(a1[, a2...]) -> int**
** Supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins
Returns a hash value of the arguments. hash('a','b','c') = 143025634.## Adding User Defined Functions to Presto-UDFs
Functions can be added using annotations, follow https://prestosql.io/docs/current/develop/functions.html for details on how to add functions** Note that Code generated functions were supported only till v1.0.0 due to the limitations presto new versions of Presto puts on plugins
## Release a new version of presto-udfs
Releases are always created from `master`. During development, `master`
has a version like `X.Y.Z-SNAPSHOT`.# Change version as per http://semver.org/
mvn release:prepare -Prelease
mvn release:perform -Prelease
git push
git push --tags