Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vilcek/HiveKVStorageHandler2
Hive Storage Handler for Oracle NoSQL Database v2
https://github.com/vilcek/HiveKVStorageHandler2
Last synced: 3 months ago
JSON representation
Hive Storage Handler for Oracle NoSQL Database v2
- Host: GitHub
- URL: https://github.com/vilcek/HiveKVStorageHandler2
- Owner: vilcek
- License: apache-2.0
- Created: 2013-03-11T16:35:42.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2014-04-14T23:18:35.000Z (over 10 years ago)
- Last Synced: 2024-02-14T04:33:41.612Z (9 months ago)
- Language: Java
- Size: 8.45 MB
- Stars: 2
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.html
- License: LICENSE
Awesome Lists containing this project
- awesome-hive - Oracle NoSQL
README
HiveKVStorageHandler
About HiveKVStorageHandler:
This is an implementation of a Storage Handler to query data stored in Oracle NoSQL Database via Hive.
Note: This version works only with Oracle NoSQL Database v2.x.
Written by: Alexandre Vilcek ([email protected])
If you want to know what a Hive Storage Handler is:
Current Limitations:
- Supports only external non-native Hive tables.
- Writing data to Oracle NoSQLDB is not supported yet.
- Parsing of Hive SerDe properties is very rudimentary yet and spaces between NoSQL DB keys definitions in the key mappings properties in the Hive table create statement will cause key names to be misinterpreted.
- Columns names and types specified in the Hive table definition are ignored; only NoSQL DB Major and Minor Keys mappings in the Hive table create statement define the column names.
- A NoSQL DB Value for a given key is always interpred as string in the Hive table.
Hive CREATE TABLE Syntax:
CREATE EXTERNAL TABLE <hive_table_name> (column_name column_type,column_name column type, ...)
STORED BY 'org.vilcek.hive.kv.KVHiveStorageHandler'
WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "<majorKey1,majorKey2,...>", "kv.minor.keys.mapping" = "<minorKey1,minorKey2,...>")
TBLPROPERTIES ("kv.host.port" = "<kvstore hostname>:<kvstore port number>", "kv.name" = "<kvstore name>");
Example:
Data stored in Oracle NoSQL Database:
/Smith/Bob/-/birthdate: 05/02/1975
/Smith/Bob/-/phonenumber: 1111-1111
/Smith/Bob/-/userid: 1
/Smith/Patricia/-/birthdate: 10/25/1967
/Smith/Patricia/-/phonenumber: 2222-2222
/Smith/Patricia/-/userid: 2
/Wong/Bill/-/birthdate: 03/10/1982
/Wong/Bill/-/phonenumber: 3333-3333
/Wong/Bill/-/userid: 3
Table definition and query in Hive:
hive> ADD JAR HiveKVStorageHandler.jar;
hive> CREATE EXTERNAL TABLE nosqldbtest (lastname string, firstname string, birthdate string, phonenumber string, userid string)
STORED BY 'org.vilcek.hive.kv.KVHiveStorageHandler'
WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "lastname,firstname", "kv.minor.keys.mapping" = "birthdate,phonenumber,userID")
TBLPROPERTIES ("kv.host.port" = "localhost:5000", "kv.name" = "kvstore");
hive> SELECT * FROM nosqldbtest;
OK
Smith
Patricia
10/25/1967
NULL
NULL
Smith
Patricia
NULL
2222-2222
NULL
Smith
Patricia
NULL
NULL
2
Smith
Bob
05/02/1975
NULL
NULL
Smith
Bob
NULL
1111-1111
NULL
Smith
Bob
NULL
NULL
1
Wong
Bill
03/10/1982
NULL
NULL
Wong
Bill
NULL
3333-3333
NULL
Wong
Bill
NULL
NULL
3
Note: Please consider setting hive execution mode to local when working with small datasets. This avoids the overhead of lauching MapReduce jobs and in some cases the query will execute much faster. The better way to do that is by letting hive decide when to run jobs locally or not:
hive> set hive.exec.mode.local.auto=true
hive> SELECT lastname, firstname, collect_set(birthdate)[0], collect_set(phonenumber)[0], collect_set(userid)[0]
FROM nosqldbtest
GROUP BY lastname, firstname;
OK
Smith
Bob
05/02/1975
1111-1111
1
Smith
Patricia
10/25/1967
2222-2222
2
Wong
Bill
03/10/1982
3333-3333
3