https://github.com/vilcek/HiveKVStorageHandler2

Hive Storage Handler for Oracle NoSQL Database v2
https://github.com/vilcek/HiveKVStorageHandler2

Last synced: 2 months ago
JSON representation

Hive Storage Handler for Oracle NoSQL Database v2

Host: GitHub
URL: https://github.com/vilcek/HiveKVStorageHandler2
Owner: vilcek
License: apache-2.0
Created: 2013-03-11T16:35:42.000Z (over 12 years ago)
Default Branch: master
Last Pushed: 2014-04-14T23:18:35.000Z (about 11 years ago)
Last Synced: 2025-04-11T21:14:30.648Z (3 months ago)
Language: Java
Size: 8.45 MB
Stars: 2
Watchers: 2
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.html
- License: LICENSE

Awesome Lists containing this project

awesome-hive - Oracle NoSQL

README

HiveKVStorageHandler

About HiveKVStorageHandler:

This is an implementation of a Storage Handler to query data stored in Oracle NoSQL Database via Hive.

Note: This version works only with Oracle NoSQL Database v2.x.

Written by: Alexandre Vilcek ([email protected])

If you want to know what a Hive Storage Handler is:

Hive Storage Handler

Current Limitations:

Supports only external non-native Hive tables.

Writing data to Oracle NoSQLDB is not supported yet.

Parsing of Hive SerDe properties is very rudimentary yet and spaces between NoSQL DB keys definitions in the key mappings properties in the Hive table create statement will cause key names to be misinterpreted.

Columns names and types specified in the Hive table definition are ignored; only NoSQL DB Major and Minor Keys mappings in the Hive table create statement define the column names.

A NoSQL DB Value for a given key is always interpred as string in the Hive table.

Hive CREATE TABLE Syntax:

CREATE EXTERNAL TABLE <hive_table_name> (column_name column_type,column_name column type, ...)

STORED BY 'org.vilcek.hive.kv.KVHiveStorageHandler'

WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "<majorKey1,majorKey2,...>", "kv.minor.keys.mapping" = "<minorKey1,minorKey2,...>")

TBLPROPERTIES ("kv.host.port" = "<kvstore hostname>:<kvstore port number>", "kv.name" = "<kvstore name>");

Example:

Data stored in Oracle NoSQL Database:

/Smith/Bob/-/birthdate: 05/02/1975

/Smith/Bob/-/phonenumber: 1111-1111

/Smith/Bob/-/userid: 1

/Smith/Patricia/-/birthdate: 10/25/1967

/Smith/Patricia/-/phonenumber: 2222-2222

/Smith/Patricia/-/userid: 2

/Wong/Bill/-/birthdate: 03/10/1982

/Wong/Bill/-/phonenumber: 3333-3333

/Wong/Bill/-/userid: 3

Table definition and query in Hive:

hive> ADD JAR HiveKVStorageHandler.jar;

hive> CREATE EXTERNAL TABLE nosqldbtest (lastname string, firstname string, birthdate string, phonenumber string, userid string)

STORED BY 'org.vilcek.hive.kv.KVHiveStorageHandler'

WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "lastname,firstname", "kv.minor.keys.mapping" = "birthdate,phonenumber,userID")

TBLPROPERTIES ("kv.host.port" = "localhost:5000", "kv.name" = "kvstore");

hive> SELECT * FROM nosqldbtest;

OK

Smith
Patricia
10/25/1967
NULL
NULL

Smith
Patricia
NULL
2222-2222
NULL

Smith
Patricia
NULL
NULL
2

Smith
Bob
05/02/1975
NULL
NULL

Smith
Bob
NULL
1111-1111
NULL

Smith
Bob
NULL
NULL
1

Wong
Bill
03/10/1982
NULL
NULL

Wong
Bill
NULL
3333-3333
NULL

Wong
Bill
NULL
NULL
3

Note: Please consider setting hive execution mode to local when working with small datasets. This avoids the overhead of lauching MapReduce jobs and in some cases the query will execute much faster. The better way to do that is by letting hive decide when to run jobs locally or not:

hive> set hive.exec.mode.local.auto=true

hive> SELECT lastname, firstname, collect_set(birthdate)[0], collect_set(phonenumber)[0], collect_set(userid)[0]

FROM nosqldbtest

GROUP BY lastname, firstname;

OK

Smith
Bob
05/02/1975
1111-1111
1

Smith
Patricia
10/25/1967
2222-2222
2

Wong
Bill
03/10/1982
3333-3333
3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome