https://github.com/lyft/dynamodb-hive-serde
Hive Deserializer for DynamoDB backup data format
https://github.com/lyft/dynamodb-hive-serde
lyft
Last synced: 3 months ago
JSON representation
Hive Deserializer for DynamoDB backup data format
- Host: GitHub
- URL: https://github.com/lyft/dynamodb-hive-serde
- Owner: lyft
- License: other
- Created: 2015-08-05T23:31:18.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2024-09-25T14:34:06.000Z (over 1 year ago)
- Last Synced: 2025-04-05T00:41:20.016Z (9 months ago)
- Topics: lyft
- Language: Java
- Homepage:
- Size: 16.6 KB
- Stars: 8
- Watchers: 638
- Forks: 4
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dynamodb-hive-serde
Hive Deserializer for DynamoDB backup data format.
When AWS Data Pipeline is used to [export backups](http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-pipelinejson-verifydata2.html) of DynamoDB tables, the file format is somewhat difficult to parse in Hive. This custom deserializer makes it easy to process files in hive without any pre-processing.
Simply install the DynamoDbSerDe jar and specify the row format as the DynamoDB SerDe in your queries. Pick the DynamoDb column names you want to access and a type they should be. Per line of data the DynamoDb SerDe will locate the columns you specified and coerce the values into the types you specify.
Example query:
```sql
ADD jar /path/to/jar/dynamodb-hive-serde-1.0-SNAPSHOT.jar;
CREATE EXTERNAL TABLE dynamodb (id string, updated_at string, created_at string, version int)
ROW FORMAT SERDE 'com.lyft.hive.serde.DynamoDbSerDe'
LOCATION '/dynamodb/input/';
```
## Timestamp format
You can specify a custom time format, which will be used to construct a [Joda Time DateTimeFormatter](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html). For example:
```sql
CREATE EXTERNAL TABLE dynamodb (id string, updated_at timestamp, created_at timestamp, version int)
ROW FORMAT SERDE 'com.lyft.hive.serde.DynamoDbSerDe'
WITH SERDEPROPERTIES ('input.timestamp.format'='yyyy-MM-dd\'T\'HH:mm:ss.SSSSSSZ')
LOCATION '/dynamodb/input/';
```
## Building
First, install maven, then:
```
mvn package
```