https://github.com/lyft/dynamodb-hive-serde

Hive Deserializer for DynamoDB backup data format
https://github.com/lyft/dynamodb-hive-serde

lyft

Last synced: 3 months ago
JSON representation

Hive Deserializer for DynamoDB backup data format

Host: GitHub
URL: https://github.com/lyft/dynamodb-hive-serde
Owner: lyft
License: other
Created: 2015-08-05T23:31:18.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2024-09-25T14:34:06.000Z (over 1 year ago)
Last Synced: 2025-04-05T00:41:20.016Z (9 months ago)
Topics: lyft
Language: Java
Homepage:
Size: 16.6 KB
Stars: 8
Watchers: 638
Forks: 4
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# dynamodb-hive-serde
Hive Deserializer for DynamoDB backup data format.

When AWS Data Pipeline is used to [export backups](http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-pipelinejson-verifydata2.html) of DynamoDB tables, the file format is somewhat difficult to parse in Hive. This custom deserializer makes it easy to process files in hive without any pre-processing.

Simply install the DynamoDbSerDe jar and specify the row format as the DynamoDB SerDe in your queries. Pick the DynamoDb column names you want to access and a type they should be. Per line of data the DynamoDb SerDe will locate the columns you specified and coerce the values into the types you specify.

Example query:
```sql
ADD jar /path/to/jar/dynamodb-hive-serde-1.0-SNAPSHOT.jar;

CREATE EXTERNAL TABLE dynamodb (id string, updated_at string, created_at string, version int)
ROW FORMAT SERDE 'com.lyft.hive.serde.DynamoDbSerDe'
LOCATION '/dynamodb/input/';
```

## Timestamp format
You can specify a custom time format, which will be used to construct a [Joda Time DateTimeFormatter](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html). For example:
```sql
CREATE EXTERNAL TABLE dynamodb (id string, updated_at timestamp, created_at timestamp, version int)
ROW FORMAT SERDE 'com.lyft.hive.serde.DynamoDbSerDe'
WITH SERDEPROPERTIES ('input.timestamp.format'='yyyy-MM-dd\'T\'HH:mm:ss.SSSSSSZ')
LOCATION '/dynamodb/input/';
```

## Building
First, install maven, then:
```
mvn package
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lyft/dynamodb-hive-serde

Awesome Lists containing this project

README