https://github.com/amazon-archives/dynamodb-import-export-tool

Exports DynamoDB items via parallel scan into a blocking queue, then consumes the queue and import DynamoDB items into a replica table using asynchronous writes.
https://github.com/amazon-archives/dynamodb-import-export-tool

Last synced: 3 months ago
JSON representation

Exports DynamoDB items via parallel scan into a blocking queue, then consumes the queue and import DynamoDB items into a replica table using asynchronous writes.

Host: GitHub
URL: https://github.com/amazon-archives/dynamodb-import-export-tool
Owner: amazon-archives
License: apache-2.0
Archived: true
Created: 2015-08-12T17:41:50.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2019-07-30T22:13:07.000Z (almost 6 years ago)
Last Synced: 2024-04-11T02:52:08.666Z (over 1 year ago)
Language: Java
Size: 48.8 KB
Stars: 90
Watchers: 28
Forks: 42
Open Issues: 16
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # DynamoDB Import Export Tool

The DynamoDB Import Export Tool is designed to perform parallel scans on the source table, store scan results in a queue, then consume the queue by writing the items asynchronously to a destination table.

## Requirements ##

* Maven

* JRE 1.7+

* Pre-existing source and destination DynamoDB tables

## Running as an executable

1. Build the library:

```

    mvn install

```

2. This produces the target jar in the target/ directory, to start the replication process:

java -jar dynamodb-import-export-tool.jar

--destinationEndpoint  // the DynamoDB endpoint where the destination table is located.

--destinationTable  // the destination table to write to.

--sourceEndpoint  // the endpoint where the source table is located.

--sourceTable // the source table to read from.

--readThroughputRatio  // the ratio of read throughput to consume from the source table.

--writeThroughputRatio  // the ratio of write throughput to consume from the destination table.

--maxWriteThreads  // (Optional, default=128 * Available_Processors) Maximum number of write threads to create.

--totalSections  // (Optional, default=1) Total number of sections to split the bootstrap into. Each application will only scan and write one section.

--section  // (Optional, default=0) section to read and write. Only will scan this one section of all sections, [0...totalSections-1].

--consistentScan  // (Optional, default=false) indicates whether consistent scan should be used when reading from the source table.

> **NOTE**: To split the replication process across multiple machines, simply use the totalSections & section command line arguments, where each machine will run one section out of [0 ... totalSections-1].

## Using the API

### 1. Transfer Data from One DynamoDB Table to Another DynamoDB Table

The below example will read from "mySourceTable" at 100 reads per second, using 4 threads. And it will write to "myDestinationTable" at 50 writes per second, using 8 threads.

Both tables are located at "dynamodb.us-west-1.amazonaws.com". (to transfer to a different region, create 2 AmazonDynamoDBClients

with different endpoints to pass into the DynamoDBBootstrapWorker and the DynamoDBConsumer.

```java

AmazonDynamoDBClient client = new AmazonDynamoDBClient(new ProfileCredentialsProvider());

client.setEndpoint("dynamodb.us-west-1.amazonaws.com");

DynamoDBBootstrapWorker worker = null;

try {

    // 100.0 read operations per second. 4 threads to scan the table.

    worker = new DynamoDBBootstrapWorker(client,

                100.0, "mySourceTable", 4);

} catch (NullReadCapacityException e) {

    LOGGER.error("The DynamoDB source table returned a null read capacity.", e);

    System.exit(1);

}

 // 50.0 write operations per second. 8 threads to scan the table.

DynamoDBConsumer consumer = new DynamoDBConsumer(client, "myDestinationTable", 50.0, Executors.newFixedThreadPool(8));

try {

    worker.pipe(consumer);

} catch (ExecutionException e) {

    LOGGER.error("Encountered exception when executing transfer.", e);

    System.exit(1);

} catch (InterruptedException e){

    LOGGER.error("Interrupted when executing transfer.", e);

    System.exit(1);

}

```

### 2. Transfer Data From one DynamoDB Table to a Blocking Queue.

The below example will read from a DynamoDB table and export to an array blocking queue. This is useful for when another application would like to consume

the DynamoDB entries but does not have a setup application for it. They can just retrieve the queue (consumer.getQueue()) and then continually pop() from it

to then process the new entries.

```java

AmazonDynamoDBClient client = new AmazonDynamoDBClient(new ProfileCredentialsProvider());

client.setEndpoint("dynamodb.us-west-1.amazonaws.com");

DynamoDBBootstrapWorker worker = null;

try {

    // 100.0 read operations per second. 4 threads to scan the table.

    worker = new DynamoDBBootstrapWorker(client,

                100.0, "mySourceTable", 4);

} catch (NullReadCapacityException e) {

    LOGGER.error("The DynamoDB source table returned a null read capacity.", e);

    System.exit(1);

}

BlockingQueueConsumer consumer = new BlockingQueueConsumer(8);

try {

    worker.pipe(consumer);

} catch (ExecutionException e) {

    LOGGER.error("Encountered exception when executing transfer.", e);

    System.exit(1);

} catch (InterruptedException e){

    LOGGER.error("Interrupted when executing transfer.", e);

    System.exit(1);

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amazon-archives/dynamodb-import-export-tool

Awesome Lists containing this project

README