Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/shelfio/dynamodb-parallel-scan

Scan large DynamoDB tables faster with parallelism
https://github.com/shelfio/dynamodb-parallel-scan

aws dynamodb npm-package

Last synced: 2 days ago
JSON representation

Scan large DynamoDB tables faster with parallelism

Host: GitHub
URL: https://github.com/shelfio/dynamodb-parallel-scan
Owner: shelfio
License: mit
Created: 2019-02-01T19:27:16.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2024-12-22T04:50:44.000Z (about 2 months ago)
Last Synced: 2024-12-27T08:32:07.255Z (about 2 months ago)
Topics: aws, dynamodb, npm-package
Language: TypeScript
Size: 337 KB
Stars: 72
Watchers: 23
Forks: 10
Open Issues: 15
Metadata Files:
- Readme: readme.md
- License: license

Awesome Lists containing this project

README

        # dynamodb-parallel-scan [![CircleCI](https://circleci.com/gh/shelfio/dynamodb-parallel-scan/tree/master.svg?style=svg)](https://circleci.com/gh/shelfio/dynamodb-parallel-scan/tree/master) ![](https://img.shields.io/badge/code_style-prettier-ff69b4.svg) [![npm (scoped)](https://img.shields.io/npm/v/@shelf/dynamodb-parallel-scan.svg)](https://www.npmjs.com/package/@shelf/dynamodb-parallel-scan)

> Scan DynamoDB table concurrently (up to 1,000,000 segments), recursively read all items from every segment

[A blog post going into details about this library.](https://vladholubiev.medium.com/how-to-scan-a-23-gb-dynamodb-table-in-1-minute-110730879e2b)

## Install

```

$ yarn add @shelf/dynamodb-parallel-scan

```

This library has 2 peer dependencies:

- `@aws-sdk/client-dynamodb`

- `@aws-sdk/lib-dynamodb`

Make sure to install them alongside this library.

## Why this is better than a regular scan

**Easily parallelize** scan requests to fetch all items from a table at once.

This is useful when you need to scan a large table to find a small number of items that will fit the node.js memory.

**Scan huge tables using async generator** or stream.

And yes, it supports streams backpressure!

Useful when you need to process a large number of items while you scan them.

It allows receiving chunks of scanned items, wait until you process them, and then resume scanning when you're ready.

## Usage

### Fetch everything at once

```js

const {parallelScan} = require('@shelf/dynamodb-parallel-scan');

(async () => {

  const items = await parallelScan(

    {

      TableName: 'files',

      FilterExpression: 'attribute_exists(#fileSize)',

      ExpressionAttributeNames: {

        '#fileSize': 'fileSize',

      },

      ProjectionExpression: 'fileSize',

    },

    {concurrency: 1000}

  );

  console.log(items);

})();

```

### Use as async generator (or streams)

Note: `highWaterMark` determines items count threshold, so Parallel Scan can fetch `concurrency` \* 1MB more data even after highWaterMark was reached.

```js

const {parallelScanAsStream} = require('@shelf/dynamodb-parallel-scan');

(async () => {

  const stream = await parallelScanAsStream(

    {

      TableName: 'files',

      FilterExpression: 'attribute_exists(#fileSize)',

      ExpressionAttributeNames: {

        '#fileSize': 'fileSize',

      },

      ProjectionExpression: 'fileSize',

    },

    {concurrency: 1000, chunkSize: 10000, highWaterMark: 10000}

  );

  for await (const items of stream) {

    console.log(items); // 10k items here

  }

})();

```

## Read

- [Taking Advantage of Parallel Scans](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-query-scan.html)

- [Working with Scans](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html)

![](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/ParallelScan.png)

## Publish

```sh

$ git checkout master

$ yarn version

$ yarn publish

$ git push origin master --tags

```

## License

MIT © [Shelf](https://shelf.io)