https://github.com/shelfio/dynamodb-parallel-scan
Scan large DynamoDB tables faster with parallelism
https://github.com/shelfio/dynamodb-parallel-scan
aws dynamodb npm-package
Last synced: about 1 year ago
JSON representation
Scan large DynamoDB tables faster with parallelism
- Host: GitHub
- URL: https://github.com/shelfio/dynamodb-parallel-scan
- Owner: shelfio
- License: mit
- Created: 2019-02-01T19:27:16.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2025-04-09T22:23:42.000Z (about 1 year ago)
- Last Synced: 2025-04-09T23:26:32.210Z (about 1 year ago)
- Topics: aws, dynamodb, npm-package
- Language: TypeScript
- Size: 336 KB
- Stars: 73
- Watchers: 22
- Forks: 11
- Open Issues: 15
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
# dynamodb-parallel-scan [](https://circleci.com/gh/shelfio/dynamodb-parallel-scan/tree/master)  [](https://www.npmjs.com/package/@shelf/dynamodb-parallel-scan)
> Scan DynamoDB table concurrently (up to 1,000,000 segments), recursively read all items from every segment
[A blog post going into details about this library.](https://vladholubiev.medium.com/how-to-scan-a-23-gb-dynamodb-table-in-1-minute-110730879e2b)
## Install
```
$ yarn add @shelf/dynamodb-parallel-scan
```
This library has 2 peer dependencies:
- `@aws-sdk/client-dynamodb`
- `@aws-sdk/lib-dynamodb`
Make sure to install them alongside this library.
## Why this is better than a regular scan
**Easily parallelize** scan requests to fetch all items from a table at once.
This is useful when you need to scan a large table to find a small number of items that will fit the node.js memory.
**Scan huge tables using async generator** or stream.
And yes, it supports streams backpressure!
Useful when you need to process a large number of items while you scan them.
It allows receiving chunks of scanned items, wait until you process them, and then resume scanning when you're ready.
## Usage
### Fetch everything at once
```js
const {parallelScan} = require('@shelf/dynamodb-parallel-scan');
(async () => {
const items = await parallelScan(
{
TableName: 'files',
FilterExpression: 'attribute_exists(#fileSize)',
ExpressionAttributeNames: {
'#fileSize': 'fileSize',
},
ProjectionExpression: 'fileSize',
},
{concurrency: 1000}
);
console.log(items);
})();
```
### Use as async generator (or streams)
Note: `highWaterMark` determines items count threshold, so Parallel Scan can fetch `concurrency` \* 1MB more data even after highWaterMark was reached.
```js
const {parallelScanAsStream} = require('@shelf/dynamodb-parallel-scan');
(async () => {
const stream = await parallelScanAsStream(
{
TableName: 'files',
FilterExpression: 'attribute_exists(#fileSize)',
ExpressionAttributeNames: {
'#fileSize': 'fileSize',
},
ProjectionExpression: 'fileSize',
},
{concurrency: 1000, chunkSize: 10000, highWaterMark: 10000}
);
for await (const items of stream) {
console.log(items); // 10k items here
}
})();
```
## Read
- [Taking Advantage of Parallel Scans](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-query-scan.html)
- [Working with Scans](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html)

## Publish
```sh
$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags
```
## License
MIT © [Shelf](https://shelf.io)