Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shelfio/dynamodb-parallel-scan
Scan large DynamoDB tables faster with parallelism
https://github.com/shelfio/dynamodb-parallel-scan
aws dynamodb npm-package
Last synced: 3 days ago
JSON representation
Scan large DynamoDB tables faster with parallelism
- Host: GitHub
- URL: https://github.com/shelfio/dynamodb-parallel-scan
- Owner: shelfio
- License: mit
- Created: 2019-02-01T19:27:16.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-12-22T04:50:44.000Z (20 days ago)
- Last Synced: 2024-12-27T08:32:07.255Z (15 days ago)
- Topics: aws, dynamodb, npm-package
- Language: TypeScript
- Size: 337 KB
- Stars: 72
- Watchers: 23
- Forks: 10
- Open Issues: 15
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
# dynamodb-parallel-scan [![CircleCI](https://circleci.com/gh/shelfio/dynamodb-parallel-scan/tree/master.svg?style=svg)](https://circleci.com/gh/shelfio/dynamodb-parallel-scan/tree/master) ![](https://img.shields.io/badge/code_style-prettier-ff69b4.svg) [![npm (scoped)](https://img.shields.io/npm/v/@shelf/dynamodb-parallel-scan.svg)](https://www.npmjs.com/package/@shelf/dynamodb-parallel-scan)
> Scan DynamoDB table concurrently (up to 1,000,000 segments), recursively read all items from every segment
[A blog post going into details about this library.](https://vladholubiev.medium.com/how-to-scan-a-23-gb-dynamodb-table-in-1-minute-110730879e2b)
## Install
```
$ yarn add @shelf/dynamodb-parallel-scan
```This library has 2 peer dependencies:
- `@aws-sdk/client-dynamodb`
- `@aws-sdk/lib-dynamodb`Make sure to install them alongside this library.
## Why this is better than a regular scan
**Easily parallelize** scan requests to fetch all items from a table at once.
This is useful when you need to scan a large table to find a small number of items that will fit the node.js memory.**Scan huge tables using async generator** or stream.
And yes, it supports streams backpressure!
Useful when you need to process a large number of items while you scan them.
It allows receiving chunks of scanned items, wait until you process them, and then resume scanning when you're ready.## Usage
### Fetch everything at once
```js
const {parallelScan} = require('@shelf/dynamodb-parallel-scan');(async () => {
const items = await parallelScan(
{
TableName: 'files',
FilterExpression: 'attribute_exists(#fileSize)',
ExpressionAttributeNames: {
'#fileSize': 'fileSize',
},
ProjectionExpression: 'fileSize',
},
{concurrency: 1000}
);console.log(items);
})();
```### Use as async generator (or streams)
Note: `highWaterMark` determines items count threshold, so Parallel Scan can fetch `concurrency` \* 1MB more data even after highWaterMark was reached.
```js
const {parallelScanAsStream} = require('@shelf/dynamodb-parallel-scan');(async () => {
const stream = await parallelScanAsStream(
{
TableName: 'files',
FilterExpression: 'attribute_exists(#fileSize)',
ExpressionAttributeNames: {
'#fileSize': 'fileSize',
},
ProjectionExpression: 'fileSize',
},
{concurrency: 1000, chunkSize: 10000, highWaterMark: 10000}
);for await (const items of stream) {
console.log(items); // 10k items here
}
})();
```## Read
- [Taking Advantage of Parallel Scans](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-query-scan.html)
- [Working with Scans](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html)![](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/ParallelScan.png)
## Publish
```sh
$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tags
```## License
MIT © [Shelf](https://shelf.io)