{"id":15290801,"url":"https://github.com/shelfio/dynamodb-parallel-scan","last_synced_at":"2025-04-12T22:19:16.925Z","repository":{"id":34079852,"uuid":"168749441","full_name":"shelfio/dynamodb-parallel-scan","owner":"shelfio","description":"Scan large DynamoDB tables faster with parallelism","archived":false,"fork":false,"pushed_at":"2025-04-09T22:23:42.000Z","size":344,"stargazers_count":73,"open_issues_count":15,"forks_count":11,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-09T23:26:32.210Z","etag":null,"topics":["aws","dynamodb","npm-package"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shelfio.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-01T19:27:16.000Z","updated_at":"2025-01-31T15:58:46.000Z","dependencies_parsed_at":"2023-12-21T01:10:02.115Z","dependency_job_id":"2e405dbb-f901-4d5f-b254-af72fc945862","html_url":"https://github.com/shelfio/dynamodb-parallel-scan","commit_stats":{"total_commits":437,"total_committers":19,"mean_commits":23.0,"dds":0.5308924485125859,"last_synced_commit":"8c78c0015eeac3ca3fd38b32ba54f69c7124c7d2"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shelfio%2Fdynamodb-parallel-scan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shelfio%2Fdynamodb-parallel-scan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shelfio%2Fdynamodb-parallel-scan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shelfio%2Fdynamodb-parallel-scan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shelfio","download_url":"https://codeload.github.com/shelfio/dynamodb-parallel-scan/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248637977,"owners_count":21137580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","dynamodb","npm-package"],"created_at":"2024-09-30T16:09:32.940Z","updated_at":"2025-04-12T22:19:16.900Z","avatar_url":"https://github.com/shelfio.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dynamodb-parallel-scan [![CircleCI](https://circleci.com/gh/shelfio/dynamodb-parallel-scan/tree/master.svg?style=svg)](https://circleci.com/gh/shelfio/dynamodb-parallel-scan/tree/master) ![](https://img.shields.io/badge/code_style-prettier-ff69b4.svg) [![npm (scoped)](https://img.shields.io/npm/v/@shelf/dynamodb-parallel-scan.svg)](https://www.npmjs.com/package/@shelf/dynamodb-parallel-scan)\n\n\u003e Scan DynamoDB table concurrently (up to 1,000,000 segments), recursively read all items from every segment\n\n[A blog post going into details about this library.](https://vladholubiev.medium.com/how-to-scan-a-23-gb-dynamodb-table-in-1-minute-110730879e2b)\n\n## Install\n\n```\n$ yarn add @shelf/dynamodb-parallel-scan\n```\n\nThis library has 2 peer dependencies:\n\n- `@aws-sdk/client-dynamodb`\n- `@aws-sdk/lib-dynamodb`\n\nMake sure to install them alongside this library.\n\n## Why this is better than a regular scan\n\n**Easily parallelize** scan requests to fetch all items from a table at once.\nThis is useful when you need to scan a large table to find a small number of items that will fit the node.js memory.\n\n**Scan huge tables using async generator** or stream.\nAnd yes, it supports streams backpressure!\nUseful when you need to process a large number of items while you scan them.\nIt allows receiving chunks of scanned items, wait until you process them, and then resume scanning when you're ready.\n\n## Usage\n\n### Fetch everything at once\n\n```js\nconst {parallelScan} = require('@shelf/dynamodb-parallel-scan');\n\n(async () =\u003e {\n  const items = await parallelScan(\n    {\n      TableName: 'files',\n      FilterExpression: 'attribute_exists(#fileSize)',\n      ExpressionAttributeNames: {\n        '#fileSize': 'fileSize',\n      },\n      ProjectionExpression: 'fileSize',\n    },\n    {concurrency: 1000}\n  );\n\n  console.log(items);\n})();\n```\n\n### Use as async generator (or streams)\n\nNote: `highWaterMark` determines items count threshold, so Parallel Scan can fetch `concurrency` \\* 1MB more data even after highWaterMark was reached.\n\n```js\nconst {parallelScanAsStream} = require('@shelf/dynamodb-parallel-scan');\n\n(async () =\u003e {\n  const stream = await parallelScanAsStream(\n    {\n      TableName: 'files',\n      FilterExpression: 'attribute_exists(#fileSize)',\n      ExpressionAttributeNames: {\n        '#fileSize': 'fileSize',\n      },\n      ProjectionExpression: 'fileSize',\n    },\n    {concurrency: 1000, chunkSize: 10000, highWaterMark: 10000}\n  );\n\n  for await (const items of stream) {\n    console.log(items); // 10k items here\n  }\n})();\n```\n\n## Read\n\n- [Taking Advantage of Parallel Scans](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-query-scan.html)\n- [Working with Scans](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html)\n\n![](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/images/ParallelScan.png)\n\n## Publish\n\n```sh\n$ git checkout master\n$ yarn version\n$ yarn publish\n$ git push origin master --tags\n```\n\n## License\n\nMIT © [Shelf](https://shelf.io)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshelfio%2Fdynamodb-parallel-scan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshelfio%2Fdynamodb-parallel-scan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshelfio%2Fdynamodb-parallel-scan/lists"}