https://github.com/widdix/s3-getobject-accelerator

Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.
https://github.com/widdix/s3-getobject-accelerator

aws aws-nodejs aws-s3

Last synced: 11 months ago
JSON representation

Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.

Host: GitHub
URL: https://github.com/widdix/s3-getobject-accelerator
Owner: widdix
License: mit
Created: 2023-03-21T07:58:41.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-06-03T18:03:44.000Z (about 1 year ago)
Last Synced: 2025-06-24T09:07:33.438Z (about 1 year ago)
Topics: aws, aws-nodejs, aws-s3
Language: JavaScript
Homepage:
Size: 265 KB
Stars: 17
Watchers: 1
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # S3 GetObject Accelerator

Get large objects from S3 by using parallel byte-range fetches/parts without the AWS SDK to improve performance.

> We measured a troughoput of 6.5 Gbit/s on an m5zn.6xlarge in eu-west-1 using this lib with this settings: `{concurrency: 64}`.

## Installation

```bash

npm i s3-getobject-accelerator

```

## Examples

### Compact

```js

const {createWriteStream} = require('node:fs');

const {pipeline} = require('node:stream');

const {download} = require('s3-getobject-accelerator');

pipeline(

  download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4}).readStream(),

  createWriteStream('/tmp/test'),

  (err) => {

    if (err) {

      console.error('something went wrong', err);

    } else {

      console.log('done');

    }

  }

);

```

### More verbose

Get insights into the part downloads and write to file directly without stream if it is smaller than 1 TiB:

```js

const {download} = require('s3-getobject-accelerator');

const d = download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4});

d.on('part:downloading', ({partNo}) => {

  console.log('start downloading part', partNo);

});

d.on('part:downloaded', ({partNo}) => {

  console.log('part downloaded, write to disk next in correct order', partNo);

});

d.on('part:writing', ({partNo}) => {

  console.log('start writing part to disk', partNo);

});

d.on('part:done', ({partNo}) => {

  console.log('part written to disk', partNo);

});

d.meta((err, metadata) => {

  if (err) {

    console.error('something went wrong', err);

  } else {

    if (metadata.lengthInBytes > 1024 * 1024 * 1024 * 1024) {

      console.error('file is larger than 1 TiB');

    } else {

      d.file('/tmp/test', (err) => {

        if (err) {

          console.error('something went wrong', err);

        } else {

          console.log('done');

        }

      });

    }

  }

});

```

## API

### download(s3source, options)

* `s3source` ``

  * `bucket` ``

  * `key` ``

  * `version` `` (optional)

* `options` ``

  * `partSizeInMegabytes` `` (optional, defaults to uploaded part size)

  * `concurrency` ``

  * `requestTimeoutInMilliseconds` `` Maxium time for a request to complete from start to finish (optional, defaults to 300,000, 0 := no timeout)

  * `resolveTimeoutInMilliseconds` `` Maximum time for a DNS query to resolve (optional, defaults to 3,000, 0 := no timeout)

  * `connectionTimeoutInMilliseconds` `` Maximum time for a socket to connect (optional, defaults to 3,000, 0 := no timeout)

  * `readTimeoutInMilliseconds` `` Maxium time to read the response body (optional, defaults to 300,000, 0 := no timeout)

  * `dataTimeoutInMilliseconds` `` Maxium time between two data events while reading the response body (optional, defaults to 3,000, 0 := no timeout)

  * `writeTimeoutInMilliseconds` `` Maxium time to write the request body (optional, defaults to 300,000, 0 := no timeout)

  * `region` `` (optional, defaults to [see AWS credentials & region](#aws-region))

  * `v2AwsSdkCredentials` `` (optional)

  * `endpointHostname` `` (optional, defaults to ${bucket}.s3.${region}.amazonaws.com or s3.${region}.amazonaws.com if the bucket contains a dot)

  * `agent` `` (optional)

* Returns:

  * `meta(cb)` `` Get meta-data before starting the download (downloads the first part and keeps the body in memory until download starts)

    * `cb(err, metadata)` ``

      * `err` ``

      * `metadata` ``

        * `lengthInBytes` ``

        * `parts` `` Number of parts available (optional)

  * `readStream()` `` Start download

    * Returns: [ReadStream](https://nodejs.org/api/stream.html#class-streamreadable)

  * `file(path, cb)` `` Start download

    * `path` ``

    * `cb(err)` ``

      * `err` ``

  * `abort([err])` `` Abort download

    * `err` ``

  * `partsDownloading()` `` Number of parts downloading at the moment

    * Returns ``

  * `addListener(eventName, listener)` See https://nodejs.org/api/events.html#emitteraddlistenereventname-listener

  * `off(eventName, listener)` See https://nodejs.org/api/events.html#emitteroffeventname-listener

  * `on(eventName, listener)` See https://nodejs.org/api/events.html#emitteroneventname-listener

  * `once(eventName, listener)` See https://nodejs.org/api/events.html#emitteronceeventname-listener

  * `removeListener(eventName, listener)` See https://nodejs.org/api/events.html#emitterremovelistenereventname-listener 

## AWS credentials

AWS credentials are fetched in the following order:

1. `options.v2AwsSdkCredentials`

2. Environment variables

  * `AWS_ACCESS_KEY_ID`

  * `AWS_SECRET_ACCESS_KEY`

  * `AWS_SESSION_TOKEN` (optional)

3. IMDSv2

## AWS region

AWS region is fetched in the following order:

1. `options.region`

2. Environment variable `AWS_REGION`

3. IMDSv2

## Considerations

* Typical sizes `partSizeInMegabytes` are 8 MB or 16 MB. If objects are uploaded using a multipart upload, it’s a good practice to download them in the same part sizes ( do not specify `partSizeInMegabytes`), or at least aligned to part boundaries, for best performance (see https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html).

* Keep in mind that you pay per GET request to Amazon S3. The smaller the parts, the more expensive a download is.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/widdix/s3-getobject-accelerator

Awesome Lists containing this project

README