An open API service indexing awesome lists of open source software.

https://github.com/widdix/s3-getobject-accelerator

Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.
https://github.com/widdix/s3-getobject-accelerator

aws aws-nodejs aws-s3

Last synced: 11 months ago
JSON representation

Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.

Awesome Lists containing this project

README

          

# S3 GetObject Accelerator

Get large objects from S3 by using parallel byte-range fetches/parts without the AWS SDK to improve performance.

> We measured a troughoput of 6.5 Gbit/s on an m5zn.6xlarge in eu-west-1 using this lib with this settings: `{concurrency: 64}`.

## Installation

```bash
npm i s3-getobject-accelerator
```

## Examples

### Compact

```js
const {createWriteStream} = require('node:fs');
const {pipeline} = require('node:stream');
const {download} = require('s3-getobject-accelerator');

pipeline(
download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4}).readStream(),
createWriteStream('/tmp/test'),
(err) => {
if (err) {
console.error('something went wrong', err);
} else {
console.log('done');
}
}
);
```

### More verbose

Get insights into the part downloads and write to file directly without stream if it is smaller than 1 TiB:

```js
const {download} = require('s3-getobject-accelerator');

const d = download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4});

d.on('part:downloading', ({partNo}) => {
console.log('start downloading part', partNo);
});
d.on('part:downloaded', ({partNo}) => {
console.log('part downloaded, write to disk next in correct order', partNo);
});
d.on('part:writing', ({partNo}) => {
console.log('start writing part to disk', partNo);
});
d.on('part:done', ({partNo}) => {
console.log('part written to disk', partNo);
});

d.meta((err, metadata) => {
if (err) {
console.error('something went wrong', err);
} else {
if (metadata.lengthInBytes > 1024 * 1024 * 1024 * 1024) {
console.error('file is larger than 1 TiB');
} else {
d.file('/tmp/test', (err) => {
if (err) {
console.error('something went wrong', err);
} else {
console.log('done');
}
});
}
}
});
```

## API

### download(s3source, options)

* `s3source` ``
* `bucket` ``
* `key` ``
* `version` `` (optional)
* `options` ``
* `partSizeInMegabytes` `` (optional, defaults to uploaded part size)
* `concurrency` ``
* `requestTimeoutInMilliseconds` `` Maxium time for a request to complete from start to finish (optional, defaults to 300,000, 0 := no timeout)
* `resolveTimeoutInMilliseconds` `` Maximum time for a DNS query to resolve (optional, defaults to 3,000, 0 := no timeout)
* `connectionTimeoutInMilliseconds` `` Maximum time for a socket to connect (optional, defaults to 3,000, 0 := no timeout)
* `readTimeoutInMilliseconds` `` Maxium time to read the response body (optional, defaults to 300,000, 0 := no timeout)
* `dataTimeoutInMilliseconds` `` Maxium time between two data events while reading the response body (optional, defaults to 3,000, 0 := no timeout)
* `writeTimeoutInMilliseconds` `` Maxium time to write the request body (optional, defaults to 300,000, 0 := no timeout)
* `region` `` (optional, defaults to [see AWS credentials & region](#aws-region))
* `v2AwsSdkCredentials` `` (optional)
* `endpointHostname` `` (optional, defaults to ${bucket}.s3.${region}.amazonaws.com or s3.${region}.amazonaws.com if the bucket contains a dot)
* `agent` `` (optional)
* Returns:
* `meta(cb)` `` Get meta-data before starting the download (downloads the first part and keeps the body in memory until download starts)
* `cb(err, metadata)` ``
* `err` ``
* `metadata` ``
* `lengthInBytes` ``
* `parts` `` Number of parts available (optional)
* `readStream()` `` Start download
* Returns: [ReadStream](https://nodejs.org/api/stream.html#class-streamreadable)
* `file(path, cb)` `` Start download
* `path` ``
* `cb(err)` ``
* `err` ``
* `abort([err])` `` Abort download
* `err` ``
* `partsDownloading()` `` Number of parts downloading at the moment
* Returns ``
* `addListener(eventName, listener)` See https://nodejs.org/api/events.html#emitteraddlistenereventname-listener
* `off(eventName, listener)` See https://nodejs.org/api/events.html#emitteroffeventname-listener
* `on(eventName, listener)` See https://nodejs.org/api/events.html#emitteroneventname-listener
* `once(eventName, listener)` See https://nodejs.org/api/events.html#emitteronceeventname-listener
* `removeListener(eventName, listener)` See https://nodejs.org/api/events.html#emitterremovelistenereventname-listener

## AWS credentials

AWS credentials are fetched in the following order:

1. `options.v2AwsSdkCredentials`
2. Environment variables
* `AWS_ACCESS_KEY_ID`
* `AWS_SECRET_ACCESS_KEY`
* `AWS_SESSION_TOKEN` (optional)
3. IMDSv2

## AWS region

AWS region is fetched in the following order:

1. `options.region`
2. Environment variable `AWS_REGION`
3. IMDSv2

## Considerations

* Typical sizes `partSizeInMegabytes` are 8 MB or 16 MB. If objects are uploaded using a multipart upload, it’s a good practice to download them in the same part sizes ( do not specify `partSizeInMegabytes`), or at least aligned to part boundaries, for best performance (see https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html).
* Keep in mind that you pay per GET request to Amazon S3. The smaller the parts, the more expensive a download is.