https://github.com/widdix/s3-getobject-accelerator
Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.
https://github.com/widdix/s3-getobject-accelerator
aws aws-nodejs aws-s3
Last synced: 11 months ago
JSON representation
Get large objects from S3 by using parallel byte-rangefetches/parts to improve performance.
- Host: GitHub
- URL: https://github.com/widdix/s3-getobject-accelerator
- Owner: widdix
- License: mit
- Created: 2023-03-21T07:58:41.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-06-03T18:03:44.000Z (about 1 year ago)
- Last Synced: 2025-06-24T09:07:33.438Z (about 1 year ago)
- Topics: aws, aws-nodejs, aws-s3
- Language: JavaScript
- Homepage:
- Size: 265 KB
- Stars: 17
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# S3 GetObject Accelerator
Get large objects from S3 by using parallel byte-range fetches/parts without the AWS SDK to improve performance.
> We measured a troughoput of 6.5 Gbit/s on an m5zn.6xlarge in eu-west-1 using this lib with this settings: `{concurrency: 64}`.
## Installation
```bash
npm i s3-getobject-accelerator
```
## Examples
### Compact
```js
const {createWriteStream} = require('node:fs');
const {pipeline} = require('node:stream');
const {download} = require('s3-getobject-accelerator');
pipeline(
download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4}).readStream(),
createWriteStream('/tmp/test'),
(err) => {
if (err) {
console.error('something went wrong', err);
} else {
console.log('done');
}
}
);
```
### More verbose
Get insights into the part downloads and write to file directly without stream if it is smaller than 1 TiB:
```js
const {download} = require('s3-getobject-accelerator');
const d = download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4});
d.on('part:downloading', ({partNo}) => {
console.log('start downloading part', partNo);
});
d.on('part:downloaded', ({partNo}) => {
console.log('part downloaded, write to disk next in correct order', partNo);
});
d.on('part:writing', ({partNo}) => {
console.log('start writing part to disk', partNo);
});
d.on('part:done', ({partNo}) => {
console.log('part written to disk', partNo);
});
d.meta((err, metadata) => {
if (err) {
console.error('something went wrong', err);
} else {
if (metadata.lengthInBytes > 1024 * 1024 * 1024 * 1024) {
console.error('file is larger than 1 TiB');
} else {
d.file('/tmp/test', (err) => {
if (err) {
console.error('something went wrong', err);
} else {
console.log('done');
}
});
}
}
});
```
## API
### download(s3source, options)
* `s3source` ``
* `bucket` ``
* `key` ``
* `version` `` (optional)
* `options` ``
* `partSizeInMegabytes` `` (optional, defaults to uploaded part size)
* `concurrency` ``
* `requestTimeoutInMilliseconds` `` Maxium time for a request to complete from start to finish (optional, defaults to 300,000, 0 := no timeout)
* `resolveTimeoutInMilliseconds` `` Maximum time for a DNS query to resolve (optional, defaults to 3,000, 0 := no timeout)
* `connectionTimeoutInMilliseconds` `` Maximum time for a socket to connect (optional, defaults to 3,000, 0 := no timeout)
* `readTimeoutInMilliseconds` `` Maxium time to read the response body (optional, defaults to 300,000, 0 := no timeout)
* `dataTimeoutInMilliseconds` `` Maxium time between two data events while reading the response body (optional, defaults to 3,000, 0 := no timeout)
* `writeTimeoutInMilliseconds` `` Maxium time to write the request body (optional, defaults to 300,000, 0 := no timeout)
* `region` `` (optional, defaults to [see AWS credentials & region](#aws-region))
* `v2AwsSdkCredentials` `` (optional)
* `endpointHostname` `` (optional, defaults to ${bucket}.s3.${region}.amazonaws.com or s3.${region}.amazonaws.com if the bucket contains a dot)
* `agent` `` (optional)
* Returns:
* `meta(cb)` `` Get meta-data before starting the download (downloads the first part and keeps the body in memory until download starts)
* `cb(err, metadata)` ``
* `err` ``
* `metadata` ``
* `lengthInBytes` ``
* `parts` `` Number of parts available (optional)
* `readStream()` `` Start download
* Returns: [ReadStream](https://nodejs.org/api/stream.html#class-streamreadable)
* `file(path, cb)` `` Start download
* `path` ``
* `cb(err)` ``
* `err` ``
* `abort([err])` `` Abort download
* `err` ``
* `partsDownloading()` `` Number of parts downloading at the moment
* Returns ``
* `addListener(eventName, listener)` See https://nodejs.org/api/events.html#emitteraddlistenereventname-listener
* `off(eventName, listener)` See https://nodejs.org/api/events.html#emitteroffeventname-listener
* `on(eventName, listener)` See https://nodejs.org/api/events.html#emitteroneventname-listener
* `once(eventName, listener)` See https://nodejs.org/api/events.html#emitteronceeventname-listener
* `removeListener(eventName, listener)` See https://nodejs.org/api/events.html#emitterremovelistenereventname-listener
## AWS credentials
AWS credentials are fetched in the following order:
1. `options.v2AwsSdkCredentials`
2. Environment variables
* `AWS_ACCESS_KEY_ID`
* `AWS_SECRET_ACCESS_KEY`
* `AWS_SESSION_TOKEN` (optional)
3. IMDSv2
## AWS region
AWS region is fetched in the following order:
1. `options.region`
2. Environment variable `AWS_REGION`
3. IMDSv2
## Considerations
* Typical sizes `partSizeInMegabytes` are 8 MB or 16 MB. If objects are uploaded using a multipart upload, it’s a good practice to download them in the same part sizes ( do not specify `partSizeInMegabytes`), or at least aligned to part boundaries, for best performance (see https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html).
* Keep in mind that you pay per GET request to Amazon S3. The smaller the parts, the more expensive a download is.