https://github.com/auth0/node-s3-client
high level amazon s3 client for node.js
https://github.com/auth0/node-s3-client
Last synced: 3 months ago
JSON representation
high level amazon s3 client for node.js
- Host: GitHub
- URL: https://github.com/auth0/node-s3-client
- Owner: auth0
- License: mit
- Fork: true (andrewrk/node-s3-client)
- Created: 2018-11-01T21:09:42.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-09-24T13:31:49.000Z (over 5 years ago)
- Last Synced: 2024-12-27T10:17:19.030Z (about 1 year ago)
- Language: JavaScript
- Homepage:
- Size: 255 KB
- Stars: 41
- Watchers: 5
- Forks: 25
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# High Level Amazon S3 Client
Fork from https://github.com/andrewrk/node-s3-client
## Installation
`npm install @auth0/s3 --save`
## Features
* Automatically retry a configurable number of times when S3 returns an error.
* Includes logic to make multiple requests when there is a 1000 object limit.
* Ability to set a limit on the maximum parallelization of S3 requests.
Retries get pushed to the end of the parallelization queue.
* Ability to sync a dir to and from S3.
* Progress reporting.
* Supports files of any size (up to S3's maximum 5 TB object size limit).
* Uploads large files quickly using parallel multipart uploads.
* Uses heuristics to compute multipart ETags client-side to avoid uploading
or downloading files unnecessarily.
* Automatically provide Content-Type for uploads based on file extension.
* Support third-party S3-compatible platform services like Ceph
See also the companion CLI tool which is meant to be a drop-in replacement for
s3cmd: [s3-cli](https://github.com/andrewrk/node-s3-cli).
## Synopsis
### Create a client
```js
var s3 = require('s3');
var client = s3.createClient({
maxAsyncS3: 20, // this is the default
s3RetryCount: 3, // this is the default
s3RetryDelay: 1000, // this is the default
multipartUploadThreshold: 20971520, // this is the default (20 MB)
multipartUploadSize: 15728640, // this is the default (15 MB)
s3Options: {
accessKeyId: "your s3 key",
secretAccessKey: "your s3 secret",
region: "your region",
// endpoint: 's3.yourdomain.com',
// sslEnabled: false
// any other options are passed to new AWS.S3()
// See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#constructor-property
},
});
```
### Create a client from existing AWS.S3 object
```js
var s3 = require('s3');
var awsS3Client = new AWS.S3(s3Options);
var options = {
s3Client: awsS3Client,
// more options available. See API docs below.
};
var client = s3.createClient(options);
```
### Upload a file to S3
```js
var params = {
localFile: "some/local/file",
s3Params: {
Bucket: "s3 bucket name",
Key: "some/remote/file",
// other options supported by putObject, except Body and ContentLength.
// See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
},
};
var uploader = client.uploadFile(params);
uploader.on('error', function(err) {
console.error("unable to upload:", err.stack);
});
uploader.on('progress', function() {
console.log("progress", uploader.progressMd5Amount,
uploader.progressAmount, uploader.progressTotal);
});
uploader.on('end', function() {
console.log("done uploading");
});
```
### Download a file from S3
```js
var params = {
localFile: "some/local/file",
s3Params: {
Bucket: "s3 bucket name",
Key: "some/remote/file",
// other options supported by getObject
// See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
},
};
var downloader = client.downloadFile(params);
downloader.on('error', function(err) {
console.error("unable to download:", err.stack);
});
downloader.on('progress', function() {
console.log("progress", downloader.progressAmount, downloader.progressTotal);
});
downloader.on('end', function() {
console.log("done downloading");
});
```
### Sync a directory to S3
```js
var params = {
localDir: "some/local/dir",
deleteRemoved: true, // default false, whether to remove s3 objects
// that have no corresponding local file.
s3Params: {
Bucket: "s3 bucket name",
Prefix: "some/remote/dir/",
// other options supported by putObject, except Body and ContentLength.
// See: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
},
};
var uploader = client.uploadDir(params);
uploader.on('error', function(err) {
console.error("unable to sync:", err.stack);
});
uploader.on('progress', function() {
console.log("progress", uploader.progressAmount, uploader.progressTotal);
});
uploader.on('end', function() {
console.log("done uploading");
});
```
## Tips
* Consider increasing the socket pool size in the `http` and `https` global
agents. This will improve bandwidth when using `uploadDir` and `downloadDir`
functions. For example:
```js
http.globalAgent.maxSockets = https.globalAgent.maxSockets = 20;
```
## API Documentation
### s3.AWS
This contains a reference to the aws-sdk module. It is a valid use case to use
both this module and the lower level aws-sdk module in tandem.
### s3.createClient(options)
Creates an S3 client.
`options`:
* `s3Client` - optional, an instance of `AWS.S3`. Leave blank if you provide `s3Options`.
* `s3Options` - optional. leave blank if you provide `s3Client`.
- See AWS SDK documentation for available options which are passed to `new AWS.S3()`:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#constructor-property
* `maxAsyncS3` - maximum number of simultaneous requests this client will
ever have open to S3. defaults to `20`.
* `s3RetryCount` - how many times to try an S3 operation before giving up.
Default 3.
* `s3RetryDelay` - how many milliseconds to wait before retrying an S3
operation. Default 1000.
* `multipartUploadThreshold` - if a file is this many bytes or greater, it
will be uploaded via a multipart request. Default is 20MB. Minimum is 5MB.
Maximum is 5GB.
* `multipartUploadSize` - when uploading via multipart, this is the part size.
The minimum size is 5MB. The maximum size is 5GB. Default is 15MB. Note that
S3 has a maximum of 10000 parts for a multipart upload, so if this value is
too small, it will be ignored in favor of the minimum necessary value
required to upload the file.
### s3.getPublicUrl(bucket, key, [bucketLocation])
* `bucket` S3 bucket
* `key` S3 key
* `bucketLocation` string, one of these:
- "" (default) - US Standard
- "eu-west-1"
- "us-west-1"
- "us-west-2"
- "ap-southeast-1"
- "ap-southeast-2"
- "ap-northeast-1"
- "sa-east-1"
You can find out your bucket location programatically by using this API:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getBucketLocation-property
returns a string which looks like this:
`https://s3.amazonaws.com/bucket/key`
or maybe this if you are not in US Standard:
`https://s3-eu-west-1.amazonaws.com/bucket/key`
### s3.getPublicUrlHttp(bucket, key)
* `bucket` S3 Bucket
* `key` S3 Key
Works for any region, and returns a string which looks like this:
`http://bucket.s3.amazonaws.com/key`
### client.uploadFile(params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
`params`:
* `s3Params`: params to pass to AWS SDK `putObject`.
* `localFile`: path to the file on disk you want to upload to S3.
* (optional) `defaultContentType`: Unless you explicitly set the `ContentType`
parameter in `s3Params`, it will be automatically set for you based on the
file extension of `localFile`. If the extension is unrecognized,
`defaultContentType` will be used instead. Defaults to
`application/octet-stream`.
The difference between using AWS SDK `putObject` and this one:
* This works with files, not streams or buffers.
* If the reported MD5 upon upload completion does not match, it retries.
* If the file size is large enough, uses multipart upload to upload parts in
parallel.
* Retry based on the client's retry settings.
* Progress reporting.
* Sets the `ContentType` based on file extension if you do not provide it.
Returns an `EventEmitter` with these properties:
* `progressMd5Amount`
* `progressAmount`
* `progressTotal`
And these events:
* `'error' (err)`
* `'end' (data)` - emitted when the file is uploaded successfully
- `data` is the same object that you get from `putObject` in AWS SDK
* `'progress'` - emitted when `progressMd5Amount`, `progressAmount`, and
`progressTotal` properties change. Note that it is possible for progress to
go backwards when an upload fails and must be retried.
* `'fileOpened' (fdSlicer)` - emitted when `localFile` has been opened. The file
is opened with the [fd-slicer](https://github.com/andrewrk/node-fd-slicer)
module because we might need to read from multiple locations in the file at
the same time. `fdSlicer` is an object for which you can call
`createReadStream(options)`. See the fd-slicer README for more information.
* `'fileClosed'` - emitted when `localFile` has been closed.
And these methods:
* `abort()` - call this to stop the find operation.
### client.downloadFile(params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
`params`:
* `localFile` - the destination path on disk to write the s3 object into
* `s3Params`: params to pass to AWS SDK `getObject`.
The difference between using AWS SDK `getObject` and this one:
* This works with a destination file, not a stream or a buffer.
* If the reported MD5 upon download completion does not match, it retries.
* Retry based on the client's retry settings.
* Progress reporting.
Returns an `EventEmitter` with these properties:
* `progressAmount`
* `progressTotal`
And these events:
* `'error' (err)`
* `'end'` - emitted when the file is downloaded successfully
* `'progress'` - emitted when `progressAmount` and `progressTotal`
properties change.
### client.downloadBuffer(s3Params)
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
* `s3Params`: params to pass to AWS SDK `getObject`.
The difference between using AWS SDK `getObject` and this one:
* This works with a buffer only.
* If the reported MD5 upon download completion does not match, it retries.
* Retry based on the client's retry settings.
* Progress reporting.
Returns an `EventEmitter` with these properties:
* `progressAmount`
* `progressTotal`
And these events:
* `'error' (err)`
* `'end' (buffer)` - emitted when the file is downloaded successfully.
`buffer` is a `Buffer` containing the object data.
* `'progress'` - emitted when `progressAmount` and `progressTotal`
properties change.
### client.downloadStream(s3Params)
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property
* `s3Params`: params to pass to AWS SDK `getObject`.
The difference between using AWS SDK `getObject` and this one:
* This works with a stream only.
If you want retries, progress, or MD5 checking, you must code it yourself.
Returns a `ReadableStream` with these additional events:
* `'httpHeaders' (statusCode, headers)` - contains the HTTP response
headers and status code.
### client.listObjects(params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property
`params`:
* `s3Params` - params to pass to AWS SDK `listObjects`.
* (optional) `recursive` - `true` or `false` whether or not you want to recurse
into directories. Default `false`.
Note that if you set `Delimiter` in `s3Params` then you will get a list of
objects and folders in the directory you specify. You probably do not want to
set `recursive` to `true` at the same time as specifying a `Delimiter` because
this will cause a request per directory. If you want all objects that share a
prefix, leave the `Delimiter` option `null` or `undefined`.
Be sure that `s3Params.Prefix` ends with a trailing slash (`/`) unless you
are requesting the top-level listing, in which case `s3Params.Prefix` should
be empty string.
The difference between using AWS SDK `listObjects` and this one:
* Retries based on the client's retry settings.
* Supports recursive directory listing.
* Makes multiple requests if the number of objects to list is greater than 1000.
Returns an `EventEmitter` with these properties:
* `progressAmount`
* `objectsFound`
* `dirsFound`
And these events:
* `'error' (err)`
* `'end'` - emitted when done listing and no more 'data' events will be emitted.
* `'data' (data)` - emitted when a batch of objects are found. This is
the same as the `data` object in AWS SDK.
* `'progress'` - emitted when `progressAmount`, `objectsFound`, and
`dirsFound` properties change.
And these methods:
* `abort()` - call this to stop the find operation.
### client.deleteObjects(s3Params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#deleteObjects-property
`s3Params` are the same.
The difference between using AWS SDK `deleteObjects` and this one:
* Retry based on the client's retry settings.
* Make multiple requests if the number of objects you want to delete is
greater than 1000.
Returns an `EventEmitter` with these properties:
* `progressAmount`
* `progressTotal`
And these events:
* `'error' (err)`
* `'end'` - emitted when all objects are deleted.
* `'progress'` - emitted when the `progressAmount` or `progressTotal` properties change.
* `'data' (data)` - emitted when a request completes. There may be more.
### client.uploadDir(params)
Syncs an entire directory to S3.
`params`:
* `localDir` - source path on local file system to sync to S3
* `s3Params`
- `Prefix` (required)
- `Bucket` (required)
* (optional) `deleteRemoved` - delete s3 objects with no corresponding local file.
default false
* (optional) `getS3Params` - function which will be called for every file that
needs to be uploaded. You can use this to skip some files. See below.
* (optional) `defaultContentType`: Unless you explicitly set the `ContentType`
parameter in `s3Params`, it will be automatically set for you based on the
file extension of `localFile`. If the extension is unrecognized,
`defaultContentType` will be used instead. Defaults to
`application/octet-stream`.
* (optional) `followSymlinks` - Set this to `false` to ignore symlinks.
Defaults to `true`.
```js
function getS3Params(localFile, stat, callback) {
// call callback like this:
var err = new Error(...); // only if there is an error
var s3Params = { // if there is no error
ContentType: getMimeType(localFile), // just an example
};
// pass `null` for `s3Params` if you want to skip uploading this file.
callback(err, s3Params);
}
```
Returns an `EventEmitter` with these properties:
* `progressAmount`
* `progressTotal`
* `progressMd5Amount`
* `progressMd5Total`
* `deleteAmount`
* `deleteTotal`
* `filesFound`
* `objectsFound`
* `doneFindingFiles`
* `doneFindingObjects`
* `doneMd5`
And these events:
* `'error' (err)`
* `'end'` - emitted when all files are uploaded
* `'progress'` - emitted when any of the above progress properties change.
* `'fileUploadStart' (localFilePath, s3Key)` - emitted when a file begins
uploading.
* `'fileUploadEnd' (localFilePath, s3Key)` - emitted when a file successfully
finishes uploading.
`uploadDir` works like this:
0. Start listing all S3 objects for the target `Prefix`. S3 guarantees
returned objects to be in sorted order.
0. Meanwhile, recursively find all files in `localDir`.
0. Once all local files are found, we sort them (the same way that S3 sorts).
0. Next we iterate over the sorted local file list one at a time, computing
MD5 sums.
0. Now S3 object listing and MD5 sum computing are happening in parallel. As
each operation progresses we compare both sorted lists side-by-side,
iterating over them one at a time, uploading files whose MD5 sums don't
match the remote object (or the remote object is missing), and, if
`deleteRemoved` is set, deleting remote objects whose corresponding local
files are missing.
### client.downloadDir(params)
Syncs an entire directory from S3.
`params`:
* `localDir` - destination directory on local file system to sync to
* `s3Params`
- `Prefix` (required)
- `Bucket` (required)
* (optional) `deleteRemoved` - delete local files with no corresponding s3 object. default `false`
* (optional) `getS3Params` - function which will be called for every object that
needs to be downloaded. You can use this to skip downloading some objects.
See below.
* (optional) `followSymlinks` - Set this to `false` to ignore symlinks.
Defaults to `true`.
```js
function getS3Params(localFile, s3Object, callback) {
// localFile is the destination path where the object will be written to
// s3Object is same as one element in the `Contents` array from here:
// http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property
// call callback like this:
var err = new Error(...); // only if there is an error
var s3Params = { // if there is no error
VersionId: "abcd", // just an example
};
// pass `null` for `s3Params` if you want to skip downloading this object.
callback(err, s3Params);
}
```
Returns an `EventEmitter` with these properties:
* `progressAmount`
* `progressTotal`
* `progressMd5Amount`
* `progressMd5Total`
* `deleteAmount`
* `deleteTotal`
* `filesFound`
* `objectsFound`
* `doneFindingFiles`
* `doneFindingObjects`
* `doneMd5`
And these events:
* `'error' (err)`
* `'end'` - emitted when all files are downloaded
* `'progress'` - emitted when any of the progress properties above change
* `'fileDownloadStart' (localFilePath, s3Key)` - emitted when a file begins
downloading.
* `'fileDownloadEnd' (localFilePath, s3Key)` - emitted when a file successfully
finishes downloading.
`downloadDir` works like this:
0. Start listing all S3 objects for the target `Prefix`. S3 guarantees
returned objects to be in sorted order.
0. Meanwhile, recursively find all files in `localDir`.
0. Once all local files are found, we sort them (the same way that S3 sorts).
0. Next we iterate over the sorted local file list one at a time, computing
MD5 sums.
0. Now S3 object listing and MD5 sum computing are happening in parallel. As
each operation progresses we compare both sorted lists side-by-side,
iterating over them one at a time, downloading objects whose MD5 sums don't
match the local file (or the local file is missing), and, if
`deleteRemoved` is set, deleting local files whose corresponding objects
are missing.
### client.deleteDir(s3Params)
Deletes an entire directory on S3.
`s3Params`:
* `Bucket`
* `Prefix`
* (optional) `MFA`
Returns an `EventEmitter` with these properties:
* `progressAmount`
* `progressTotal`
And these events:
* `'error' (err)`
* `'end'` - emitted when all objects are deleted.
* `'progress'` - emitted when the `progressAmount` or `progressTotal` properties change.
`deleteDir` works like this:
0. Start listing all objects in a bucket recursively. S3 returns 1000 objects
per response.
0. For each response that comes back with a list of objects in the bucket,
immediately send a delete request for all of them.
### client.copyObject(s3Params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#copyObject-property
`s3Params` are the same. Don't forget that `CopySource` must contain the
source bucket name as well as the source key name.
The difference between using AWS SDK `copyObject` and this one:
* Retry based on the client's retry settings.
Returns an `EventEmitter` with these events:
* `'error' (err)`
* `'end' (data)`
### client.moveObject(s3Params)
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#copyObject-property
`s3Params` are the same. Don't forget that `CopySource` must contain the
source bucket name as well as the source key name.
Under the hood, this uses `copyObject` and then `deleteObjects` only if the
copy succeeded.
Returns an `EventEmitter` with these events:
* `'error' (err)`
* `'copySuccess' (data)`
* `'end' (data)`
## Examples
### Check if a file exists in S3
Using the AWS SDK, you can send a HEAD request, which will tell you if a file exists at `Key`.
See http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#headObject-property
```js
var client = require('s3').createClient({ /* options */ });
client.s3.headObject({
Bucket: 's3 bucket name',
Key: 'some/remote/file'
}, function(err, data) {
if (err) {
// file does not exist (err.statusCode == 404)
return;
}
// file exists
});
```
## Testing
`aws-vault exec -- S3_KEY= npm test`
Tests upload and download large amounts of data to and from S3. The test
timeout is set to 40 seconds because Internet connectivity waries wildly.