{"id":13526642,"url":"https://github.com/peak/s5cmd","last_synced_at":"2025-05-12T07:50:07.224Z","repository":{"id":38523338,"uuid":"73909333","full_name":"peak/s5cmd","owner":"peak","description":"Parallel S3 and local filesystem execution tool.","archived":false,"fork":false,"pushed_at":"2025-01-17T07:10:56.000Z","size":27330,"stargazers_count":3037,"open_issues_count":131,"forks_count":261,"subscribers_count":32,"default_branch":"master","last_synced_at":"2025-05-12T05:42:37.641Z","etag":null,"topics":["aws","cli","filesystem","go","s3","s5cmd","storage"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/peak.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-11-16T10:31:15.000Z","updated_at":"2025-05-11T12:30:51.000Z","dependencies_parsed_at":"2023-02-18T07:31:15.172Z","dependency_job_id":"b6e0c0af-44aa-4823-a41e-110b89f8efb9","html_url":"https://github.com/peak/s5cmd","commit_stats":{"total_commits":526,"total_committers":47,"mean_commits":"11.191489361702128","dds":0.6406844106463878,"last_synced_commit":"c1c7ee35acfc16aaf4a2b060129aa60b75bb5ef5"},"previous_names":["peakgames/s5cmd"],"tags_count":31,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peak%2Fs5cmd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peak%2Fs5cmd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peak%2Fs5cmd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peak%2Fs5cmd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/peak","download_url":"https://codeload.github.com/peak/s5cmd/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253692207,"owners_count":21948312,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","cli","filesystem","go","s3","s5cmd","storage"],"created_at":"2024-08-01T06:01:32.538Z","updated_at":"2025-05-12T07:50:07.199Z","avatar_url":"https://github.com/peak.png","language":"Go","readme":"[![Go Report](https://goreportcard.com/badge/github.com/peak/s5cmd/v2)](https://goreportcard.com/report/github.com/peak/s5cmd/v2) ![Github Actions Status](https://github.com/peak/s5cmd/actions/workflows/ci.yml/badge.svg)\n\n![](./doc/s5cmd_header.jpg)\n\n\n## Overview\n`s5cmd` is a very fast S3 and local filesystem execution tool. It comes with support\nfor a multitude of operations including tab completion and wildcard support\nfor files, which can be very handy for your object storage workflow while working\nwith large number of files.\n\nThere are already other utilities to work with S3 and similar object storage\nservices, thus it is natural to wonder what `s5cmd` has to offer that others don't.\n\nIn short, *`s5cmd` offers a very fast speed.*\nThanks to [Joshua Robinson](https://github.com/joshuarobinson) for his\nstudy and experimentation on `s5cmd;` to quote his medium [post](https://medium.com/@joshua_robinson/s5cmd-for-high-performance-object-storage-7071352cc09d):\n\u003e For uploads, s5cmd is 32x faster than s3cmd and 12x faster than aws-cli.\n\u003eFor downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s), whereas s3cmd\n\u003eand aws-cli can only reach 85 MB/s and 375 MB/s respectively.\n\nIf you would like to know more about performance of `s5cmd` and the\nreasons for its fast speed, refer to [benchmarks](./README.md#Benchmarks) section\n## Features\n![](./doc/usage.png)\n\n`s5cmd` supports wide range of object management tasks both for cloud\nstorage services and local filesystems.\n\n- List buckets and objects\n- Upload, download or delete objects\n- Move, copy or rename objects\n- Set Server Side Encryption using AWS Key Management Service (KMS)\n- Set Access Control List (ACL) for objects/files on the upload, copy, move.\n- Print object contents to stdout\n- Select JSON records from objects using SQL expressions\n- Create or remove buckets\n- Summarize objects sizes, grouping by storage class\n- Wildcard support for all operations\n- Multiple arguments support for delete operation\n- Command file support to run commands in batches at very high execution speeds\n- Dry run support\n- [S3 Transfer Acceleration](https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html) support\n- Google Cloud Storage (and any other S3 API compatible service) support\n- Structured logging for querying command outputs\n- Shell auto-completion\n- S3 ListObjects API backward compatibility\n\n## Installation\n\n### Official Releases\n\n#### Binaries\n\nThe [Releases](https://github.com/peak/s5cmd/releases) page provides pre-built\nbinaries for Linux, macOS and Windows.\n\n#### Homebrew\n\nFor macOS, a [homebrew](https://brew.sh) tap is provided:\n\n    brew install peak/tap/s5cmd\n\n### Unofficial Releases (by Community)\n[![Packaging status](https://repology.org/badge/tiny-repos/s5cmd.svg)](https://repology.org/project/s5cmd/versions)\n\u003e **Warning**\n\u003e These releases are maintained by the community. They might be out of date compared to the official releases.\n\n#### MacPorts\nYou can also install `s5cmd` from [MacPorts](https://ports.macports.org/port/s5cmd/summary) on macOS:\n\n    sudo port selfupdate\n    sudo port install s5cmd\n\n#### Conda\n`s5cmd` is [included](https://anaconda.org/conda-forge/s5cmd ) in the [conda-forge]( https://conda-forge.org ) channel, and it can be downloaded through the [Conda](https://docs.conda.io/).\n\n\u003e Installing `s5cmd` from the `conda-forge` channel can be achieved by adding `conda-forge` to your channels with:\n\u003e ```\n\u003e conda config --add channels conda-forge\n\u003e conda config --set channel_priority strict\n\u003e ```\n\u003e\n\u003e Once the `conda-forge` channel has been enabled, `s5cmd` can be installed with `conda`:\n\u003e\n\u003e ```\n\u003e conda install s5cmd\n\u003e ```\nps.  Quoted from [s5cmd feedstock](https://github.com/conda-forge/s5cmd-feedstock). You can also find further instructions on its [README](https://github.com/conda-forge/s5cmd-feedstock/blob/main/README.md).\n\n#### FreeBSD\n\nOn FreeBSD you can install s5cmd as a package:\n\n```\npkg install s5cmd\n```\n\nor via ports:\n\n```\ncd /usr/ports/net/s5cmd\nmake install clean\n```\n\n### Build from source\n\nYou can build `s5cmd` from source if you have [Go](https://golang.org/dl/) 1.19+\ninstalled.\n\n    go install github.com/peak/s5cmd/v2@master\n\n⚠️ Please note that building from `master` is not guaranteed to be stable since\ndevelopment happens on `master` branch.\n\n### Docker\n\n#### Hub\n    $ docker pull peakcom/s5cmd\n    $ docker run --rm -v ~/.aws:/root/.aws peakcom/s5cmd \u003cS3 operation\u003e\n\nℹ️ `/aws` directory is the working directory of the image. Mounting your current working directory to it allows you to run `s5cmd` as if it was installed in your system;\n\n    docker run --rm -v $(pwd):/aws -v ~/.aws:/root/.aws peakcom/s5cmd \u003cS3 operation\u003e\n\n#### Build\n    $ git clone https://github.com/peak/s5cmd \u0026\u0026 cd s5cmd\n    $ docker build -t s5cmd .\n    $ docker run --rm -v ~/.aws:/root/.aws s5cmd \u003cS3 operation\u003e\n\n## Usage\n\n`s5cmd` supports multiple-level wildcards for all S3 operations. This is\nachieved by listing all S3 objects with the prefix up to the first wildcard,\nthen filtering the results in-memory. For example, for the following command;\n\n    s5cmd cp 's3://bucket/logs/2020/03/*' .\n\nfirst a `ListObjects` request is send, then the copy operation will be executed\nagainst each matching object, in parallel.\n\n\n### Specifying credentials\n\n`s5cmd` uses official AWS SDK to access S3. SDK requires credentials to sign\nrequests to AWS. Credentials can be provided in a [variety of ways](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html):\n\n- Command line options `--profile` to use a named profile, `--credentials-file` flag to use the specified credentials file\n\n    ```sh\n    # Use your company profile in AWS default credential file\n    s5cmd --profile my-work-profile ls s3://my-company-bucket/\n\n    # Use your company profile in your own credential file\n    s5cmd --credentials-file ~/.your-credentials-file --profile my-work-profile ls s3://my-company-bucket/\n    ```\n\n- Environment variables\n\n    ```sh\n    # Export your AWS access key and secret pair\n    export AWS_ACCESS_KEY_ID='\u003cyour-access-key-id\u003e'\n    export AWS_SECRET_ACCESS_KEY='\u003cyour-secret-access-key\u003e'\n    export AWS_PROFILE='\u003cyour-profile-name\u003e'\n    export AWS_REGION='\u003cyour-bucket-region\u003e'\n\n    s5cmd ls s3://your-bucket/\n    ```\n\n- If `s5cmd` runs on an Amazon EC2 instance, EC2 IAM role\n- If `s5cmd` runs on EKS, Kube IAM role\n- Or, you can send requests anonymously with `--no-sign-request` option\n\n    ```sh\n    # List objects in a public bucket\n    s5cmd --no-sign-request ls s3://public-bucket/\n    ```\n\n### Region detection\n\nWhile executing the commands, `s5cmd` detects the region according to the following order of priority:\n\n1. `--source-region` or `--destination-region` flags of `cp` command.\n2. `AWS_REGION` environment variable.\n3. Region section of AWS profile.\n4. Auto detection from bucket region (via `HeadBucket` API call).\n5. `us-east-1` as default region.\n\n### Examples\n\n#### Check if a bucket exists\n\n    s5cmd head s3://bucket/\n\n#### Print a remote object's metadata\n\n    s5cmd head s3://bucket/object.gz\n\n#### Download a single S3 object\n\n    s5cmd cp s3://bucket/object.gz .\n\n#### Download multiple S3 objects\n\nSuppose we have the following objects:\n```\ns3://bucket/logs/2020/03/18/file1.gz\ns3://bucket/logs/2020/03/19/file2.gz\ns3://bucket/logs/2020/03/19/originals/file3.gz\n```\n\n    s5cmd cp 's3://bucket/logs/2020/03/*' logs/\n\n\n`s5cmd` will match the given wildcards and arguments by doing an efficient\nsearch against the given prefixes. All matching objects will be downloaded in\nparallel. `s5cmd` will create the destination directory if it is missing.\n\n`logs/` directory content will look like:\n\n```\n$ tree\n.\n└── logs\n    ├── 18\n    │   └── file1.gz\n    └── 19\n        ├── file2.gz\n        └── originals\n            └── file3.gz\n\n4 directories, 3 files\n```\n\nℹ️ `s5cmd` preserves the source directory structure by default. If you want to\nflatten the source directory structure, use the `--flatten` flag.\n\n    s5cmd cp --flatten 's3://bucket/logs/2020/03/*' logs/\n\n`logs/` directory content will look like:\n\n```\n$ tree\n.\n└── logs\n    ├── file1.gz\n    ├── file2.gz\n    └── file3.gz\n\n1 directory, 3 files\n```\n\n#### Upload a file to S3\n\n    s5cmd cp object.gz s3://bucket/\n\n by setting server side encryption (*aws kms*) of the file:\n\n    s5cmd cp -sse aws:kms -sse-kms-key-id \u003cyour-kms-key-id\u003e object.gz s3://bucket/\n\n by setting Access Control List (*acl*) policy of the object:\n\n    s5cmd cp -acl bucket-owner-full-control object.gz s3://bucket/\n\n#### Upload multiple files to S3\n\n    s5cmd cp directory/ s3://bucket/\n\nWill upload all files at given directory to S3 while keeping the folder hierarchy\nof the source.\n\n#### Stream stdin to S3\nYou can upload remote objects by piping stdin to `s5cmd`:\n\n    curl https://github.com/peak/s5cmd/ | s5cmd pipe s3://bucket/s5cmd.html\n\nOr you can compress the data before uploading:\n\n    gzip -c file | s5cmd pipe s3://bucket/file.gz\n\n#### Delete an S3 object\n\n    s5cmd rm s3://bucket/logs/2020/03/18/file1.gz\n\n#### Delete multiple S3 objects\n\n    s5cmd rm 's3://bucket/logs/2020/03/19/*'\n\nWill remove all matching objects:\n\n```\ns3://bucket/logs/2020/03/19/file2.gz\ns3://bucket/logs/2020/03/19/originals/file3.gz\n```\n\n`s5cmd` utilizes S3 delete batch API. If matching objects are up to 1000,\nthey'll be deleted in a single request. However, it should be noted that commands such as\n\n    s5cmd rm s3://bucket-foo/object s3://bucket-bar/object\n\nare not supported by `s5cmd` and result in error (since we have 2 different buckets), as it is in odds with the benefit of performing batch delete requests. Thus, if in need, one can use `s5cmd run` mode for this case, i.e,\n\n    $ s5cmd run\n    rm s3://bucket-foo/object\n    rm s3://bucket-bar/object\n\nmore details and examples on `s5cmd run` are presented in a [later section](./README.md#L293).\n\n#### Copy objects from S3 to S3\n\n`s5cmd` supports copying objects on the server side as well.\n\n    s5cmd cp 's3://bucket/logs/2020/*' s3://bucket/logs/backup/\n\nWill copy all the matching objects to the given S3 prefix, respecting the source\nfolder hierarchy.\n\n⚠️ Copying objects (from S3 to S3) larger than 5GB is not supported yet. We have\nan [open ticket](https://github.com/peak/s5cmd/issues/29) to track the issue.\n\n#### Using Exclude and Include Filters\n`s5cmd` supports the `--exclude` and `--include` flags, which can be used to specify patterns for objects to be excluded or included in commands.\n\n- The `--exclude` flag specifies objects that should be excluded from the operation. Any object that matches the pattern will be skipped.\n- The `--include` flag specifies objects that should be included in the operation. Only objects that match the pattern will be handled.\n- If both flags are used, `--exclude` has precedence over `--include`. This means that if an object URL matches any of the `--exclude` patterns, the object will be skipped, even if it also matches one of the `--include` patterns.\n- The order of the flags does not affect the results (unlike `aws-cli`).\n\nThe command below will delete only objects that end with `.log`.\n\n    s5cmd rm --include \"*.log\" 's3://bucket/logs/2020/*'\n\nThe command below will delete all objects except those that end with `.log` or `.txt`.\n\n    s5cmd rm --exclude \"*.log\" --exclude \"*.txt\" 's3://bucket/logs/2020/*'\n\nIf you wish, you can use multiple flags, like below. It will download objects that start with `request` or end with `.log`.\n\n    s5cmd cp --include \"*.log\" --include \"request*\" 's3://bucket/logs/2020/*' .\n\nUsing a combination of `--include` and `--exclude` also possible. The command below will only sync objects that end with `.log` or `.txt` but exclude those that start with `access_`. For example, `request.log`, and `license.txt` will be included, while `access_log.txt`, and `readme.md` are excluded.\n\n    s5cmd sync --include \"*.log\" --exclude \"access_*\" --include \"*.txt\" 's3://bucket/logs/*' .\n#### Select JSON object content using SQL\n\n`s5cmd` supports the `SelectObjectContent` S3 operation, and will run your\n[SQL query](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference.html)\nagainst objects matching normal wildcard syntax and emit matching JSON records via stdout. Records\nfrom multiple objects will be interleaved, and order of the records is not guaranteed (though it's\nlikely that the records from a single object will arrive in-order, even if interleaved with other\nrecords).\n\n    $ s5cmd select --compression GZIP \\\n      --query \"SELECT s.timestamp, s.hostname FROM S3Object s WHERE s.ip_address LIKE '10.%' OR s.application='unprivileged'\" \\\n      s3://bucket-foo/object/2021/*\n    {\"timestamp\":\"2021-07-08T18:24:06.665Z\",\"hostname\":\"application.internal\"}\n    {\"timestamp\":\"2021-07-08T18:24:16.095Z\",\"hostname\":\"api.github.com\"}\n\nAt the moment this operation _only_ supports JSON records selected with SQL. S3 calls this\nlines-type JSON, but it seems that it works even if the records aren't line-delineated. YMMV.\n\n#### Count objects and determine total size\n\n    $ s5cmd du --humanize 's3://bucket/2020/*'\n\n    30.8M bytes in 3 objects: s3://bucket/2020/*\n\n#### Run multiple commands in parallel\n\nThe most powerful feature of `s5cmd` is the commands file. Thousands of S3 and\nfilesystem commands are declared in a file (or simply piped in from another\nprocess) and they are executed using multiple parallel workers. Since only one\nprogram is launched, thousands of unnecessary fork-exec calls are avoided. This\nway S3 execution times can reach a few thousand operations per second.\n\n    s5cmd run commands.txt\n\nor\n\n    cat commands.txt | s5cmd run\n\n`commands.txt` content could look like:\n\n```\ncp 's3://bucket/2020/03/*' logs/2020/03/\n\n# line comments are supported\nrm s3://bucket/2020/03/19/file2.gz\n\n# empty lines are OK too like above\n\n# rename an S3 object\nmv s3://bucket/2020/03/18/file1.gz s3://bucket/2020/03/18/original/file.gz\n```\n\n#### Sync\n`sync` command synchronizes S3 buckets, prefixes, directories and files between S3 buckets and prefixes as well.\nIt compares files between source and destination, taking source files as **source-of-truth**;\n\n* copies files those do not exist in destination\n* copies files those exist in both locations if the comparison made with sync strategy allows it so\n\nIt makes a one way synchronization from source to destination without modifying any of the source files and deleting any of the destination files (unless `--delete` flag has passed).\n\nSuppose we have following files;\n```\n   -  29 Sep 10:00 .\n5000  29 Sep 11:00 ├── favicon.ico\n 300  29 Sep 10:00 ├── index.html\n  50  29 Sep 10:00 ├── readme.md\n  80  29 Sep 11:30 └── styles.css\n```\n\n```\ns5cmd ls s3://bucket/static/\n2021/09/29 10:00:01               300 index.html\n2021/09/29 11:10:01                10 readme.md\n2021/09/29 10:00:01                90 styles.css\n2021/09/29 11:10:01                10 test.html\n```\nrunning would;\n* copy `favicon.ico`\n  * file does not exist in destination.\n* copy `styles.css`\n  * source file is newer than to remote counterpart.\n* copy `readme.md`\n  * even though the source one is older, it's size differs from the destination one; assuming source file is the source of truth.\n```\ns5cmd sync . s3://bucket/static/\n\ncp favicon.ico s3://bucket/static/favicon.ico\ncp styles.css s3://bucket/static/styles.css\ncp readme.md s3://bucket/static/readme.md\n```\n\nRunning with `--delete` flag would delete files those do not exist in the source;\n```\ns5cmd sync --delete . s3://bucket/static/\n\nrm s3://bucket/test.html\ncp favicon.ico s3://bucket/static/favicon.ico\ncp styles.css s3://bucket/static/styles.css\ncp readme.md s3://bucket/static/readme.md\n```\n\nIt's also possible to use wildcards to sync only a subset of files.\n\nTo sync only `.html` files in S3 bucket above to same local file system;\n\n```\ns5cmd sync 's3://bucket/static/*.html' .\n\ncp s3://bucket/prefix/index.html index.html\ncp s3://bucket/prefix/test.html test.html\n```\n\nWe don't support syncing between 2 storage endpoints out of the box. The current solution is to sync remote objects to your local disk first, then sync your local files to the target remote storage. For example, if you'd like to sync S3 and Google Cloud Storage:\n\n```\ns5cmd sync 's3://s3-bucket/path/*' download_folder/\n\ns5cmd --endpoint-url \u003cgcs-endpoint\u003e sync 'download_folder/*' s3://gcs-bucket/path/\n```\n\n##### Strategy\n###### Default\nBy default `s5cmd` compares files' both size **and** modification times, treating source files as **source of truth**. Any difference in size or modification time would cause `s5cmd` to copy source object to destination.\n\nmod time    |  size        |  should sync\n------------|--------------|-------------\nsrc \u003e dst   |  src != dst  |  ✅\nsrc \u003e dst   |  src == dst  |  ✅\nsrc \u003c= dst  |  src != dst  |  ✅\nsrc \u003c= dst  |  src == dst  |  ❌\n\n###### Size only\nWith `--size-only` flag, it's possible to use the strategy that would only compare file sizes. Source treated as **source of truth** and any difference in sizes would cause `s5cmd` to copy source object to destination.\n\nmod time   |  size        |  should sync\n-----------|--------------|-------------\nsrc \u003e dst  |  src != dst  |  ✅\nsrc \u003e dst  |  src = dst   |  ❌\nsrc \u003c= dst  |  src != dst  |  ✅\nsrc \u003c= dst  |  src == dst  |  ❌\n\n### Dry run\n`--dry-run` flag will output what operations will be performed without actually\ncarrying out those operations.\n\n    s3://bucket/pre/file1.gz\n    ...\n    s3://bucket/last.txt\n\nrunning\n\n    s5cmd --dry-run cp s3://bucket/pre/* s3://another-bucket/\n\nwill output\n\n    cp s3://bucket/pre/file1.gz s3://another-bucket/file1.gz\n    ...\n    cp s3://bucket/pre/last.txt s3://anohter-bucket/last.txt\n\nhowever, those copy operations will not be performed. It is displaying what\n`s5cmd` will do when ran without `--dry-run`\n\nNote that `--dry-run` can be used with any operation that has a side effect, i.e.,\ncp, mv, rm, mb ...\n\n### S3 ListObjects API Backward Compatibility\n\nThe `--use-list-objects-v1` flag will force using S3 ListObjectsV1 API. This\nflag is useful for services that do not support ListObjectsV2 API.\n\n```\ns5cmd --use-list-objects-v1 ls s3://bucket/\n```\n\n\n### Shell auto-completion\n\nShell completion is supported for bash, pwsh (PowerShell) and zsh.\n\nRun `s5cmd --install-completion` to obtain the appropriate auto-completion script for your shell, note that `install-completion` does not install the auto-completion but merely gives the instructions to install. The name is kept as it is for backward compatibility.\n\nTo actually enable auto-completion:\n####  in bash and zsh:\n you should add auto-completion script to `.bashrc` and `.zshrc` file.\n#### in pwsh:\nyou should save the autocompletion script to a file named `s5cmd.ps1` and add the full path of \"s5cmd.ps1\" file to profile file (which you can locate with `$profile`)\n\n\nFinally, restart your shell to activate the changes.\n\n\u003e **Note**\nThe environment variable `SHELL` must be accurate for the autocompletion to function properly. That is it should point to `bash` binary in bash, to `zsh` binary in zsh and to `pwsh` binary in PowerShell.\n\n\n\u003e **Note**\nThe autocompletion is tested with following versions of the shells: \\\n***zsh*** 5.8.1 (x86_64-apple-darwin21.0) \\\nGNU ***bash***, version 5.1.16(1)-release (x86_64-apple-darwin21.1.0) \\\n***PowerShell*** 7.2.6\n\n### Google Cloud Storage support\n\n`s5cmd` supports S3 API compatible services, such as GCS, Minio or your favorite\nobject storage.\n\n    s5cmd --endpoint-url https://storage.googleapis.com ls\n\nor an alternative with environment variable\n\n    S3_ENDPOINT_URL=\"https://storage.googleapis.com\" s5cmd ls\n\n    # or\n\n    export S3_ENDPOINT_URL=\"https://storage.googleapis.com\"\n    s5cmd ls\n\nall variants will return your GCS buckets.\n\n`s5cmd` reads `.aws/credentials` to access Google Cloud Storage. Populate the `aws_access_key_id` and `aws_secret_access_key` fields in `.aws/credentials` with an HMAC key created using this [procedure](https://cloud.google.com/storage/docs/authentication/managing-hmackeys#create).\n\n`s5cmd` will use virtual-host style bucket resolving for S3, S3 transfer\nacceleration and GCS. If a custom endpoint is provided, it'll fallback to\npath-style.\n\n### Retry logic\n\n`s5cmd` uses an exponential backoff retry mechanism for transient or potential\nserver-side throttling errors. Non-retriable errors, such as `invalid\ncredentials`, `authorization errors` etc, will not be retried. By default,\n`s5cmd` will retry 10 times for up to a minute. Number of retries are adjustable\nvia `--retry-count` flag.\n\nℹ️ Enable debug level logging for displaying retryable errors.\n\n### Integrity Verification\n`s5cmd` verifies the integrity of files uploaded to Amazon S3 by checking the `Content-MD5` and `X-Amz-Content-Sha256` headers. These headers are added by the AWS SDK for both standard and multipart uploads.\n\n* `Content-MD5` is a checksum of the file's contents, calculated using the `MD5` algorithm.\n* `X-Amz-Content-Sha256` is a checksum of the file's contents, calculated using the `SHA256` algorithm.\n\nIf the checksums in these headers do not match the checksum of the file that was actually uploaded, then `s5cmd` will fail the upload. This helps to ensure that the file was not corrupted during transmission.\n\nIf the checksum calculated by S3 does not match the checksums provided in the `Content-MD5` and `X-Amz-Content-Sha256` headers, S3 will not store the object. Instead, it will return an error message to `s5cmd` with the error code `InvalidDigest` for an `MD5` mismatch or `XAmzContentSHA256Mismatch` for a `SHA256` mismatch.\n\n| Error Code | Description |\n|---|---|\n| `InvalidDigest` | The checksum provided in the `Content-MD5` header does not match the checksum calculated by S3. |\n| `XAmzContentSHA256Mismatch` | The checksum provided in the `X-Amz-Content-Sha256` header does not match the checksum calculated by S3. |\n\nIf `s5cmd` receives either of these error codes, it will not retry to upload the object again and exit code will be `1`.\n\nIf the `MD5` checksum mismatches, you will see an error like the one below.\n\n    ERROR \"cp file.log s3://bucket/file.log\": InvalidDigest: The Content-MD5 you specified was invalid. status code: 400, request id: S3TR4P2E0A2K3JMH7, host id: XTeMYKd2KECOHWk5S\n\nIf the `SHA256` checksum mismatches, you will see an error like the one below.\n\n    ERROR \"cp file.log s3://bucket/file.log\": XAmzContentSHA256Mismatch: The provided 'x-amz-content-sha256' header does not match what was computed. status code: 400, request id: S3TR4P2E0A2K3JMH7, host id: XTeMYKd2KECOHWk5S\n\n`aws-cli` and `s5cmd` are both command-line tools that can be used to interact with Amazon S3. However, there are some differences between the two tools in terms of how they verify the integrity of data uploaded to S3.\n\n* **Number of retries:** `aws-cli` will retry up to five times to upload a file, while `s5cmd` will not retry.\n* **Checksums:** If you enable `Signature Version 4` in your `~/.aws/config` file, `aws-cli` will only check the `SHA256` checksum of a file  while `s5cmd` will check both the `MD5` and `SHA256` checksums.\n\n**Sources:**\n- [AWS Go SDK](https://github.com/aws/aws-sdk-go/blob/b75b2a7b3cb40ece5774ed07dde44903481a2d4d/service/s3/customizations.go#L56)\n- [AWS CLI Docs](https://docs.aws.amazon.com/cli/latest/topic/s3-faq.html)\n- [AWS S3 Docs](https://aws.amazon.com/getting-started/hands-on/amazon-s3-with-additional-checksums/)\n\n## Using wildcards\n\nOn some shells, like zsh, the `*` character gets treated as a file globbing\nwildcard, which causes unexpected results for `s5cmd`. You might see an output\nlike:\n\n```\nzsh: no matches found\n```\n\nIf that happens, you need to wrap your wildcard expression in single quotes, like:\n\n```\ns5cmd cp '*.gz' s3://bucket/\n```\n\n## Output\n\n`s5cmd` supports both structured and unstructured outputs.\n* unstructured output\n\n```shell\n$ s5cmd cp s3://bucket/testfile .\n\ncp s3://bucket/testfile testfile\n```\n\n```shell\n$ s5cmd cp --no-clobber s3://somebucket/file.txt file.txt\n\nERROR \"cp s3://somebucket/file.txt file.txt\": object already exists\n```\n\n* If `--json` flag is provided:\n\n```json\n{\n    \"operation\": \"cp\",\n    \"success\": true,\n    \"source\": \"s3://bucket/testfile\",\n    \"destination\": \"testfile\",\n    \"object\": \"[object]\"\n}\n{\n    \"operation\": \"cp\",\n    \"job\": \"cp s3://somebucket/file.txt file.txt\",\n    \"error\": \"'cp s3://somebucket/file.txt file.txt': object already exists\"\n}\n```\n\n## Configuring Concurrency\n\n### numworkers\n\n`numworkers` is a global option that sets the size of the global worker pool. Default value of `numworkers` is [256](https://github.com/peak/s5cmd/blob/master/command/app.go#L18).\nCommands such as `cp`, `select` and `run`, which can benefit from parallelism use this worker pool to execute tasks. A task can be an upload, a download or anything in a [`run` file](https://github.com/peak/s5cmd/blob/master/command/app.go#L18).\n\nFor example, if you are uploading 100 files to an S3 bucket and the `--numworkers` is set to 10, then `s5cmd` will limit the number of files concurrently uploaded to 10.\n\n```\ns5cmd --numworkers 10 cp '/Users/foo/bar/*' s3://mybucket/foo/bar/\n```\n\n### concurrency\n\n`concurrency` is a `cp` command option. It sets the number of parts that will be uploaded or downloaded in parallel for a single file.\nThis parameter is used by the AWS Go SDK. Default value of `concurrency` is `5`.\n\n`numworkers` and `concurrency` options can be used together:\n\n```\ns5cmd --numworkers 10 cp --concurrency 10 '/Users/foo/bar/*' s3://mybucket/foo/bar/\n```\n\nIf you have a few, large files to download, setting `--numworkers` to a very high value will not affect download speed. In this scenario setting `--concurrency` to a higher value may have a better impact on the download speed.\n\n## Benchmarks\nSome benchmarks regarding the performance of `s5cmd` are introduced below. For more\ndetails refer to this [post](https://medium.com/@joshua_robinson/s5cmd-for-high-performance-object-storage-7071352cc09d)\nwhich is the source of the benchmarks to be presented.\n\n*Upload/download of single large file*\n\n\u003cimg src=\"./doc/benchmark1.png\" alt=\"get/put performance graph\" height=\"75%\" width=\"75%\"\u003e\n\n*Uploading large number of small-sized files*\n\n\u003cimg src=\"./doc/benchmark2.png\" alt=\"multi-object upload performance graph\" height=\"75%\" width=\"75%\"\u003e\n\n*Performance comparison on different hardware*\n\n\u003cimg src=\"./doc/benchmark3.png\" alt=\"s3 upload speed graph\" height=\"75%\" width=\"75%\"\u003e\n\n*So, where does all this speed come from?*\n\nThere are mainly two reasons for this:\n- It is written in Go, a statically compiled language designed to make development\nof concurrent systems easy and make full utilization of multi-core processors.\n- *Parallelization.* `s5cmd` starts out with concurrent worker pools and parallelizes\nworkloads as much as possible while trying to achieve maximum throughput.\n\n## performance regression tests\n\n[`bench.py`](benchmark/bench.py) script can be used to compare performance of two different s5cmd builds. Refer to this [readme](benchmark/README.md) file for further details.\n\n# Advanced Usage\n\nSome of the advanced usage patterns provided below are inspired by the following [article](https://medium.com/@joshua_robinson/s5cmd-hits-v1-0-and-intro-to-advanced-usage-37ad02f7e895) (thank you! [@joshuarobinson](https://github.com/joshuarobinson))\n\n## Integrate s5cmd operations with Unix commands\nAssume we have a set of objects on S3, and we would like to list them in sorted fashion according to object names.\n\n    $ s5cmd ls s3://bucket/reports/ | sort -k 4\n    2020/08/17 09:34:33              1364 antalya.csv\n    2020/08/17 09:34:33                 0 batman.csv\n    2020/08/17 09:34:33             23114 istanbul.csv\n    2020/08/17 09:34:33             26154 izmir.csv\n    2020/08/17 09:34:33               112 samsun.csv\n    2020/08/17 09:34:33             12552 van.csv\n\nFor a more practical scenario, let's say we have an [avocado prices](https://www.kaggle.com/neuromusic/avocado-prices) dataset, and we would like to take a peek at the few lines of the data by fetching only the necessary bytes.\n\n    $ s5cmd cat s3://bucket/avocado.csv.gz | gunzip | xsv slice --len 5 | xsv table\n        Date        AveragePrice  Total Volume  4046     4225       4770   Total Bags  Small Bags  Large Bags  XLarge Bags  type          year  region\n    0   2015-12-27  1.33          64236.62      1036.74  54454.85   48.16  8696.87     8603.62     93.25       0.0          conventional  2015  Albany\n    1   2015-12-20  1.35          54876.98      674.28   44638.81   58.33  9505.56     9408.07     97.49       0.0          conventional  2015  Albany\n    2   2015-12-13  0.93          118220.22     794.7    109149.67  130.5  8145.35     8042.21     103.14      0.0          conventional  2015  Albany\n    3   2015-12-06  1.08          78992.15      1132.0   71976.41   72.58  5811.16     5677.4      133.76      0.0          conventional  2015  Albany\n    4   2015-11-29  1.28          51039.6       941.48   43838.39   75.78  6183.95     5986.26     197.69      0.0          conventional  2015  Albany\n\n\n## Beast Mode s5cmd\n\n`s5cmd` allows to pass in some file, containing list of operations to be performed, as an argument to the `run` command as illustrated in the [above](./README.md#L293) example. Alternatively, one can pipe in commands into\nthe `run:`\n\n    BUCKET=s5cmd-test; s5cmd ls \"s3://$BUCKET/*test\" | grep -v DIR | awk ‘{print $NF}’\n    | xargs -I {} echo “cp s3://$BUCKET/{} /local/directory/” | s5cmd run\n\nThe above command performs two `s5cmd` invocations; first, searches for files with *test* suffix and then creates a *copy to local directory* command for each matching file and finally, pipes in those into the ` run.`\n\nLet's examine another usage instance, where we migrate files older than\n30 days to a cloud object storage:\n\n    find /mnt/joshua/nachos/ -type f -mtime +30 | awk '{print \"mv \"$1\" s3://joshuarobinson/backup/\"$1}'\n    | s5cmd run\n\nIt is worth to mention that, `run` command should not be considered as a *silver bullet* for all operations. For example, assume we want to remove the following objects:\n\n    s3://bucket/prefix/2020/03/object1.gz\n    s3://bucket/prefix/2020/04/object1.gz\n    ...\n    s3://bucket/prefix/2020/09/object77.gz\n\nRather than executing\n\n    rm s3://bucket/prefix/2020/03/object1.gz\n    rm s3://bucket/prefix/2020/04/object1.gz\n    ...\n    rm s3://bucket/prefix/2020/09/object77.gz\n\nwith `run` command, it is better to just use\n\n    rm 's3://bucket/prefix/2020/0*/object*.gz'\n\nthe latter sends single delete request per thousand objects, whereas using the former approach\nsends a separate delete request for each subcommand provided to `run.` Thus, there can be a\nsignificant runtime difference between those two approaches.\n\n# LICENSE\n\nMIT. See [LICENSE](https://github.com/peak/s5cmd/blob/master/LICENSE).\n","funding_links":[],"categories":["Go","Software Packages","软件包","HarmonyOS","go","DevOps Tools","Development","Go Tools","Open Source Repos","Repositories","Go 工具"],"sub_categories":["DevOps Tools","DevOps 工具","Windows Manager","Devops","S3","代码分析","devops 工具"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeak%2Fs5cmd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpeak%2Fs5cmd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeak%2Fs5cmd/lists"}