{"id":13437401,"url":"https://github.com/whitfin/s3-concat","last_synced_at":"2026-03-07T02:35:06.185Z","repository":{"id":57666122,"uuid":"156038700","full_name":"whitfin/s3-concat","owner":"whitfin","description":"Concatenate Amazon S3 files remotely using flexible patterns","archived":false,"fork":false,"pushed_at":"2021-02-01T02:11:18.000Z","size":38,"stargazers_count":38,"open_issues_count":2,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-06-13T23:37:21.476Z","etag":null,"topics":["aws","aws-s3","concatenation","filesystem","text-processing","tooling"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/whitfin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-11-04T01:24:14.000Z","updated_at":"2025-04-08T21:20:24.000Z","dependencies_parsed_at":"2022-09-26T20:31:39.199Z","dependency_job_id":null,"html_url":"https://github.com/whitfin/s3-concat","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/whitfin/s3-concat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fs3-concat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fs3-concat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fs3-concat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fs3-concat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/whitfin","download_url":"https://codeload.github.com/whitfin/s3-concat/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/whitfin%2Fs3-concat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30206085,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T19:07:06.838Z","status":"online","status_checked_at":"2026-03-07T02:00:06.765Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-s3","concatenation","filesystem","text-processing","tooling"],"created_at":"2024-07-31T03:00:56.667Z","updated_at":"2026-03-07T02:35:06.115Z","avatar_url":"https://github.com/whitfin.png","language":"Rust","funding_links":[],"categories":["Applications","应用","应用 Applications"],"sub_categories":["Utilities","实用","公用事业 Utilities"],"readme":"# S3 Concat\n[![Crates.io](https://img.shields.io/crates/v/s3-concat.svg)](https://crates.io/crates/s3-concat) [![Build Status](https://img.shields.io/github/workflow/status/whitfin/s3-concat/CI)](https://github.com/whitfin/s3-concat/actions)\n\n**This tool has been migrated into [s3-utils](https://github.com/whitfin/s3-utils), please use that crate for future updates.**\n\nA small utility to concatenate files in AWS S3. Designed to be simple and quick, this tool uses the Multipart Upload API provided by AWS to concatenate files. This avoids the need to download files to the local machines, although it does come with caveats. S3 interaction is controlled by [rusoto_s3](https://crates.io/crates/rusoto_s3), so check out those docs for authorization practices.\n\n## Installation\n\nYou can install `s3-concat` from either this repository, or from Crates (once it's published):\n\n```shell\n# install from Cargo\n$ cargo install s3-concat\n\n# install the latest from GitHub\n$ cargo install --git https://github.com/whitfin/s3-concat.git\n```\n\n## Usage\n\nCredentials can be configured by following the instructions on the [AWS Documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-environment.html), although examples will use environment variables for the sake of clarity.\n\nYou can concatenate files in a basic manner just by providing a source pattern, and a target file path:\n\n```shell\n$ AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID \\\n    AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY \\\n    AWS_DEFAULT_REGION=us-west-2 \\\n    s3-concat my.bucket.name 'archives/*.gz' 'archive.gz'\n```\n\nIf the case you're working with long paths, you can add a prefix on the bucket name to avoid having to type it all out multiple times. In the following case, `*.gz` and `archive.gz` are relative to the `my/annoyingly/nested/path/` prefix.\n\n```shell\n$ AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID \\\n    AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY \\\n    AWS_DEFAULT_REGION=us-west-2 \\\n    s3-concat my.bucket.name/my/annoyingly/nested/path/ '*.gz' 'archive.gz'\n```\n\nYou can also use pattern matching (driven by the official `regex` crate), to use segments of the source paths in your target paths. Here is an example of mapping a date hierarchy (`YYYY/MM/DD`) to a flat structure (`YYYY-MM-DD`):\n\n```shell\n$ AWS_ACCESS_KEY_ID=MY_ACCESS_KEY_ID \\\n    AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY \\\n    AWS_DEFAULT_REGION=us-west-2 \\\n    s3-concat my.bucket.name 'date-hierachy/(\\d{4})/(\\d{2})/(\\d{2})/*.gz' 'flat-hierarchy/$1-$2-$3.gz'\n```\n\nIn this case, all files in `2018/01/01/*` would be mapped to `2018-01-01.gz`. Don't forget to add single quotes around your expressions to avoid any pesky shell expansions!\n\nFor any other functionality, check out the help menu (although this example below might be outdated):\n\n```shell\n$ s3-concat -h\ns3-concat 1.0.0\nIsaac Whitfield \u003ciw@whitfin.io\u003e\nConcatenate Amazon S3 files remotely using flexible patterns\n\nUSAGE:\n    s3-concat [FLAGS] \u003cbucket\u003e \u003csource\u003e \u003ctarget\u003e\n\nFLAGS:\n    -c, --cleanup    Removes source files after concatenation\n    -d, --dry-run    Only print out the calculated writes\n    -h, --help       Prints help information\n    -q, --quiet      Only prints errors during execution\n    -V, --version    Prints version information\n\nARGS:\n    \u003cbucket\u003e    An S3 bucket prefix to work within\n    \u003csource\u003e    A source pattern to use to locate files\n    \u003ctarget\u003e    A target pattern to use to concatenate files into\n```\n\n## Limitations\n\nIn order to concatenate files remotely (i.e. without pulling them to your machine), this tool uses the Multipart Upload API of S3. This means that all limitations of that API are inherited by this tool. Usually, this isn't an issue, but one of the more noticeable problems is that files smaller than 5MB cannot be concatenated. To avoid wasted AWS calls, this is currently caught in the client layer and will result in a client side error. Due to the complexity in working around this, it's currently unsupported to join files with a size smaller than 5MB.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwhitfin%2Fs3-concat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwhitfin%2Fs3-concat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwhitfin%2Fs3-concat/lists"}