{"id":24344390,"url":"https://github.com/xtream1101/s3-concat","last_synced_at":"2025-04-09T17:15:18.381Z","repository":{"id":47048136,"uuid":"173357016","full_name":"xtream1101/s3-concat","owner":"xtream1101","description":"Concat multiple files in s3","archived":false,"fork":false,"pushed_at":"2023-04-27T19:28:26.000Z","size":46,"stargazers_count":39,"open_issues_count":5,"forks_count":12,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-09T17:15:10.336Z","etag":null,"topics":["cli","concatenation","python","s3","s3-concat"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xtream1101.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-01T19:28:49.000Z","updated_at":"2024-01-29T12:59:05.000Z","dependencies_parsed_at":"2024-06-19T18:59:41.959Z","dependency_job_id":"ddc3f7a3-aa88-4344-a67c-bd6eebb53d03","html_url":"https://github.com/xtream1101/s3-concat","commit_stats":{"total_commits":46,"total_committers":4,"mean_commits":11.5,"dds":0.08695652173913049,"last_synced_commit":"4f9d7d46bd5a8a0e61222f8e0528a49e29fa6f19"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtream1101%2Fs3-concat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtream1101%2Fs3-concat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtream1101%2Fs3-concat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xtream1101%2Fs3-concat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xtream1101","download_url":"https://codeload.github.com/xtream1101/s3-concat/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248074921,"owners_count":21043490,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","concatenation","python","s3","s3-concat"],"created_at":"2025-01-18T09:35:39.054Z","updated_at":"2025-04-09T17:15:18.362Z","avatar_url":"https://github.com/xtream1101.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python S3 Concat\n\n[![PyPI](https://img.shields.io/pypi/v/s3-concat.svg)](https://pypi.python.org/pypi/s3-concat)\n[![PyPI](https://img.shields.io/pypi/l/s3-concat.svg)](https://pypi.python.org/pypi/s3-concat)  \n\n\nS3 Concat is used to concatenate many small files in an s3 bucket into fewer larger files.\n\n\n## Install\n`pip install s3-concat`\n\n\n## Usage\n\n### Command Line\n`$ s3-concat -h`\n\n### Import\n```python\nfrom s3_concat import S3Concat\n\nbucket = 'YOUR_BUCKET_NAME'\npath_to_concat = 'PATH_TO_FILES_TO_CONCAT'\nconcatenated_file = 'FILE_TO_SAVE_TO.json'\n# Setting this to a size will always add a part number at the end of the file name\nmin_file_size = '50MB'  # ex: FILE_TO_SAVE_TO-1.json, FILE_TO_SAVE_TO-2.json, ...\n# Setting this to None will concat all files into a single file\n# min_file_size = None  ex: FILE_TO_SAVE_TO.json\n\n# Init the job\njob = S3Concat(bucket, concatenated_file, min_file_size,\n               content_type='application/json',\n              #  session=boto3.session.Session(),  # For custom aws session\n              # s3_client_kwargs={}  # Use to pass arguments allowed by the s3 client: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html\n               )\n# Add files, can call multiple times to add files from other directories\njob.add_files(path_to_concat)\n# Add a single file at a time\njob.add_file('some/file_key.json')\n# Only use small_parts_threads if you need to. See Advanced Usage section below.\njob.concat(small_parts_threads=4, main_threads=2)\n```\n\n## Advanced Usage\n\nDepending on your use case, you may want to use more threads then just 1.  \n\n  - `main_threads` is the number of threads to use when uploading files to s3. This will help when there are a lot of files that are already over the `min_file_size` that is set\n\n  - `small_parts_threads` is only used when the files you are trying to concat are less then 5MB. These are spawned from _inside_ of the `main_threads`. Due to the limitations of the s3 multipart_upload api (see *Limitations* below) any files less then 5MB need to be downloaded locally, concated together, then re uploaded. By setting this thread count it will download the parts in parallel for faster creation of the concatenation process.  \n\nThe values set for these arguments depends on your use case and the system you are running this on.\n\n\n## Limitations\nThis uses the multipart upload of s3 and its limits are https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxtream1101%2Fs3-concat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxtream1101%2Fs3-concat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxtream1101%2Fs3-concat/lists"}