{"id":26198178,"url":"https://github.com/machu-gwu/s3pathlib-project","last_synced_at":"2025-12-25T23:22:37.824Z","repository":{"id":281230272,"uuid":"448969676","full_name":"MacHu-GWU/s3pathlib-project","owner":"MacHu-GWU","description":"s3pathlib is the python package provides the Pythonic objective oriented programming (OOP) interface to manipulate AWS S3 object / directory. The api is similar to the pathlib standard library and very intuitive for human.","archived":false,"fork":false,"pushed_at":"2025-08-12T17:03:37.000Z","size":628,"stargazers_count":4,"open_issues_count":4,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-23T05:12:28.951Z","etag":null,"topics":["aws","aws-s3","filesystem","python"],"latest_commit_sha":null,"homepage":"https://s3pathlib.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MacHu-GWU.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.rst","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-01-17T16:34:18.000Z","updated_at":"2025-10-03T14:52:32.000Z","dependencies_parsed_at":"2025-06-13T18:36:11.563Z","dependency_job_id":null,"html_url":"https://github.com/MacHu-GWU/s3pathlib-project","commit_stats":null,"previous_names":["machu-gwu/s3pathlib-project"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/MacHu-GWU/s3pathlib-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fs3pathlib-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fs3pathlib-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fs3pathlib-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fs3pathlib-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MacHu-GWU","download_url":"https://codeload.github.com/MacHu-GWU/s3pathlib-project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fs3pathlib-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28040476,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-25T02:00:05.988Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-s3","filesystem","python"],"created_at":"2025-03-12T02:50:46.716Z","updated_at":"2025-12-25T23:22:37.819Z","avatar_url":"https://github.com/MacHu-GWU.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. image:: https://readthedocs.org/projects/s3pathlib/badge/?version=latest\n    :target: https://s3pathlib.readthedocs.io/en/latest/\n    :alt: Documentation Status\n\n.. image:: https://github.com/MacHu-GWU/s3pathlib-project/actions/workflows/main.yml/badge.svg\n    :target: https://github.com/MacHu-GWU/s3pathlib-project/actions?query=workflow:CI\n\n.. image:: https://codecov.io/gh/MacHu-GWU/s3pathlib-project/branch/main/graph/badge.svg\n    :target: https://codecov.io/gh/MacHu-GWU/s3pathlib-project\n\n.. image:: https://img.shields.io/pypi/v/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/pypi/l/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/pypi/pyversions/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n    \n.. image:: https://img.shields.io/pypi/dm/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/badge/✍️_Release_History!--None.svg?style=social\u0026logo=github\n    :target: https://github.com/MacHu-GWU/s3pathlib-project/blob/main/release-history.rst\n\n.. image:: https://img.shields.io/badge/⭐_Star_me_on_GitHub!--None.svg?style=social\u0026logo=github\n    :target: https://github.com/aws-samples/s3pathlib-project\n\n------\n\n.. image:: https://img.shields.io/badge/Link-API-blue.svg\n    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg\n    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg\n    :target: https://github.com/aws-samples/s3pathlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg\n    :target: https://github.com/aws-samples/s3pathlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Download-blue.svg\n    :target: https://pypi.org/pypi/s3pathlib#files\n\n\nWelcome to ``s3pathlib`` Documentation\n==============================================================================\n`s3pathlib \u003chttps://s3pathlib.readthedocs.io/en/latest/\u003e`_ is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library `pathlib \u003chttps://docs.python.org/3/library/pathlib.html\u003e`_ and is user-friendly. The package also `supports versioning \u003chttps://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html\u003e`_ in AWS S3.\n\n.. note::\n\n    You may not be viewing the full document, `FULL DOCUMENT IS HERE \u003chttps://s3pathlib.readthedocs.io/en/latest/\u003e`_\n\n\nQuick Start\n------------------------------------------------------------------------------\n.. note::\n\n    `COMPREHENSIVE DOCUMENT guide / features / best practice can be found at HERE \u003chttps://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide\u003e`_\n\n**Import the library, declare an S3Path object**\n\n.. code-block:: python\n\n    # import\n    \u003e\u003e\u003e from s3pathlib import S3Path\n\n    # construct from string, auto join parts\n    \u003e\u003e\u003e p = S3Path(\"bucket\", \"folder\", \"file.txt\")\n    # construct from S3 URI works too\n    \u003e\u003e\u003e p = S3Path(\"s3://bucket/folder/file.txt\")\n    # construct from S3 ARN works too\n    \u003e\u003e\u003e p = S3Path(\"arn:aws:s3:::bucket/folder/file.txt\")\n    \u003e\u003e\u003e p.bucket\n    'bucket'\n    \u003e\u003e\u003e p.key\n    'folder/file.txt'\n    \u003e\u003e\u003e p.uri\n    's3://bucket/folder/file.txt'\n    \u003e\u003e\u003e p.console_url # click to preview it in AWS console\n    'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'\n    \u003e\u003e\u003e p.arn\n    'arn:aws:s3:::bucket/folder/file.txt'\n\n**Talk to AWS S3 and get some information**\n\n.. code-block:: python\n\n    # s3pathlib maintains a \"context\" object that holds the AWS authentication information\n    # you just need to build your own boto session object and attach to it\n    \u003e\u003e\u003e import boto3\n    \u003e\u003e\u003e from s3pathlib import context\n    \u003e\u003e\u003e context.attach_boto_session(\n    ...     boto3.session.Session(\n    ...         region_name=\"us-east-1\",\n    ...         profile_name=\"my_aws_profile\",\n    ...     )\n    ... )\n\n    \u003e\u003e\u003e p = S3Path(\"bucket\", \"folder\", \"file.txt\")\n    \u003e\u003e\u003e p.write_text(\"a lot of data ...\")\n    \u003e\u003e\u003e p.etag\n    '3e20b77868d1a39a587e280b99cec4a8'\n    \u003e\u003e\u003e p.size\n    56789000\n    \u003e\u003e\u003e p.size_for_human\n    '51.16 MB'\n\n    # folder works too, you just need to use a tailing \"/\" to identify that\n    \u003e\u003e\u003e p = S3Path(\"bucket\", \"datalake/\")\n    \u003e\u003e\u003e p.count_objects()\n    7164 # number of files under this prefix\n    \u003e\u003e\u003e p.calculate_total_size()\n    (7164, 236483701963) # 7164 objects, 220.24 GB\n    \u003e\u003e\u003e p.calculate_total_size(for_human=True)\n    (7164, '220.24 GB') # 7164 objects, 220.24 GB\n\n**Manipulate Folder in S3**\n\nNative S3 Write API (those operation that change the state of S3) only operate on object level. And the `list_objects \u003chttps://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2\u003e`_ API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. ``s3pathlib`` **CAN SAVE YOUR LIFE**\n\n.. code-block:: python\n\n    # create a S3 folder\n    \u003e\u003e\u003e p = S3Path(\"bucket\", \"github\", \"repos\", \"my-repo/\")\n\n    # upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/\n    \u003e\u003e\u003e p.upload_dir(\"/my-repo\", pattern=\"**/*.py\", overwrite=False)\n\n    # copy entire s3 folder to another s3 folder\n    \u003e\u003e\u003e p2 = S3Path(\"bucket\", \"github\", \"repos\", \"another-repo/\")\n    \u003e\u003e\u003e p1.copy_to(p2, overwrite=True)\n\n    # delete all objects in the folder, recursively, to clean up your test bucket\n    \u003e\u003e\u003e p.delete()\n    \u003e\u003e\u003e p2.delete()\n\n**S3 Path Filter**\n\nEver think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e s3bkt = S3Path(\"bucket\") # assume you have a lots of files in this bucket\n    \u003e\u003e\u003e iterproxy = s3bkt.iter_objects().filter(\n    ...     S3Path.size \u003e= 10_000_000, S3Path.ext == \".csv\" # add filter\n    ... )\n\n    \u003e\u003e\u003e iterproxy.one() # fetch one\n    S3Path('s3://bucket/larger-than-10MB-1.csv')\n\n    \u003e\u003e\u003e iterproxy.many(3) # fetch three\n    [\n        S3Path('s3://bucket/larger-than-10MB-1.csv'),\n        S3Path('s3://bucket/larger-than-10MB-2.csv'),\n        S3Path('s3://bucket/larger-than-10MB-3.csv'),\n    ]\n\n    \u003e\u003e\u003e for p in iterproxy: # iter the rest\n    ...     print(p)\n\n\n**File Like Object for Simple IO**\n\n``S3Path`` is file-like object. It support ``open`` and context manager syntax out of the box. Here are only some highlight examples:\n\n.. code-block:: python\n\n    # Stream big file by line\n    \u003e\u003e\u003e p = S3Path(\"bucket\", \"log.txt\")\n    \u003e\u003e\u003e with p.open(\"r\") as f:\n    ...     for line in f:\n    ...         do what every you want\n\n    # JSON io\n    \u003e\u003e\u003e import json\n    \u003e\u003e\u003e p = S3Path(\"bucket\", \"config.json\")\n    \u003e\u003e\u003e with p.open(\"w\") as f:\n    ...     json.dump({\"password\": \"mypass\"}, f)\n\n    # pandas IO\n    \u003e\u003e\u003e import pandas as pd\n    \u003e\u003e\u003e p = S3Path(\"bucket\", \"dataset.csv\")\n    \u003e\u003e\u003e df = pd.DataFrame(...)\n    \u003e\u003e\u003e with p.open(\"w\") as f:\n    ...     df.to_csv(f)\n\nNow that you have a basic understanding of s3pathlib, let's read the `full document \u003chttps://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide\u003e`_ to explore its capabilities in greater depth.\n\n\nGetting Help\n------------------------------------------------------------------------------\nPlease use the ``python-s3pathlib`` tag on Stack Overflow to get help.\n\nSubmit a ``I want help`` issue tickets on `GitHub Issues \u003chttps://github.com/aws-samples/s3pathlib-project/issues/new/choose\u003e`_\n\n\nContributing\n------------------------------------------------------------------------------\nPlease see the `Contribution Guidelines \u003chttps://github.com/aws-samples/s3pathlib-project/blob/main/CONTRIBUTING.rst\u003e`_.\n\n\nCopyright\n------------------------------------------------------------------------------\ns3pathlib is an open source project. See the `LICENSE \u003chttps://github.com/aws-samples/s3pathlib-project/blob/main/LICENSE\u003e`_ file for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachu-gwu%2Fs3pathlib-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmachu-gwu%2Fs3pathlib-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachu-gwu%2Fs3pathlib-project/lists"}