{"id":15521347,"url":"https://github.com/andrewrosss/gcs-uri","last_synced_at":"2025-04-23T04:45:21.030Z","repository":{"id":40536177,"uuid":"478730833","full_name":"andrewrosss/gcs-uri","owner":"andrewrosss","description":"Simple API to copy files to and from Google Cloud Storage","archived":false,"fork":false,"pushed_at":"2022-05-03T17:40:57.000Z","size":71,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-14T06:17:05.986Z","etag":null,"topics":["gcp","gcp-storage","google-cloud","google-cloud-storage"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrewrosss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-06T21:21:15.000Z","updated_at":"2023-02-16T05:41:09.000Z","dependencies_parsed_at":"2022-07-27T09:32:13.754Z","dependency_job_id":null,"html_url":"https://github.com/andrewrosss/gcs-uri","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewrosss%2Fgcs-uri","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewrosss%2Fgcs-uri/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewrosss%2Fgcs-uri/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewrosss%2Fgcs-uri/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrewrosss","download_url":"https://codeload.github.com/andrewrosss/gcs-uri/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250372935,"owners_count":21419722,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gcp","gcp-storage","google-cloud","google-cloud-storage"],"created_at":"2024-10-02T10:34:00.436Z","updated_at":"2025-04-23T04:45:21.015Z","avatar_url":"https://github.com/andrewrosss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gcs-uri\n\nSimple API to copy files to and from Google Cloud Storage\n\n[![PyPI Version](https://img.shields.io/pypi/v/gcs-uri.svg)](https://pypi.org/project/gcs-uri/)\n\n## Installation\n\n```bash\npip install gcs-uri\n```\n\n## Usage\n\n`gcs-uri` exposes the following functions as its main public API\n\n- `copy_file`\n- `copy_dir`\n- `copy_files`\n\nThese functions do exactly what they sound like they do.\n\n`copy_file` will copy a source file (either a local file or a remote blob in GCS) to destination file (either a local file or remote blob in GCS).\n\n`copy_dir` will recursively copy the contents of a directory (either a local directory or a remote \"directory\" in GCS) to a destination directory (either a local directory or a remote \"directory\" in GCS)\n\n`copy_files` will copy a list of source files (either local files or remote blobs in GCS or a mix of local files/remote blobs) to a corresponding set of destination files (either local files or remote blobs in GCS of a mix of local files/remote blobs)\n\n\u003e If the second argument to `copy_files` is of type `str | Path | Blob` (as opposed to a Sequence), then this argument is treated like a directory and each of the source files are \"flattened\" (i.e. folder delimiters are removed) and copied under the destintation directory.\n\nThe idea being that you can pass just about any object to these functions and the functions will figures how to do the copying.\n\n## Examples\n\n### Local file -\u003e local file\n\nIn this case `copy_file` behaves just like `shutil.copy2` or `cp`, copying the source file to the destination file locally.\n\n```python\nsrc = '/my/src/file.txt'\ndst = '/my/dst/file.txt'\n\ncopy_file(src, dst)\n```\n\n`src` and `dst` can also be `pathlib.Path` objects:\n\n```python\nfrom pathlib import Path\n\nsrc = Path('/my/src/file.txt')\ndst = Path('/my/dst/file.txt')\n\ncopy_file(src, dst)\n```\n\n### Local dir -\u003e local dir\n\nIn this case `copy_dir` behaves just like `shutil.copytree` (or somewhat like rsync, but `copy_dir` will \"re-copy\" all files to the destination whether they exist in the the destination or not).\n\n```python\nsrc = '/my/src'\ndst = '/my/dst'\n\ncopy_dir(src, dst)\n\n# if there was a file `/my/src/a/b.txt` after `copy_dir`\n# there would then be a file `/my/dst/a/b.txt`\n```\n\nThe source and destination can include or omit a trailing slash and the results are the same as above.\n\n### Local file -\u003e remote file (upload)\n\nTo copy a file to a google cloud bucket, barely anything has to change, the destination should simply be a google storage URI:\n\n```python\nsrc = '/my/src/file.txt'\ndst = 'gs://my-bkt/dst/file.txt'\n\ncopy_file(src, dst)\n```\n\nIf you would like `gcs-uri` to use a particular Google Storage Client, this can be provided as a keyword(-only) argument (the same applies to `copy_dir`):\n\n```python\nfrom google.cloud import storage\n\nclient = storage.Client()\n\nsrc = '/my/src/file.txt'\ndst = 'gs://my-bkt/dst/file.txt'\n\ncopy_file(src, dst, client=client)\n```\n\nIf no client is provided and either of the source or destinations (or both) are determined to represent a remote location then `gcs-uri` will try to instantiate a client by calling `storage.Client()`.\n\nNote, we can provided `gcs-uri` with \"richer\" objects (instead of just strings):\n\n```python\nfrom pathlib import Path\nfrom google.cloud import storage\n\nclient = storage.Client()\n\nsrc = Path('/my/src/file.txt')\ndst = storage.Blob.from_string('gs://my-bkt/dst/file.txt', client=client)\n\ncopy_file(src, dst)\n```\n\n### Local dir -\u003e remote dir (upload)\n\nThe concepts from the previous sections apply here:\n\n```python\nsrc = '/my/src'\ndst = 'gs://my-bkt/dst'\n\ncopy_dir(src, dst)\n\n# if there was a file `/my/src/a/b.txt` after `copy_dir`\n# there would then be a blob `gs://my-bkt/dst/a/b.txt`\n```\n\n### Remote file -\u003e local file (download)\n\n```python\nsrc = 'gs://my-bkt/src/file.txt'\ndst = '/my/dst/file.txt'\n\ncopy_file(src, dst)\n```\n\n### Remote dir -\u003e local dir (download)\n\n```python\nsrc = 'gs://my-bkt/src'\ndst = '/my/dst'\n\ncopy_dir(src, dst)\n```\n\n### Remote file -\u003e remote file (transfer)\n\n```python\nsrc = 'gs://my-bkt/src/file.txt'\ndst = 'gs://my-other-bkt/dst/file.txt'\n\ncopy_file(src, dst)\n```\n\n### Remote dir -\u003e remote dir (transfer)\n\n```python\nsrc = 'gs://my-bkt/src'\ndst = 'gs://my-other-bkt/dst'\n\ncopy_dir(src, dst)\n```\n\n### List of local files -\u003e list of remote files\n\n```python\nsrcs = ['/my/src/file1.txt', '/my/src/file2.txt']\ndsts = ['gs://my-bkt/dst/file1.txt', 'gs://my-bkt/dst/file2.txt']\n\ncopy_files(srcs, dsts)\n# copies: /my/src/file1.txt -\u003e gs://my-bkt/dst/file1.txt\n# copies: /my/src/file2.txt -\u003e gs://my-bkt/dst/file2.txt\n```\n\n### List of local files -\u003e remote dir\n\n```python\nsrcs = ['/my/src/file1.txt', '/my/src/file2.txt']\ndst = 'gs://my-bkt/dst'\n\ncopy_files(srcs, dst)\n# copies: /my/src/file1.txt -\u003e gs://my-bkt/dst/my-src-file1.txt\n# copies: /my/src/file2.txt -\u003e gs://my-bkt/dst/my-src-file2.txt\n```\n\n## API\n\n```python\n# src/gcs_uri.py\n\ndef copy_file(\n    src: str | Path | Blob,\n    dst: str | Path | Blob,\n    *,\n    client: Client | None = None,\n    quiet: bool = False,\n) -\u003e None:\n    \"\"\"Copy a single file.\n\n    If `src` and `dst` are both determined to be local files then `client` is ignored.\n    \"\"\"\n\ndef copy_dir(\n    src: str | Path | Blob,\n    dst: str | Path | Blob,\n    *,\n    client: Client | None = None,\n    quiet: bool = False,\n) -\u003e None:\n    \"\"\"Copy a directory (recursively).\n\n    If `src` and `dst` are both determined to be local directories\n    then `client` is ignored.\n    \"\"\"\n\ndef copy_files(\n    srcs: Sequence[str | Path | Blob],\n    dsts: str | Path | Blob | Sequence[str | Path | Blob],\n    *,\n    client: Client | None = None,\n    quiet: bool = False,\n) -\u003e None:\n    \"\"\"Copy a list of files.\n\n    If `dsts` is a `str | Path | Blob` it is treated as a directory\n    and each of the files in `srcs` will have its name \"flattened\" and will be\n    copied under `dsts`.\n\n    if `dsts` is a `Sequence[str | Path | Blob]` it is zipped with `srcs`, i.e.\n    each file in `srcs` is copied to its corresponding entry in `dsts`.\n    \"\"\"\n```\n\n## Tests\n\nThis package comes with some basic end-to-end (e2e) tests. They require an active google cloud project with the google storage API enabled.\n\nTo help with running them there is a utility script in the root of this repo: `run_e2e_tests.py`.\n\n```text\nusage: run_e2e_tests.py [-h] [-v] [-c GOOGLE_APPLICATION_CREDENTIALS]\n                        [-u TEST_STORAGE_URI]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -v, --version         show program's version number and exit\n  -c GOOGLE_APPLICATION_CREDENTIALS, --google-application-credentials GOOGLE_APPLICATION_CREDENTIALS\n                        Google cloud service account to use.\n  -u TEST_STORAGE_URI, --test-storage-uri TEST_STORAGE_URI\n                        Google storage uri to use when running e2e tests.\n```\n\nThis script requires you to provided a service account json file as we'll as a URI to a location in google cloud which the tests will use to copy blobs to/from. (**IMPORTANT**: **_all_** blobs at and beneath the location you specifify will be removed - the bucket itself will **not** be removed).\n\nSo, run the e2e tests with something like:\n\n```bash\npython -m run_e2e_tests -c \"path/to/service-account.json\" -u \"gs://my-bkt/gcs-uri-tests\"\n```\n\n## Contributing\n\n1. Have or install a recent version of `poetry` (version \u003e= 1.1)\n1. Fork the repo\n1. Setup a virtual environment (however you prefer)\n1. Run `poetry install`\n1. Run `pre-commit install`\n1. Add your changes (adding/updating tests is always nice too)\n1. Commit your changes + push to your fork\n1. Open a PR\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewrosss%2Fgcs-uri","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrewrosss%2Fgcs-uri","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewrosss%2Fgcs-uri/lists"}