{"id":17061344,"url":"https://github.com/roeap/object-store-python","last_synced_at":"2025-05-08T05:55:36.322Z","repository":{"id":59037840,"uuid":"535070477","full_name":"roeap/object-store-python","owner":"roeap","description":"Python bindings and arrow integration for the rust object_store crate.","archived":false,"fork":false,"pushed_at":"2024-08-05T21:30:51.000Z","size":536,"stargazers_count":64,"open_issues_count":14,"forks_count":9,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-08T05:55:30.543Z","etag":null,"topics":["azure","gcp-storage","object-store","python","rust","s3"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roeap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-10T17:30:46.000Z","updated_at":"2025-04-06T02:38:08.000Z","dependencies_parsed_at":"2024-06-09T14:37:52.286Z","dependency_job_id":"187a589d-3164-49d2-9a25-4c12fd438e68","html_url":"https://github.com/roeap/object-store-python","commit_stats":{"total_commits":57,"total_committers":2,"mean_commits":28.5,"dds":0.03508771929824561,"last_synced_commit":"06c74161cad5eccec1f50d9409d2c22a1fafb8e5"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roeap%2Fobject-store-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roeap%2Fobject-store-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roeap%2Fobject-store-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roeap%2Fobject-store-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roeap","download_url":"https://codeload.github.com/roeap/object-store-python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253009891,"owners_count":21839714,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","gcp-storage","object-store","python","rust","s3"],"created_at":"2024-10-14T10:46:50.513Z","updated_at":"2025-05-08T05:55:36.286Z","avatar_url":"https://github.com/roeap.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# object-store-python\n\n[![CI][ci-img]][ci-link]\n[![code style: black][black-img]][black-link]\n![PyPI](https://img.shields.io/pypi/v/object-store-python)\n[![PyPI - Downloads][pypi-img]][pypi-link]\n\nPython bindings and integrations for the excellent [`object_store`][object-store] crate.\nThe main idea is to provide a common interface to various storage backends including the\nobjects stores from most major cloud providers. The APIs are very focussed and taylored\ntowards modern cloud native applications by hiding away many features (and complexities)\nencountered in full fledges file systems.\n\nAmong the included backend are:\n\n- Amazon S3 and S3 compliant APIs\n- Google Cloud Storage Buckets\n- Azure Blob Gen1 and Gen2 accounts (including ADLS Gen2)\n- local storage\n- in-memory store\n\n## Installation\n\nThe `object-store-python` package is available on PyPI and can be installed via\n\n```sh\npoetry add object-store-python\n```\n\nor using pip\n\n```sh\npip install object-store-python\n```\n\n## Usage\n\nThe main [`ObjectStore`](#object-store-python) API mirrors the native [`object_store`][object-store]\nimplementation, with some slight adjustments for ease of use in python programs.\n\n### `ObjectStore` api\n\n```py\nfrom object_store import ObjectStore, ObjectMeta, Path\n\n# we use an in-memory store for demonstration purposes.\n# data will not be persisted and is not shared across store instances\nstore = ObjectStore(\"memory://\")\n\nstore.put(Path(\"data\"), b\"some data\")\n\ndata = store.get(\"data\")\nassert data == b\"some data\"\n\nblobs = store.list()\n\nmeta = store.head(\"data\")\n\nrange = store.get_range(\"data\", start=0, length=4)\nassert range == b\"some\"\n\nstore.copy(\"data\", \"copied\")\ncopied = store.get(\"copied\")\nassert copied == data\n```\n\n#### Async api\n\n```py\nfrom object_store import ObjectStore, ObjectMeta, Path\n\n# we use an in-memory store for demonstration purposes.\n# data will not be persisted and is not shared across store instances\nstore = ObjectStore(\"memory://\")\n\npath = Path(\"data\")\nawait store.put_async(path, b\"some data\")\n\ndata = await store.get_async(path)\nassert data == b\"some data\"\n\nblobs = await store.list_async()\n\nmeta = await store.head_async(path)\n\nrange = await store.get_range_async(path, start=0, length=4)\nassert range == b\"some\"\n\nawait store.copy_async(Path(\"data\"), Path(\"copied\"))\ncopied = await store.get_async(Path(\"copied\"))\nassert copied == data\n```\n\n### Configuration\n\nAs much as possible we aim to make access to various storage backends dependent\nonly on runtime configuration. The kind of service is always derived from the\nurl used to specifiy the storage location. Some basic configuration can also be\nderived from the url string, dependent on the chosen url format.\n\n```py\nfrom object_store import ObjectStore\n\nstorage_options = {\n    \"azure_storage_account_name\": \"\u003cmy-account-name\u003e\",\n    \"azure_client_id\": \"\u003cmy-client-id\u003e\",\n    \"azure_client_secret\": \"\u003cmy-client-secret\u003e\",\n    \"azure_tenant_id\": \"\u003cmy-tenant-id\u003e\"\n}\n\nstore = ObjectStore(\"az://\u003ccontainer-name\u003e\", storage_options)\n```\n\nWe can provide the same configuration via the environment.\n\n```py\nimport os\nfrom object_store import ObjectStore\n\nos.environ[\"AZURE_STORAGE_ACCOUNT_NAME\"] = \"\u003cmy-account-name\u003e\"\nos.environ[\"AZURE_CLIENT_ID\"] = \"\u003cmy-client-id\u003e\"\nos.environ[\"AZURE_CLIENT_SECRET\"] = \"\u003cmy-client-secret\u003e\"\nos.environ[\"AZURE_TENANT_ID\"] = \"\u003cmy-tenant-id\u003e\"\n\nstore = ObjectStore(\"az://\u003ccontainer-name\u003e\")\n```\n\n#### Azure\n\nThe recommended url format is `az://\u003ccontainer\u003e/\u003cpath\u003e` and Azure always requieres\n`azure_storage_account_name` to be configured.\n\n- [shared key][azure-key]\n  - `azure_storage_account_key`\n- [service principal][azure-ad]\n  - `azure_client_id`\n  - `azure_client_secret`\n  - `azure_tenant_id`\n- [shared access signature][azure-sas]\n  - `azure_storage_sas_key` (as provided by StorageExplorer)\n- bearer token\n  - `azure_storage_token`\n- [managed identity][azure-managed]\n  - if using user assigned identity one of `azure_client_id`, `azure_object_id`, `azure_msi_resource_id`\n  - if no other credential can be created, managed identity will be tried\n- [workload identity][azure-workload]\n  - `azure_client_id`\n  - `azure_tenant_id`\n  - `azure_federated_token_file`\n\n#### S3\n\nThe recommended url format is `s3://\u003cbucket\u003e/\u003cpath\u003e` S3 storage always requires a\nregion to be specified via one of `aws_region` or `aws_default_region`.\n\n- [access key][aws-key]\n  - `aws_access_key_id`\n  - `aws_secret_access_key`\n- [session token][aws-sts]\n  - `aws_session_token`\n- [imds instance metadata][aws-imds]\n  - `aws_metadata_endpoint`\n- [profile][aws-profile]\n  - `aws_profile`\n\nAWS supports [virtual hosting of buckets][aws-virtual], which can be configured by setting\n`aws_virtual_hosted_style_request` to \"true\".\n\nWhen an alternative implementation or a mocked service like localstack is used, the service\nendpoint needs to be explicitly specified via `aws_endpoint`.\n\n#### GCS\n\nThe recommended url format is `gs://\u003cbucket\u003e/\u003cpath\u003e`.\n\n- service account\n  - `google_service_account`\n\n### with `pyarrow`\n\n```py\nfrom pathlib import Path\n\nimport numpy as np\nimport pyarrow as pa\nimport pyarrow.fs as fs\nimport pyarrow.dataset as ds\nimport pyarrow.parquet as pq\n\nfrom object_store import ArrowFileSystemHandler\n\ntable = pa.table({\"a\": range(10), \"b\": np.random.randn(10), \"c\": [1, 2] * 5})\n\nbase = Path.cwd()\nstore = fs.PyFileSystem(ArrowFileSystemHandler(str(base.absolute())))\n\npq.write_table(table.slice(0, 5), \"data/data1.parquet\", filesystem=store)\npq.write_table(table.slice(5, 10), \"data/data2.parquet\", filesystem=store)\n\ndataset = ds.dataset(\"data\", format=\"parquet\", filesystem=store)\n```\n\n## Development\n\n### Prerequisites\n\n- [poetry](https://python-poetry.org/docs/)\n- [Rust toolchain](https://www.rust-lang.org/tools/install)\n- [just](https://github.com/casey/just#readme)\n\n### Running tests\n\nIf you do not have [`just`](https://github.com/casey/just#readme) installed and do not wish to install it,\nhave a look at the [`justfile`](https://github.com/roeap/object-store-python/blob/main/justfile) to see the raw commands.\n\nTo set up the development environment, and install a dev version of the native package just run:\n\n```sh\njust init\n```\n\nThis will also configure [`pre-commit`](https://pre-commit.com/) hooks in the repository.\n\nTo run the rust as well as python tests:\n\n```sh\njust test\n```\n\n[object-store]: https://crates.io/crates/object_store\n[pypi-img]: https://img.shields.io/pypi/dm/object-store-python\n[pypi-link]: https://pypi.org/project/object-store-python/\n[ci-img]: https://github.com/roeap/object-store-python/actions/workflows/ci.yaml/badge.svg\n[ci-link]: https://github.com/roeap/object-store-python/actions/workflows/ci.yaml\n[black-img]: https://img.shields.io/badge/code%20style-black-000000.svg\n[black-link]: https://github.com/psf/black\n[aws-virtual]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html\n[azure-managed]: https://learn.microsoft.com/en-gb/azure/app-service/overview-managed-identity\n[azure-sas]: https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview\n[azure-ad]: https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory\n[azure-key]: https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key\n[azure-workload]: https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview\n[aws-imds]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html\n[aws-profile]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html\n[aws-sts]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html\n[aws-key]: https://docs.aws.amazon.com/accounts/latest/reference/credentials-access-keys-best-practices.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froeap%2Fobject-store-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froeap%2Fobject-store-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froeap%2Fobject-store-python/lists"}