Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/roeap/object-store-python
Python bindings and arrow integration for the rust object_store crate.
https://github.com/roeap/object-store-python
azure gcp-storage object-store python rust s3
Last synced: 6 days ago
JSON representation
Python bindings and arrow integration for the rust object_store crate.
- Host: GitHub
- URL: https://github.com/roeap/object-store-python
- Owner: roeap
- License: apache-2.0
- Created: 2022-09-10T17:30:46.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-05T21:30:51.000Z (5 months ago)
- Last Synced: 2024-12-28T18:18:09.847Z (13 days ago)
- Topics: azure, gcp-storage, object-store, python, rust, s3
- Language: Rust
- Homepage:
- Size: 523 KB
- Stars: 61
- Watchers: 2
- Forks: 9
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# object-store-python
[![CI][ci-img]][ci-link]
[![code style: black][black-img]][black-link]
![PyPI](https://img.shields.io/pypi/v/object-store-python)
[![PyPI - Downloads][pypi-img]][pypi-link]Python bindings and integrations for the excellent [`object_store`][object-store] crate.
The main idea is to provide a common interface to various storage backends including the
objects stores from most major cloud providers. The APIs are very focussed and taylored
towards modern cloud native applications by hiding away many features (and complexities)
encountered in full fledges file systems.Among the included backend are:
- Amazon S3 and S3 compliant APIs
- Google Cloud Storage Buckets
- Azure Blob Gen1 and Gen2 accounts (including ADLS Gen2)
- local storage
- in-memory store## Installation
The `object-store-python` package is available on PyPI and can be installed via
```sh
poetry add object-store-python
```or using pip
```sh
pip install object-store-python
```## Usage
The main [`ObjectStore`](#object-store-python) API mirrors the native [`object_store`][object-store]
implementation, with some slight adjustments for ease of use in python programs.### `ObjectStore` api
```py
from object_store import ObjectStore, ObjectMeta, Path# we use an in-memory store for demonstration purposes.
# data will not be persisted and is not shared across store instances
store = ObjectStore("memory://")store.put(Path("data"), b"some data")
data = store.get("data")
assert data == b"some data"blobs = store.list()
meta = store.head("data")
range = store.get_range("data", start=0, length=4)
assert range == b"some"store.copy("data", "copied")
copied = store.get("copied")
assert copied == data
```#### Async api
```py
from object_store import ObjectStore, ObjectMeta, Path# we use an in-memory store for demonstration purposes.
# data will not be persisted and is not shared across store instances
store = ObjectStore("memory://")path = Path("data")
await store.put_async(path, b"some data")data = await store.get_async(path)
assert data == b"some data"blobs = await store.list_async()
meta = await store.head_async(path)
range = await store.get_range_async(path, start=0, length=4)
assert range == b"some"await store.copy_async(Path("data"), Path("copied"))
copied = await store.get_async(Path("copied"))
assert copied == data
```### Configuration
As much as possible we aim to make access to various storage backends dependent
only on runtime configuration. The kind of service is always derived from the
url used to specifiy the storage location. Some basic configuration can also be
derived from the url string, dependent on the chosen url format.```py
from object_store import ObjectStorestorage_options = {
"azure_storage_account_name": "",
"azure_client_id": "",
"azure_client_secret": "",
"azure_tenant_id": ""
}store = ObjectStore("az://", storage_options)
```We can provide the same configuration via the environment.
```py
import os
from object_store import ObjectStoreos.environ["AZURE_STORAGE_ACCOUNT_NAME"] = ""
os.environ["AZURE_CLIENT_ID"] = ""
os.environ["AZURE_CLIENT_SECRET"] = ""
os.environ["AZURE_TENANT_ID"] = ""store = ObjectStore("az://")
```#### Azure
The recommended url format is `az:///` and Azure always requieres
`azure_storage_account_name` to be configured.- [shared key][azure-key]
- `azure_storage_account_key`
- [service principal][azure-ad]
- `azure_client_id`
- `azure_client_secret`
- `azure_tenant_id`
- [shared access signature][azure-sas]
- `azure_storage_sas_key` (as provided by StorageExplorer)
- bearer token
- `azure_storage_token`
- [managed identity][azure-managed]
- if using user assigned identity one of `azure_client_id`, `azure_object_id`, `azure_msi_resource_id`
- if no other credential can be created, managed identity will be tried
- [workload identity][azure-workload]
- `azure_client_id`
- `azure_tenant_id`
- `azure_federated_token_file`#### S3
The recommended url format is `s3:///` S3 storage always requires a
region to be specified via one of `aws_region` or `aws_default_region`.- [access key][aws-key]
- `aws_access_key_id`
- `aws_secret_access_key`
- [session token][aws-sts]
- `aws_session_token`
- [imds instance metadata][aws-imds]
- `aws_metadata_endpoint`
- [profile][aws-profile]
- `aws_profile`AWS supports [virtual hosting of buckets][aws-virtual], which can be configured by setting
`aws_virtual_hosted_style_request` to "true".When an alternative implementation or a mocked service like localstack is used, the service
endpoint needs to be explicitly specified via `aws_endpoint`.#### GCS
The recommended url format is `gs:///`.
- service account
- `google_service_account`### with `pyarrow`
```py
from pathlib import Pathimport numpy as np
import pyarrow as pa
import pyarrow.fs as fs
import pyarrow.dataset as ds
import pyarrow.parquet as pqfrom object_store import ArrowFileSystemHandler
table = pa.table({"a": range(10), "b": np.random.randn(10), "c": [1, 2] * 5})
base = Path.cwd()
store = fs.PyFileSystem(ArrowFileSystemHandler(str(base.absolute())))pq.write_table(table.slice(0, 5), "data/data1.parquet", filesystem=store)
pq.write_table(table.slice(5, 10), "data/data2.parquet", filesystem=store)dataset = ds.dataset("data", format="parquet", filesystem=store)
```## Development
### Prerequisites
- [poetry](https://python-poetry.org/docs/)
- [Rust toolchain](https://www.rust-lang.org/tools/install)
- [just](https://github.com/casey/just#readme)### Running tests
If you do not have [`just`](https://github.com/casey/just#readme) installed and do not wish to install it,
have a look at the [`justfile`](https://github.com/roeap/object-store-python/blob/main/justfile) to see the raw commands.To set up the development environment, and install a dev version of the native package just run:
```sh
just init
```This will also configure [`pre-commit`](https://pre-commit.com/) hooks in the repository.
To run the rust as well as python tests:
```sh
just test
```[object-store]: https://crates.io/crates/object_store
[pypi-img]: https://img.shields.io/pypi/dm/object-store-python
[pypi-link]: https://pypi.org/project/object-store-python/
[ci-img]: https://github.com/roeap/object-store-python/actions/workflows/ci.yaml/badge.svg
[ci-link]: https://github.com/roeap/object-store-python/actions/workflows/ci.yaml
[black-img]: https://img.shields.io/badge/code%20style-black-000000.svg
[black-link]: https://github.com/psf/black
[aws-virtual]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
[azure-managed]: https://learn.microsoft.com/en-gb/azure/app-service/overview-managed-identity
[azure-sas]: https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview
[azure-ad]: https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory
[azure-key]: https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key
[azure-workload]: https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview
[aws-imds]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html
[aws-profile]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html
[aws-sts]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html
[aws-key]: https://docs.aws.amazon.com/accounts/latest/reference/credentials-access-keys-best-practices.html