Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gsy0911/azfs

AzFS is to provide convenient Python read/write functions for Azure Storage Account.
https://github.com/gsy0911/azfs

azure azure-storage-account dataframe pandas-dataframe python

Last synced: 24 days ago
JSON representation

AzFS is to provide convenient Python read/write functions for Azure Storage Account.

Awesome Lists containing this project

README

        

# AzFS

[![pytest](https://github.com/gsy0911/azfs/workflows/pytest/badge.svg)](https://github.com/gsy0911/azfs/actions?query=workflow%3Apytest)
[![codecov](https://codecov.io/gh/gsy0911/azfs/branch/master/graph/badge.svg)](https://codecov.io/gh/gsy0911/azfs)
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/gsy0911/azfs.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/gsy0911/azfs/context:python)
[![Documentation Status](https://readthedocs.org/projects/azfs/badge/?version=latest)](https://azfs.readthedocs.io/en/latest/?badge=latest)

[![PythonVersion](https://img.shields.io/badge/python-3.6|3.7|3.8-blue.svg)](https://www.python.org/downloads/release/python-377/)
[![PiPY](https://img.shields.io/pypi/v/azfs.svg)](https://pypi.org/project/azfs/)
[![Downloads](https://pepy.tech/badge/azfs)](https://pepy.tech/project/azfs)

AzFS is to provide convenient Python read/write functions for Azure Storage Account.

`AzFS` can

* list files in blob (also with wildcard `*`),
* check if file exists,
* read csv as pd.DataFrame, and json as dict from blob,
* write pd.DataFrame as csv, and dict as json to blob.

## install

```bash
$ pip install azfs
```

## usage

For `Blob` Storage.

```python
import azfs
from azure.identity import DefaultAzureCredential
import pandas as pd

# credential is not required if your environment is on AAD(Azure Active Directory)
azc = azfs.AzFileClient()

# credential is required if your environment is not on AAD
credential = "[your storage account credential]"
# or
credential = DefaultAzureCredential()
azc = azfs.AzFileClient(credential=credential)

# connection_string is also supported
connection_string = "DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxx;EndpointSuffix=core.windows.net"
azc = azfs.AzFileClient(connection_string=connection_string)

# data paths
csv_path = "https://testazfs.blob.core.windows.net/test_caontainer/test_file.csv"

# read csv as pd.DataFrame
df = azc.read_csv(csv_path, index_col=0)
# or
with azc:
df = pd.read_csv_az(csv_path, header=None)

# write csv
azc.write_csv(path=csv_path, df=df)
# or
with azc:
df.to_csv_az(path=csv_path, index=False)

# you can read multiple files
csv_pattern_path = "https://testazfs.blob.core.windows.net/test_caontainer/*.csv"
df = azc.read().csv(csv_pattern_path)

# to apply additional filter or another process
df = azc.read().apply(function=lambda x: x[x['id'] == 'AAA']).csv(csv_pattern_path)

# in addition, you can use multiprocessing
df = azc.read(use_mp=True).apply(function=lambda x: x[x['id'] == 'AAA']).csv(csv_pattern_path)
```

For `Queue` Storage

```python
import azfs
queue_url = "https://{storage_account}.queue.core.windows.net/{queue_name}"

azc = azfs.AzFileClient()
queue_message = azc.get(queue_url)
# message will not be deleted if `delete=False`
# queue_message = azc.get(queue_url, delete=False)

# get message content
queue_content = queue_message.get('content')

```

For `Table` Storage

```python

import azfs
cons = {
"account_name": "{storage_account_name}",
"account_key": "{credential}",
"database_name": "{database_name}"
}

table_client = azfs.TableStorageWrapper(**cons)

# put data, according to the keyword you put
table_client.put(id_="1", message="hello_world")

# get data
table_client.get(id_="1")

```

check more details in [![Documentation Status](https://readthedocs.org/projects/azfs/badge/?version=latest)](https://azfs.readthedocs.io/en/latest/?badge=latest)

### types of authorization

Supported authentication types are
* [Azure Active Directory (AAD) token credential](https://docs.microsoft.com/azure/storage/common/storage-auth-aad).
* connection_string, like `DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxx;EndpointSuffix=core.windows.net`

### types of storage account kind

The table below shows if `AzFS` provides read/write functions for the storage.

| account kind | Blob | Data Lake | Queue | File | Table |
|:--|:--:|:--:|:--:|:--:|:--:|
| StorageV2 | O | O | O | X | O |
| StorageV1 | O | O | O | X | O |
| BlobStorage | O | - | - | - | - |

* O: provides basic functions
* X: not provides
* -: storage type unavailable

## dependencies

```
pandas
azure-identity >= "1.3.1"
azure-storage-blob >= "12.3.0"
azure-storage-file-datalake >= "12.0.0"
azure-storage-queue >= "12.1.1"
azure-cosmosdb-table
```

## references

* [azure-sdk-for-python/storage](https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage)
* [filesystem_spec](https://github.com/intake/filesystem_spec)