https://github.com/snakemake/snakemake-interface-storage-plugins
This package provides a stable interface for interactions between Snakemake and its storage plugins.
https://github.com/snakemake/snakemake-interface-storage-plugins
Last synced: 5 months ago
JSON representation
This package provides a stable interface for interactions between Snakemake and its storage plugins.
- Host: GitHub
- URL: https://github.com/snakemake/snakemake-interface-storage-plugins
- Owner: snakemake
- License: mit
- Created: 2023-09-21T16:30:37.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-03-18T05:10:39.000Z (over 2 years ago)
- Last Synced: 2024-04-07T11:00:29.250Z (about 2 years ago)
- Language: Python
- Homepage:
- Size: 107 KB
- Stars: 1
- Watchers: 1
- Forks: 4
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# snakemake-interface-storage-plugins
This package provides a stable interface for interactions between Snakemake and its storage plugins.
Plugins should implement the following skeleton to comply with this interface.
It is recommended to use [Snakedeploy to set up the skeleton](https://snakedeploy.readthedocs.io/en/stable/snakemake_developers/scaffold_snakemake_plugins.html) (and automated testing) within a python package.
```python
from dataclasses import dataclass, field
from typing import Any, Iterable, Optional, List
from snakemake_interface_storage_plugins.settings import StorageProviderSettingsBase
from snakemake_interface_storage_plugins.storage_provider import (
StorageProviderBase,
StorageQueryValidationResult,
ExampleQuery,
Operation,
)
from snakemake_interface_storage_plugins.storage_object import (
StorageObjectRead,
StorageObjectWrite,
StorageObjectGlob,
StorageObjectTouch,
retry_decorator,
)
from snakemake_interface_storage_plugins.io import IOCacheStorageInterface
# Optional:
# Define settings for your storage plugin (e.g. host url, credentials).
# They will occur in the Snakemake CLI as --storage--
# Make sure that all defined fields are 'Optional' and specify a default value
# of None or anything else that makes sense in your case.
# Note that we allow storage plugin settings to be tagged by the user. That means,
# that each of them can be specified multiple times (an implicit nargs=+), and
# the user can add a tag in front of each value (e.g. tagname1:value1 tagname2:value2).
# This way, a storage plugin can be used multiple times within a workflow with different
# settings.
@dataclass
class StorageProviderSettings(StorageProviderSettingsBase):
myparam: Optional[int] = field(
default=None,
metadata={
"help": "Some help text",
# Optionally request that setting is also available for specification
# via an environment variable. The variable will be named automatically as
# SNAKEMAKE__, all upper case.
# This mechanism should only be used for passwords, usernames, and other
# credentials.
# For other items, we rather recommend to let people use a profile
# for setting defaults
# (https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles).
"env_var": False,
# Optionally specify a function that parses the value given by the user.
# This is useful to create complex types from the user input.
"parse_func": ...,
# If a parse_func is specified, you also have to specify an unparse_func
# that converts the parsed value back to a string.
"unparse_func": ...,
# Optionally specify that setting is required when the executor is in use.
"required": True,
# Optionally specify multiple args with "nargs": True
},
)
# Required:
# Implementation of your storage provider
# This class can be empty as the one below.
# You can however use it to store global information or maintain e.g. a connection
# pool.
# Inside of the provider, you can use self.logger (a normal Python logger of type
# logging.Logger) to log any additional informations or
# warnings.
class StorageProvider(StorageProviderBase):
# For compatibility with future changes, you should not overwrite the __init__
# method. Instead, use __post_init__ to set additional attributes and initialize
# futher stuff.
def __post_init__(self):
# This is optional and can be removed if not needed.
# Alternatively, you can e.g. prepare a connection to your storage backend here.
# and set additional attributes.
pass
@classmethod
def example_queries(cls) -> List[ExampleQuery]:
"""Return valid example queries (at least one) with description."""
...
def rate_limiter_key(self, query: str, operation: Operation) -> Any:
"""Return a key for identifying a rate limiter given a query and an operation.
This is used to identify a rate limiter for the query.
E.g. for a storage provider like http that would be the host name.
For s3 it might be just the endpoint URL.
"""
...
def default_max_requests_per_second(self) -> float:
"""Return the default maximum number of requests per second for this storage
provider."""
...
def use_rate_limiter(self) -> bool:
"""Return False if no rate limiting is needed for this provider."""
...
@classmethod
def is_valid_query(cls, query: str) -> StorageQueryValidationResult:
"""Return whether the given query is valid for this storage provider."""
# Ensure that also queries containing wildcards (e.g. {sample}) are accepted
# and considered valid. The wildcards will be resolved before the storage
# object is actually used.
...
# If required, overwrite the method postprocess_query from StorageProviderBase
# in order to e.g. normalize the query or add information from the settings to it.
# Otherwise, remove this method as it will be inherited from the base class.
def postprocess_query(self, query: str) -> str:
return query
# This can be used to change how the rendered query is displayed in the logs to
# prevent accidentally printing sensitive information e.g. tokens in a URL.
def safe_print(self, query: str) -> str:
"""Process the query to remove potentially sensitive information when printing.
"""
return query
# Required:
# Implementation of storage object. If certain methods cannot be supported by your
# storage (e.g. because it is read-only see
# snakemake-storage-http for comparison), remove the corresponding base classes
# from the list of inherited items.
# Inside of the object, you can use self.provider to access the provider (e.g. for )
# self.provider.logger, see above, or self.provider.settings).
class StorageObject(
StorageObjectRead,
StorageObjectWrite,
StorageObjectGlob,
StorageObjectTouch
):
# For compatibility with future changes, you should not overwrite the __init__
# method. Instead, use __post_init__ to set additional attributes and initialize
# futher stuff.
def __post_init__(self):
# This is optional and can be removed if not needed.
# Alternatively, you can e.g. prepare a connection to your storage backend here.
# and set additional attributes.
pass
async def inventory(self, cache: IOCacheStorageInterface):
"""From this file, try to find as much existence and modification date
information as possible. Only retrieve that information that comes for free
given the current object.
"""
# This is optional and can be left as is
# If this is implemented in a storage object, results have to be stored in
# the given IOCache object, using self.cache_key() as key.
# Optionally, this can take a custom local suffix, needed e.g. when you want
# to cache more items than the current query: self.cache_key(local_suffix=...)
pass
def get_inventory_parent(self) -> Optional[str]:
"""Return the parent directory of this object."""
# this is optional and can be left as is
return None
def local_suffix(self) -> str:
"""Return a unique suffix for the local path, determined from self.query."""
...
def cleanup(self):
"""Perform local cleanup of any remainders of the storage object."""
# self.local_path() should not be removed, as this is taken care of by
# Snakemake.
...
# Fallible methods should implement some retry logic.
# The easiest way to do this (but not the only one) is to use the retry_decorator
# provided by snakemake-interface-storage-plugins.
@retry_decorator
def exists(self) -> bool:
# return True if the object exists
...
@retry_decorator
def mtime(self) -> float:
# return the modification time
...
@retry_decorator
def size(self) -> int:
# return the size in bytes
...
@retry_decorator
def local_footprint(self) -> int:
# Local footprint is the size of the object on the local disk.
# For directories, this should return the recursive sum of the
# directory file sizes.
# If the storage provider supports ondemand eligibility (see retrieve_object()
# below), this should return 0 if the object is not downloaded but e.g.
# mounted upon retrieval.
# If this method is not overwritten here, it defaults to self.size().
...
@retry_decorator
def retrieve_object(self):
# Ensure that the object is accessible locally under self.local_path()
# Optionally, this can make use of the attribute self.is_ondemand_eligible,
# which indicates that the object could be retrieved on demand,
# e.g. by only symlinking or mounting it from whatever network storage this
# plugin provides. For example, objects with self.is_ondemand_eligible == True
# could mount the object via fuse instead of downloading it.
# The job can then transparently access only the parts that matter to it
# without having to wait for the full download.
# On demand eligibility is calculated via Snakemake's access pattern annotation.
# If no access pattern is annotated by the workflow developers,
# self.is_ondemand_eligible is by default set to False.
...
# The following two methods are only required if the class inherits from
# StorageObjectReadWrite.
@retry_decorator
def store_object(self):
# Ensure that the object is stored at the location specified by
# self.local_path().
...
@retry_decorator
def remove(self):
# Remove the object from the storage.
...
# The following method is only required if the class inherits from
# StorageObjectGlob.
@retry_decorator
def list_candidate_matches(self) -> Iterable[str]:
"""Return a list of candidate matches in the storage for the query."""
# This is used by glob_wildcards() to find matches for wildcards in the query.
# The method has to return concretized queries without any remaining wildcards.
# Use snakemake_executor_plugins.io.get_constant_prefix(self.query) to get the
# prefix of the query before the first wildcard.
...
# The following method is only required if the class inherits from
# StorageObjectTouch
@retry_decorator
def touch(self):
"""Touch the object, updating its modification date."""
...
```