https://github.com/nicholasjng/shelf
A type-aware, fsspec-based general artifact store client.
https://github.com/nicholasjng/shelf
fsspec machine-learning mlops python
Last synced: 6 days ago
JSON representation
A type-aware, fsspec-based general artifact store client.
- Host: GitHub
- URL: https://github.com/nicholasjng/shelf
- Owner: nicholasjng
- License: apache-2.0
- Created: 2023-10-01T14:13:41.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-01-26T21:25:26.000Z (over 1 year ago)
- Last Synced: 2025-04-03T13:44:23.663Z (about 2 months ago)
- Topics: fsspec, machine-learning, mlops, python
- Language: Python
- Homepage:
- Size: 69.3 KB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# shelf - a lightweight Python artefact store client
## What is it?
shelf combines the [pytree registry](https://jax.readthedocs.io/en/latest/pytrees.html) from JAX with the [fsspec](https://filesystem-spec.readthedocs.io/en/latest/index.html) project.
Similarly to what you do in JAX, registering a pair of serialization and deserialization callbacks allows you to easily save your custom Python types as files _anywhere_ fsspec can reach!
## A ⚡️- quick demo
Here's how you register a custom neural network type that uses [pickle](https://docs.python.org/3/library/pickle.html) to store trained models on disk.
```python
# my_model.py
import numpy as np
import pickle
import shelf
import osclass MyModel:
def __call__(self):
return 42
def train(self, data: np.ndarray):
pass
def score(self, data: np.ndarray):
return 1.def save_to_disk(model: MyModel, ctx: shelf.Context) -> None:
"""Dumps the model to the directory ``tmpdir`` using `pickle`."""
fp = ctx.file("my-model.pkl", mode="wb")
pickle.dump(model, fp)def load_from_disk(ctx: shelf.Context) -> MyModel:
"""Reloads the previously pickled model."""
fname, = ctx.filenames
fp = ctx.file(fname, mode="rb")
model: MyModel = pickle.load(fp)
return modelshelf.register_type(MyModel, save_to_disk, load_from_disk)
```Now, for example in your training loop, save the model to anywhere using a `Shelf`:
```python
import numpy as np
from shelf import Shelffrom my_model import MyModel
def train():
# Initialize a `Shelf` to handle remote I/O.
shelf = Shelf()
model = MyModel()
data = np.random.randn(100)# Train your model...
for epoch in range(10):
model.train(data)
# and save it to S3...
shelf.put(model, "s3://my-bucket/my-model.pkl")
# ... or GCS if you prefer...
shelf.put(model, "gs://my-bucket/my-model.pkl")
# ... or Azure!
shelf.put(model, "az://my-blob/my-model.pkl")
```Conversely, if you want to reinstantiate a remotely stored model:
```python
def score():
model = shelf.get("s3://my-bucket/my-model.pkl", MyModel)
accuracy = model.score(np.random.randn(100))
print(f"And here's how accurately it predicts: {accuracy:.2%}")
```And just like that, push and pull your custom models and data artifacts anywhere you like - your service of choice just has to have a supporting `fsspec` [filesystem implementation](https://github.com/fsspec/filesystem_spec/blob/master/fsspec/registry.py) available.
## Installation
⚠️ `shelf` is an experimental project - expect bugs and sharp edges.
Install it directly from source, for example either using `pip` or `poetry`:
```shell
pip install git+https://github.com/nicholasjng/shelf.git
# or
poetry add git+https://github.com/nicholasjng/shelf.git
```A PyPI package release is planned for the future.