{"id":15640754,"url":"https://github.com/graykode/matorage","last_synced_at":"2025-04-14T03:13:20.564Z","repository":{"id":39724602,"uuid":"269370547","full_name":"graykode/matorage","owner":"graykode","description":"Matorage is tensor(multidimensional matrix) object storage manager for deep learning framework(Pytorch, Tensorflow V2, Keras)","archived":false,"fork":false,"pushed_at":"2023-03-25T00:47:54.000Z","size":396,"stargazers_count":73,"open_issues_count":9,"forks_count":7,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-14T03:13:12.883Z","etag":null,"topics":["deep-learning","pytorch","storage-manager","tensorflow"],"latest_commit_sha":null,"homepage":"https://matorage.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graykode.png","metadata":{"files":{"readme":"README.md","changelog":"change_logs/v0.2.0.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-04T13:43:03.000Z","updated_at":"2024-04-12T06:34:57.000Z","dependencies_parsed_at":"2024-10-23T05:27:47.062Z","dependency_job_id":null,"html_url":"https://github.com/graykode/matorage","commit_stats":{"total_commits":281,"total_committers":2,"mean_commits":140.5,"dds":0.003558718861209953,"last_synced_commit":"b17859976becd1fca30a0ea897928a08157d22a2"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fmatorage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fmatorage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fmatorage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fmatorage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graykode","download_url":"https://codeload.github.com/graykode/matorage/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248813802,"owners_count":21165634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","pytorch","storage-manager","tensorflow"],"created_at":"2024-10-03T11:39:39.380Z","updated_at":"2025-04-14T03:13:20.539Z","avatar_url":"https://github.com/graykode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [matorage](https://matorage.readthedocs.io/en/latest)\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://travis-ci.com/github/graykode/matorage\"\u003e\u003cimg alt=\"Build Status\" src=\"https://travis-ci.com/graykode/matorage.svg?branch=master\"\u003e\u003c/a\u003e\n\u003ca href=\"https://matorage.readthedocs.io/en/latest/?badge=latest\"\u003e\u003cimg alt=\"Documentation Status\" src=\"https://readthedocs.org/projects/matorage/badge/?version=latest\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/graykode/matorage/blob/master/LICENSE\"\u003e\u003cimg alt=\"License: Apache 2.0\" src=\"https://img.shields.io/badge/License-Apache%202.0-blue.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pypi.org/project/matorage/\"\u003e\u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/matorage\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pepy.tech/project/matorage\"\u003e\u003cimg alt=\"Downloads\" src=\"https://static.pepy.tech/badge/matorage\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n**An efficient way to store/load and manage dataset, model and optimizer for deep learning with matorage!**\n\nMatorage is tensor(multidimensional matrix) object storage manager for deep learning framework(Pytorch, Tensorflow V2, Keras).\n\n## Features\n\n- Boilerplated data pipeline for dataset, model and optimizer.\n- High performance on tensor storage\n\n**For researchers who need to focus on model training**:\n\n- Support storing data in pre-processed Tensor(multidimensional matrix), eliminate training time.\n- Reduce storage space through multiple compression methods.\n- Manage data and models while training\n\n**For AI Developer who need to focus on creating data pipeline:**\n\n- Concurrency data save \u0026 load\n- Compatible with object storage such as MinIO, S3\n- Generate pipeline from user endpoints data.\n\n## Quick Start with Pytorch Example\n\nFor an example of tensorflow, refer to the detailed document.\nIf you want to see the full code, see below\n\n- [Pytorch Mnist Example](examples/pytorch/mnist)\n- [Tensorflow Mnist Example](examples/tensorflow/mnist)\n- [SQuAD 1.1/2.0 Example](examples/pytorch/squad)\n\n- Content\n    - [0. Install matorage with pip](https://github.com/graykode/matorage#0-install-matorage-with-pip)\n    - [1. Set up Minio Server with docker](https://github.com/graykode/matorage#1-set-up-minio-server-with-docker)\n    - [2. Save pre-processed dataset](https://github.com/graykode/matorage#2-save-pre-processed-dataset)\n    - [3. Load dataset from matorage](https://github.com/graykode/matorage#3-load-dataset-from-matorage)\n    - [4. Save \u0026 Load Model when training](https://github.com/graykode/matorage#4-save--load-model-when-training)\n    - [5. Save \u0026 Load Optimizer when training](https://github.com/graykode/matorage#5-save--load-optimizer-when-training)\n- [Unittest](https://github.com/graykode/matorage#unittest)\n\n#### 0. Install matorage with pip\n\n```bash\n$ pip install matorage\n```\n\n\n#### 1. Set up Minio Server with docker\n\nquick start with NAS(network access storage) using docker\nIt can be managed through the web through the address http://127.0.0.1:9000/, and security is managed through ``MINIO_ACCESS_KEY`` and ``MINIO_SECRET_KEY``.\n\n```bash\n$ mkdir ~/shared # create nas storage folder\n$ docker run -it -p 9000:9000 \\\n    --restart always -e \\\n    \"MINIO_ACCESS_KEY=minio\" -e \\\n    \"MINIO_SECRET_KEY=miniosecretkey\" \\\n    -v ~/shared:/container/vol \\\n    minio/minio gateway nas /container/vol\n```\n\n\n#### 2. Save pre-processed dataset\n\nFirst, create a ``DataConfig`` by importing matorage.\nThis is an example of pre-processing mnist and storing it in distributed storage.\n``additional`` is freely in the form of a dict, and records the shape and type of tensor to be stored in ``attributes``.\n\n```python\nfrom matorage import DataConfig\n\ntraindata_config = DataConfig(\n    endpoint='127.0.0.1:9000',\n    access_key='minio',\n    secret_key='miniosecretkey',\n    dataset_name='mnist',\n    additional={\n        \"mode\": \"train\",\n        \"framework\" : \"pytorch\",\n        ...\n        \"blah\" : \"blah\"\n    },\n    attributes=[\n        ('image', 'float32', (1, 28, 28)),\n        ('target', 'int64', (1))\n    ]\n)\n```\n\nNow do a simple pre-processing and save the data.\n\n```python\nfrom matorage import DataSaver\n\ntraindata_saver = DataSaver(config=traindata_config)\ntrain_loader = DataLoader(dataset, batch_size=60, num_workers=8)\nfor (image, target) in tqdm(train_loader):\n    # image shape : torch.Size([64, 1, 28, 28])\n    # target shape : torch.Size([64])\n    traindata_saver({\n        'image': image,\n        'target': target\n    })\ntraindata_saver.disconnect()\n```\n\n\n#### 3. Load dataset from matorage\n\nNow fetch data iteratively from storage with the same config as the saved dataset when training.\n\n```python\nfrom matorage.torch import Dataset\n\ntrain_dataset = Dataset(config=traindata_config, clear=True)\ntrain_loader = DataLoader(\n    train_dataset, batch_size=64, num_workers=8, shuffle=True\n)\n\nfor batch_idx, (image, target) in enumerate(tqdm(train_loader)):\n    image, target = image.to(device), target.to(device)\n```\n\nOnly an index can be fetched through lazy load.\n\n```python\ntrain_dataset = Dataset(config=traindata_config, clear=True)\nprint(train_dataset[0], len(train_dataset))\n```\n\n\n#### 4. Save \u0026 Load Model when training\n\nDuring training, you can save and load models of specific steps or epochs in distributed storage through inmemory.\nFirst, make the model config the same as the dataset.\n\n```python\nfrom matorage import ModelConfig\nfrom matorage.torch import ModelManager\n\nmodel_config = ModelConfig(\n    endpoint='127.0.0.1:9000',\n    access_key='minio',\n    secret_key='miniosecretkey',\n    model_name='mnist_simple_training',\n    additional={\n        \"version\" : \"1.0.1\",\n        ...\n        \"blah\" : \"blah\"\n    }\n)\n\nmodel_manager = ModelManager(config=model_config)\nprint(model_manager.get_metadata)\nmodel_manager.save(model, epoch=1)\nprint(model_manager.get_metadata)\n```\n\nWhen an empty model is loaded with specific steps or epochs, the appropriate weight is filled into the model.\n\n```python\nprint(model.state_dict())\nmodel_manager.load(model, epoch=1)\nprint(model.state_dict())\n# load a layer weight.\nprint(model_manager.load('net1.0.weight', step=0))\n```\n\n\n#### 5. Save \u0026 Load Optimizer when training\n\nSave and load of optimizer is similar to managing model.\n\n```python\nfrom matorage import OptimizerConfig\nfrom matorage.torch import OptimizerManager\n\noptimizer_config = OptimizerConfig(\n    endpoint='127.0.0.1:9000',\n    access_key='minio',\n    secret_key='miniosecretkey',\n    optimizer_name='adam',\n    additional={\n        \"model\" : \"1.0.1\",\n        ...\n        \"blah\" : \"blah\"\n    }\n)\n\noptimizer_manager = OptimizerManager(config=optimizer_config)\nprint(optimizer_manager.get_metadata)\n# The optimizer contains information about the step.\noptimizer_manager.save(optimizer)\nprint(optimizer_manager.get_metadata)\n```\n\nWhen an empty optimizer is loaded with specific steps, the appropriate weight is filled into the optimizer.\n\n```python\noptimizer = optim.Adam(model.parameters(), lr=0.01)\noptimizer_manager.load(optimizer, step=938)\n```\n\n\n### Unittest\n```bash\n$ git clone https://github.com/graykode/matorage \u0026\u0026 cd matorage\n$ python -m tests.test_suite\n```\n\n\n### Framework Requirement\n\n- torch(\u003e=1.0.0), torchvision(\u003e=0.2.2)\n- tensorflow(\u003e=2.2), tensorflow_io(\u003e=0.13)\n\n### Author\n\n[Tae Hwan Jung(@graykode)](https://github.com/graykode/matorage\u003e)\nWe are looking for a contributor.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Fmatorage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraykode%2Fmatorage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Fmatorage/lists"}