{"id":23776139,"url":"https://github.com/cea-list/rpcdataloader","last_synced_at":"2025-06-21T01:39:59.229Z","repository":{"id":43637570,"uuid":"511470779","full_name":"CEA-LIST/RPCDataloader","owner":"CEA-LIST","description":"A variant of the PyTorch Dataloader using remote workers.","archived":false,"fork":false,"pushed_at":"2023-04-01T16:06:25.000Z","size":3414,"stargazers_count":17,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-12-16T10:52:22.355Z","etag":null,"topics":["data-science","dataloader","distributed-computing","hpc","machine-learning","preprocessing","pytorch","slurm"],"latest_commit_sha":null,"homepage":"https://cea-list.github.io/RPCDataloader/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CEA-LIST.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-07T09:49:01.000Z","updated_at":"2024-12-15T23:36:46.000Z","dependencies_parsed_at":"2024-11-16T03:26:40.353Z","dependency_job_id":"493341a5-4542-4026-9065-a6530848fe1b","html_url":"https://github.com/CEA-LIST/RPCDataloader","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CEA-LIST%2FRPCDataloader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CEA-LIST%2FRPCDataloader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CEA-LIST%2FRPCDataloader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CEA-LIST%2FRPCDataloader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CEA-LIST","download_url":"https://codeload.github.com/CEA-LIST/RPCDataloader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232055408,"owners_count":18466161,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","dataloader","distributed-computing","hpc","machine-learning","preprocessing","pytorch","slurm"],"created_at":"2025-01-01T07:13:05.841Z","updated_at":"2025-01-01T07:13:06.413Z","avatar_url":"https://github.com/CEA-LIST.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. image:: https://img.shields.io/badge/doc-latest-brightgreen\n   :target: https://cea-list.github.io/RPCDataloader\n   :alt: Documentation\n.. image:: https://github.com/CEA-LIST/RPCDataloader/actions/workflows/tests.yml/badge.svg\n   :target: https://github.com/CEA-LIST/RPCDataloader/actions/workflows/tests.yml\n   :alt: Continuous tests\n\n==============\nRPC Dataloader\n==============\n\nThis library implements a variant of the PyTorch Dataloader using remote workers.\nIt allows to distribute workers over remote servers rather than the one running the main script.\n\nTo use it, start one or several worker daemons on remote computers.\nThe machines running the data loaders will dispatch requests for items to the workers and await the returned values.\n\nThough similar to `torch.rpc \u003chttps://pytorch.org/docs/stable/rpc.html\u003e`_, this library uses its own implementation of RPC (Remote Procedure Call) which is simpler (no initialization) and does not conflict with the one from pytorch.\n\n\nInstallation\n============\n\n.. code:: shell\n\n    pip install rpcdataloader\n\n\n.. _Usage:\n\nUsage\n=====\n\nTo use the RPC dataloader, start a few workers either from the command line:\n\n.. code:: shell\n\n    python -m rpcdataloader.launch --host=0.0.0.0 --port=6543\n\nor by calling :code:`rpcdataloader.run_worker` directly from a python script.\n\nThen instantiate a remote dataset and dataloader:\n\n.. code:: python\n\n    dataset = rpcdataloader.RPCDataset(\n        workers=['node01:6543', 'node02:5432'],\n        dataset=torchvision.datasets.ImageFolder,\n        root=args.data_path + \"/train\",\n        transform=train_transform,\n    )\n\n    dataloader = rpcdataloader.RPCDataloader(\n        dataset\n        batch_size=2,\n        shuffle=True,\n        pin_memory=True)\n\n    for minibatch in dataloader:\n        ...\n\n\nFurther reading\n===============\n\n- `API documentation \u003chttps://cea-list.github.io/RPCDataloader\u003e`_\n- `ResNet50 training on ImageNet dataset \u003cdocs/example_rpc.py\u003e`_\n- `Slurm integration using heterogeneous jobs \u003cdocs/example_rpc.slurm\u003e`_\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcea-list%2Frpcdataloader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcea-list%2Frpcdataloader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcea-list%2Frpcdataloader/lists"}