{"id":29606818,"url":"https://github.com/lancedb/lance-distributed-training","last_synced_at":"2025-10-31T05:32:34.468Z","repository":{"id":304612300,"uuid":"1019296603","full_name":"lancedb/lance-distributed-training","owner":"lancedb","description":"Examples and guides for distributed training with lance","archived":false,"fork":false,"pushed_at":"2025-07-14T07:40:37.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-14T09:25:20.907Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lancedb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-14T05:45:27.000Z","updated_at":"2025-07-14T07:40:40.000Z","dependencies_parsed_at":"2025-07-14T09:29:19.262Z","dependency_job_id":"4df4b0aa-8274-472c-a594-e0efe5f15e28","html_url":"https://github.com/lancedb/lance-distributed-training","commit_stats":null,"previous_names":["lancedb/lance-distributed-training"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/lancedb/lance-distributed-training","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-distributed-training","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-distributed-training/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-distributed-training/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-distributed-training/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lancedb","download_url":"https://codeload.github.com/lancedb/lance-distributed-training/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flance-distributed-training/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266165228,"owners_count":23886555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-20T17:30:45.914Z","updated_at":"2025-10-31T05:32:34.462Z","avatar_url":"https://github.com/lancedb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Distributed training with Lance\n\n## Examples\n### classification\n1. Create lance dataset of FOOD101:\n```\npython create_datasets/classficiation.py\n```\n2. train using map-style dataset:\n```\ntorchrun --nproc-per-node=2  lance_map_style.py --batch_size 128\n```\n\n3. train using iterable dataset:\n```\ntorchrun --nproc-per-node=2  lance_iterable.py --batch_size 128\n```\n\n## Docs\n\nThere are 2 ways to load data for training models using Lance’s pytorch integration. \n\n1. Iterable style dataset (`LanceDataset`) - Suitable for streaming. Works with inbuilt distributed samplers\n2. Map-style dataset (`SafeLanceDataset`) - Suitable as a default choice unless you have a specific reason to use an iterable dataset.\n\nA key difference in working with both is that:\n\n- In the **iterable-style** (`LanceDataset`), the data transformation (decoding bytes, applying transforms, stacking) **must happen before the `DataLoader` receives the data**. This is done inside the `to_tensor_fn` (`decode_tensor_image` ).\n\nIf your dataset contains a binray feild, it can't be converted to tensor directly, so you need to handle it appropriately in a custom\n`to_tensor_fn`. This is similar to `collate_fn` when using map-style dataset\n\u003cdetails\u003e\n\u003csummary\u003eExample: Decoding images from LanceDatase using a custom `to_tensor_fn` \u003c/summary\u003e\n\n```python\ndef decode_tensor_image(batch, **kwargs):\n    images = []\n    labels = []\n    for item in batch.to_pylist():\n        img = Image.open(io.BytesIO(item[\"image\"])).convert(\"RGB\")\n        img = _food101_transform(img)\n        images.append(img)\n        labels.append(item[\"label\"])\n    batch = {\n        \"image\": torch.stack(images),\n        \"label\": torch.tensor(labels, dtype=torch.long)\n    }\n    return batch\n\n  ds = LanceDataset(\n        dataset_path,\n        to_tensor_fn=decode_tensor_image,\n        batch_size=batch_size,\n        sampler=sampler\n    )\n```\n\n\u003c/details\u003e\n\nIn the **map-style** (`SafeLanceDataset`), the `DataLoader`'s workers fetch the raw data, and the transformation happens later in the `collate_fn`\n\n\u003cdetails\u003e\n\u003csummary\u003eExample: Decoding images from SafeLanceDataset using `collate_fn` \u003c/summary\u003e\n\n```python\nfrom lance.torch.data import SafeLanceDataset, get_safe_loader\n\ndef collate_fn(batch_of_dicts):\n    \"\"\"\n    Collates a list of dictionaries from SafeLanceDataset into a single batch.\n    This function handles decoding the image bytes and applying transforms.\n    \"\"\"\n    images = []\n    labels = []\n    \n    transform = transforms.Compose([\n        transforms.Resize((224, 224)),\n        transforms.ToTensor(),\n    ])\n\n    for item in batch_of_dicts:\n        image_bytes = item[\"image\"]\n        img = Image.open(io.BytesIO(image_bytes)).convert(\"RGB\")\n        img_tensor = transform(img)\n        images.append(img_tensor)\n        labels.append(item[\"label\"])\n        \n    return {\n        \"image\": torch.stack(images),\n        \"label\": torch.tensor(labels, dtype=torch.long)\n    }\n\n    loader = get_safe_loader(\n        dataset,\n        batch_size=args.batch_size,\n        sampler=sampler,\n        shuffle=(sampler is None),\n        num_workers=args.num_workers,\n        collate_fn=collate_fn,\n        pin_memory=True,\n        persistent_workers=True\n    )\n```\n\u003c/details\u003e\n\n\n## When to use Map-style or Iterable dataset\nThese are some rules of thumb to decide it you should use Map-style or Iterable dataset\n\n### When to Use Map-Style(`torch.utils.data.Dataset` \\ `SafeLanceDataset`):\n\n**Standard Datasets (Default Choice)**: Use this for any dataset where you have a finite collection of data points that can be indexed. This covers almost all standard use cases like image classification (ImageNet, CIFAR) or text classification where each file/line is a sample.\n\nWhen You Need High Performance: This is the only way to get the full performance benefit of PyTorch's DataLoader with num_workers \u003e 0. The DataLoader can give each worker a list of indices to fetch in parallel, which is extremely efficient.\nIn short this should be your default choice unless you have a specific reason to use an iterable dataset.\n\nLance doesn't materialise indexes in memory, so you can almost always use map style.\n\n### Iterable-Style Dataset (`torch.utils.data.IterableDataset` \\ `LanceDataset` )\nThis type of dataset works like a Python generator or a data stream.\n\nCore Idea: It only knows how to give you the next item when you iterate over it (__iter__). It often has no known length and you cannot ask for the N-th item directly.\n\n## Lance's sampling guide\n\n- `FullScanSampler`: **Not DDP-aware**. Intentionally designed for each process to scan the entire dataset. It is useful for single-GPU evaluation or when you need every process to see all the data for some reason (which is rare in ddp training).\n- `ShardedBatchSampler`: **DDP-aware**. Splits the total set of **batches** evenly among GPUs. Provides perfect workload balance.\n- `ShardedFragmentSampler`: **DDP-aware**. Splits the list of **fragments** among GPUs. Can result in an unbalanced workload if fragments have uneven sizes. This needs to be handled to prevent synchornization errors,\n\n## Full Scan Sampler\n\nThis is the simplest sampler. It inherits from `FragmentSampler` and implements the `iter_fragments` method. Its implementation is a single loop that gets all fragments from the dataset and yields each one sequentially.\n\n### Behavior in DDP\n\nThe `FullScanSampler` is **not DDP-aware**. It contains no logic that checks for a `rank` or `world_size`. Consequently, when used in a distributed setting, **every single process (GPU) will scan the entire dataset**.\n\n- **Use Case:** This sampler is primarily intended for single-GPU scenarios, such as validation, inference, or debugging, where you need one process to read all the data. It is not suitable for distributed training.\n\n## ShardedFragmentSampler\n\nThis sampler also inherits from `FragmentSampler` and works by dividing the **list of fragments** among the available processes. Its `iter_fragments` method gets the full list of fragments and then yields only the ones corresponding to its assigned `rank`.\n\n- **Rank 0** gets fragments 0, 2, 4, ...\n- **Rank 1** gets fragments 1, 3, 5, ...\n\nand so on\n\n### Behavior in DDP\n\nThis sampler is **DDP-aware**, but it operates at the fragment level.\n\n- **Pros:** It can be very I/O efficient. Since each process is assigned whole fragments, it can read them in long, sequential blocks. The documentation notes this is more efficient for large datasets.\n- **Cons:** It can lead to **workload imbalance**. Lance datasets can have fragments of varying sizes (e.g., the last fragment is often smaller). If one rank is assigned fragments that have more total rows than another rank, it will have more batches to process. This imbalance can lead to DDP deadlocks if not handled with padding.\n\n\ncan lead to **workload imbalance, and eventually error out.** \n\n\u003cdetails\u003e\n\u003csummary\u003eExample: DDP error due to imbalanced fragment sampler \u003c/summary\u003e\n\n```python\n\nEpoch 1/10: 300it [07:12,  1.44s/it, loss=1.07]  \n[Epoch 0] Loss: 980.4352, Epoch Time: 432.61s\nEpoch 2/10: 133it [03:17,  1.48s/it, loss=5.98]\nEpoch 2/10: 300it [07:24,  1.48s/it, loss=2.49]\n[Epoch 1] Loss: 1200.9648, Epoch Time: 444.51s\nEpoch 3/10: 300it [07:22,  1.48s/it, loss=3.24]\n[Epoch 2] Loss: 1324.9992, Epoch Time: 442.84s\nEpoch 4/10: 300it [07:23,  1.48s/it, loss=3.69]\n[Epoch 3] Loss: 1371.6891, Epoch Time: 443.10s\nEpoch 5/10: 300it [07:23,  1.48s/it, loss=3.91]\n[Epoch 4] Loss: 1384.9732, Epoch Time: 443.12s, Val Acc: 0.0196\nEpoch 6/10: 300it [07:24,  1.48s/it, loss=3.94]\n[Epoch 5] Loss: 1388.0216, Epoch Time: 444.14s\nEpoch 7/10: 300it [07:24,  1.48s/it, loss=4]   \n[Epoch 6] Loss: 1388.9526, Epoch Time: 444.02s\nEpoch 8/10: 300it [07:24,  1.48s/it, loss=3.99]\n[Epoch 7] Loss: 1388.8115, Epoch Time: 444.43s\nEpoch 9/10: 300it [07:24,  1.48s/it, loss=2.29]\n[Epoch 8] Loss: 1314.3089, Epoch Time: 444.65s\nEpoch 9/10: 300it [07:24,  1.48s/it, loss=2.29]]\n[Epoch 8] Loss: 1314.3089, Epoch Time: 444.65s\nEpoch 10/10: 240it [05:55,  1.47s/it, loss=5.46][rank0]:[E709 17:05:38.162555850 ProcessGroupNCCL.cpp:632] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=20585, OpType=ALLREDUCE, NumelIn=1259621, NumelOut=1259621, Timeout(ms)=600000) ran for 600000 milliseconds before timing out.\n[rank0]:[E709 17:05:38.162814866 ProcessGroupNCCL.cpp:2271] [PG ID 0 PG GUID 0(default_pg) Rank 0]  failure detected by watchdog at work sequence id: 20585 PG status: last enqueued work: 20589, last completed work: 20584\n[rank0]:[E709 17:05:38.162832798 ProcessGroupNCCL.cpp:670] Stack trace of the failed collective not found, potentially because FlightRecorder is disabled. You can enable it by setting TORCH_NCCL_TRACE_BUFFER_SIZE to a non-zero value.\n[rank0]:[E709 17:05:38.162895613 ProcessGroupNCCL.cpp:2106] [PG ID 0 PG GUID 0(default_pg) Rank 0] First PG on this rank to signal dumping.\n[rank0]:[E709 17:05:38.482119928 ProcessGroupNCCL.cpp:1746] [PG ID 0 PG GUID 0(default_pg) Rank 0] Received a dump signal due to a collective timeout from this local rank and we will try our best to dump the debug info. Last enqueued NCCL work: 20589, last completed NCCL work: 20584.This is most likely caused by incorrect usages of collectives, e.g., wrong sizes used across ranks, the order of collectives is not same for all ranks or the scheduled collective, for some reason, didn't run. Additionally, this can be caused by GIL deadlock or other reasons such as network errors or bugs in the communications library (e.g. NCCL), etc. \n[rank0]:[E709 17:05:38.482326987 ProcessGroupNCCL.cpp:1536] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL preparing to dump debug info. Include stack trace: 1\nEpoch 10/10: 241it [15:55, 181.19s/it, loss=5.09][rank0]:[E709 17:05:39.081662161 ProcessGroupNCCL.cpp:684] [Rank 0] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.\n[rank0]:[E709 17:05:39.081690629 ProcessGroupNCCL.cpp:698] [Rank 0] To avoid data inconsistency, we are taking the entire process down.\n[rank0]:[E709 17:05:39.083402482 ProcessGroupNCCL.cpp:1899] [PG ID 0 PG GUID 0(default_pg) Rank 0] Process group watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=20585, OpType=ALLREDUCE, NumelIn=1259621, NumelOut=1259621, Timeout(ms)=600000) ran for 600000 milliseconds before timing out.\nException raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:635 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string\u003cchar, std::char_traits\u003cchar\u003e, std::allocator\u003cchar\u003e \u003e) + 0x98 (0x7f92e62535e8 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)\nframe #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional\u003cstd::chrono::duration\u003clong, std::ratio\u003c1l, 1000l\u003e \u003e \u003e) + 0x23d (0x7f92e756ea6d in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)\nframe #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0xc80 (0x7f92e75707f0 in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)\nframe #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f92e7571efd in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)\nframe #4: \u003cunknown function\u003e + 0xd8198 (0x7f92d7559198 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #5: \u003cunknown function\u003e + 0x7ea7 (0x7f933d48dea7 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)\nframe #6: clone + 0x3f (0x7f933d25eadf in /usr/lib/x86_64-linux-gnu/libc.so.6)\n\nterminate called after throwing an instance of 'c10::DistBackendError'\n  what():  [PG ID 0 PG GUID 0(default_pg) Rank 0] Process group watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=20585, OpType=ALLREDUCE, NumelIn=1259621, NumelOut=1259621, Timeout(ms)=600000) ran for 600000 milliseconds before timing out.\nException raised from checkTimeout at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:635 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string\u003cchar, std::char_traits\u003cchar\u003e, std::allocator\u003cchar\u003e \u003e) + 0x98 (0x7f92e62535e8 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)\nframe #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional\u003cstd::chrono::duration\u003clong, std::ratio\u003c1l, 1000l\u003e \u003e \u003e) + 0x23d (0x7f92e756ea6d in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)\nframe #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0xc80 (0x7f92e75707f0 in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)\nframe #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f92e7571efd in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)\nframe #4: \u003cunknown function\u003e + 0xd8198 (0x7f92d7559198 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #5: \u003cunknown function\u003e + 0x7ea7 (0x7f933d48dea7 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)\nframe #6: clone + 0x3f (0x7f933d25eadf in /usr/lib/x86_64-linux-gnu/libc.so.6)\n\nException raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1905 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string\u003cchar, std::char_traits\u003cchar\u003e, std::allocator\u003cchar\u003e \u003e) + 0x98 (0x7f92e62535e8 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)\nframe #1: \u003cunknown function\u003e + 0x11b4abe (0x7f92e7540abe in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)\nframe #2: \u003cunknown function\u003e + 0xe07bed (0x7f92e7193bed in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)\nframe #3: \u003cunknown function\u003e + 0xd8198 (0x7f92d7559198 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #4: \u003cunknown function\u003e + 0x7ea7 (0x7f933d48dea7 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)\nframe #5: clone + 0x3f (0x7f933d25eadf in /usr/lib/x86_64-linux-gnu/libc.so.6)\n\nE0709 17:05:39.816000 56204 site-packages/torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: -6) local_rank: 0 (pid: 56213) of binary: /opt/conda/bin/python3.10\nTraceback (most recent call last):\n  File \"/opt/conda/bin/torchrun\", line 8, in \u003cmodule\u003e\n    sys.exit(main())\n  File \"/opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 355, in wrapper\n    return f(*args, **kwargs)\n  File \"/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py\", line 892, in main\n    run(args)\n  File \"/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py\", line 883, in run\n    elastic_launch(\n  File \"/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py\", line 139, in __call__\n    return launch_agent(self._config, self._entrypoint, list(args))\n  File \"/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py\", line 270, in launch_agent\n    raise ChildFailedError(\ntorch.distributed.elastic.multiprocessing.errors.ChildFailedError: \n============================================================\ntrainer.py FAILED\n------------------------------------------------------------\nFailures:\n  \u003cNO_OTHER_FAILURES\u003e\n------------------------------------------------------------\nRoot Cause (first observed failure):\n[0]:\n  time      : 2025-07-09_17:05:39\n  host      : distributed-training.us-central1-a.c.lance-dev-ayush.internal\n  rank      : 0 (local_rank: 0)\n  exitcode  : -6 (pid: 56213)\n  error_file: \u003cN/A\u003e\n  traceback : Signal 6 (SIGABRT) received by PID 56213\n============================================================\n(base) jupyter@distributed-training:~/lance-dist-training$ \n(base) jupyter@distributed-training:~/lance-dist-training$ python\n```\n\u003c/details\u003e\n\n## ShardedBatchSampler\n\nThis sampler provides perfectly balanced sharding by operating at the **batch level**, not the fragment level. Calculates row ranges for each batch and deals those ranges out to the different ranks.\n\nThis logic gives interleaved batches to each process:\n\n- **Rank 0** gets row ranges for Batch 0, Batch 2, Batch 4, ...\n- **Rank 1** gets row ranges for Batch 1, Batch 3, Batch 5, ...\n\n### Behavior in DDP\n\nThis sampler is **DDP-aware** and is the safest choice for balanced distributed training.\n\n- **Pros:** It guarantees that every process receives almost the exact same number of batches, preventing workload imbalance and DDP deadlocks.\n- **Cons:** It can be slightly less I/O efficient than `ShardedFragmentSampler`. To construct a batch, it may need to perform a specific range read from a fragment, which can be less optimal than reading the entire fragment at once.\n\n\nyou cannot use the `lance` samplers (like `ShardedBatchSampler` or `ShardedFragmentSampler`) with a map-style dataset.\n\nThe two systems are fundamentally incompatible by design:\n\n1. **Lance Samplers** are designed to work *inside* the iterable `LanceDataset`. They don't generate indices. Instead, they directly control how the `lance` file scanner reads and yields entire batches of data. They are tightly coupled to the `LanceDataset`'s streaming (`__iter__`) mechanism.\n2. **PyTorch's `DistributedSampler`** works by generating a list of **indices** (e.g., `[10, 5, 22]`). The `DataLoader` then takes these indices and fetches each item individually from a map-style dataset using its `__getitem__` method (e.g., `dataset[10]`).\n\nBecause the `lance` samplers don't produce the indices that a map-style `DataLoader` needs, you cannot use them together. You have to choose one of the two paths:\n\n- **Path A (Lance Control):** Use the iterable `LanceDataset` with a `lance` sampler. **Benefit:** Uses `lance`'s native, optimized sampling. **Limitation:** Must use `num_workers=0`.\n- **Path B (PyTorch Control):** Use a map-style dataset (like the `LanceMapDataset` we built, or `torchvision`'s) with PyTorch's `DistributedSampler`. **Benefit:** Allows for high-performance parallel data loading with `num_workers \u003e 0`. **Limitation:** Does not use `lance`'s specific sampling logic.\n\n\n### Running benchmarks \nAll the scripts log data to wandb dashboard to the same project. Simply run the training scripts and look at the dashboard for \ntraining time per eoch or other metircs.\n\nThe same training loop implemented on `ImageFolder` dataset via torchvision is provided in torch_version/ folder\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Flance-distributed-training","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flancedb%2Flance-distributed-training","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Flance-distributed-training/lists"}