{"id":13719961,"url":"https://github.com/zincware/dask4dvc","last_synced_at":"2025-05-07T12:30:33.350Z","repository":{"id":104068302,"uuid":"521259992","full_name":"zincware/dask4dvc","owner":"zincware","description":"Use dask to run the DVC Graph","archived":true,"fork":false,"pushed_at":"2024-07-18T15:45:33.000Z","size":2509,"stargazers_count":16,"open_issues_count":19,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-09T02:41:38.746Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zincware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-04T12:35:20.000Z","updated_at":"2024-10-01T12:58:47.000Z","dependencies_parsed_at":"2023-10-03T12:53:13.186Z","dependency_job_id":"f3a8a81d-67b5-4821-9c80-de1dcf78d75e","html_url":"https://github.com/zincware/dask4dvc","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2Fdask4dvc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2Fdask4dvc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2Fdask4dvc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2Fdask4dvc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zincware","download_url":"https://codeload.github.com/zincware/dask4dvc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252876279,"owners_count":21818155,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T01:00:58.226Z","updated_at":"2025-05-07T12:30:32.563Z","avatar_url":"https://github.com/zincware.png","language":"Python","funding_links":[],"categories":["Tools \u0026 Libraries"],"sub_categories":[],"readme":"\n\u003e [!NOTE]\n\u003e The usage of `dask` and `distributed` and the task to implement dvc experiments made this project very convoluted.\n\u003e It will no longer be maintained: checkout https://github.com/zincware/paraffin for a simpler version instead.\n\n\n\n[![Coverage Status](https://coveralls.io/repos/github/zincware/dask4dvc/badge.svg?branch=main)](https://coveralls.io/github/zincware/dask4dvc?branch=main)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/zincware/dask4dvc/main.svg)](https://results.pre-commit.ci/latest/github/zincware/dask4dvc/main)\n![PyTest](https://github.com/zincware/dask4dvc/actions/workflows/pytest.yaml/badge.svg)\n[![PyPI version](https://badge.fury.io/py/dask4dvc.svg)](https://badge.fury.io/py/dask4dvc)\n[![zincware](https://img.shields.io/badge/Powered%20by-zincware-darkcyan)](https://github.com/zincware)\n\n# Dask4DVC - Distributed Node Execution\n\n[DVC](dvc.org) provides tools for building and executing the computational graph\nlocally through various methods. The `dask4dvc` package combines\n[Dask Distributed](https://distributed.dask.org/) with DVC to make it easier to\nuse with HPC managers like [Slurm](https://github.com/SchedMD/slurm).\n\nThe `dask4dvc repro` package will run the DVC graph in parallel where possible.\nCurrently, `dask4dvc run` will not run stages per experiment sequentially.\n\n\u003e :warning: This is an experimental package **not** affiliated in any way with\n\u003e iterative or DVC.\n\n## Usage\n\nDask4DVC provides a CLI similar to DVC.\n\n- `dvc repro` becomes `dask4dvc repro`.\n- `dvc queue start` becomes `dask4dvc run`\n\nYou can follow the progress using `dask4dvc \u003ccmd\u003e --dashboard`.\n\n### SLURM Cluster\n\nYou can use `dask4dvc` easily with a slurm cluster. This requires a running dask\nscheduler:\n\n```python\nfrom dask_jobqueue import SLURMCluster\n\ncluster = SLURMCluster(\n    cores=1, memory='128GB',\n    queue=\"gpu\",\n    processes=1,\n    walltime='8:00:00',\n    job_cpu=1,\n    job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', \"--gres=gpu:1\"],\n    scheduler_options={\"port\": 31415}\n)\ncluster.adapt()\n```\n\nwith this setup you can then run `dask4dvc repro --address 127.0.0.1:31415` on\nthe example port `31415`.\n\nYou can also use config files with `dask4dvc repro --config myconfig.yaml`. All\n`dask.distributed` Clusters should be supported.\n\n```yaml\ndefault:\n  SGECluster:\n    queue: regular\n    cores: 10\n    memory: 16 GB\n```\n\n![dask4dvc repro](https://raw.githubusercontent.com/zincware/dask4dvc/main/misc/dask4dvc_1.gif \"dask4dvc repro\")\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzincware%2Fdask4dvc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzincware%2Fdask4dvc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzincware%2Fdask4dvc/lists"}