{"id":26057226,"url":"https://github.com/ncar/daskrun","last_synced_at":"2025-08-23T10:32:15.594Z","repository":{"id":96025747,"uuid":"154187768","full_name":"NCAR/daskrun","owner":"NCAR","description":"mpirun-like operation with dask-jobqueue","archived":false,"fork":false,"pushed_at":"2018-10-23T19:25:56.000Z","size":58,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-03-15T11:49:41.548Z","etag":null,"topics":["dask","dask-jobqueue","daskrun"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NCAR.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.rst","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-10-22T17:37:39.000Z","updated_at":"2019-10-16T13:36:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"12ef007b-f330-45fe-9cab-694fd376746f","html_url":"https://github.com/NCAR/daskrun","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/NCAR/daskrun","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fdaskrun","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fdaskrun/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fdaskrun/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fdaskrun/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NCAR","download_url":"https://codeload.github.com/NCAR/daskrun/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCAR%2Fdaskrun/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271746655,"owners_count":24813575,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dask","dask-jobqueue","daskrun"],"created_at":"2025-03-08T11:07:32.826Z","updated_at":"2025-08-23T10:32:15.582Z","avatar_url":"https://github.com/NCAR.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# daskrun\n daskrun allows the user to run a script that uses Dask for parallelism in the same way as running a script that uses MPI for parallelism.\n\n## Difference between `mpirun` and `daskrun`\n\nTo illustrate differences between `mpirun` and `daskrun`, we are going to assume that we have a python script called `example.py`.\n\nTo execute this script with mpi, you might have to write another script, `submit_job.sh` with the following content:\n\n```bash\n  #!/bin/bash\n  #PBS -N pangeo\n  #PBS -q ${QUEUE}\n  #PBS -A ${ACCOUNT}\n  #PBS -l select=${NODES}:ncpus=${CORES}:mpiprocs=${CORES}\n  #PBS -l walltime=$WALLTIME\n  #PBS -j oe\n\n  mpirun -np $CORES python example.py\n```\nNext, you would submit this script to the scheduler (in this case, we will assume that we are using `PBS`) by running:\n\n        qsub submit_job.sh\n\n\nWith `daskrun`, everything is done for you from the command line:\n\n    daskrun --script example.py --cores $NCORES --project $ACCOUNT --queue $QUEUE --walltime $WALLTIME\n\n\nAnother difference is that the `--cores`, `--memory`, `--num-processes` keywords used in `daskrun` correspond not to your full desired deployment, but rather to the size of a single job which should be no larger than the size of a single machine in your cluster. \nSeparately **the number of jobs** to deploy corresponds to number of workers specified via `--num-workers` keyword argument. \n\n\nUnder the hood, `daskrun` is doing the following:\n- Get all the specific scheduler keywords such as `project`, `queue`, `walltime`, etc., and submits jobs to the scheduler via [dask-jobqueue](https://dask-jobqueue.readthedocs.io/en/latest/). This creates a dask cluster with the specified resources.  \n- After this step, dask launches `dask-workers` on requested resources.\n- Next, once the `dask-workers` are up and running, `dask-scheduler` is ready to launch, manage jobs on the created `dask-workers`. \n- Once all jobs are finished, the created dask cluster is teared down, and we are done. \n\n\n## Installation \n\nTo install the most recent stable version (`v.0.1.0`), run:\n```bash\npip install git+git://github.com/NCAR/daskrun.git@v.0.1.0\n```\n\n\n## Usage \n\n`daskrun` allows you to specify the following arguments:\n\n```bash\nabanihi@cheyenne5: ~ $ daskrun --help\nUsage: daskrun [OPTIONS]\n\nOptions:\n  --version                Show the version and exit.\n  -s, --script PATH        Script to run\n  -q, --queue TEXT         Destination queue for each worker job. Passed to\n                           #PBS -q option.  [default: economy]\n  -p, --project TEXT       Accounting string associated with each worker job.\n                           Passed to #PBS -A option.  [default: ]\n  -w, --walltime TEXT      Walltime for each worker job.  [default: 00:20:00]\n  --num-workers INTEGER    Number of workers  [default: 1]\n  --num-processes INTEGER  Number of Python processes to cut up each job\n                           [default: 1]\n  --cores INTEGER          Total number of cores per job  [default: 1]\n  --memory TEXT            Total amount of memory per job\n  --local-directory TEXT   Location to put temporary data if necessary\n                           [default: /glade/scratch/abanihi]\n  --help                   Show this message and exit.\n```\n\nTo use daskrun, you need to include the following lines in your script:\n\n```python\nfrom daskrun.config import scheduler\n\nclient = Client(scheduler)\n```\n\nThis allows the script to retrieve needed information about where dask scheduler is running from. \n\nAn example `example.py` script is provided below:\n\n```python\nfrom dask.distributed import Client\nimport dask\n\n# Make sure you include the following line in your script\n# to get the scheduler information\nfrom daskrun.config import scheduler\n\nclient = Client(scheduler)\ndf = dask.datasets.timeseries()\nprint(df.head(20))\nprint(df.describe().compute())\nclient.write_scheduler_file(\"./dask-scheduler.json\")\n```\n\n```daskrun --script example.py --num-workers 2 --project PROJECTID --cores 1```\n\n\nNOTE: when you execute ```daskrun ...[options] --script myscript.py```, the total number of submitted jobs equals to the number of dask-workers  specified via the `--num-workers` argument. In other words, each dask-worker is launched in an independent job \n\nFor instance, this is what we get when we run `example.py` script with two `dask-workers`:\n\n```bash\nabanihi@cheyenne5: ~ $ qstat -u $USER\n\nchadmin1: \n                                                            Req'd  Req'd   Elap\nJob ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time\n--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----\n3085027.chadmin abanihi  economy  dask-worke    --    1   1    --  00:20 Q   -- \n3085028.chadmin abanihi  economy  dask-worke    --    1   1    --  00:20 Q   -- \n```\n\nTo verify that our `example.py` script was executed with two `dask-workers`, let's inspect the created `dask-scheduler.json` file. We expect to see two dask workers information along side dask's scheduler information.\n\n```json\n{\n  \"type\": \"Scheduler\",\n  \"id\": \"Scheduler-f828319a-a327-4860-90b9-d863ef97cd9b\",\n  \"address\": \"tcp://xx.xxx.x.x:51034\",\n  \"services\": {\n    \"bokeh\": 8787\n  },\n  \"workers\": {\n    \"tcp://xx.xxx.x.x:34137\": {\n      \"type\": \"Worker\",\n      \"id\": \"dask-worker--3081414--\",\n      \"host\": \"xx.xxx.x.xxx\",\n      \"resources\": {},\n      \"local_directory\": \"/glade/scratch/abanihi/worker-i68z06t5\",\n      \"name\": \"dask-worker--3081414--\",\n      \"ncores\": 1,\n      \"memory_limit\": 3000000000,\n      \"last_seen\": 1540312311.5554748,\n      \"services\": {\n        \"nanny\": 41957,\n        \"bokeh\": 33914\n      },\n      \"metrics\": {\n        \"cpu\": 108.2,\n        \"memory\": 123023360,\n        \"time\": 1540312311.0555336,\n        \"read_bytes\": 488117.31111868663,\n        \"write_bytes\": 66659.12169260635,\n        \"num_fds\": 27,\n        \"executing\": 0,\n        \"in_memory\": 94,\n        \"ready\": 0,\n        \"in_flight\": 0\n      }\n    },\n    \"tcp://xx.xxx.x.x:57886\": {\n      \"type\": \"Worker\",\n      \"id\": \"dask-worker--3081413--\",\n      \"host\": \"xx.xxx.x.x\",\n      \"resources\": {},\n      \"local_directory\": \"/glade/scratch/abanihi/worker-tfduys7x\",\n      \"name\": \"dask-worker--3081413--\",\n      \"ncores\": 1,\n      \"memory_limit\": 3000000000,\n      \"last_seen\": 1540312311.5556846,\n      \"services\": {\n        \"nanny\": 55003,\n        \"bokeh\": 57435\n      },\n      \"metrics\": {\n        \"cpu\": 112.0,\n        \"memory\": 128286720,\n        \"time\": 1540312311.0591698,\n        \"read_bytes\": 528809.3225060791,\n        \"write_bytes\": 65662.37663631311,\n        \"num_fds\": 26,\n        \"executing\": 0,\n        \"in_memory\": 52,\n        \"ready\": 0,\n        \"in_flight\": 0\n      }\n    }\n  }\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncar%2Fdaskrun","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fncar%2Fdaskrun","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncar%2Fdaskrun/lists"}