{"id":15654438,"url":"https://github.com/basnijholt/adaptive-scheduler","last_synced_at":"2026-02-25T01:02:14.407Z","repository":{"id":34868222,"uuid":"184797329","full_name":"basnijholt/adaptive-scheduler","owner":"basnijholt","description":"Run many functions (adaptively) on many cores (\u003e10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:","archived":false,"fork":false,"pushed_at":"2025-03-24T18:28:34.000Z","size":993,"stargazers_count":30,"open_issues_count":29,"forks_count":11,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-28T19:09:57.664Z","etag":null,"topics":["active-learning","adaptive","adaptive-learning","dask","distributed-computing","interactive","ipyparallel","loky","mpi4py","parallel-computing","pbs","python","slurm"],"latest_commit_sha":null,"homepage":"http://adaptive-scheduler.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/basnijholt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-03T17:46:40.000Z","updated_at":"2025-03-11T22:49:50.000Z","dependencies_parsed_at":"2023-09-26T06:40:07.252Z","dependency_job_id":"4f4bf379-333e-4637-b1ce-c45e0645c82c","html_url":"https://github.com/basnijholt/adaptive-scheduler","commit_stats":{"total_commits":822,"total_committers":11,"mean_commits":74.72727272727273,"dds":0.402676399026764,"last_synced_commit":"aec20887d71e456a945dda555ffa2cb50d40c7b6"},"previous_names":[],"tags_count":108,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basnijholt%2Fadaptive-scheduler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basnijholt%2Fadaptive-scheduler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basnijholt%2Fadaptive-scheduler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basnijholt%2Fadaptive-scheduler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/basnijholt","download_url":"https://codeload.github.com/basnijholt/adaptive-scheduler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247423515,"owners_count":20936626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["active-learning","adaptive","adaptive-learning","dask","distributed-computing","interactive","ipyparallel","loky","mpi4py","parallel-computing","pbs","python","slurm"],"created_at":"2024-10-03T12:51:48.531Z","updated_at":"2026-02-25T01:02:05.338Z","avatar_url":"https://github.com/basnijholt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Asynchronous Job Scheduler for Adaptive :rocket:\n\n[![PyPI](https://img.shields.io/pypi/v/adaptive-scheduler.svg)](https://pypi.python.org/pypi/adaptive-scheduler)\n[![Conda](https://img.shields.io/conda/v/conda-forge/adaptive-scheduler.svg?label=conda-forge)](https://anaconda.org/conda-forge/adaptive-scheduler)\n[![Downloads](https://anaconda.org/conda-forge/adaptive-scheduler/badges/downloads.svg)](https://anaconda.org/conda-forge/adaptive-scheduler)\n[![Build Status](https://github.com/basnijholt/adaptive-scheduler/actions/workflows/pytest.yml/badge.svg)](https://github.com/basnijholt/adaptive-scheduler/actions/workflows/pytest.yml)\n[![Documentation Status](https://readthedocs.org/projects/adaptive-scheduler/badge/?version=latest)](https://adaptive-scheduler.readthedocs.io/en/latest/?badge=latest)\n[![CodeCov](https://codecov.io/gh/basnijholt/adaptive-scheduler/branch/main/graph/badge.svg)](https://codecov.io/gh/basnijholt/adaptive-scheduler)\n\nThis is an asynchronous job scheduler for [`Adaptive`](https://github.com/python-adaptive/adaptive/), designed to run many `adaptive.Learner`s on many cores (\u003e***10k-100k***) using `mpi4py.futures`, `ipyparallel`, `loky`, `concurrent.futures.ProcessPoolExecutor`, or `dask.distributed`.\n\n\u003c!-- toc-start --\u003e\n## :books: Table of Contents\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n- [:thinking: What is this?](#thinking-what-is-this)\n- [:dart: Design Goals](#dart-design-goals)\n- [:test_tube: How does it work?](#test_tube-how-does-it-work)\n- [:mag: But how does it *really* work?](#mag-but-how-does-it-really-work)\n- [:notebook: Jupyter Notebook Example](#notebook-jupyter-notebook-example)\n- [:computer: Installation](#computer-installation)\n- [:hammer_and_wrench: Development](#hammer_and_wrench-development)\n- [:warning: Limitations](#warning-limitations)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- toc-end --\u003e\n\n## :thinking: What is this?\n\nAdaptive Scheduler is designed to address the challenge of executing a large number of `adaptive.Learner`s in parallel, even when using more than **1k-100k cores**.\nTraditional engines like `ipyparallel` and `distributed` can struggle with such high core counts because there is a central process that communicates with each worker.\n\nThis library schedules a separate job for each `adaptive.Learner`, and manages the creation and execution of these jobs.\nThis ensures that your calculations will run even if the cluster is currently fully occupied (because job will just be put in the queue).\nThe approach allows for nearly limitless core usage, whether you allocate 10 nodes for a single job or 1 core for a single job while scheduling hundreds of jobs.\n\nThe computation is designed for maximum locality.\nIf a job crashes, it will automatically reschedule a new one and continue the calculation from where it left off, thanks to Adaptive's periodic saving functionality.\nEven if the central \"job manager\" fails, the jobs will continue to run, although no new jobs will be scheduled.\n\n## :dart: Design Goals\n\n1. Needs to be able to run efficiently on \u003e30k cores.\n2. Works seamlessly with the Adaptive package.\n3. Minimal load on the file system.\n4. Removes all boilerplate of working with a scheduler:\n   - Writes job script.\n   - (Re)submits job scripts.\n5. Handles random crashes (or node evictions) with minimal data loss.\n6. Preserves Python kernel and variables inside a job (in contrast to submitting jobs for every parameter).\n7. Separates the simulation definition code from the code that runs the simulation.\n8. Maximizes computation locality, jobs continue to run when the main process dies.\n\n## :test_tube: How does it work?\n\nYou create a bunch of `learners` and corresponding `fnames` so they can be loaded, like:\n\n```python\nimport adaptive\nfrom functools import partial\n\ndef h(x, pow, a):\n    return a * x**pow\n\ncombos = adaptive.utils.named_product(\n    pow=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],\n    a=[0.1, 0.5],\n)  # returns list of dicts, cartesian product of all values\n\nlearners = [adaptive.Learner1D(partial(h, **combo),\n            bounds=(-1, 1)) for combo in combos]\nfnames = [f\"data/{combo}\" for combo in combos]\n```\n\nThen you start a process that creates and submits as many job-scripts as there are learners, like:\n\n```python\nimport adaptive_scheduler\n\ndef goal(learner):\n    return learner.npoints \u003e 200\n\nscheduler = adaptive_scheduler.scheduler.SLURM(cores=10)  # every learner gets this many cores\n\nrun_manager = adaptive_scheduler.server_support.RunManager(\n    scheduler,\n    learners,\n    fnames,\n    goal=goal,\n    log_interval=30,  # write info such as npoints, cpu_usage, time, etc. to the job log file\n    save_interval=300,  # save the data every 300 seconds\n)\nrun_manager.start()\n```\n\nThat's it! You can run `run_manager.info()` which will display an interactive `ipywidget` that shows the amount of running, pending, and finished jobs, buttons to cancel your job, and other useful information.\n\n![Widget demo](https://user-images.githubusercontent.com/6897215/232347580-37f8faa9-53f0-45f5-a34b-856cd8e62b83.gif)\n\n## :mag: But how does it *really* work?\n\nThe `adaptive_scheduler.server_support.RunManager` basically does the following:\n\n- *You* need to create `N` `learners` and `fnames` (like in the section above).\n- Then a \"job manager\" writes and submits `max(N, max_simultaneous_jobs)` job scripts but *doesn't know* which learner it is going to run!\n- This is the responsibility of the \"database manager\", which keeps a database of `job_id \u003c--\u003e learner`.\n- The job script starts a Python file `run_learner.py` in which the learner is run.\n\nIn a Jupyter notebook, you can start the \"job manager\" and the \"database manager\", and create the `run_learner.py` like:\n\n```python\nimport adaptive_scheduler\nfrom adaptive_scheduler import server_support\n\n# create a scheduler\nscheduler = adaptive_scheduler.scheduler.SLURM(cores=10)\n\n# create a new database that keeps track of job \u003c-\u003e learner\ndb_fname = \"running.json\"\nurl = (\n   server_support.get_allowed_url()\n)  # get a url where we can run the database_manager\ndatabase_manager = server_support.DatabaseManager(\n   url, scheduler, db_fname, learners, fnames\n)\ndatabase_manager.start()\n\n# create unique names for the jobs\nn_jobs = len(learners)\njob_names = [f\"test-job-{i}\" for i in range(n_jobs)]\n\njob_manager = server_support.JobManager(\n    job_names,\n    database_manager,\n    scheduler,\n    save_interval=300,\n    log_interval=30,\n    goal=0.01,\n)\njob_manager.start()\n```\n\nThen, when the jobs have been running for a while, you can check `server_support.parse_log_files(database_manager, scheduler)`.\n\nAnd use `scheduler.cancel(job_names)` to cancel the jobs.\n\nYou don't actually ever have to leave the Jupyter notebook; take a look at the [`example notebook`](https://github.com/basnijholt/adaptive-scheduler/blob/main/example.ipynb).\n\n## :notebook: Jupyter Notebook Example\n\nSee [`example.ipynb`](https://github.com/basnijholt/adaptive-scheduler/blob/main/example.ipynb).\n\n## :computer: Installation\n\nInstall the **latest stable** version from conda (recommended):\n\n```bash\nconda install adaptive-scheduler\n```\n\nor from PyPI:\n\n```bash\npip install adaptive_scheduler\n```\n\nor install **main** with:\n\n```bash\npip install -U https://github.com/basnijholt/adaptive-scheduler/archive/main.zip\n```\n\nor clone the repository and do a dev install (recommended for dev):\n\n```bash\ngit clone git@github.com:basnijholt/adaptive-scheduler.git\ncd adaptive-scheduler\npip install -e .\n```\n\n## :hammer_and_wrench: Development\n\nIn order not to pollute the history with the output of the notebooks, please set up the git filter by executing:\n\n```bash\npython ipynb_filter.py\n```\n\nin the repository.\n\nWe also use `pre-commit`, so `pip install pre_commit` and run:\n\n```bash\npre-commit install\n```\n\nin the repository.\n\n## :warning: Limitations\n\nCurrently, `adaptive_scheduler` only works for SLURM and PBS.\nHowever, only a class like [`adaptive_scheduler/scheduler.py`](https://github.com/basnijholt/adaptive-scheduler/blob/main/adaptive_scheduler/scheduler.py#L471) would have to be implemented for another type of scheduler.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasnijholt%2Fadaptive-scheduler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbasnijholt%2Fadaptive-scheduler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasnijholt%2Fadaptive-scheduler/lists"}