{"id":13493939,"url":"https://github.com/facebookincubator/submitit","last_synced_at":"2025-05-14T08:06:14.026Z","repository":{"id":37814872,"uuid":"258441818","full_name":"facebookincubator/submitit","owner":"facebookincubator","description":"Python 3.8+ toolbox for submitting jobs to Slurm","archived":false,"fork":false,"pushed_at":"2025-04-24T14:21:37.000Z","size":224,"stargazers_count":1415,"open_issues_count":53,"forks_count":135,"subscribers_count":23,"default_branch":"main","last_synced_at":"2025-04-24T14:24:17.366Z","etag":null,"topics":["clusters","python","slurm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookincubator.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-24T07:41:09.000Z","updated_at":"2025-04-24T13:04:11.000Z","dependencies_parsed_at":"2023-10-04T14:57:41.568Z","dependency_job_id":"149cb7a9-f92f-465c-9be2-60f443b4aee8","html_url":"https://github.com/facebookincubator/submitit","commit_stats":{"total_commits":121,"total_committers":24,"mean_commits":5.041666666666667,"dds":0.6446280991735538,"last_synced_commit":"4cf1462d7216f9dcc530daeb703ce07c37cf9d72"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookincubator%2Fsubmitit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookincubator%2Fsubmitit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookincubator%2Fsubmitit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookincubator%2Fsubmitit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookincubator","download_url":"https://codeload.github.com/facebookincubator/submitit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254101615,"owners_count":22014909,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clusters","python","slurm"],"created_at":"2024-07-31T19:01:20.169Z","updated_at":"2025-05-14T08:06:09.018Z","avatar_url":"https://github.com/facebookincubator.png","language":"Python","readme":"[![CircleCI](https://circleci.com/gh/facebookincubator/submitit.svg?style=svg)](https://circleci.com/gh/facebookincubator/workflows/submitit)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Pypi](https://img.shields.io/pypi/v/submitit)](https://pypi.org/project/submitit/)\n[![conda-forge](https://img.shields.io/conda/vn/conda-forge/submitit)](https://anaconda.org/conda-forge/submitit)\n# Submit it!\n\n## What is submitit?\n\nSubmitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster.\nIt basically wraps submission and provide access to results, logs and more.\n[Slurm](https://slurm.schedmd.com/quickstart.html) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.\nSubmitit allows to switch seamlessly between executing on Slurm or locally.\n\n### An example is worth a thousand words: performing an addition\n\nFrom inside an environment with `submitit` installed:\n\n```python\nimport submitit\n\ndef add(a, b):\n    return a + b\n\n# executor is the submission interface (logs are dumped in the folder)\nexecutor = submitit.AutoExecutor(folder=\"log_test\")\n# set timeout in min, and partition for running the job\nexecutor.update_parameters(timeout_min=1, slurm_partition=\"dev\")\njob = executor.submit(add, 5, 7)  # will compute add(5, 7)\nprint(job.job_id)  # ID of your job\n\noutput = job.result()  # waits for completion and returns output\nassert output == 12  # 5 + 7 = 12...  your addition was computed in the cluster\n```\n\nThe `Job` class also provides tools for reading the log files (`job.stdout()` and `job.stderr()`).\n\nIf what you want to run is a command, turn it into a Python function using `submitit.helpers.CommandFunction`, then submit it.\nBy default stdout is silenced in `CommandFunction`, but it can be unsilenced with `verbose=True`.\n\n**Find more examples [here](docs/examples.md)!!!**\n\nSubmitit is a Python 3.8+ toolbox for submitting jobs to Slurm.\nIt aims at running python function from python code.\n\n\n## Install\n\nQuick install, in a virtualenv/conda environment where `pip` is installed (check `which pip`):\n- stable release:\n  ```\n  pip install submitit\n  ```\n- stable release using __conda__:\n  ```\n  conda install -c conda-forge submitit\n  ```\n- main branch:\n  ```\n  pip install git+https://github.com/facebookincubator/submitit@main#egg=submitit\n  ```\n\nYou can try running the [MNIST example](docs/mnist.py) to check that everything is working as expected (requires sklearn).\n\n\n## Documentation\n\nSee the following pages for more detailled information:\n\n- [Examples](docs/examples.md): for a bunch of examples dealing with errors, concurrency, multi-tasking etc...\n- [Structure and main objects](docs/structure.md): to get a better understanding of how `submitit` works, which files are created for each job, and the main objects you will interact with.\n- [Checkpointing](docs/checkpointing.md): to understand how you can configure your job to get checkpointed when preempted and/or timed-out.\n- [Tips and caveats](docs/tips.md): for a bunch of information that can be handy when working with `submitit`.\n- [Hyperparameter search with nevergrad](docs/nevergrad.md): basic example of `nevergrad` usage and how it interfaces with `submitit`.\n\n\n### Goals\n\nThe aim of this Python3 package is to be able to launch jobs on Slurm painlessly from *inside Python*, using the same submission and job patterns than the standard library package `concurrent.futures`:\n\nHere are a few benefits of using this lightweight package:\n - submit any function, even lambda and script-defined functions.\n - raises an error with stack trace if the job failed.\n - requeue preempted jobs (Slurm only)\n - swap between `submitit` executor and one of `concurrent.futures` executors in a line, so that it is easy to run your code either on slurm, or locally with multithreading for instance.\n - checkpoints stateful callables when preempted or timed-out and requeue from current state (advanced feature).\n - easy access to task local/global rank for multi-nodes/tasks jobs.\n - same code can work for different clusters thanks to a plugin system.\n\nSubmitit is used by FAIR researchers on the FAIR cluster.\nThe defaults are chosen to make their life easier, and might not be ideal for every cluster.\n\n### Non-goals\n\n- a commandline tool for running slurm jobs. Here, everything happens inside Python. To this end, you can however use [Hydra](https://hydra.cc/)'s [submitit plugin](https://hydra.cc/docs/next/plugins/submitit_launcher) (version \u003e= 1.0.0).\n- a task queue, this only implements the ability to launch tasks, but does not schedule them in any way.\n- being used in Python2! This is a Python3.8+ only package :)\n\n\n### Comparison with dask.distributed\n\n[`dask`](https://distributed.dask.org/en/latest/) is a nice framework for distributed computing. `dask.distributed` provides the same `concurrent.futures` executor API as `submitit`:\n\n```python\nfrom distributed import Client\nfrom dask_jobqueue import SLURMCluster\ncluster = SLURMCluster(processes=1, cores=2, memory=\"2GB\")\ncluster.scale(2)  # this may take a few seconds to launch\nexecutor = Client(cluster)\nexecutor.submit(...)\n```\n\nThe key difference with `submitit` is that `dask.distributed` distributes the jobs to a pool of workers (see the `cluster` variable above) while `submitit` jobs are directly jobs on the cluster. In that sense `submitit` is a lower level interface than `dask.distributed` and you get more direct control over your jobs, including individual `stdout` and `stderr`, and possibly checkpointing in case of preemption and timeout. On the other hand, you should avoid submitting multiple small tasks with `submitit`, which would create many independent jobs and possibly overload the cluster, while you can do it without any problem through `dask.distributed`.\n\n\n## Contributors\n\nBy chronological order: Jérémy Rapin, Louis Martin, Lowik Chanussot, Lucas Hosseini, Fabio Petroni, Francisco Massa, Guillaume Wenzek, Thibaut Lavril, Vinayak Tantia, Andrea Vedaldi, Max Nickel, Quentin Duval (feel free to [contribute](.github/CONTRIBUTING.md) and add your name ;) )\n\n## License\n\nSubmitit is released under the [MIT License](LICENSE).\n","funding_links":[],"categories":["Production Level","Python","分布式机器学习"],"sub_categories":["Loss Functions","Problems"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookincubator%2Fsubmitit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookincubator%2Fsubmitit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookincubator%2Fsubmitit/lists"}