https://github.com/lartpang/runit

A simple program scheduler for your code on different devices.
https://github.com/lartpang/runit

deeplearning-tool multi-gpu-scheduler multi-process multi-process-scheduler python python3 scheduler scheduler-tool single-file-scheduler tool utility

Last synced: 6 months ago
JSON representation

A simple program scheduler for your code on different devices.

Host: GitHub
URL: https://github.com/lartpang/runit
Owner: lartpang
License: mit
Created: 2021-06-13T12:54:59.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2024-08-15T04:29:17.000Z (11 months ago)
Last Synced: 2024-12-28T23:32:37.193Z (7 months ago)
Topics: deeplearning-tool, multi-gpu-scheduler, multi-process, multi-process-scheduler, python, python3, scheduler, scheduler-tool, single-file-scheduler, tool, utility
Language: Python
Homepage:
Size: 34.2 KB
Stars: 11
Watchers: 3
Forks: 1
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # RunIt

> [!NOTE]

> This tool still has some limitations.

> If you encounter any problems in use, please feel free to ask.

A simple program scheduler for your code on different devices.

Let the machine move!

Putting the machine into sleep is a disrespect for time.

## Usage

> [!note]

>

> 2024-8-14: Now, the config file contains the information of your GPUs and jobs, more details can be found in [config.py](./examples/config.py).

### Dependency

- PyYAML==6.0

- nvidia-ml-py (`pynvml` only for `runit_based_on_detected_memory.py`)

### Scripts

We provides 3 scripts for different ways to run jobs.

- `runit_with_exclusive_gpu.py`: One GPU can only be used by one job at a time.

- `runit_based_on_memory`：One GPU can be used by many job at a time based on the memory usage.

- `runit_based_on_detected_memory.py`: Use `pynvml` for detecting the total memory usage of each GPU. *But this may not be suitable for scenarios where the memory used by a running GPU application is unstable.*

## demo

```shell

$ python run_it.py --config ./examples/config.yaml

$ python run_it.py --max-workers 3 --config ./examples/config.yaml

```

```mermaid

graph TD

    A[Start] --> B[Read Configuration and Command Pool]

    B --> C[Initialize Shared Resources]

    C --> |Maximum number of requirements met| D[Loop Until All Jobs Done]

    D --> E[Check Available GPUs]

    E -->|Enough GPUs| F[Run Job in Separate Process]

    E -->|Not Enough GPUs| G[Wait and Retry]

    F --> H[Job Completes]

    F --> I[Job Fails]

    H --> J[Update Job Status and Return GPUs]

    I --> J

    G --> D

    J -->|All Jobs Done| K[End]

    C -->|Maximum number of requirements not met| L[Terminate Workers]

    L --> M[Shutdown Manager and Join Pool]

    M --> K

```

## Thanks

- [@BitCalSaul](https://github.com/BitCalSaul): Thanks for the positive feedbacks!

  - 

  - 

  - 

- https://www.jb51.net/article/142787.htm

- https://docs.python.org/zh-cn/3/library/subprocess.html

- https://stackoverflow.com/a/23616229

- https://stackoverflow.com/a/14533902

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lartpang/runit

Awesome Lists containing this project

README