https://github.com/braun-steven/torch-utils

Utility functions that reoccur in my PyTorch experiments.
https://github.com/braun-steven/torch-utils

pytorch utility

Last synced: 2 months ago
JSON representation

Utility functions that reoccur in my PyTorch experiments.

Host: GitHub
URL: https://github.com/braun-steven/torch-utils
Owner: braun-steven
License: mit
Created: 2019-06-22T10:58:57.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-09-27T11:12:13.000Z (almost 6 years ago)
Last Synced: 2025-03-31T11:02:13.641Z (3 months ago)
Topics: pytorch, utility
Language: Python
Homepage:
Size: 38.1 KB
Stars: 10
Watchers: 0
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Torch Utilities

This repo contains some utility functions that I constantly reuse when writing experiment code using PyTorch. 

Feel free to contribute your helper functions which you could not live without!

## Utility Functions

The following functions are available:

- `time_delta_now`: Create a human readable time string of the time passed until now, given a timestamp.

- `ensure_dir`: Ensure that a directory exists.

- `count_params`: Count the number of learnable parameters in an `nn.Module` object.

- `generate_run_base_dir`: Generate a base directory for experiment runs.

- `setup_logging`: Setup python logging with the `logging` module. Starts logging to a file and `stdout`

- `set_seed`: Set the seed for python, numpy and torch.

- `set_cuda_device`: Set the `CUDA_VISIBLE_DEVICES` environment variable.

- `make_multi_gpu`: Convert a `nn.Module` to a multi-gpu module.

- `load_args`: Load stored commandline arguments from the `argparse` module.

- `save_args`: Store commandline arguments from `argparse` in a file.

- `clone_args`: Clone an arguments object from `argparse`.

- `plot_samples`: Plot `(x, y)` samples in a grid.

## qdaq: Cuda Experiment Queue

`qdaq` makes running multiple PyTorch or Tensorflow experiments on more than one GPU easy.

The typical use case is e.g. a gridsearch over hyperparameters for the same experiment with a bunch of GPUs available. Usually it would be necessary to manually split the hyperparameter space and start the experiments on different GPUs. This has multiple drawbacks:

- It's tedious to split experiment setups and start them on the appropriate GPU device by hand.

- If 10 experiments are sequentially started on GPU0 and 10 on GPU1 it might happen that GPU0 finishes way earlier while GPU1 is still running with a queue of experiments. This results in unused GPU time on GPU0 (bad load balancing).

The main function is to provide a multiprocessing queue with available cuda devices. The idea is to define a `Job`, e.g. your experiment, create a bunch of jobs with different settings, specify a list of available cuda devices and let the internals handle the rest. In the background each job is put into a queue and grabs a cuda device as soon as it is available.

### Quick Start

The following is a quick example on how to use qdaq:

```python

import torch

from utils.qdaq import Job, start

# Create a job class that implements `run`

class Foo(Job):

    def __init__(self, q):

        # Save some job parameters

        self.q = q

    def run(self, cuda_device_id):

        # Get cuda device

        device = torch.device(f"cuda:{cuda_device_id}")

        # Send data to device

        x = torch.arange(1, 3).to(device)

        # Compute stuff

        y = x.pow(self.q)

        print(f"Cuda device {cuda_device_id}, Exponent {self.q}, Result {y}")

if __name__ == "__main__":

    exps = [Foo(i) for i in range(9)]

    start(exps, [1, 3])

# Output:

# Cuda device 3, Exponent 1, Result tensor([1, 2], device='cuda:3')

# Cuda device 1, Exponent 0, Result tensor([1, 1], device='cuda:1')

# Cuda device 3, Exponent 2, Result tensor([1, 4], device='cuda:3')

# Cuda device 1, Exponent 3, Result tensor([1, 8], device='cuda:1')

# Cuda device 1, Exponent 5, Result tensor([ 1, 32], device='cuda:1')

# Cuda device 3, Exponent 4, Result tensor([ 1, 16], device='cuda:3')

# Cuda device 1, Exponent 6, Result tensor([ 1, 64], device='cuda:1')

# Cuda device 3, Exponent 8, Result tensor([  1, 256], device='cuda:3')

# Cuda device 1, Exponent 7, Result tensor([  1, 128], device='cuda:1')

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/braun-steven/torch-utils

Awesome Lists containing this project

README