https://github.com/psteinb/sota_on_uncertainties

trying to obtain uncertainties from training accuracies using timm
https://github.com/psteinb/sota_on_uncertainties

Last synced: 11 months ago
JSON representation

trying to obtain uncertainties from training accuracies using timm

Host: GitHub
URL: https://github.com/psteinb/sota_on_uncertainties
Owner: psteinb
License: bsd-3-clause
Created: 2022-02-09T16:38:40.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2022-06-17T09:11:31.000Z (almost 4 years ago)
Last Synced: 2025-04-12T23:42:00.494Z (about 1 year ago)
Language: Python
Size: 5.19 MB
Stars: 9
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # SOTA with uncertainties

Obtain uncertainties from training accuracies using [timm](https://github.com/rwightman/pytorch-image-models/) using ensembles from an automated workflow. More details are available in [our paper](https://ml-eval.github.io/accepted-papers/#11).

# Required Environment

## instructions for a vanilla python installation

We assume you have some form of GPU available including the required runtime environment available. If not, you can try to execute the workflow on CPU-only hardware. Note though that should you wish to train the networks, running on a CPU-only hardware can be very slow.

Checking the python version:

```bash

python --version

```

should give:

```

3.8.5

```

Set up the `venv`:

```bash

python -m venv 

```

Source the `venv`:

```bash

source /bin/activate

```

## instructions for Jusuf

For development:

```bash

salloc -N 1 -p gpus -A  -t 01:00:00

```

```bash

srun --cpu_bind=none --pty /bin/bash -i

```

Setup the software environment:

```bash

ml add Stages/2020 GCCcore/.10.3.0 CUDA/11.3 Python/3.8.5

```

This will setup the environment to:

```bash

ml

```

this produces:

```

Currently Loaded Modules:

  1) Stages/2020            (S)     9) ncurses/.6.2     (H)  17) util-linux/.2.36    (H)  25) libxml2/.2.9.10  (H)  33) libspatialindex/.1.9.3 (H)

  2) StdEnv/2020                   10) libreadline/.8.0 (H)  18) fontconfig/.2.13.92 (H)  26) libxslt/.1.1.34  (H)  34) NASM/.2.15.03          (H)

  3) GCCcore/.10.3.0        (H)    11) Tcl/8.6.10            19) xorg-macros/.1.19.2 (H)  27) libffi/.3.3      (H)  35) libjpeg-turbo/.2.0.5   (H)

  4) binutils/.2.36.1       (H)    12) SQLite/.3.32.3   (H)  20) libpciaccess/.0.16  (H)  28) libyaml/.0.2.5   (H)  36) Python/3.8.5

  5) zlib/.1.2.11           (H)    13) expat/.2.2.9     (H)  21) X11/20200222             29) Java/15.0.1

  6) nvidia-driver/.default (H,g)  14) libpng/.1.6.37   (H)  22) Tk/.8.6.10          (H)  30) PostgreSQL/12.3

  7) CUDA/11.3              (g)    15) freetype/.2.10.1 (H)  23) GMP/6.2.0                31) protobuf/.3.13.0 (H)

  8) bzip2/.1.0.8           (H)    16) gperf/.3.1       (H)  24) XZ/.5.2.5           (H)  32) gflags/.2.2.2    (H)

  Where:

   S:  Module is Sticky, requires --force to unload or purge

   g:  built for GPU

   H:             Hidden Module

```

Checking the python version:

```bash

python --version

```

should give:

```

3.8.5

```

Set up the `venv`:

```bash

python -m venv 

```

Source the `venv`:

```bash

source /bin/activate

```

For the sake of good practice, update `pip` whenever possible:

```bash

python -m pip install -U pip

```

# Required Python Packages

### Prepare for full training

To prepare the environment and set up `timm` for complete training, we need to install our own pytorch including support for Nvidia GPUs. As `timm` supports exclusively cuda at the moment, we follow suite and require our users to have access to Nvidia GPUs :

```bash

python -m pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113  -f https://download.pytorch.org/whl/cu113/torch_stable.html

```

To conduct training, you need to install the full software stack:

```bash

python -m pip install -r requirements-full.txt

```

**NB** We did not test our workflow in this scenario.

### Prepare for reproducing the figures only

**Note**: If you are not interested to rerun the machine learning traing, you are fine to go without `pytorch` and `timm` such as:

```bash

python -m pip install -r requirements.txt

```

# Running the experiments

## Getting the data

Note, this repo involves 360 1h runs on a Nvidia V100. If you'd like to repeat the experiments, you need to download `imagenette2` the dataset as documented in the timmdocs. Our workflow can do that automatically. Just in case you want to look around first, here is how to obtain it:

```bash

mkdir data

cd data

wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.tgz

tar xf imagenette2-320.tgz

```

## training all models

To run the 360 experiments sequentially, do

```bash

cd /root/to/repo

```

to switch the current working directory. Then run the training:

```bash

snakemake -j1 imagenette2_train

```

Please use the issue tracker to report any shortcomings.

## parallel execution on a cluster

This workflow setup is prepared with a [slurm](https://slurm.schedm.com) cluster in mind. JUSUF at JSC is managed by [slurm](https://slurm.schedm.com). 

Before you do anything, the destination folders for the slurm logs have to be created.

```bash

cd /path/to/repo

mkdir -p outputs/slurm/logs/

```

On Jusuf, you can run all model trainings as

```bash

snakemake -j40 -p --profile config/slurm/jusuf imagenette2_train

```

Note, this will submit `360` jobs in total, but only run `40` jobs at a time. You can only invoke this command from the `venv` described above. If you'd like to run this on another cluster, adjust `config/slurm/jusuf/config.yaml` to your needs (see [slurm profile](https://github.com/Snakemake-Profiles/slurm) for the api documentation of `config.yaml`).

## Inference (default workflow target)

The default workflow target is to run inference on the validation datasets created. You need at least one GPU for this and all `last.pth.tar` model files generated by `timm` in a folder structure which the workflow expects. In other words:

```bash

snakemake -j80 -p --profile config/slurm/jusuf imagenette2_inference_last

```

# Viewing the execution graph

```bash

snakemake -j1 -F --dag results/figures/imagenette2_compare_meanstd_approx.png| dot -Tsvg > ~/imagenette2_compare_meanstd_approx_dag.svg

```

# Citation

```

@article{https://doi.org/10.48550/arxiv.2204.05173,

  doi = {10.48550/ARXIV.2204.05173},

  url = {https://ml-eval.github.io/accepted-papers/#11},

  author = {Steinbach, Peter and Gernhardt, Felicita and Tanveer, Mahnoor and Schmerler, Steve and Starke, Sebastian},

  keywords = {Machine Learning (cs.LG), Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},

  title = {Machine Learning State-of-the-Art with Uncertainties},

  publisher = {arXiv},

  year = {2022},

  copyright = {Creative Commons Attribution 4.0 International},

  note = {ICLR22, ML Evaluation Standards workshop}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/psteinb/sota_on_uncertainties

Awesome Lists containing this project

README