Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hachreak/movad

Memory-augmented Online Video Anomaly Detection
https://github.com/hachreak/movad

Last synced: about 7 hours ago
JSON representation

Memory-augmented Online Video Anomaly Detection

Host: GitHub
URL: https://github.com/hachreak/movad
Owner: hachreak
License: gpl-2.0
Created: 2023-01-31T10:07:45.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-04-25T12:40:54.000Z (7 months ago)
Last Synced: 2024-04-25T13:48:45.119Z (7 months ago)
Language: Python
Size: 2.59 MB
Stars: 2
Watchers: 3
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ## [Memory-augmented Online Video Anomaly Detection (MOVAD)](https://doi.org/10.1109/ICASSP48485.2024.10447554)

Official PyTorch implementation of **MOVAD**, paper accepted to ICASSP 2024.

We propose **MOVAD**, a brand new architecture for online (frame-level) video

anomaly detection.

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/memory-augmented-online-video-anomaly/online-video-anomaly-detection-on-detection)](https://paperswithcode.com/sota/online-video-anomaly-detection-on-detection?p=memory-augmented-online-video-anomaly)

![MOVAD Architurecture](images/arch.jpg)

Authors: Leonardo Rossi, Vittorio Bernuzzi, Tomaso Fontanini,

         Massimo Bertozzi, Andrea Prati.

[IMP Lab](http://implab.ce.unipr.it/) -

Dipartimento di Ingegneria e Architettura

University of Parma, Italy

![Video Anomaly Detection with MOVAD](images/video1.gif)

## Abstract

The ability to understand the surrounding scene is of paramount importance for

Autonomous Vehicles (AVs).

This paper presents a system capable to work in an online fashion,

giving an immediate response to the arise of anomalies surrounding the AV,

exploiting only the videos captured by a dash-mounted camera.

Our architecture, called MOVAD, relies on two main modules:

a Short-Term Memory Module to extract information related to the ongoing

action, implemented by a Video Swin Transformer (VST), and a Long-Term Memory

Module injected inside the classifier that considers also remote past

information and action context thanks to the use of a Long-Short Term

Memory (LSTM) network.

The strengths of MOVAD are not only linked to its excellent performance, but

also to its straightforward and modular architecture, trained in a end-to-end

fashion with only RGB frames with as less assumptions as possible, which

makes it easy to implement and play with.

We evaluated the performance of our method on Detection of Traffic Anomaly

(DoTA) dataset, a challenging collection of dash-mounted camera videos of

accidents.

After an extensive ablation study, MOVAD is able to reach an AUC score of

82.17\%, surpassing the current state-of-the-art by $+2.87$ AUC.

## Usage

###  Installation

```bash

$ git clone https://github.com/IMPLabUniPr/movad/tree/movad_vad

$ cd movad

$ wget https://github.com/SwinTransformer/storage/releases/download/v1.0.4/swin_base_patch244_window1677_sthv2.pth -O pretrained/swin_base_patch244_window1677_sthv2.pth

$ conda env create -n movad_env --file environment.yml

$ conda activate movad_env

```

# Download DoTa dataset

Please download from [official website](https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly)

the dataset and save inside `data/dota` directory.

You should obtain the following structure:

```

data/dota

├── annotations

│   ├── 0qfbmt4G8Rw_000306.json

│   ├── 0qfbmt4G8Rw_000435.json

│   ├── 0qfbmt4G8Rw_000602.json

│   ...

├── frames

│   ├── 0qfbmt4G8Rw_000072

│   ├── 0qfbmt4G8Rw_000306

│   ├── 0qfbmt4G8Rw_000435

│   .... 

└── metadata

    ├── metadata_train.json

    ├── metadata_val.json

    ├── train_split.txt

    └── val_split.txt

```

# Download pretrained on DoTA dataset

Open [Release v1.0](https://github.com/IMPLabUniPr/movad/releases/tag/v1.0)

page and download .pt (pretrained) and .pkl (results) file.

Unzip them inside the `output` directory, obtaining the following directories

structure:

```

output/

├── v4_1

│   ├── checkpoints

│   │   └── model-640.pt

│   └── eval

│       └── results-640.pkl

└── v4_2

    ├── checkpoints

    │   └── model-690.pt

    └── eval

        └── results-690.pkl

```

### Train

```bash

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase train --epochs 1000 --epoch -1

```

### Eval

```bash

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase test --epoch 690

```

### Play: generate video

```bash

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase play --epoch 690

```

## Results

### Table 1

Memory modules effectiveness.

| # | Short-term | Long-term | AUC | Conf |

|:---:|:---:|:---:|:---:|:---:|

| 1 |   |   | 66.53 | [conf](cfgs/v0_1.yml) |

| 2 | X |   | 74.46 | [conf](cfgs/v2_3.yml) |

| 3 |   | X | 68.76 | [conf](cfgs/v1_1.yml) |

| 4 | X | X | 79.21 | [conf](cfgs/v1_3.yml) |

### Figure 2

Short-term memory module.

| Name | Conf |

|:---:|:---:|

| NF 1 | [conf](cfgs/v1_1.yml) |

| NF 2 | [conf](cfgs/v1_2.yml) |

| NF 3 | [conf](cfgs/v1_3.yml) |

| NF 4 | [conf](cfgs/v1_4.yml) |

| NF 5 | [conf](cfgs/v1_5.yml) |

### Figure 3

Long-term memory module.

| Name | Conf |

|:---:|:---:|

| w/out LSTM     | [conf](cfgs/v2_1.yml) |

| LSTM (1 cell)  | [conf](cfgs/v2_2.yml) |

| LSTM (2 cells) | [conf](cfgs/v1_3.yml) |

| LSTM (3 cells) | [conf](cfgs/v2_3.yml) |

| LSTM (4 cells) | [conf](cfgs/v2_4.yml) |

### Figure 4

Video clip length (VCL).

| Name | Conf |

|:---:|:---:|

| 4 frames  | [conf](cfgs/v3_1.yml) |

| 8 frames  | [conf](cfgs/v1_3.yml) |

| 12 frames | [conf](cfgs/v3_2.yml) |

| 16 frames | [conf](cfgs/v3_3.yml) |

### Table 2

Comparison with the state of the art.

| # | Method | Input | AUC | Conf |

|:---:|:---:|:---:|:---:|:---:|

| 9  | Our (MOVAD) | RGB (320x240) | 80.09 | [conf](cfgs/v4_1.yml) |

| 10 | Our (MOVAD) | RGB (640x480) | 82.17 | [conf](cfgs/v4_2.yml) |

## License

See [GPL v2](./LICENSE) License.

## Acknowledgement

This research benefits from the HPC (High Performance Computing) facility

of the University of Parma, Italy.

## Citation

If you find our work useful in your research, please cite:

```

@inproceedings{rossi2024memory,

  title={Memory-augmented Online Video Anomaly Detection},

  author={Rossi, Leonardo and Bernuzzi, Vittorio and Fontanini, Tomaso and Bertozzi, Massimo and Prati, Andrea},

  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},

  pages={6590--6594},

  year={2024},

  organization={IEEE}

}

```