https://github.com/lartpang/zoomnext
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection (TPAMI 2024)
https://github.com/lartpang/zoomnext
camouflage-detection camouflaged-object-detection camouflaged-target-detection image-camouflaged-object-detection image-video-unified-model unified-model video-camouflaged-object-detection
Last synced: 4 months ago
JSON representation
ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection (TPAMI 2024)
- Host: GitHub
- URL: https://github.com/lartpang/zoomnext
- Owner: lartpang
- Created: 2023-10-30T07:15:08.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-13T06:54:17.000Z (about 1 year ago)
- Last Synced: 2024-12-28T23:32:40.411Z (about 1 year ago)
- Topics: camouflage-detection, camouflaged-object-detection, camouflaged-target-detection, image-camouflaged-object-detection, image-video-unified-model, unified-model, video-camouflaged-object-detection
- Language: Python
- Homepage: https://arxiv.org/abs/2310.20208
- Size: 77.1 KB
- Stars: 39
- Watchers: 6
- Forks: 6
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection (TPAMI 2024)
[](https://paperswithcode.com/sota/camouflaged-object-segmentation-on-camo?p=zoomnext-a-unified-collaborative-pyramid) [](https://paperswithcode.com/sota/camouflaged-object-segmentation-on-chameleon?p=zoomnext-a-unified-collaborative-pyramid) [](https://paperswithcode.com/sota/camouflaged-object-segmentation-on-cod?p=zoomnext-a-unified-collaborative-pyramid) [](https://paperswithcode.com/sota/camouflaged-object-segmentation-on-nc4k?p=zoomnext-a-unified-collaborative-pyramid) [](https://paperswithcode.com/sota/camouflaged-object-segmentation-on-moca-mask?p=zoomnext-a-unified-collaborative-pyramid) [](https://paperswithcode.com/sota/camouflaged-object-segmentation-on?p=zoomnext-a-unified-collaborative-pyramid)
```bibtex
@ARTICLE {ZoomNeXt,
title = {ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection},
author ={Youwei Pang and Xiaoqi Zhao and Tian-Zhu Xiang and Lihe Zhang and Huchuan Lu},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = {2024},
doi = {10.1109/TPAMI.2024.3417329},
}
```
## Weights and Results
See [Google Drive](https://drive.google.com/drive/folders/1Hp3GIqossOrJYs3bRzJICujMKbZy4WxO?usp=drive_link).
### Performance
| Backbone | CAMO-TE | | | CHAMELEON | | | COD10K-TE | | | NC4K | | |
| --------------- | ------- | -------------------- | ----- | --------- | -------------------- | ----- | --------- | -------------------- | ----- | ----- | -------------------- | ----- |
| | $S_m$ | $F^{\omega}_{\beta}$ | MAE | $S_m$ | $F^{\omega}_{\beta}$ | MAE | $S_m$ | $F^{\omega}_{\beta}$ | MAE | $S_m$ | $F^{\omega}_{\beta}$ | MAE |
| ResNet-50 | 0.833 | 0.774 | 0.065 | 0.908 | 0.858 | 0.021 | 0.861 | 0.768 | 0.026 | 0.874 | 0.816 | 0.037 |
| EfficientNet-B1 | 0.848 | 0.803 | 0.056 | 0.916 | 0.870 | 0.020 | 0.863 | 0.773 | 0.024 | 0.876 | 0.823 | 0.036 |
| EfficientNet-B4 | 0.867 | 0.824 | 0.046 | 0.911 | 0.865 | 0.020 | 0.875 | 0.797 | 0.021 | 0.884 | 0.837 | 0.032 |
| PVTv2-B2 | 0.874 | 0.839 | 0.047 | 0.922 | 0.884 | 0.017 | 0.887 | 0.818 | 0.019 | 0.892 | 0.852 | 0.030 |
| PVTv2-B3 | 0.885 | 0.854 | 0.042 | 0.927 | 0.898 | 0.017 | 0.895 | 0.829 | 0.018 | 0.900 | 0.861 | 0.028 |
| PVTv2-B4 | 0.888 | 0.859 | 0.040 | 0.925 | 0.897 | 0.016 | 0.898 | 0.838 | 0.017 | 0.900 | 0.865 | 0.028 |
| PVTv2-B5 | 0.889 | 0.857 | 0.041 | 0.924 | 0.885 | 0.018 | 0.898 | 0.827 | 0.018 | 0.903 | 0.863 | 0.028 |
| Backbone | CAD | | | | | MoCA-Mask-TE | | | | |
| -------------- | ----- | -------------------- | ----- | ----- | ----- | ------------ | -------------------- | ----- | ----- | ----- |
| | $S_m$ | $F^{\omega}_{\beta}$ | MAE | mDice | mIoU | $S_m$ | $F^{\omega}_{\beta}$ | MAE | mDice | mIoU |
| PVTv2-B5 (T=5) | 0.757 | 0.593 | 0.020 | 0.599 | 0.510 | 0.734 | 0.476 | 0.010 | 0.497 | 0.422 |
## Prepare Data
> [!note]
>
> CAD dataset can be found at https://drive.google.com/file/d/1XhrC6NSekGOAAM7osLne3p46pj1tLFdI/view?usp=sharing
>
> Based on the following data setup, the performance of the VCOD dataset evaluated directly using the training script is now consistent with the paper.
Set all dataset information to the `dataset.yaml` as follows.
Example of the config file (dataset.yaml):
```yaml
# VCOD Datasets
moca_mask_tr:
{
root: "YOUR-VCOD-DATASETS-ROOT/MoCA-Mask/MoCA_Video/TrainDataset_per_sq",
image: { path: "*/Imgs", suffix: ".jpg" },
mask: { path: "*/GT", suffix: ".png" },
start_idx: 0,
end_idx: 0
}
moca_mask_te:
{
root: "YOUR-VCOD-DATASETS-ROOT/MoCA-Mask/MoCA_Video/TestDataset_per_sq",
image: { path: "*/Imgs", suffix: ".jpg" },
mask: { path: "*/GT", suffix: ".png" },
start_idx: 0,
end_idx: -2
}
cad:
{
root: "YOUR-VCOD-DATASETS-ROOT/CamouflagedAnimalDataset",
image: { path: "original_data/*/frames", suffix: ".png" },
mask: { path: "converted_mask/*/groundtruth", suffix: ".png" },
start_idx: 0,
end_idx: 0
}
# ICOD Datasets
cod10k_tr:
{
root: "YOUR-ICOD-DATASETS-ROOT/Train/COD10K-TR",
image: { path: "Image", suffix: ".jpg" },
mask: { path: "Mask", suffix: ".png" },
}
camo_tr:
{
root: "YOUR-ICOD-DATASETS-ROOT/Train/CAMO-TR",
image: { path: "Image", suffix: ".jpg" },
mask: { path: "Mask", suffix: ".png" },
}
cod10k_te:
{
root: "YOUR-ICOD-DATASETS-ROOT/Test/COD10K-TE",
image: { path: "Image", suffix: ".jpg" },
mask: { path: "Mask", suffix: ".png" },
}
camo_te:
{
root: "YOUR-ICOD-DATASETS-ROOT/Test/CAMO-TE",
image: { path: "Image", suffix: ".jpg" },
mask: { path: "Mask", suffix: ".png" },
}
chameleon:
{
root: "YOUR-ICOD-DATASETS-ROOT/Test/CHAMELEON",
image: { path: "Image", suffix: ".jpg" },
mask: { path: "Mask", suffix: ".png" },
}
nc4k:
{
root: "YOUR-ICOD-DATASETS-ROOT/Test/NC4K",
image: { path: "Imgs", suffix: ".jpg" },
mask: { path: "GT", suffix: ".png" },
}
```
## Install Requirements
* torch==2.1.2
* torchvision==0.16.2
* Others: `pip install -r requirements.txt`
## Evaluation
```shell
# ICOD
python main_for_image.py --config configs/icod_train.py --model-name --evaluate --load-from
# VCOD
python main_for_video.py --config configs/vcod_finetune.py --model-name --evaluate --load-from
```
## Training
### Image Camouflaged Object Detection
```shell
python main_for_image.py --config configs/icod_train.py --pretrained --model-name EffB1_ZoomNeXt
python main_for_image.py --config configs/icod_train.py --pretrained --model-name EffB4_ZoomNeXt
python main_for_image.py --config configs/icod_train.py --pretrained --model-name PvtV2B2_ZoomNeXt
python main_for_image.py --config configs/icod_train.py --pretrained --model-name PvtV2B3_ZoomNeXt
python main_for_image.py --config configs/icod_train.py --pretrained --model-name PvtV2B4_ZoomNeXt
python main_for_image.py --config configs/icod_train.py --pretrained --model-name PvtV2B5_ZoomNeXt
python main_for_image.py --config configs/icod_train.py --pretrained --model-name RN50_ZoomNeXt
```
### Video Camouflaged Object Detection
1. Pretrain on COD10K-TR: `python main_for_image.py --config configs/icod_pretrain.py --info pretrain --model-name PvtV2B5_ZoomNeXt --pretrained`
2. Finetune on MoCA-Mask-TR: `python main_for_video.py --config configs/vcod_finetune.py --info finetune --model-name videoPvtV2B5_ZoomNeXt --load-from `
> [!note]
> If you meets the OOM problem, you can try to reduce the batch size or switch on the `--use-checkpoint` flag:
> `python main_for_image.py/main_for_video.py --use-checkpoint`