https://github.com/coincheung/mfm
code for paper "Masked Frequency Modeling for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)
https://github.com/coincheung/mfm
fft frequency mfm pretrain self-supervised-learning ssl
Last synced: 5 months ago
JSON representation
code for paper "Masked Frequency Modeling for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)
- Host: GitHub
- URL: https://github.com/coincheung/mfm
- Owner: CoinCheung
- License: mit
- Created: 2022-07-27T06:51:25.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2023-02-03T04:46:08.000Z (over 2 years ago)
- Last Synced: 2025-05-07T11:56:46.642Z (5 months ago)
- Topics: fft, frequency, mfm, pretrain, self-supervised-learning, ssl
- Language: Python
- Homepage:
- Size: 276 KB
- Stars: 24
- Watchers: 2
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MFM
Unofficial code for paper "Masked Feature Prediction for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)Below are experiments with resnet50. Though better result is achieved, it seems that the baseline is also much higher than in paper.
top-1 acc
pretrain
finetunepaper scratch
78.1
-
-paper mfm pretrain
78.5
-
-scratch
78.542
-
linksupervised pretrain
78.942
-
link**Note**: Supervised pretrain means finetune from torchvision resnet weights (by setting `pretrained=True`). It seems that supervised pretrain is better than the proposed mfm pretrain.
## Platform
* pytorch 1.13.1
* torchvision 0.14.1
* dali 1.21.0
* cuda 11.6
* V100 GPU(32G) x 8
* driver: 470.82.01## Dataset
Prepare imagenet val set in same method as pytorch official classification [example](https://github.com/pytorch/examples/tree/main/imagenet), and then link them to the folder of this repo:
```
$ mkdir -p imagenet
$ ln -s /path/to/imagenet/train ./imagenet/train
$ ln -s /path/to/imagenet/val ./imagenet/val
```## Train
Pretraining and finetuning Command is [here](./dist_train.sh).## More ablations
Here are some points that affects the results:1. finetune `--val-resize-size`
When we eval the model after finetuning, we always resize the short side of the image to a fixed value before a center crop operation. Here I find sometimes the value of fixed short side size affects the acc by a noticeable margin. Take the "supervised pretrain" as example:
val-resize-size
234
235
236
top-1 acc
78.856
78.942
78.794
2. finetune with bce loss is important
We can see this by finetuning from scratch with CE(cross entropy) loss and BCE(binary cross entropy) loss, the result is:
loss
CE
BCE
top-1 acc
78.542
78.952
3. pretrain random crop area
We usually crop a part of the image with certain area ratio from the original image, and the default value of this ratio is `0.08-1.0` with torchvision `RandomResizedCrop`. Different self-supervised learning methods tend to prefer different random area ratios. For example, MAE uses `0.2-1.0`, MAE3d uses `0.5-1.0`, and SimMIM uses `0.67-1.0`. Here I find a smaller lower bound of `0.2-1.0` is better:
random area ratio
0.67-1.0
0.2-1.0
0.1-1.0
top-1 acc
78.770
78.826
78.842
Though here `0.1-1.0` is better than `0.2-1.0`, I still use the latter, since, with `0.1-1.0`, the finetuning eval result is more affacted by `val-resize-size`:
val-resize-size
234
235
236
0.2-1.0
78.816
78.826
78.796
0.1-1.0
78.730
78.842
78.738
4. model variance
Here I pretrain the model for 4 times(2 on 8 v100 gpu, and 2 on 8 p40 gpu) with identical configuration. Then I finetune 3 times for each of the pretrained model(with 8 p40). Results are listed below. We can see that the results varies between a big margin. Maybe the above good results are brought by a good luck. Hence, I cannot say that I have certainly reproduced the results in the paper now.
pretrain
finetune
acc1(235)
mean/std
round 1
round 1
78.654
78.644/0.024
78.621/0.08
round 2
78.61
round 3
78.668
round 2
round 1
78.646
78.642/0.122
round 2
78.79
round 3
78.49
round 3
round 1
78.516
78.612/0.073
round 2
78.626
round 3
78.694
round 4
round 1
78.608
78.584/0.080
round 2
78.668
round 3
78.476