https://github.com/coincheung/mfm

code for paper "Masked Frequency Modeling for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)
https://github.com/coincheung/mfm

fft frequency mfm pretrain self-supervised-learning ssl

Last synced: 5 months ago
JSON representation

code for paper "Masked Frequency Modeling for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)

Host: GitHub
URL: https://github.com/coincheung/mfm
Owner: CoinCheung
License: mit
Created: 2022-07-27T06:51:25.000Z (about 3 years ago)
Default Branch: master
Last Pushed: 2023-02-03T04:46:08.000Z (over 2 years ago)
Last Synced: 2025-05-07T11:56:46.642Z (5 months ago)
Topics: fft, frequency, mfm, pretrain, self-supervised-learning, ssl
Language: Python
Homepage:
Size: 276 KB
Stars: 24
Watchers: 2
Forks: 2
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# MFM
Unofficial code for paper "Masked Feature Prediction for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)

Below are experiments with resnet50. Though better result is achieved, it seems that the baseline is also much higher than in paper.

top-1 acc
pretrain
finetune

paper scratch
78.1
-
-

paper mfm pretrain
78.5
-
-

scratch
78.542
-
link

supervised pretrain
78.942
-
link

mfm pretrain
78.826
link
link

**Note**: Supervised pretrain means finetune from torchvision resnet weights (by setting `pretrained=True`). It seems that supervised pretrain is better than the proposed mfm pretrain.

## Platform

* pytorch 1.13.1
* torchvision 0.14.1
* dali 1.21.0
* cuda 11.6
* V100 GPU(32G) x 8
* driver: 470.82.01

## Dataset
Prepare imagenet val set in same method as pytorch official classification [example](https://github.com/pytorch/examples/tree/main/imagenet), and then link them to the folder of this repo:
```
$ mkdir -p imagenet
$ ln -s /path/to/imagenet/train ./imagenet/train
$ ln -s /path/to/imagenet/val ./imagenet/val
```

## Train
Pretraining and finetuning Command is [here](./dist_train.sh).

## More ablations
Here are some points that affects the results:

1. finetune `--val-resize-size`
When we eval the model after finetuning, we always resize the short side of the image to a fixed value before a center crop operation. Here I find sometimes the value of fixed short side size affects the acc by a noticeable margin. Take the "supervised pretrain" as example:

val-resize-size
234
235
236

top-1 acc
78.856
78.942
78.794

2. finetune with bce loss is important
We can see this by finetuning from scratch with CE(cross entropy) loss and BCE(binary cross entropy) loss, the result is:

loss
CE
BCE

top-1 acc
78.542
78.952

3. pretrain random crop area
We usually crop a part of the image with certain area ratio from the original image, and the default value of this ratio is `0.08-1.0` with torchvision `RandomResizedCrop`. Different self-supervised learning methods tend to prefer different random area ratios. For example, MAE uses `0.2-1.0`, MAE3d uses `0.5-1.0`, and SimMIM uses `0.67-1.0`. Here I find a smaller lower bound of `0.2-1.0` is better:

random area ratio
0.67-1.0
0.2-1.0
0.1-1.0

top-1 acc
78.770
78.826
78.842

Though here `0.1-1.0` is better than `0.2-1.0`, I still use the latter, since, with `0.1-1.0`, the finetuning eval result is more affacted by `val-resize-size`:

val-resize-size
234
235
236

0.2-1.0
78.816
78.826
78.796

0.1-1.0
78.730
78.842
78.738

4. model variance
Here I pretrain the model for 4 times(2 on 8 v100 gpu, and 2 on 8 p40 gpu) with identical configuration. Then I finetune 3 times for each of the pretrained model(with 8 p40). Results are listed below. We can see that the results varies between a big margin. Maybe the above good results are brought by a good luck. Hence, I cannot say that I have certainly reproduced the results in the paper now.

pretrain
finetune
acc1(235)
mean/std

round 1
round 1
78.654
78.644/0.024
78.621/0.08

round 2
78.61

round 3
78.668

round 2
round 1
78.646
78.642/0.122

round 2
78.79

round 3
78.49

round 3
round 1
78.516
78.612/0.073

round 2
78.626

round 3
78.694

round 4
round 1
78.608
78.584/0.080

round 2
78.668

round 3
78.476

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/coincheung/mfm

Awesome Lists containing this project

README