https://github.com/kahsolt/neural-classifier-prototypes
Evolving from a pure-color image or random noise towards class-wise inherent prototypes for a neural classifier
https://github.com/kahsolt/neural-classifier-prototypes
Last synced: 11 months ago
JSON representation
Evolving from a pure-color image or random noise towards class-wise inherent prototypes for a neural classifier
- Host: GitHub
- URL: https://github.com/kahsolt/neural-classifier-prototypes
- Owner: Kahsolt
- License: mit
- Created: 2022-10-10T09:16:56.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-21T12:30:06.000Z (about 3 years ago)
- Last Synced: 2025-01-03T16:29:00.681Z (about 1 year ago)
- Language: Python
- Size: 38.1 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Neural-Classifier-Prototypes
Evolving from a pure-color image or random noise towards class-wise inherent prototypes for a neural classifier
----
For a well-trained neural classifier `f`, we call an input `x` to be a prototype of class `y` iff. `f(x) == y st. d(f(x)) / d(x) == 0`.
Which means this input `x` is a extremum of `f`, causing zero gradient on `f`, and reasonably correctly classified by `f` (with the highest confidence).
Due to the inherent nature of NN models, a 1000-class classifier may have much more than 1000 class-wise prototypes, constrasting with centroids-based clustering models.
Even though, there are much much moooore inputs that are NOT exactly equal to any prototype, and they will cause a non-zero gradient on `f`.
An adversarial attacker usually seeks along the direction of max gradient `max. d(f(x)) / d(x)` for a small perturbation `dx` letting `f(x + dx) != y`, using the naive FGSM or PDG method.
We now consider about the opposite: for any given input `x`, find its closest class-wise prototypes inherent of `f`.
In addition, we are quite curious about this: when `x` starts evolving/alientnating from a total blank or guassian noise, **to what degree** will `f` start to recognize it (with a high confidence), is that generated texture also recognizable for a human-being?
Yeah, you might've got it - we are treating adversarial attacks as kind of generative model, and the frozen classifier is its discriminator counterpart ;)
### Experiments
#### Alienate from blank / pure color image
Occupying any label-ignorant gradient-based adversarial attack (we try `PGD`, `MIFGSM` and `PGDL2`), for each input `x` towards then given classifier `f`:
- Find perturbations `dx_{i}` st. `f(c + dx_{i}) == y_{i} st. d(f(x + dx_{i})) / d(x + dx_{i}) -> 0`, where `c` is a const
- Find perturbations `dx_{i}` st. `f(c + dx_{i}) == y_{i} st. | d(f(x + dx_{i})) / d(x + dx_{i}) | -> inf`
We test model `f=resnet18` (pretrained over the whole ImageNet), victim dataset `X=imagenet-1k`, attack setting `atk=('pgd', 0.1, 0.001)`
#### Alienate from random noise
For each input `x` towards then given classifier `f`:
- Find perturbations `dx_{i}` st. `f(r + dx_{i}) == y_{i} st. d(f(x + dx_{i})) / d(x + dx_{i}) -> 0`, where `r` follow some kind of stochastic distribution
- Find perturbations `dx_{i}` st. `f(r + dx_{i}) == y_{i} st. | d(f(x + dx_{i})) / d(x + dx_{i}) | -> inf`
We test model `f=resnet18` (pretrained over the whole ImageNet), victim dataset `X=imagenet-1k`, attack setting `atk=('pgd', 0.1, 0.001)`
#### Alienate from a given picture
- Find perturbations `dx_{i}` st. `f(x + dx_{i}) == y_{i} st. d(f(x + dx_{i})) / d(x + dx_{i}) -> 0`, where `i` enumerates over all target classes of `f`
- Find perturbations `dx_{i}` st. `f(x + dx_{i}) == y_{i} st. | d(f(x + dx_{i})) / d(x + dx_{i}) | -> inf`
We test model `f=resnet18` (pretrained over the whole ImageNet), victim dataset `X=imagenet-1k`, attack setting `atk=('pgd', 0.1, 0.001)`
```
log\_--__e_a.pkl
resnet18_imagenet-min-svhn_pgd_e3e-2_a1e-3
```
----
by Armit
2022/10/09