Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/christiansassi/deep-learning-project
Project developed by Matteo Beltrami (@matteobeltrami), Pietro Bologna (@bolognapietro), and Christian Sassi for the Deep Learning course.
https://github.com/christiansassi/deep-learning-project
deep-learning memo
Last synced: 2 days ago
JSON representation
Project developed by Matteo Beltrami (@matteobeltrami), Pietro Bologna (@bolognapietro), and Christian Sassi for the Deep Learning course.
- Host: GitHub
- URL: https://github.com/christiansassi/deep-learning-project
- Owner: christiansassi
- Created: 2024-04-18T08:13:06.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-05-22T14:27:18.000Z (9 months ago)
- Last Synced: 2024-05-23T08:57:23.748Z (9 months ago)
- Topics: deep-learning, memo
- Language: Jupyter Notebook
- Homepage:
- Size: 246 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Deep Learning 2024 - Project Assignment
## Introduction
Deep neural networks often suffer from severe performance degradation when tested on images that differ visually from those encountered during training. This degradation is caused by factors such as domain shift, noise, or changes in lighting.
Recent research has focused on domain adaptation techniques to build deep models that can adapt from an annotated source dataset to a target dataset. However, such methods usually require access to downstream training data, which can be challenging to collect.
An alternative approach is **Test-Time Adaptation (TTA)**, which aims to improve the robustness of a pre-trained neural network to a test dataset, potentially by enhancing the network's predictions on one test sample at a time. Two notable TTA methods for image classification are:
- **[Marginal Entropy Minimization with One test point (MEMO)](https://arxiv.org/pdf/2110.09506)**: This method uses pre-trained models directly without making any assumptions about their specific training procedures or architectures, requiring only a single test input for adaptation.
- **[Test-Time Prompt Tuning (TPT)](https://arxiv.org/pdf/2209.07511)**: This method leverages pre-existing models without any assumptions about their specific training methods or architectures, enabling adaptation using only a small set of labeled examples from the target domain.## MEMO
For this project, MEMO was applied to a pretrained Convolutional Neural Network, **ViT-b/16**, using the **ImageNetV2** dataset. This network operates as follows: given a test point $x \in X$, it produces a conditional output distribution $p(y|x; w)$ over a set of classes $Y$, and predicts a label $\hat{y}$ as:$$ \hat{y} = M(x | w) = \arg \max_{y \in Y} p(y | x; w) $$
![]()
Fig. 1 MEMO overviewLet $ A = \{a_1,...,a_M\} $ be a set of augmentations (resizing, cropping, color jittering etc...). Each augmentation $ a_i \in A $ can be applied to an input sample $x$, resulting in a transformed sample denoted as $a_i(x)$, as shown in figure. The objective here is to make the model's prediction invariant to those specific transformations.
MEMO starts by appling a set of $B$ augmentation functions sampled from $A$ to $x$. It then calculates the average, or marginal, output distribution $ \bar{p}(y | x; w) $ by averaging the conditional output distributions over these augmentations, represented as:
$$ \bar{p}(y | x; w) = \frac{1}{B} \sum_{i=1}^B p(y | a_i(x); w) $$
Since the true label $y$ is not available during testing, the objective of Test-Time Adaptation (TTA) is twofold: (i) to ensure that the model's predictions have the same label $y$ across various augmented versions of the test sample, (ii) to increase the confidence in the model's predictions, given that the augmented versions have the same label. To this end, the model is trained to minimize the entropy of the marginal output distribution across augmentations, defined as:
$$ L(w; x) = H(\bar{p}(\cdot | x;w)) = -\sum_{y \in Y} \bar{p}(y | x;w) \text{log} \bar{p}(y | x;w) $$
## How to Run
1. Clone the repository:
```bash
git clone https://github.com/christiansassi/deep-learning-project
cd deep_learning_project
```
2. Upload the notebook `deep_learning.ipynb` on [Google Colab](https://colab.research.google.com/). *NOTE: Make sure you use the T4 GPU*.# Contacts
Matteo Beltrami - [[email protected]](mailto:[email protected])
Pietro Bologna - [[email protected]](mailto:[email protected])
Christian Sassi - [[email protected]](mailto:[email protected])