https://github.com/anto18671/efficientvit-b4.r256

Pretraining the EfficientViT-B4 model on the ImageNet-1k dataset
https://github.com/anto18671/efficientvit-b4.r256

computer-vision efficientvit imagenet-1k pretraining vision-transformer

Last synced: 9 months ago
JSON representation

Pretraining the EfficientViT-B4 model on the ImageNet-1k dataset

Host: GitHub
URL: https://github.com/anto18671/efficientvit-b4.r256
Owner: anto18671
License: mit
Created: 2024-10-06T16:49:13.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-06T20:30:29.000Z (over 1 year ago)
Last Synced: 2025-10-04T05:52:58.658Z (9 months ago)
Topics: computer-vision, efficientvit, imagenet-1k, pretraining, vision-transformer
Language: Python
Homepage:
Size: 11.7 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# EfficientViT-B4 Pretraining on ImageNet-1k

This repository contains the code and configuration for pretraining the **EfficientViT-B4** model on the **ImageNet-1k** dataset. The model is designed for efficient vision processing with optimized performance and resource utilization.

## Installation

Clone the repository and install the required dependencies:

```bash
git clone https://github.com/anto18671/efficientvit-b4.r256.git
cd efficientvit-b4.r256
pip install -r requirements.txt
```

The dependencies include:
- **PyTorch**
- **torchvision**
- **timm** (PyTorch Image Models)
- **Hugging Face `datasets`**
- **torchsummary**
- **tqdm**

## Dataset

The pretraining uses the **ImageNet-1k** dataset, which consists of 1.2 million images across 1000 categories. The dataset is automatically loaded using Hugging Face's `datasets` library.

## Pretraining

To start the pretraining process, make sure you have the following prerequisites:

### Prerequisites
1. **GPU Support**: The pretraining is optimized to run on systems with NVIDIA GPUs. Ensure CUDA and the necessary drivers are installed on your machine.
- CUDA Version: 12.4 (or compatible version)
- CuDNN: Version 9

2. **Environment Setup**:
- Ensure the correct version of **PyTorch** with GPU support is installed.
- Your system should have enough GPU memory to handle the specified batch size. Modify the batch size if necessary.

3. **Hugging Face Authentication**:
- You will need to authenticate with Hugging Face to access the ImageNet-1k dataset. Set your Hugging Face token in the environment:

```bash
export HUGGINGFACE_TOKEN=
```

### Starting Pretraining

Once the environment is set up, and the GPU is ready, run the `pre.py` script to begin pretraining:

```bash
python pre.py
```

This script will:
- Initialize the **EfficientViT-B4** model.
- Set up the data pipelines with transformations (resizing, augmentation, normalization).
- Configure the optimizer (AdamW) and the learning rate scheduler.
- Start pretraining from scratch or resume from the last saved checkpoint if any.

### Running in a Docker Environment

If you're using Docker for pretraining, follow these steps:

1. **Pull the Docker Image**:

```bash
docker pull ghcr.io/anto18671/efficientvit-b4.r256:latest
```

2. **Run the Docker Container with GPU Support**:

```bash
docker run --gpus all --env HUGGINGFACE_TOKEN= ghcr.io/anto18671/efficientvit-b4.r256:latest
```

Ensure that the Docker setup has GPU support enabled. Use the `--gpus all` flag to allow Docker to utilize the available GPUs.

### Checkpoints

- **Best model**: Automatically saved whenever the validation accuracy improves.
- **Last checkpoint**: Saved at the end of each epoch to allow resuming from the most recent state.

## Model Architecture

The **EfficientViT-B4** model is part of the EfficientViT family, designed for optimal speed and accuracy in vision tasks. This implementation uses custom configuration settings to balance computational efficiency and model performance.

- **Model architecture**: EfficientViT-B4
- **Input size**: 256x256 pixels
- **Pretraining**: The model is trained from scratch, with no initial weights.

## Training Configuration

- **Optimizer**: AdamW with weight decay
- **Learning Rate**: 1e-4 (with exponential decay)
- **Batch Size**: 42 (adjustable based on GPU memory)
- **Gradient Accumulation**: 3 steps to control memory usage
- **Epochs**: 16
- **Data Augmentation**: Resize, Color Jitter, Random Horizontal Flip, and Normalization

## Resume Pretraining

If pretraining is interrupted, the script will automatically resume from the last checkpoint. The model, optimizer, and scheduler states are restored from the latest saved checkpoint.

## Results and Validation

During pretraining, validation is performed at the end of each epoch to evaluate the model's performance. Metrics such as loss and accuracy are logged and tracked.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anto18671/efficientvit-b4.r256

Awesome Lists containing this project

README