https://github.com/sayakpaul/big_vision_experiments

Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.
https://github.com/sayakpaul/big_vision_experiments

computer-vision google-cloud image-recognition jax large-scale-pretraining tpu

Last synced: 5 months ago
JSON representation

Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.

Host: GitHub
URL: https://github.com/sayakpaul/big_vision_experiments
Owner: sayakpaul
License: apache-2.0
Created: 2022-05-11T11:34:57.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-01-16T15:43:35.000Z (over 2 years ago)
Last Synced: 2025-05-05T21:17:06.531Z (5 months ago)
Topics: computer-vision, google-cloud, image-recognition, jax, large-scale-pretraining, tpu
Language: Jupyter Notebook
Homepage:
Size: 240 KB
Stars: 22
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Experiments with `big_vision`

Contains my experiments with the [`big_vision`](https://github.com/google-research/big_vision) repository to train ViTs on ImageNet-1k.

## What is `big_vision`?

From the repository:

> This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and TensorFlow Datasets for scalable input pipelines in the Cloud.

> big_vision aims to support research projects at Google. We are unlikely to work on feature requests or accept external contributions, unless they were pre-approved (ask in an issue first).

## Why this repository?

* I really like how `big_vision` is organized into composable modules.
* I wanted to reproduce some of the ImageNet-1k results reported by the `big_vision` authors.
* `big_vision` not only reports scores for ImageNet-1k validation set but also reports
scores for ImageNet-V2 and ImageNet-Real.
* I wanted to run the entire training using Cloud TPUs and at the same time I wanted to
improve my JAX skills.
* I wanted to improve my chops in large-scale pre-training. Large-scale pretraining is a goldmine of
deep learning that continues to benefit downstream applications. Programs like [TRC](https://sites.research.google/trc) make it possible for
the community to learn the nitty-gritty of large-scale pre-training by providing
TPU support.
* For the sheer joy of training models to SoTA.

This repository will also contain the trained checkpoints and the training logs. Additionally,
this Colab Notebook ([`notebooks/analyze-metrics.ipynb`](https://colab.research.google.com/github/sayakpaul/big_vision_experiments/blob/main/notebooks/analyze-metrics.ipynb)) takes the raw training logs and generates a plot for reporting accuracies
across three benchmarks: ImageNet-1k validation set, ImageNetV2, ImageNet-Real.

Here's one such plot I generated from ViT S/16 checkpoints (that get to 76.23% top-1 accuracy on ImageNet-1k validation set within 90 epochs of training):

^{Training was performed on a TPU v3-8 VM that took 7 hours 22 minutes to complete.}

The performance of this model is also quite inline with what's reported in [1].

## Checkpoints and training logs

* [`vit_s16_imagenet_1k`](https://github.com/sayakpaul/big_vision_experiments/releases/tag/v0.1.0)

## Setup

Even though the `big_vision` repository provides instructions for setting things up I found them a bit incomplete.
Hence, I developed another one. Find it here - [`setup.md`](https://github.com/sayakpaul/big_vision_experiments/blob/main/setup.md).

## References

[1] Better plain ViT baselines for ImageNet-1k: https://arxiv.org/abs/2205.01580

## Acknowledgements

* [TRC (TPU Research Cloud)](https://sites.research.google/trc) for providing TPU access.
* [ML-GDE program](https://developers.google.com/programs/experts/) for providing GCP credits.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sayakpaul/big_vision_experiments

Awesome Lists containing this project

README