https://github.com/sayakpaul/big_vision_experiments
Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.
https://github.com/sayakpaul/big_vision_experiments
computer-vision google-cloud image-recognition jax large-scale-pretraining tpu
Last synced: 5 months ago
JSON representation
Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.
- Host: GitHub
- URL: https://github.com/sayakpaul/big_vision_experiments
- Owner: sayakpaul
- License: apache-2.0
- Created: 2022-05-11T11:34:57.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-16T15:43:35.000Z (over 2 years ago)
- Last Synced: 2025-05-05T21:17:06.531Z (5 months ago)
- Topics: computer-vision, google-cloud, image-recognition, jax, large-scale-pretraining, tpu
- Language: Jupyter Notebook
- Homepage:
- Size: 240 KB
- Stars: 22
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Experiments with `big_vision`
Contains my experiments with the [`big_vision`](https://github.com/google-research/big_vision) repository to train ViTs on ImageNet-1k.
## What is `big_vision`?
From the repository:
> This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and TensorFlow Datasets for scalable input pipelines in the Cloud.
> big_vision aims to support research projects at Google. We are unlikely to work on feature requests or accept external contributions, unless they were pre-approved (ask in an issue first).
## Why this repository?
* I really like how `big_vision` is organized into composable modules.
* I wanted to reproduce some of the ImageNet-1k results reported by the `big_vision` authors.
* `big_vision` not only reports scores for ImageNet-1k validation set but also reports
scores for ImageNet-V2 and ImageNet-Real.
* I wanted to run the entire training using Cloud TPUs and at the same time I wanted to
improve my JAX skills.
* I wanted to improve my chops in large-scale pre-training. Large-scale pretraining is a goldmine of
deep learning that continues to benefit downstream applications. Programs like [TRC](https://sites.research.google/trc) make it possible for
the community to learn the nitty-gritty of large-scale pre-training by providing
TPU support.
* For the sheer joy of training models to SoTA.This repository will also contain the trained checkpoints and the training logs. Additionally,
this Colab Notebook ([`notebooks/analyze-metrics.ipynb`](https://colab.research.google.com/github/sayakpaul/big_vision_experiments/blob/main/notebooks/analyze-metrics.ipynb)) takes the raw training logs and generates a plot for reporting accuracies
across three benchmarks: ImageNet-1k validation set, ImageNetV2, ImageNet-Real.Here's one such plot I generated from ViT S/16 checkpoints (that get to 76.23% top-1 accuracy on ImageNet-1k validation set within 90 epochs of training):
![]()
Training was performed on a TPU v3-8 VM that took 7 hours 22 minutes to complete.
The performance of this model is also quite inline with what's reported in [1].
## Checkpoints and training logs* [`vit_s16_imagenet_1k`](https://github.com/sayakpaul/big_vision_experiments/releases/tag/v0.1.0)
## Setup
Even though the `big_vision` repository provides instructions for setting things up I found them a bit incomplete.
Hence, I developed another one. Find it here - [`setup.md`](https://github.com/sayakpaul/big_vision_experiments/blob/main/setup.md).## References
[1] Better plain ViT baselines for ImageNet-1k: https://arxiv.org/abs/2205.01580
## Acknowledgements
* [TRC (TPU Research Cloud)](https://sites.research.google/trc) for providing TPU access.
* [ML-GDE program](https://developers.google.com/programs/experts/) for providing GCP credits.