Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sigopt/stanford-car-classification
Classifying the Stanford Car dataset using ResNet 50
https://github.com/sigopt/stanford-car-classification
Last synced: 8 days ago
JSON representation
Classifying the Stanford Car dataset using ResNet 50
- Host: GitHub
- URL: https://github.com/sigopt/stanford-car-classification
- Owner: sigopt
- License: mit
- Archived: true
- Created: 2019-03-13T22:30:02.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-08-17T13:14:47.000Z (about 1 year ago)
- Last Synced: 2024-08-01T15:29:50.066Z (3 months ago)
- Language: Python
- Size: 31.3 KB
- Stars: 25
- Watchers: 6
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Classifying the Stanford Cars Dataset
This repository runs hyperparameter optimization on tuning pretrained models from the [PyTorch model zoo](https://pytorch.org/docs/stable/torchvision/models.html) to classify images of cars in the Stanford Cars dataset.
This repository offers the option to tune only the fully connected layer of the pretrained network or fine tune the whole network.
The pretrained models supported are Resnet 18 and ResNet 50 are trained on ImageNet-1000.## Getting Started
### Clone repository
```buildoutcfg
git clone https://github.com/sigopt/stanford-car-classification
mkdir -p stanford-car-classification/data
```
### Download data
The Stanford Cars dataset can be found [here](https://ai.stanford.edu/~jkrause/cars/car_dataset.html).
The dataset includes:
* images of cars (cars_ims.tgz)
* labels (cars_annos.mat)
* devkit including human readable labels (cars_meta.mat)```buildoutcfg
wget http://imagenet.stanford.edu/internal/car196/car_ims.tgz -P stanford-car-classification/data
wget http://imagenet.stanford.edu/internal/car196/cars_annos.mat -P stanford-car-classification/data
wget https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz -P stanford-car-classification/data
```
or using CURL:
```buildoutcfg
curl http://imagenet.stanford.edu/internal/car196/car_ims.tgz -o stanford-car-classification/data/car_ims.tgzcurl http://imagenet.stanford.edu/internal/car196/cars_annos.mat -o stanford-car-classification/data/cars_annos.mat
curl https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz -o stanford-car-classification/data/car_devkit.tgz
```
Unzip folders:
```buildoutcfg
tar -C stanford-car-classification/data -xzvf stanford-car-classification/data/car_ims.tgz
tar -C stanford-car-classification/data -xzvf stanford-car-classification/data/car_devkit.tgzOnly keeping the meta file in the devkit:
mv ./stanford-car-classification/data/devkit/cars_meta.mat ./stanford-car-classification/data
rm -r ./stanford-car-classification/data/devkit```
## Virtual environment set up
### Creating a new virtualenv
#### Installing pip3
MacOS:
```
brew install python3
```Ubuntu:
```
sudo apt-get update
sudo apt-get -y install python3-pip```
#### Installing virtualenv
```buildoutcfg
pip3 install virtualenv```
#### Set up new virtualenv
```buildoutcfg
python3 -m virtualenv [PATH TO VIRUALENV]example:
python3 -m virtualenv ./stanford-car-classification-venv
```
#### Installing requirements in virtualenvironmentUse the `stanford_cars_venv_requirements.txt` file in this repository to install requirements to your virtualenvironment.
```buildoutcfg
source [PATH TO VIRTUALENV]/bin/activate (ex: ./stanford-car-classification-venv/bin/activate)pip3 install -r stanford_cars_venv_requirements.txt
```
## Tuning Pre-trained ResNet Models
### CommandLine Interface
```
python run_resnet_training_cli.py --path_images
--path_data
--path_labels
[--path_model_checkpoint ]
[--checkpoint_frequency , default: No checkpointing]
--model {ResNet18 | ResNet50}
--epochs
--validation_frequency
--number_of_classes
--data_subset
--learning_rate_scheduler
--batch_size
--weight_decay
--momentum
--learning_rate
--scheduler_rate
{--nesterov | --no-nesterov} {--freeze_weights | --no-freeze_weights}```
To include Nesterov in the learning: --nesterov must be included
To not include Nesterov: --no-nesterov must be included
To train the fully connected layer: --freeze_weights must be included
To fine tune the whole network: --no-freeze_weights must be included
The data_subset option is used to specify the fraction of the Stanford Cars dataset to use (data_subset = 0.5 means 50% of the data is used).
A 20% validation split is applied to the data used.Logs will be outputted to the working directory.
An output directory with the format `
#### Tuning the Fully Connected Layer Only
Example:
```
source ./stanford-car-classification-venv/bin/activate
python run_resnet_training_cli.py --path_images ./stanford-car-classification/data/ --path_data ./stanford-car-classification/data/cars_annos.mat --path_labels ./stanford-car-classification/data/cars_meta.mat --path_model_checkpoint ./stanford-car-classification --checkpoint_frequency 10 --model ResNet18 --epochs 35 --validation_frequency 10 --number_of_classes 196 --data_subset 1.0 --learning_rate_scheduler 0.2 --batch_size 6 --weight_decay -3 --momentum 0.9 --learning_rate -1 --scheduler_rate 5 --nesterov --freeze_weights```
The above example tunes the fully connected layer of a pretrained ResNet18 with the specified SGD parameters including Nesterov.
#### Fine Tuning the Network
```
source ./stanford-car-classification-venv/bin/activate
python run_resnet_training_cli.py --path_images ./stanford-car-classification/data/ --path_data ./stanford-car-classification/data/cars_annos.mat --path_labels ./stanford-car-classification/data/cars_meta.mat --path_model_checkpoint ./stanford-car-classification --checkpoint_frequency 10 --model ResNet50 --epochs 35 --validation_frequency 10 --number_of_classes 196 --data_subset 1.0 --learning_rate_scheduler 0.2 --batch_size 6 --weight_decay -3 --momentum 0.9 --learning_rate -3 --scheduler_rate 5 --no-nesterov --no-freeze_weights```
The above example fine tunes the whole ResNet50 architecture with the specified SGD parameters and no Nesterov.
## Hyperparameter Optimization for Tuning Pre-trained ResNet Models
The Hyperparameter Optimization (HPO) conducted is a layer on top of the model tuning as explicated above.
### Pre-requistes
The HPO uses [SigOpt](https://sigopt.com/)'s [Multitask](https://app.sigopt.com/docs/overview/multitask) feature as the optimizer as well as [Orchestrate](https://app.sigopt.com/docs/orchestrate) to manage AWS clusters.
In order to be able to run the optimization, please set up your SigOpt account and walk through the [Orchestrate tutorial](https://app.sigopt.com/docs/orchestrate/tutorial/1).
At the end of the tutorial, you should have an Orchestrate specific virtualenvironment which we will use later.### CommandLine Interface
```
python run_resnet_training_cli.py
--path_images
--path_data
--path_labels
[--path_model_checkpoint ]
[--checkpoint_frequency , default: No check-pointing]
--model {ResNet18 | ResNet50}
--epochs
--validation_frequency
--number_of_classes
--data_subset
{--freeze_weights | --no-freeze_weights}```
To run HPO on tuning the fully connected layer: --freeze_weights must be included
To run HPO on fine tuning the whole network: --no-freeze_weights must be included
The data_subset option is used to specify the fraction of the Stanford Cars dataset to use (data_subset = 0.5 means 50% of the data is used).
A 20% validation split is applied to the data used.Logs will be outputted to the working directory. We suggest to not checkpoint during hyperparameter optimization.
SigOpt suggests values for the following hyperparameters:
* batch size
* learning rate
* learning rate scheduler
* scheduler rate
* weight decay
* momentum
* nesterovThe bounds of these hyperparameters are specified in the Orchestrate experiment configuration file `orchestrate_stanford_cars_tuning_config.yml`.
Example:
```buildoutcfg
source ./stanford-car-classification-venv/bin/activate
python run_resnet_training_cli.py --path_images ./stanford-car-classification/data/ --path_data ./stanford-car-classification/data/cars_annos.mat --path_labels ./stanford-car-classification/data/cars_meta.mat --model ResNet18 --epochs 2 --validation_frequency 10 --data_subset 1.0 --number_of_classes 196 --no-freeze_weights
```### Cluster Configuration
As seen in the [Orchestrate tutorial](https://app.sigopt.com/docs/orchestrate/tutorial/1), a cluster configuration file is necessary to deploy a cluster.
The following snippet is an example cluster configuration `orchestrate_cluster_deploy_sample.yml` that deploys 2 p2.xlarge EC2 instances on AWS.```buildoutcfg
# must be a .yml file
# AWS is currently our only supported provider for cluster create
provider: aws# We have provided a name that is short and descriptive
cluster_name: stanford-cars-run-gpu-cluster# Your cluster config can have CPU nodes, GPU nodes, or both.
# The configuration of your nodes is defined in the sections below.# Define GPU compute here
gpu:
# # AWS GPU-enabled instance type
# # This can be any p* instance type
instance_type: p2.xlarge
max_nodes: 2
min_nodes: 2```
To see more options for AWS EC2 instances please read through AWS's EC2 [specification and pricing](https://aws.amazon.com/ec2/pricing/on-demand/).
### Orchestrate Experiment Configuration
The configuration file used to run SigOpt optimization using Orchestrate is in the repository as `orchestrate_stanford_cars_tuning_config.yml`.
Please note how the bounds for the hyperparameters are specified as well as the framework to be used and language.
### To Run The Hyperparameter Optimization
```buildoutcfg
source .//bin/activatesigopt cluster create -f cluster_deploy.yml
sigopt run -f orchestrate_stanford_cars_tuning_config.yml
```
To follow the progression of your job, look at the SigOpt dashboard or the following commands:
Status of all jobs in Orchestrate:
`sigopt status-all`
Status of a single job:
`sigopt status `
Status of a pod in the cluster:
`sigopt kubectl logs -n orchestrate`