https://github.com/nirabo/carnd-traffic-sign-classifier

Traffic Sign Classification using a CNN
https://github.com/nirabo/carnd-traffic-sign-classifier
Last synced: about 2 months ago
JSON representation
Traffic Sign Classification using a CNN
Host: GitHub
URL: https://github.com/nirabo/carnd-traffic-sign-classifier
Owner: nirabo
Created: 2017-04-04T11:17:23.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2018-01-15T22:51:00.000Z (over 8 years ago)
Last Synced: 2024-06-03T12:05:47.371Z (about 2 years ago)
Language: Jupyter Notebook
Homepage:
Size: 21.3 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # **CarND Project 2: Traffic Sign Recognition**

Author: Lyuboslav Petrov

[snow]: ./data/de_traffic_signs/1_snow.png "General Caution in Snow"

[noentry]: ./data/de_traffic_signs/2_noentry.png "No Entry"

[noentry2]: ./data/de_traffic_signs/3_noentry.png "No Entry drawing"

[roundabout]: ./data/de_traffic_signs/4_roundabout.png "Roundabout"

[limit30]: ./data/de_traffic_signs/5_limit30.png "Speed Limit 30"

[softmax]: ./doc/softmax.png "Softmax probabilities"

[real-world]: ./doc/real-world.png "Real World Accuracy"

[results]: ./doc/results.png "Training Results after 50 Epochs"

[balanced_train]: ./doc/balanced_train.png "Balanced out classes"

[balanced]: ./doc/balanced.png "Balanced out classes"

[perturbations]: ./doc/perturbations.png "Image Perturbations"

[classes]: ./doc/all_classes.png "Examples of all classes"

[distribution]: ./doc/distribution.png "Sample Size Distribution"

---

## Summary

This below outlines the work performed for analyzing the [German Traffic Sign Benchmark Dataset](http://benchmark.ini.rub.de/)  from the Ruhr-University, Bochum, Germany, as

part of the [Self Driving Car Engineer](https://www.udacity.com/) nanodegree from Udacity.

A convolutional neural network was trained with a validation accuracy of ~94% and testing

accuracy of ~95%. Real-world testing with images from the internet showed results

approaching 30% accuracy.

---

## Introduction

One of the main characteristics of a legalized road are its signs and markings.

It is therefore of great interest to the self-driving car research domain to

find accurate, fast and resilient algorithms for image based road sign detection

and classification. This work details the ***classification*** aspect, assuming the

signs were already detected.

---

## Methods

### Data Summary

The dataset consists of labeled images organized in train, test and validation sets:

  * Number of training examples: **34799**

  * Number of testing examples: **4410**

  * Image data shape: **32, 32, 3**

  * Number of classes: **43**

![][classes]

It is evident from the above figure, that there is a great variation of brightness,

contrast and resolution in the data. However, the targets/sings are brought to the

image foreground and populate the centre of every sample with the majority of pixels,

in most cases, belonging to the signs.

The samples per class distribution of all the sets can be seen below:

![][distribution]

As can be seen, the distributions along the different sets are very close, but

the distribution of samples among classes is of great variance. It was therefore

necessary to balance-out the classes by generating *surrogate* data, based on the

existing dataset.

### Pre-processing Steps

Although the images have already undergone preprocessing steps (ROI cropping),

further preprocessing was seen an efficient method for optimizing performance.

#### Class Balancing with Surrogate data

Since sample distribution among classes was seen to be greatly varying, it was

decided to augment the lower-sample-count classes with artificially created data.

Another benefitial aspect in adding perturbations to the data is that in this manner

the network becomes more robust and less likely to overfit.

The methods for creation of this data where all based on perturbing the existing

samples, where the perturbations chosen where:

1. Image **Rotation** by +- 6 to 9 degrees around the image centre

2. Image **Translation** by +- 3 pixels along the x and y axes

3. Image **Affine** transformation

4. Image **Perspective** warping

![][perturbations]

Using these techniques for image generation and class balancing, several balancing

thresholds where tested, namely **median**, **mean** and **max** counts of all samples

among the classes and the **max** threshold was chosen as final.

The results from class balancing can be seen below.

***NOTE:*** Images where converted to float32 and therefore their colorspace

is depicted differently by matplotlib.

![][balanced]

The resulting distribution for the training is shown below, where the total training

set size changed from 34799 to 86429, hence the surrogate data represents **~60%** of

all training data!

![][balanced_train]

#### Grayscale and Normalization

Images where then converted to grayscale and normalized between 0 and 1.

### Network Architecture

Sevaral network architectures were iterated through. First, the LeNet convolutional network was taken and adapted to work with the traffic sign data set - adapting it to 43 categories, instead of 10. On the first iterations it was

observed that the 3 channels of the image do not contribute towards better accuracy and the pre-processing now included not only normalization, but also a colorspace conversion to grayscale. In addition, multiple filter sizes where tested with the LeNet architecture, when the necessity of paramtrization was recognized (see below). Further, two dropout layers were added after the first two Fully-Connected layers which brought the accuracy towards 0.8-0.9. Multiple filter depths were tested, and with filter depths of (64, 128) for the first two convolutional layers, the network reached 0.91 accuracy. A further test was made with addition of a third convolutional layer, where final results came to ~0.95 accuracy. 

Details of the layers dimensions can be found below.

In order to iterate through multiple network architectures, it is necessary

to make the network models parametric, so interdependencies between variables can

be solved dynamically.

Fist, the layer dimensions are sequentially defined. Example:

    layers = {}

    layers.update({

        'c1':{

            'd': n_channels * 9,

            'fx': 5,

            'fy': 5

        }

    })

    layers.update({

        'c2':{

            'd': layers['c1']['d'] * 6,

            'fx': 5,

            'fy': 5

        }   

    })

    layers.update({

        'c3':{

            'd': layers['c2']['d'] * 4,

            'fx': 5,

            'fy': 5

        }   

    })

    layers.update({

        'f0': {

            # Resulting flat size = n_channels * 9 * 6 * 4 = 1 * 9 * 6 * 4 = 216

            'in': layers['c3']['d'],

            'out': 480

        }  

    })

    layers.update({

        'f1': {

            'in': layers['f0']['out'],

            'out': 240

        }

    })

    layers.update({

        'f2': {

            'in': layers['f1']['out'],

            'out': 43

        }

    })

Next, the weight and bias objects (python dictionaries) are constructed:

    weights = {

        'wc1': tfhe((layers['c1']['fx'], layers['c1']['fy'], n_channels, layers['c1']['d'])),

        'wc2': tfhe((layers['c2']['fx'], layers['c2']['fy'], layers['c1']['d'], layers['c2']['d'])),

        'wc3': tfhe((layers['c3']['fx'], layers['c3']['fy'], layers['c2']['d'], layers['c3']['d'])),

        'wf0': tfhe((layers['f0']['in'], layers['f0']['out'])),

        'wf1': tfhe((layers['f1']['in'], layers['f1']['out'])),

        'wf2': tfhe((layers['f2']['in'], layers['f2']['out']))

    }

    biases = {

        'bc1': tf.Variable(tf.zeros(layers['c1']['d'])),

        'bc2': tf.Variable(tf.zeros(layers['c2']['d'])),

        'bc3': tf.Variable(tf.zeros(layers['c3']['d'])),

        'bf0': tf.Variable(tf.zeros(layers['f0']['out'])),

        'bf1': tf.Variable(tf.zeros(layers['f1']['out'])),

        'bf2': tf.Variable(tf.zeros(layers['f2']['out']))

    }

where, the *tfhe* function points to the initialization routine detialed in [1].

The initial architecture chosen was LeNet's convolutional network as detailed [here](https://classroom.udacity.com/nanodegrees/nd013/parts/fbf77062-5703-404e-b60c-95b78b2f3f9e/modules/6df7ae49-c61c-4bb2-a23e-6527e69209ec/lessons/601ae704-1035-4287-8b11-e2c2716217ad/concepts/d4aca031-508f-4e0b-b493-e7b706120f81). The parameter tweaking and

performance testing showed that stacking another convolutional layer is of greater

benefit then increasing the number of parameters (i.e. depth vs width).

The final architecture chosen is three subsequent convolutional layers with average pooling,

equal strides and equal filter widths (**w_c(0,1,2) = 5x5**), and respective filter

depths (**d_c(0,1,2) = 9, 54, 220**). The following layers chosen are three subsequent fully connected layers with widths respectively (**w_fc(0,1,2) = 480, 240, 43**).

| # | Layer                 |     Description	                              |  Output

|:-:|:---------------------:|:---------------------------------------------:|:-------------------:|

|1  | Input                 | Grayscale image                               | 32x32x1

|2  | Convolution (5x5x9)   | 1x1 Stride, Valid Padding                     | 28x28x9

|3  | ReLu                  |                                               | 28x28x9

|4  | Average pooling       | 2x2 stride                                    | 14x14x9

|5  | Convolution (5x5x54)  | 1x1 Stride, Valid Padding                     | 10x10x54

|6  | ReLu                  |                                               | 10x10x54

|7  | Average pooling       | 2x2 stride                                    | 5x5x54

|8  | Convolution (5x5x216) | 1x1 Stride, Valid Padding                     | 1x1x216

|9  | ReLu                  |                                               | 1x1x216

|10 | Average pooling       | 2x2 stride                                    | 1x1x216

|11 | Fully connected		| Flattened network (1x216)                     | 1x480

|12 | Dropout       		| val=0.85                                      | 1x480

|13 | Fully connected		|                                               | 1x240

|14 | Dropout       		| val=0.85                                      | 1x240

|15 | Fully connected		|                                               | 1x43

### Train - Validate - Test

The network was trained and optimized for **50 Epochs** with a **Batch Size of 128** using:

For each image, discuss what quality or qualities might be difficult to classify.

| # | Layer                 |     Description	                            |  Output

|:-:|:---------------------:|:---------------------------------------------:|:-------------------:|

| 1 | Softmax               | Cross Entropy with Logits                     | 1x43

| 2 | Loss Operation        | Reduce entropy with mean                      | 1x43

| 3 | Optimizer             | Adam Optimizer (learning_rate = 0.0007)       | 1x43

---

## Results

### Validation Accuracy

![][results]

### Testing Accuracy

The training accuracy achieved was in the range of 0.950-0.960

### Real World Testing Accuracy

Testing with images downloaded from a google image search with key-words: "German traffic signs"

resulted in accuracy of **0.30**.

![][real-world]

The softmax probabilities 5 randomly chosen real-world images are as follows:

![][softmax]

The individual images can below be seen in full size with their supporting discussion.

#### 1. General Caution in Snow

![][snow]

The top 5 probabilities are far away from correct.

Difficulties for classification:

1. Snow! This is an image for a General Caution sign in the winter, partially covered in snow.

2. Size ratio - the sign area is much smaller than the complete image area (<< 0.5), whereas the training set had a sign to image size ratio of approx 0.5

3. Multiple Signs and overlayed text

#### 2. No Entry under a high angle

![][noentry]

Difficulties for classification:

1. Sign centre is shifted towards the upper edge of the image

2. The pose of the sign relative to the camera is not favorable to the algorithm

3. Size ratio

#### 3. No Entry drawing

![][noentry2]

This image is a drawing and is as expected classified with probability of 1.0 

#### 4. Roundabout

![][roundabout]

The roundabout mandatory sign is as well classified with a high probability.

#### 5. Small Limit 30

![][limit30]

Problems with this image are:

1. Size ratio (sign area to image area)

## Discussion

The task at hand was successfully completed by achieving a higher than the desired accuracy (testing and validation)

of 93%. Further testing with random images from the internet showed the weakness

of the model, namly that it expects the sign to be centrally located within the

image, to be of a certain resolution and to occupy the bulk are within the image.

Prior data augmenting, real-world results were with accuracy < 0.1. After augmentation, the results improved significantly.

During training and testing it was noted that increasing the dropout rate above

0.5 up to 0.9 had only benefitial effects. This can be attributed to the intentionally

wide fully connected layers at the end, where a large dropout will still result in

enough nodes left to achieve a good result.

The conversion to grayscale proved itself also very valuable. This operation is essentially

a pre-convolution that reduces the dimensionality of the input set, which in turn reduces

drastically the requirements for the network size with negligable information loss.

Increasing network depth was the final step following which the desired accuracy was exceeded.

## Visualizing the network state

N/A

## References

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification", CoRR, 2015
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nirabo/carnd-traffic-sign-classifier

Awesome Lists containing this project

README