https://github.com/logancyang/behavioral-cloning
Udacity SDCND: Teach a car to drive itself in a simulator by training convolutional neural networks using TensorFlow and Keras
https://github.com/logancyang/behavioral-cloning
Last synced: 10 months ago
JSON representation
Udacity SDCND: Teach a car to drive itself in a simulator by training convolutional neural networks using TensorFlow and Keras
- Host: GitHub
- URL: https://github.com/logancyang/behavioral-cloning
- Owner: logancyang
- Created: 2017-07-22T22:36:10.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-07-24T20:59:51.000Z (almost 9 years ago)
- Last Synced: 2025-04-02T23:29:41.673Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 437 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **Behavioral Cloning**
---
The goals / steps of this project are the following:
* Use the simulator to collect data of good driving behavior
* Build, a convolution neural network in Keras that predicts steering angles from images
* Train and validate the model with a training and validation set
* Test that the model successfully drives around track one without leaving the road
* Summarize the results with a written report
[//]: # (Image References)
[image1]: ./examples/steering_hist.png "Distribution of Steering Angles in Training Data"
[image2]: ./examples/train_val_loss.png "Training and Validation loss"
[image3]: ./examples/left.jpg "Left camera"
[image4]: ./examples/middle.jpg "Middle camera"
[image5]: ./examples/right.jpg "Right camera"
[image6]: ./examples/flipped_middle.jpg "Flipped image"
---
### Summary of Training Data
The driving log data consists of 8036 rows, each row has 3 images recorded by 3 virtual cameras on the vehicle: center, left and right.
Here is a histogram to see the distribution of the steering angles. It is a little bit imbalanced because the vehicle drives
counter-clockwise on the track.
![alt text][image1]
The training data consists of three types of images: left, middle, right generated from the cameras on the car. The sample images are shown below.
![alt text][image3] ![alt text][image4] ![alt text][image5]
Data augmentation is used to generate more training data. The technique is to flip the image randomly and also flip the steering input. This simple technique was able to improve the performance of the model significantly.
![alt text][image4] ![alt text][image6]
### Model Architecture and Training Strategy
#### 1. An appropriate model architecture has been employed
My initial model is LeNet but with continuous output. The initial data was collected by myself driving in the simulator
for 2 laps using only keyboard. The result wasn't very good because as some fellow students pointed out, keyboard's
abrupted movements are not smooth and suitable for the model to learn from. The steering should be as smooth as possible, so
a game controller is recommended over the keyboard. But since I don't have a game controller at this time, I tried the sample
data instead. LeNet performed pretty well on the sample data. It was not bad for the straight or slightly curved lanes but when
it reached sharper curves or the bridge where the ground has a different texture, it drove the car onto the curb and got stuck.
My next attempts were to augment the data as suggested by the lecture, and adopt a well-tested architecture in the literature,
which is NVIDIA's "End to End Learning for Self-Driving Cars"
[paper] (https://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-to-end-dl-using-px.pdf) (model.py lines 19-60)
The model includes RELU layers to introduce nonlinearity, and the data is normalized in the model using a Keras lambda layer (code line 20).
#### 2. Attempts to reduce overfitting in the model
The model was trained and validated on different data sets to ensure that the model was not overfitting (model.py line 64-70).
The number of training samples for each epoch is 20000, and the number of validation samples is set to 6400.
The model was tested by running it through the simulator and ensuring that the vehicle could stay on the track.
#### 3. Model parameter tuning
The model used an adam optimizer, so the learning rate was not tuned manually (model.py line 61). It was trained for
4 epochs because further training didn't reduce the loss by a noticeable amount.
#### 4. Appropriate training data
Initially the data I collected was via keyboard. It was quite abrupt and jerky because it was hard to maintain a smooth input
with the keyboard. As expected, the result was also jerky. The car frequently adjusted steering angles even on straight lanes.
This is a regression task using convolutional neural networks,
hence there is an important note for training these models - "garbage in, garbage out". With the sample data and data augmentation,
I was able to improve the model output by a lot. I trained for 4 epochs since further training didn't appear to be very helpful.
The final validation loss was 0.0102.
The following diagram shows the training and validation losses in the training process over the number of epochs,
![alt text][image2]
### Model Architecture and Training Strategy
#### 1. Solution Design Approach
Following the suggestion of the lecture, I tried the LeNet model first and then moved to the NVidia model.
In order to gauge how well the model was working, I implemented the image batch generator to populate the training and
validation sets. I found that my first model had a low mse loss on the training set but a high loss on the validation set.
This implied that the model was overfitting. So I added data augmentation to generate more training samples. Since the car
drives counter-clockwise on the track, the left and right steering data are not balanced. So I augmented the image data by flipping them
to horizontally symmetric ones randomly at a probability of 0.5. I also cropped and resized the images to only focus on
the road instead of the irrelevant parts.
The final step was to run the simulator to see how well the car was driving around track one. Without data augmentation, the car tended
to steer to the right more because of the imbalance left and right steering angles in the data. With random flipping, the car drove significantly
better.
On the model side, the NVidia paper provided a powerful solution to this problem. The paper described the architecture as follows.
The network consists of 9 layers, including a normalization layer, 5 convolutional layers and 3 fully connected layers.
The first layer of the network performs image normalization. Performing normalization in the network allows the
normalization scheme to be altered with the network architecture and to be accelerated via GPU processing.
The convolutional layers were designed to perform feature extraction and were chosen empirically through a series of
experiments that varied layer configurations. The model used strided convolutions in the first three convolutional layers with a
2×2 stride and a 5×5 kernel and a non-strided convolution with a 3×3 kernel size in the last two convolutional layers.
It follows the five convolutional layers with three fully connected layers leading to an output control value which is
the inverse turning radius. The fully connected layers are designed to function as a controller for steering.
The final model appeared to be easy to train and effective. At the end of the process,
the vehicle is able to drive autonomously around the track without leaving the road.
#### 2. Final Model Architecture
The architecture is shown below.
```
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
lambda_1 (Lambda) (None, 64, 64, 3) 0 lambda_input_1[0][0]
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D) (None, 32, 32, 24) 1824 lambda_1[0][0]
____________________________________________________________________________________________________
activation_1 (Activation) (None, 32, 32, 24) 0 convolution2d_1[0][0]
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D) (None, 31, 31, 24) 0 activation_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 16, 16, 36) 21636 maxpooling2d_1[0][0]
____________________________________________________________________________________________________
activation_2 (Activation) (None, 16, 16, 36) 0 convolution2d_2[0][0]
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D) (None, 15, 15, 36) 0 activation_2[0][0]
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D) (None, 8, 8, 48) 43248 maxpooling2d_2[0][0]
____________________________________________________________________________________________________
activation_3 (Activation) (None, 8, 8, 48) 0 convolution2d_3[0][0]
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D) (None, 7, 7, 48) 0 activation_3[0][0]
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D) (None, 7, 7, 64) 27712 maxpooling2d_3[0][0]
____________________________________________________________________________________________________
activation_4 (Activation) (None, 7, 7, 64) 0 convolution2d_4[0][0]
____________________________________________________________________________________________________
maxpooling2d_4 (MaxPooling2D) (None, 6, 6, 64) 0 activation_4[0][0]
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D) (None, 6, 6, 64) 36928 maxpooling2d_4[0][0]
____________________________________________________________________________________________________
activation_5 (Activation) (None, 6, 6, 64) 0 convolution2d_5[0][0]
____________________________________________________________________________________________________
maxpooling2d_5 (MaxPooling2D) (None, 5, 5, 64) 0 activation_5[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten) (None, 1600) 0 maxpooling2d_5[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 1164) 1863564 flatten_1[0][0]
____________________________________________________________________________________________________
activation_6 (Activation) (None, 1164) 0 dense_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 100) 116500 activation_6[0][0]
____________________________________________________________________________________________________
activation_7 (Activation) (None, 100) 0 dense_2[0][0]
____________________________________________________________________________________________________
dense_3 (Dense) (None, 50) 5050 activation_7[0][0]
____________________________________________________________________________________________________
activation_8 (Activation) (None, 50) 0 dense_3[0][0]
____________________________________________________________________________________________________
dense_4 (Dense) (None, 10) 510 activation_8[0][0]
____________________________________________________________________________________________________
activation_9 (Activation) (None, 10) 0 dense_4[0][0]
____________________________________________________________________________________________________
dense_5 (Dense) (None, 1) 11 activation_9[0][0]
====================================================================================================
Total params: 2,116,983
Trainable params: 2,116,983
Non-trainable params: 0
```
#### 3. End Result
I recorded the final result in autonomous mode into a mp4 file and uploaded it
[here](https://www.youtube.com/watch?v=pDdN28Bdm-o&feature=youtu.be).
#### References
- [Nvidia: End to End Learning for Self-Driving Cars](https://arxiv.org/abs/1604.07316)
- [Must Know Tips/Tricks in Deep Neural Networks](http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html)
- [An overview of gradient descent optimization algorithms](http://ruder.io/optimizing-gradient-descent/index.html)
- [Striving for Simplicity: The All Convolutional Net](https://arxiv.org/abs/1412.6806)
- [Spatial Dropout](https://faroit.github.io/keras-docs/1.1.1/layers/core/#spatialdropout2d)