https://github.com/creatcodebuild/carnd-p3
https://github.com/creatcodebuild/carnd-p3
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/creatcodebuild/carnd-p3
- Owner: CreatCodeBuild
- Created: 2017-03-01T12:12:49.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-03-02T07:30:52.000Z (almost 9 years ago)
- Last Synced: 2025-02-15T22:43:13.840Z (10 months ago)
- Language: HTML
- Size: 263 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Behavior-Clone
## Data Gathering
I mainly used Udacity data.
But I also collected my own data and used other people's data as while.
There are the distributions of different data set I used

I used the new simulator with mouse. So I have smooth angles. I drove for many rounds in both forward and backward directions. My data has about 20000 entities.

Udacity's data has about 8000 entities.

This data has about 20000 entities.
Udacity and the other data used old simulator, which is more zero centered.
I also drove many edge recover data and collected some recovery data from other people. But during my experiments, I found that I didn't need many recovery data. So I just used the recovery data from [Somnath Banerjee](https://github.com/cssomnath/udacity-sdc/tree/master/carnd-projects/CarND-Behavioral-Cloning). I considered some of his approach in this [post](https://medium.com/@somnath.banerjee/behavioral-cloning-project-of-self-driving-car-nano-degree-9381aaa4da13).
## Architecture
I tried many different networks. From big networks such as VGG16 plus 2 fully connected layers with 2048 nodes each, to small networks with only 2 convolutions and 2 fully connected layers.
One observation is that a big network, such as VGG, requires much more data to train. And once it's trained, it got overfitted very easily. I used VGG16 with image net weights. I tried 2 approaches, the first one frozed VGG weights and only train the dense layer's weight. The second approach train all layers. Both produce good results. But took too long to train and hard to tune.
Small network is much easy to train, but it's harder to get good results. And once I added recover data, it started to become weird. This made sense because for a human, it's very hard to drive consistantly acrossing different data gathering attempts. Therefore more data introduce more entropy into the system. A small network would have a hardtime to find patterns in such a chaos.
Therefore, I used the Nvidia architecture. This architecture is powerful enough to learn many patterns and in the same time small enough to be trained in a short time.
I tried different variations of Nvidia architectures but I don't find much differences. Therefore, for this report, I used the original nvidia architectures.
The model looks like this
```
learning_rate=0.0001
dropout=0.5
model = Sequential()
model.add(Convolution2D(24, 5, 5, subsample=(2, 2), input_shape=(66, 200, 3),
activation='relu'))
model.add(Convolution2D(36, 5, 5, subsample=(2, 2), activation='relu'))
model.add(Convolution2D(48, 5, 5, subsample=(2, 2), activation='relu'))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(Flatten())
model.add(Dropout(dropout))
model.add(Dense(1164, activation='relu'))
model.add(Dropout(dropout))
model.add(Dense(100, activation='relu'))
model.add(Dropout(dropout))
model.add(Dense(50, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))
opt = Adam(lr=learning_rate)
model.compile(optimizer=opt, loss='mean_squared_error')
```
As you can see, this model, not like VGG, uses subsampling instead of pooling. The dense layer is very standard dense layer. I added 3 dropout layers to reduce the chances of overfitting.
For activation function, I tried Elu and Relu, but the performance are almost the same. Since Relu is easier to compute, I just used Relu.
The remaing sections of this report is all based this architectures in the code.
## Method
I set up 4 comparison groups:
- Model 1: trained with center images from Udacity's data only
- Model 2: trained with center images from Udacity'dat and recovery data.
- Model 3: trained with center, left, right images from Udacity's data only
- Model 4: trained with center, left, right images from Udacity's data nad recovery data.
During my actually experiements, I have many more comparison groups such as different learning rate, different optimizer, different data preprocessing methods. But it will take too many pages to cover all of them.
For the above 4 comparison groups, I used Adam optimizer with 0.0001 learning rate. No exponential decay. I used adadelta but it took much longer time to converge.
### Data/Image preprocessing
I used all 3 channels. Only 2 preprocessing is done.
__The first step__ is to crop the upper part of an image because the upper part contains unless information.

This is an original image. As you can see, we probably don't need the sky and environment. We only need the road.

This is the cropped image. Only roads are left.

Then resize the image to 200 width x 66 height for the mdoel to consume. I tried bilinear, bicubic and lanzocs downsampling. The choices of downsampling method doesn't influence the end model result very much, at least not observable by me. In this report, normal bilinear is used.
__The second step__ is to normalize the image. I computed the mean and standard deviation of an image, and subtracted the image by its mean and divided by its std, as shown in the code
```
def normalize(img):
mean = np.mean(img, axis=(0, 1))
std = np.std(img, axis=(0, 1))
return (img - mean) / std
def preprocess_image(img):
img = imresize(img[60:150, :, :], (200, 66))
return normalize(img)
```
I also tried a simple linear interpolation which maps value in [0, 255] to [-1.0, 1.0] and/or [-0.5, 0.5].
This submission used `(img - mean) / std`.
The choice of normalization doesn't influence end results very much as long as the normalized matrix(image) is zero centered and has a range from [-1, 1]. In practice, linear interpolation should be a better choice because lienar interpolation allows the model to consume unnormalized data during testing because all values are linearly mapped. The std method requires normalization even in the testing time.
In the case of `drive.py`, preprocessing during real time driving delays the system response time.
### Trainning
The trainning is simple, I created 4 models and trained them with different configurations. The validation split is 10 % for all cases. I only trained it for 30 epoch for this report because I noticed that more trainning started to harm the performance (overfitting). Tge angle offset is 0.25. I noticed that the offset can be any number between 0.2 to 0.25. No big difference.
```
create_model('nvidia_center.h5')
train('nvidia_center.h5', batch_size=512, nb_epoch=30, valid_split=0.1, center_only=True, angle=0.25, sharp_turn=False)
create_model('nvidia_center_sharp_turn.h5')
train('nvidia_center_sharp_turn.h5', batch_size=512, nb_epoch=30, valid_split=0.1, center_only=True, angle=0.25, sharp_turn=True)
create_model('nvidia_cfr25_sharp_turn.h5')
train('nvidia_cfr25_sharp_turn.h5', batch_size=512, nb_epoch=30, valid_split=0.1, center_only=False, angle=0.25, sharp_turn=True)
create_model('nvidia_cfr25.h5')
train('nvidia_cfr25.h5', batch_size=512, nb_epoch=30, valid_split=0.1, center_only=False, angle=0.25, sharp_turn=False)
```
You can look at `model.py` for more details.
## Result
### Training History
I used 10% of the center images as validation and trained for 30 epochs for all 4 experiments. I didn't use a fixed random seed for shuffling and I shuffled the training set before each epoch. Therefore, the performance across different runs varied.

I first trained the network with Udacity center images only. Surprisingly, the validation was almost parallel to training and was about 0.0002 lower than the training loss. The validation reached as low as 0.00075. I tried to train more epochs but that introduced overfit.
The car was able to drive across the bridge and made the first sharp turn after the bridge. But it was not able to make the second sharp turn.

Using center, left and right images, we still got under 0.001 loss. But we saw a small overfitting. The car was above to drive through the bridge because it turned to left and right too much.

Center images only but adding sharp turn data. The result is about the same as center data only without sharp turn data. Although we saw a much worse overfit, but the driving performance was not hurt. It was able to drive across the bridge and make the first sharp turn. The second sharp turn was half successful. It did not have enough "knowledge" to make shapr turnes.
|
Although the training loss was much more, but the car actually perform much better. It was able to drive track 1 successfully. It could drive most of track 2 but failed at the everything end.
I included the videos for track 1 and track 2 for this experiment.
### Video
[Track 1](https://www.youtube.com/watch?v=F70QZbMNi1I)
[Track 2](https://www.youtube.com/watch?v=ukBT-UFJVhE)
## Discussion and Future Work
As you can see, with Udacity data along , I am able to train a genralized regressor. It still fails in some edge cases. I didn't use random lighting/color adjustment or shifting/rotation to augment the data. But, I believe with more data and data augmentation, the model is powerful enough to learn more patterns.
Also, from a self-driving car point of view, behavior clone approach has a fatal problem. It is only as good as its driver. But human driving are not consistant and stable. Cloning human driving behavior may not be a good way to produce good driving agent. It's a way to produce human like driving agent. However, the goal of self-driving car is to eliminate the dangerous elements in human driving and replace them with well designed and engineered safe, reliable driving.
## Code
Assuming you have your "driving_log.csv" in the `./data` directory, simply run
```
python model.py
```
to build and train the model.
Run
```
python analytic.py
```
to see the analysis.