https://github.com/MINGUKKANG/ENAS-Tensorflow

Efficient Neural Architecture search via parameter sharing(ENAS) micro search Tensorflow code for windows user
https://github.com/MINGUKKANG/ENAS-Tensorflow

Last synced: 2 months ago
JSON representation

Efficient Neural Architecture search via parameter sharing(ENAS) micro search Tensorflow code for windows user

Host: GitHub
URL: https://github.com/MINGUKKANG/ENAS-Tensorflow
Owner: mingukkang
Created: 2018-05-20T02:31:29.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2018-07-19T14:16:46.000Z (almost 7 years ago)
Last Synced: 2024-08-03T01:17:38.603Z (11 months ago)
Language: Python
Size: 28.3 MB
Stars: 113
Watchers: 13
Forks: 33
Open Issues: 6
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-production-machine-learning - ENAS-Tensorflow - Tensorflow.svg?style=social) - Efficient Neural Architecture search via parameter sharing(ENAS) micro search Tensorflow code for windows user. (Neural Architecture Search)
awesome-production-machine-learning - ENAS-Tensorflow - Tensorflow.svg?style=social) - Efficient Neural Architecture search via parameter sharing(ENAS) micro search Tensorflow code for windows user. (AutoML)

README

        ## ENAS-Tensorflow

I will explain the code of Efficient Neural Architecture Search(ENAS), especially case of micro search.

Unlike the author's code, This code can work in a windows 10 enviroment and you can use png files as datasets.

Also you can apply data augmentation using "n_aug_img" which is explained below. 

## Enviroment

- OS: Window 10(Ubuntu 16.04 is possible)

- Graphic Card /RAM : 1080TI /32G

- Python 3.5

- Tensorflow-gpu version:  1.4.0rc2 

- OpenCV 3.4.1

## How to run

**
At first, you should unpack the attached data as shown below.**

![사진1](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/unpack.PNG)

**
 Next, You should change the code below to suit your situation.**

```

DEFINE_string("output_dir", "./output" , "")

DEFINE_string("train_data_dir", "./data/train", "")

DEFINE_string("val_data_dir", "./data/valid", "")

DEFINE_string("test_data_dir", "./data/test", "")

DEFINE_integer("channel",1, "MNIST: 1, Cifar10: 3")

DEFINE_integer("img_size", 32, "enlarge image size")

DEFINE_integer("n_aug_img",1 , "if 2: num_img: 55000 -> aug_img: 110000, elif 1: False")

```

It is recommended to set "n_aug_img" = 1 to find the child network, and to use 2 ~ 4 to train the found child network.

**
Then, You can train Controller of ENAS with the following short code:**

```

python main_controller_child_trainer.py

```

**
After finishing,   you can train the child network with the following code:**

```

Case of MNIST 

python main_child_trainer.py --child_fixed_arc "1 2 1 3 0 1 0 4 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 4 1 0 2 0 0 3 1 1 0 0 0 0 4 1 1 0"

```

```

Case of Cifar 10

python main_child_trainer.py --child_fixed_arc "1 0 1 1 1 1 0 0 1 1 0 0 0 3 0 3 1 3 1 1 1 1 0 3 0 3 0 3 1 3 0 1 1 3 0 2 0 3 1 0"

```

```

Case of Welding Defects

python main_child_trainer.py --child_fixed_arc "1 0 0 1 0 0 1 1 2 2 1 1 1 1 1 2 1 0 0 0 0 0 0 3 2 2 1 0 2 0 2 3 0 3 4 0 1 0 3 2"

```

The string in the above code like "1 2 1 3 0 1 ~ " is the result of main_controller_child_trainer.py

The first 20 numbers are for the architecture for convolution layers, and the rest are for pooling layers.

## Result

### 1. ENAS cells discoved in the micro search space

After training , we got the following child_arc_seq and visualized it as shown below.

#### MNIST

```

"1 2 1 3 0 1 0 4 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 4 1 0 2 0 0 3 1 1 0 0 0 0 4 1 1 0"

```


![사진2](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/MNIST_convCell.png)


![사진3](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/MNIST_Reduction_cell.png)

#### CIFAR 10

```

"1 0 1 1 1 1 0 0 1 1 0 0 0 3 0 3 1 3 1 1 1 1 0 3 0 3 0 3 1 3 0 1 1 3 0 2 0 3 1 0"

```


![사진2](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/CIFAR10_Convolution_cell.png)


![사진3](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/CIFAR10_Reduction_cell.png)

#### Welding Defects

```

"1 0 0 1 0 0 1 1 2 2 1 1 1 1 1 2 1 0 0 0 0 0 0 3 2 2 1 0 2 0 2 3 0 3 4 0 1 0 3 2"

```


![사진2](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/Welding_Defects_Convolutional_Cell.png)


![사진3](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/Welding_Defects_Reduction_Cell.png)

### 2. Final structure of the child network

#### MNIST


![사진4](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/MNIST_final_network.png)

#### CIFAR 10


![사진4](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/CIFAR10_final_network.png)

#### Welding Defects


![사진4](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/Welding_final_network.png)

### 3. Test Accuracy

```

MNIST

Test Accuracy : 99.77%

```

```

CIFAR 10

Test Accuracy : 

```

```

Welding Defects

Test Accuracy : 100.00% 

```

### 4. Graphs

 Controller Validation Accuracy(reward) 



 ChildNetwork Loss ＆ Test Accuracy for MNIST Dataset



 ChildNetwork Loss ＆ Test Accuracy for Welding Defects Dataset 



## Explained

### 1. Controller

First, we will build the sampler as shown in the picture below.


![사진5](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/Controller_init.png)


Then we will make controller using sampler's output "next_c_1, next_h_1".


![사진6](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/Controller.PNG)


 After getting the "next_c_5, next_h_5", you must do the following to renew "Anchors,   Anchors_w_1".


![사진7](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/Anchors_appen.PNG)

### 2. Controller_Loss

To enable the Controller to make better networks, ENAS uses REINFORCE with a moving average baseline to reduce variance.

```python

for all index:

    curr_log_prob = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=index)

    log_prob += curr_log_prob

    curr_ent = tf.stop_gradient(tf.nn.softmax_cross_entropy_with_logits(

    logits=logits, labels=tf.nn.softmax(logits)))

    entropy += curr_ent

for all op_id:

    curr_log_prob = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=op_id)

    log_prob += curr_log_prob

    curr_ent = tf.stop_gradient(tf.nn.softmax_cross_entropy_with_logits(

    logits=logits, labels=tf.nn.softmax(logits)))

    entropy += curr_ent

arc_seq_1, entropy_1, log_prob_1, c, h = self._build_sampler(use_bias=True) # for convolution cell

arc_seq_2, entropy_2, log_prob_2, _, _ = self._build_sampler(prev_c=c, prev_h=h) # for reduction cell 

self.sample_entropy = entropy_1 + entropy_2

self.sample_log_prob = log_prob_1 + log_prob_2    

```

```python

    self.valid_acc = (tf.to_float(child_model.valid_shuffle_acc) /

                      tf.to_float(child_model.batch_size))

    self.reward = self.valid_acc 

    if self.entropy_weight is not None:

      self.reward += self.entropy_weight * self.sample_entropy

    self.sample_log_prob = tf.reduce_sum(self.sample_log_prob)

    self.baseline = tf.Variable(0.0, dtype=tf.float32, trainable=False)

    baseline_update = tf.assign_sub(

      self.baseline, (1 - self.bl_dec) * (self.baseline - self.reward))

    with tf.control_dependencies([baseline_update]):

      self.reward = tf.identity(self.reward)

    self.loss = self.sample_log_prob * (self.reward - self.baseline)

```

### 3. Child Network 

(1) Schematic of Child Network


![사진8](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/Schematic_child_network.png)

(2) _enas_layers

```python

def _enas_layers(self, layer_id, prev_layers, arc, out_filters):

    '''

    prev_layers : previous two layers. ex) layers[●,●]

    ●'s shape = [None, H, W, C]

    arc: "0 1 0 1 0 3 0 0 2 2 0 2 1 0 0 1 1 3 0 1 1 1 0 1 0 1 2 1 0 0 0 0 0 0 1 3 1 1 0 1"

    out = [self._enas_conv(x, curr_cell, prev_cell, 3, out_filters), 

           self._enas_conv(x, curr_cell, prev_cell, 5, out_filters),

           avg_pool,

           max_pool, 

           x]

    '''

    

    retrun output # calculated by arc, np.shape(output) = [None, H, W, out_filters]

                  # if child_fixed_arc is not None, np.shape(output) = [None, H, W, n*out_filters]

                  # where n is the number of not being used nodes in the coonv cell or Reduction cell.

```

(3) factorized_reduction

```python

def factorized_reduction(self, x, out_filters, strides = 2, is_training = True):

    '''

    x : x is last previous layer's output.

    out_filters: 2*(previous layer's channel)

    '''

    

    stride_spec = self._get_strides(stride)  # [1,2,2,1]

    

    # Skip path 1

    path1 = tf.nn.avg_pool(x, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)  

    with tf.variable_scope("path1_conv"):

        inp_c = self._get_C(path1)

        w = create_weight("w", [1, 1, inp_c, out_filters // 2])  

        path1 = tf.nn.conv2d(path1, w, [1, 1, 1, 1], "VALID", data_format=self.data_format)  

        # Skip path 2

        # First pad with 0"s on the right and bottom, then shift the filter to

        # include those 0"s that were added.

    if self.data_format == "NHWC":

        pad_arr = [[0, 0], [0, 1], [0, 1], [0, 0]]

        path2 = tf.pad(x, pad_arr)[:, 1:, 1:, :]

        concat_axis = 3

    else:

        pad_arr = [[0, 0], [0, 0], [0, 1], [0, 1]]

        path2 = tf.pad(x, pad_arr)[:, :, 1:, 1:]

        concat_axis = 1

    path2 = tf.nn.avg_pool(path2, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)

    with tf.variable_scope("path2_conv"):

        inp_c = self._get_C(path2)

        w = create_weight("w", [1, 1, inp_c, out_filters // 2])

        path2 = tf.nn.conv2d(path2, w, [1, 1, 1, 1], "VALID", data_format=self.data_format)

    # Concat and apply BN

    final_path = tf.concat(values=[path1, path2], axis=concat_axis)

    final_path = batch_norm(final_path, is_training, data_format=self.data_format)

    return final_path

```

(4) _maybe_calibrate_size

```python

def _maybe_calibrate_size(self, layers, out_filters, is_training): 

    """Makes sure layers[0] and layers[1] have the same shapes."""

    hw = [self._get_HW(layer) for layer in layers]  

    c = [self._get_C(layer) for layer in layers]  

    with tf.variable_scope("calibrate"):

        x = layers[0]  

        if hw[0] != hw[1]:  

            assert hw[0] == 2 * hw[1]  

            with tf.variable_scope("pool_x"):

                x = tf.nn.relu(x)

                x = self._factorized_reduction(x, out_filters, 2, is_training)

        elif c[0] != out_filters:  

            with tf.variable_scope("pool_x"):

                w = create_weight("w", [1, 1, c[0], out_filters])

                x = tf.nn.relu(x)

                x = tf.nn.conv2d(x, w, [1, 1, 1, 1], "SAME", data_format=self.data_format)

                x = batch_norm(x, is_training, data_format=self.data_format)  

        y = layers[1]  

        if c[1] != out_filters:  

            with tf.variable_scope("pool_y"):

                w = create_weight("w", [1, 1, c[1], out_filters])

                y = tf.nn.relu(y)

                y = tf.nn.conv2d(y, w, [1, 1, 1, 1], "SAME", data_format=self.data_format)

                y = batch_norm(y, is_training, data_format=self.data_format)

    return [x, y]

```

(5) Others

You can see more details of the child network in 

### 4. Summary of learning mechanism

```

1. Train the Child Network during 1 Epoch. (Momentum optimization)

※ 1 Epoch = (Total data size / batch size) times parameters update.

2. Train the controller 'FLAGS.controller_train_steps x FLAGS.controller_num_aggregate' times. (Adam Optimization)

3. Repeat "1", "2" as many as we want.(160 Epochs)

4. Choose the child network architecture with the highest validation accuracy.

```

```

1. Train the child Network which is selected above as many as we want. (Momentum optimization, 660 Epochs)

```

## Augmentation

### 1. Code

```python

def aug(image, idx):

    augmentation_dic = {0: enlarge(image, 1.2),

                        1: rotation(image),

                        2: random_bright_contrast(image),

                        3: gaussian_noise(image),

                        4: Flip(image)}

    image = augmentation_dic[idx]

    return image

```

Function enlarge, rotation, random_bright_contrast and Flip are writen using cv2.

In the case of MNIST Data, I do not apply flip! you can check more details in 

### 2. Images

## Graphs

#### MNIST

![사진9](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/MNIST_AUG.png)

#### CIFAR10

![사진9](https://github.com/MINGUKKANG/ENAS-Tensorflow/blob/master/images/Cifar10_AUG.png)

#### Welding Defects

 Welding OK 

 Welding NG 





## References

**Paper: https://arxiv.org/abs/1802.03268**

**Autors' implementation: https://github.com/melodyguan/enas**

**Data Pipeline: https://github.com/MINGUKKANG/MNIST-Tensorflow-Code**

## License

All rights related to this code are reserved to the author of ENAS

(Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/MINGUKKANG/ENAS-Tensorflow

Awesome Lists containing this project

README