2017年11月29日 星期三

A Guide to TF Layers: Building a Convolution Neural Network

A Guide to TF Layers: Building a Convolution Neural Network
井民全, Jing ,mqjing@gmail.com
Fig. A Regular 3-layer Neural Network Model (Source: ref).
ETA: 60 min
Training ETA: > 5 hours (depends on your machine)
Google doc: This document

Preface

如果你玩過 mnist_softmax[3], 你應該知道它的辨識率大約落在 91% ~ 92% 之間, 若你使用一個更精巧的 CNN 模型 (更多 layers, 更多的神經節點, 更多的神經聯結), 那結果會提升到 97.3% 附近.
網路上有許多寫得非常好的 CNN 教學文件, 這份文件就是參考官方文件: A Guide to TF Layers: Building a Convolutional Neural Network [1], 另外, 你也應該知道 CNN 模型中, 說明每一個 layer shape 最好的文件是 CS231n Convolutional Neural Networks for Visual Recognition [2]. 所以, 有興趣的朋友, 可以直接閱讀官方文件, 就可以得到你想要的東西. 除非你是想看我的版本.
利用這份文件, 你可以開始撰寫第一支 CNN tensorflow 程式. 我把它變成 step by step 步驟. 按照這個步驟可以在 60 分鐘內, 完成下列目標:
  1. 自己建立 CNN model 的能力
    1. 了解 CNN 模型中, 每一個 layer 的 shape (intput volume 與 output volume) 以及各個 layer 之間的關係.
  2. 玩的能力:
    1. 每一個建立 tensorflow 開發環境.
    2. 完成官方教學文件的 tensorflow 程式碼, 開始玩 CNN  

問題探討

  1. 不同的 model 參數, 對結果的影響為何?

Key Point

  • 每一個 layer 的 input volume (shape) 與 output volume (shape) 的詳細數字, 是搞懂的關鍵.
  • Shape 的計算 (W-F+2P)/S + 1 是計算的關鍵 (ref).

Table of Contents





Show Me the Code

GitHub: cnn_mnist.py
#  Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
"""Convolutional Neural Network Estimator for MNIST, built with tf.layers."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)


def cnn_model_fn(features, labels, mode):
 """Model function for CNN."""
 # Input Layer
 # Reshape X to 4-D tensor: [batch_size, width, height, channels]
 # MNIST images are 28x28 pixels, and have one color channel
 input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

 # Convolutional Layer #1
 # Computes 32 features using a 5x5 filter with ReLU activation.
 # Padding is added to preserve width and height.
 # Input Tensor Shape: [batch_size, 28, 28, 1]
 # Output Tensor Shape: [batch_size, 28, 28, 32]
 conv1 = tf.layers.conv2d(
     inputs=input_layer,
     filters=32,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)

 # Pooling Layer #1
 # First max pooling layer with a 2x2 filter and stride of 2
 # Input Tensor Shape: [batch_size, 28, 28, 32]
 # Output Tensor Shape: [batch_size, 14, 14, 32]
 pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

 # Convolutional Layer #2
 # Computes 64 features using a 5x5 filter.
 # Padding is added to preserve width and height.
 # Input Tensor Shape: [batch_size, 14, 14, 32]
 # Output Tensor Shape: [batch_size, 14, 14, 64]
 conv2 = tf.layers.conv2d(
     inputs=pool1,
     filters=64,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)

 # Pooling Layer #2
 # Second max pooling layer with a 2x2 filter and stride of 2
 # Input Tensor Shape: [batch_size, 14, 14, 64]
 # Output Tensor Shape: [batch_size, 7, 7, 64]
 pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

 # Flatten tensor into a batch of vectors
 # Input Tensor Shape: [batch_size, 7, 7, 64]
 # Output Tensor Shape: [batch_size, 7 * 7 * 64]
 pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

 # Dense Layer #1
 # Densely connected layer with 1024 neurons
 # Input Tensor Shape: [batch_size, 7 * 7 * 64]
 # Output Tensor Shape: [batch_size, 1024]
 dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

 # Add dropout operation; 0.6 probability that element will be kept
 dropout = tf.layers.dropout(
     inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)
 # Dense Layer #2
 # Logits layer
 # Input Tensor Shape: [batch_size, 1024]
 # Output Tensor Shape: [batch_size, 10]
 logits = tf.layers.dense(inputs=dropout, units=10)

 predictions = {
     # Generate predictions (for PREDICT and EVAL mode)
     "classes": tf.argmax(input=logits, axis=1),
     # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
     # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
 }

# Define the PREDICT
 if mode == tf.estimator.ModeKeys.PREDICT:
   return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

 # Define the LOSS function
 onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10)
 loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)

 # Training Op (for TRAIN mode)
 if mode == tf.estimator.ModeKeys.TRAIN:
   optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
   train_op = optimizer.minimize(
       loss=loss,
       global_step=tf.train.get_global_step())
   return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

 # Evaluation metrics (for EVAL mode)
 eval_metric_ops = {
     "accuracy": tf.metrics.accuracy(
         labels=labels, predictions=predictions["classes"])}
 return tf.estimator.EstimatorSpec(
     mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)


def main(unused_argv):
 # Load training and eval data
 mnist = tf.contrib.learn.datasets.load_dataset("mnist")

 train_data = mnist.train.images  # Returns np.array
 train_labels = np.asarray(mnist.train.labels, dtype=np.int32)

 eval_data = mnist.test.images  # Returns np.array
 eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

 # Create the Estimator
 mnist_classifier = tf.estimator.Estimator(
     model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

 # Set up logging for predictions
 # Log the values in the "Softmax" tensor with label "probabilities"
 tensors_to_log = {"probabilities": "softmax_tensor"}
 logging_hook = tf.train.LoggingTensorHook(
     tensors=tensors_to_log, every_n_iter=50)

 # Create the training data set
 train_input_fn = tf.estimator.inputs.numpy_input_fn(
     x={"x": train_data},
     y=train_labels,
     batch_size=100,
     num_epochs=None,
     shuffle=True)

 # Train the model
 mnist_classifier.train(
     input_fn=train_input_fn,
     steps=20000,
     hooks=[logging_hook])

# create the evaluation data set
 eval_input_fn = tf.estimator.inputs.numpy_input_fn(
     x={"x": eval_data},
     y=eval_labels,
     num_epochs=1,
     shuffle=False)

 # Evaluate the model and print results
 eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
 print(eval_results)


if __name__ == "__main__":
 tf.app.run()

Concept

What is CNN

CNNs apply a series of filters to the raw pixel data of an image to extract and learn higher-level features, which the model can then use for classification.

Three Components of the Model

Convolutional layers

  • Purpose: Feature extraction
  • Input: a specified number of convolution filters to the image
  • Output: a single value in the output feature map. And then apply ReLU activation function (激活函數) to the map to generate the nonlinearies to the model.
Fig. A typical ReLU activation function [ref].

Pooling layers

  • Prupose: Reduce the dimensionality of the feature map
  • Algo.: max pooling

Dense (fully connected) layers

  • Purpose: Perform classification on features
  • Input: the values from convolution layers and downsampled by the pooling layers
  • Output: with softmax activation function to generate the probabilities to the classes for an input image

Define the CNN Essential Module

An Overview of the Structure

(Edit)
最好的 CNN structure 說明 (view)

How to Calcuate the Output Volume



(W-F+2P)/S+1
(ref)

公式原理只是簡單的移動取樣問題.

W, input layer's dimension
F, filter size or the kernel size
P, the number of the padding "0" arround the image: 0
S, stride, the number pixel of moving step: 2

Example

(W-F+2P)/S+1 = 14



W: 28
F: 2    (filter: 2x2)
P: 0    (pad: 0)
S: 2    (stripe: 2)


Shape
Fig. The Output Volume for Pooling Layer #1.
(Edit)


Step 1: Define the CNN MNIST Classifier

Modules
Purpose
Argument
Output Shape
Input Layer
Represent the input picture
W = 28 (image size)
{28, 28, 1}

Code
# Input Layer  
input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])
Convolutional Layer #1
Feature extraction
W = 28 (image size)
Filter (kernel size) = 5x5

Depth (Filter #) = 32
Padding = same

Stride = 1
AF: ReLu

Stride: When the stride is 1 then we move the filters one pixel at a time.
{28, 28, 32}

(Edit)

Code
# Convolutional Layer #1
conv1 = tf.layers.conv2d(
     inputs=input_layer,
     filters=32,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)
Pooling Layer #1
Downsampling
Pool size = 2x2
Stride =  2
Padding = Valid
{14, 14, 32}

Code
# Pooling Layer #1
 pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
Convolutional Layer #2
Feature extraction
Filter: 5x5
A.F.: ReLu
depth = 64
Padding = same

Stride = 1
AF: ReLu
{14, 14, 64}

Code
# Convolutional Layer #2
 conv2 = tf.layers.conv2d(
     inputs=pool1,
     filters=64,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)
 

Pooling Layer #2
Downsampling
Pooling Size: 2x2
Stride:  2
Padding = Valid
{7, 7, 64}

Code
# Pooling Layer #2
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
Dense Layer #1
Flat the feature vector

Dropout layter: prevent overfit

{7 x 7 x 64 = 3136}
-> {1024}

Code
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
dropout = tf.layers.dropout(
     inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)


Dropout the nodes
Purpose
  • Reduce overfitting by dropout the node and income/outcome edges.
  • Speed up the training process
Fig. The standard network (Left), The dropped out network (Right) that edge number reduce to 2^n.
Dense Layer #2,
Logits
Classification
Num:10
Act: Linear
{10}

code
# Logits Layer
 logits = tf.layers.dense(inputs=dropout, units=10)

Step 2: Define Prediction Model

Purpose
Create a prediction model for handwriting digital application based on the logits tensor with the highest raw value and the probabilities from the logits layer.
Procedure

Step 1: Get the index for the highest raw value

Step 2: Get the probabilities   

predictions = {
     # Generate predictions (for PREDICT and EVAL mode)
     "classes": tf.argmax(input=logits, axis=1),

     # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
     # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
 }
  
Step 3: Create a prodict model based on the current setting
if mode == tf.estimator.ModeKeys.PREDICT:
 return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

Code
def cnn_model_fn(features, labels, mode):

  # <組建神經網路模型的 code> ...
  logits = ...
  
  # <建立基於神經網路的 prediction 機制>
    predictions = {
         # Generate predictions (for PREDICT and EVAL mode)
         "classes": tf.argmax(input=logits, axis=1),

         # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
     # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
 }

# for prdict mode
if mode == tf.estimator.ModeKeys.PREDICT:
 return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)


Step 3: Define Loss Function

Purpose
Measure how closely the model's predictions match the target classes. For multiclass classification problem, we use cross entropy as the loss metric.
Code
 # Define the LOSS function
 onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10)
 loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)

Step 4: Define Estimator Spec

Define Training Spec

Purpose
Via training, we configure the model to optimize the loss value. Here, the stochastic gradient descent was used as the optimization algorithm.
Code
def cnn_model_fn(features, labels, mode):

  # <組建神經網路模型的 code> ...
  logits = ...

  # <目前模型與實際的差距函數> ...
   loss =  ...

   # <使用 GrandientDescent 最佳化模型>
   if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
       train_op = optimizer.minimize(
          loss=loss,
          global_step=tf.train.get_global_step())

 return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

Define Evaluation Spec

Purpose
Provide accuracy metric in our model.

Code
def cnn_model_fn(features, labels, mode):

  # <組建神經網路模型的 code> ...
  logits = ...

  # <目前模型與實際的差距函數> ...
   loss =  ...

   # <建立基於神經網路的 prediction 機制>
    predictions = {
         # Generate predictions (for PREDICT and EVAL mode)
         "classes": tf.argmax(input=logits, axis=1),

         # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
         # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
 }

   ....

   eval_metric_ops = {
       "accuracy": tf.metrics.accuracy(
       labels=labels, predictions=predictions["classes"])}

  return tf.estimator.EstimatorSpec(
   mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

  

Prepare Training/Evaluating Process

Step 1:  Loading Training and Test Data

def main(unused_argv):
 # Load training and eval data
 mnist = tf.contrib.learn.datasets.load_dataset("mnist")

 # training part
 train_data = mnist.train.images # Returns np.array
 train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
 
 # testing part
 eval_data = mnist.test.images # Returns np.array
 eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

Step 2: Create the Estimator

Purpose
Provide high-level operators of training, evaluation, inference for our model. Remember, the estimator will be created based on the EstimatorSpec defined in cnn_model_fn function.
Code
def main(unused_argv):
   # Load training and eval data
   ...

  # Create the Estimator
  mnist_classifier = tf.estimator.Estimator(
       model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")


Step 3: Setup a Logging Hook for Tracking the Process

Purpose
Since CNNs can take a while to train, We use tf.train.LoggingTensorHook to track the training process. Simplily, we show the the training progress.

Here we log the probability values from softmax layer of our CNN .
Code
def main(unused_argv):
   # Load training and eval data
   ...

  # Create the Estimator
  mnist_classifier = ...
  # Set up logging for predictions (log every 50 steps of training)
   tensors_to_log = {"probabilities": "softmax_tensor"}
   logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=50)

def cnn_model_fn(features, labels, mode):

   ...
   # <建立基於神經網路的 prediction 機制>
    predictions = {
         # Generate predictions (for PREDICT and EVAL mode)
         "classes": tf.argmax(input=logits, axis=1),

         # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
         # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
 }

Step 4: Training the Model

Purpose
Go to training the model by feeding the training set.
Code
def main(unused_argv):

 # Load training and eval data
 mnist = tf.contrib.learn.datasets.load_dataset("mnist")
 train_data = mnist.train.images  # Returns np.array
 train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
 eval_data = mnist.test.images  # Returns np.array
 eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

# Create the Estimator
 mnist_classifier = tf.estimator.Estimator(
     model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

 # Set up logging for predictions
...

# Create the training data set
 train_input_fn = tf.estimator.inputs.numpy_input_fn(
     x={"x": train_data},
     y=train_labels,
     batch_size=100,
     num_epochs=None,
     shuffle=True)

 # Train the model
 mnist_classifier.train(
     input_fn=train_input_fn,
     steps=20000,
     hooks=[logging_hook])

Step 5: Evaluate the Model

Purpose
Go to evaluate the model by feeding the testing set.
Code
def main(unused_argv):

 # Load training and eval data
 mnist = tf.contrib.learn.datasets.load_dataset("mnist")
 train_data = mnist.train.images  # Returns np.array
 train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
 eval_data = mnist.test.images  # Returns np.array
 eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

# Create the Estimator
 mnist_classifier = tf.estimator.Estimator(
     model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

# Create the training data set
..
# Train the model
...

# Evaluate the model and print results
 eval_input_fn = tf.estimator.inputs.numpy_input_fn(
     x={"x": eval_data},
     y=eval_labels,
     num_epochs=1,
     shuffle=False)

eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

Setup Python Environment

  1. [install, Anaconda] How to install TensorFlow using Anaconda (view)

Run

Command
source activate tensorflow
python cnn_mnist.py

E.g.

References

  1. CS231n Convolutional Neural Networks for Visual Recognition (最好的 CNN structure 說明), https://cs231n.github.io/convolutional-networks/
  2. [ts, digital] MNIST for ML Beginners -- Digital number recognition, https://docs.google.com/document/d/1Rlj86PFq_--5DUVJ2ce3zCS9GWbzHdg8bKjEEbQphig/edit?usp=sharing

Further Reading

  1. Dropout layer 為何有用 (ref)