A Guide to TF Layers: Building a Convolution Neural Network

井民全, Jing ,mqjing@gmail.com

Fig. A Regular 3-layer Neural Network Model (Source: ref).

Back to Robot Main Page

ETA: 60 min

Training ETA: > 5 hours (depends on your machine)

Google doc: This document

Preface

如果你玩過 mnist_softmax[3], 你應該知道它的辨識率大約落在 91% ~ 92% 之間, 若你使用一個更精巧的 CNN 模型 (更多 layers, 更多的神經節點, 更多的神經聯結), 那結果會提升到 97.3% 附近.

網路上有許多寫得非常好的 CNN 教學文件, 這份文件就是參考官方文件: A Guide to TF Layers: Building a Convolutional Neural Network [1], 另外, 你也應該知道 CNN 模型中, 說明每一個 layer shape 最好的文件是 CS231n Convolutional Neural Networks for Visual Recognition [2]. 所以, 有興趣的朋友, 可以直接閱讀官方文件, 就可以得到你想要的東西. 除非你是想看我的版本.

利用這份文件, 你可以開始撰寫第一支 CNN tensorflow 程式. 我把它變成 step by step 步驟. 按照這個步驟可以在 60 分鐘內, 完成下列目標:

自己建立 CNN model 的能力

了解 CNN 模型中, 每一個 layer 的 shape (intput volume 與 output volume) 以及各個 layer 之間的關係.

玩的能力:

每一個建立 tensorflow 開發環境.
完成官方教學文件的 tensorflow 程式碼, 開始玩 CNN

問題探討

不同的 model 參數, 對結果的影響為何?

Key Point

每一個 layer 的 input volume (shape) 與 output volume (shape) 的詳細數字, 是搞懂的關鍵.
Shape 的計算 (W-F+2P)/S + 1 是計算的關鍵 (ref).



Convolutional layers 8

Pooling layers 8

Dense (fully connected) layers 8

The Classifer Structure 8

How to Calcuate the Output Volume 9

Let's Build the CNN MNIST Classifier 10

Create the Prediction Model based on the Classifer 17

Define the Loss Function 19

Create Estimator Spec 19

Training Estimator Spec 19

Evaluation Estimator Spec 20

Loading Training and Test Data 21

Create the Estimator 21

Setup a Logging Hook 22

Running 23

Training the Model 23

Evaluate the Model 24

Setup Python Environment 25

Run 25

References 27

Show Me the Code

GitHub: cnn_mnist.py

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Convolutional Neural Network Estimator for MNIST, built with tf.layers."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

def cnn_model_fn(features, labels, mode):
"""Model function for CNN."""
# Input Layer
# Reshape X to 4-D tensor: [batch_size, width, height, channels]
# MNIST images are 28x28 pixels, and have one color channel
input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

# Convolutional Layer #1
# Computes 32 features using a 5x5 filter with ReLU activation.
# Padding is added to preserve width and height.
# Input Tensor Shape: [batch_size, 28, 28, 1]
# Output Tensor Shape: [batch_size, 28, 28, 32]
conv1 = tf.layers.conv2d(
     inputs=input_layer,
     filters=32,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)

# Pooling Layer #1
# First max pooling layer with a 2x2 filter and stride of 2
# Input Tensor Shape: [batch_size, 28, 28, 32]
# Output Tensor Shape: [batch_size, 14, 14, 32]
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

# Convolutional Layer #2
# Computes 64 features using a 5x5 filter.
# Padding is added to preserve width and height.
# Input Tensor Shape: [batch_size, 14, 14, 32]
# Output Tensor Shape: [batch_size, 14, 14, 64]
conv2 = tf.layers.conv2d(
     inputs=pool1,
     filters=64,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)

# Pooling Layer #2
# Second max pooling layer with a 2x2 filter and stride of 2
# Input Tensor Shape: [batch_size, 14, 14, 64]
# Output Tensor Shape: [batch_size, 7, 7, 64]
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

# Flatten tensor into a batch of vectors
# Input Tensor Shape: [batch_size, 7, 7, 64]
# Output Tensor Shape: [batch_size, 7 * 7 * 64]
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

# Dense Layer #1

# Densely connected layer with 1024 neurons
# Input Tensor Shape: [batch_size, 7 * 7 * 64]
# Output Tensor Shape: [batch_size, 1024]
dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

# Add dropout operation; 0.6 probability that element will be kept
dropout = tf.layers.dropout(
inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

# Dense Layer #2

# Logits layer
# Input Tensor Shape: [batch_size, 1024]
# Output Tensor Shape: [batch_size, 10]
logits = tf.layers.dense(inputs=dropout, units=10)

predictions = {
     # Generate predictions (for PREDICT and EVAL mode)
     "classes": tf.argmax(input=logits, axis=1),
     # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
     # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}

# Define the PREDICT
if mode == tf.estimator.ModeKeys.PREDICT:
   return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

# Define the LOSS function
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)

# Training Op (for TRAIN mode)
if mode == tf.estimator.ModeKeys.TRAIN:
   optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
   train_op = optimizer.minimize(
       loss=loss,
       global_step=tf.train.get_global_step())
   return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

# Evaluation metrics (for EVAL mode)
eval_metric_ops = {
     "accuracy": tf.metrics.accuracy(
         labels=labels, predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(
     mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

def main(unused_argv):
# Load training and eval data
mnist = tf.contrib.learn.datasets.load_dataset("mnist")

train_data = mnist.train.images # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)

eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

# Create the Estimator
mnist_classifier = tf.estimator.Estimator(
     model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

# Set up logging for predictions
# Log the values in the "Softmax" tensor with label "probabilities"
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(
     tensors=tensors_to_log, every_n_iter=50)

# Create the training data set
train_input_fn = tf.estimator.inputs.numpy_input_fn(
     x={"x": train_data},
     y=train_labels,
     batch_size=100,
     num_epochs=None,
     shuffle=True)

# Train the model
mnist_classifier.train(
     input_fn=train_input_fn,
     steps=20000,
     hooks=[logging_hook])

# create the evaluation data set
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
     x={"x": eval_data},
     y=eval_labels,
     num_epochs=1,
     shuffle=False)

# Evaluate the model and print results
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

if __name__ == "__main__":
tf.app.run()

Concept

What is CNN

CNNs apply a series of filters to the raw pixel data of an image to extract and learn higher-level features, which the model can then use for classification.

Three Components of the Model

Convolutional layers

Purpose: Feature extraction
Input: a specified number of convolution filters to the image
Output: a single value in the output feature map. And then apply ReLU activation function (激活函數) to the map to generate the nonlinearies to the model.

Fig. A typical ReLU activation function [ref].

Pooling layers

Prupose: Reduce the dimensionality of the feature map
Algo.: max pooling

Dense (fully connected) layers

Purpose: Perform classification on features
Input: the values from convolution layers and downsampled by the pooling layers
Output: with softmax activation function to generate the probabilities to the classes for an input image

Define the CNN Essential Module

An Overview of the Structure

(Edit)

最好的 CNN structure 說明 (view)

How to Calcuate the Output Volume

(W-F+2P)/S+1

(ref)

公式原理只是簡單的移動取樣問題.

W, input layer's dimension

F, filter size or the kernel size

P, the number of the padding "0" arround the image: 0

S, stride, the number pixel of moving step: 2

Example

(W-F+2P)/S+1 = 14

W: 28

F: 2 (filter: 2x2)

P: 0 (pad: 0)

S: 2 (stripe: 2)

Shape

Fig. The Output Volume for Pooling Layer #1.

(Edit)

Step 1: Define the CNN MNIST Classifier

Modules

Purpose

Argument

Output Shape

Input Layer

Represent the input picture

W = 28 (image size)

{28, 28, 1}

Code

# Input Layer

input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

Convolutional Layer #1

Feature extraction

W = 28 (image size)

Filter (kernel size) = 5x5

Depth (Filter #) = 32

Padding = same

Stride = 1

AF: ReLu

Stride: When the stride is 1 then we move the filters one pixel at a time.

{28, 28, 32}

(Edit)

Code

# Convolutional Layer #1

conv1 = tf.layers.conv2d(
     inputs=input_layer,
     filters=32,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)

Pooling Layer #1

Downsampling

Pool size = 2x2

Stride = 2

Padding = Valid

{14, 14, 32}

Code

# Pooling Layer #1

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

Convolutional Layer #2

Feature extraction

Filter: 5x5

A.F.: ReLu

depth = 64

Padding = same

Stride = 1

AF: ReLu

{14, 14, 64}

Code

# Convolutional Layer #2

conv2 = tf.layers.conv2d(

inputs=pool1,

filters=64,

kernel_size=[5, 5],

padding="same",

activation=tf.nn.relu)

Pooling Layer #2

Downsampling

Pooling Size: 2x2

Stride: 2

Padding = Valid

{7, 7, 64}

Code

# Pooling Layer #2

pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

Dense Layer #1

Flat the feature vector

Dropout layter: prevent overfit

{7 x 7 x 64 = 3136}

-> {1024}

Code

pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

dropout = tf.layers.dropout(

inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

Dropout the nodes

Purpose

Reduce overfitting by dropout the node and income/outcome edges.
Speed up the training process

Fig. The standard network (Left), The dropped out network (Right) that edge number reduce to 2^n.

Ref. http://blog.csdn.net/stdcoutzyx/article/details/49022443

Dense Layer #2,

Logits

Classification

Num:10

Act: Linear

{10}

code

# Logits Layer

logits = tf.layers.dense(inputs=dropout, units=10)

Step 2: Define Prediction Model

Purpose

Create a prediction model for handwriting digital application based on the logits tensor with the highest raw value and the probabilities from the logits layer.

Procedure

Step 1: Get the index for the highest raw value

Step 2: Get the probabilities

predictions = {
# Generate predictions (for PREDICT and EVAL mode)
"classes": tf.argmax(input=logits, axis=1),

     # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
     # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}

Step 3: Create a prodict model based on the current setting

if mode == tf.estimator.ModeKeys.PREDICT:

return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

Code

def cnn_model_fn(features, labels, mode):

# <組建神經網路模型的 code> ...

logits = ...

# <建立基於神經網路的 prediction 機制>

    predictions = {
         # Generate predictions (for PREDICT and EVAL mode)
         "classes": tf.argmax(input=logits, axis=1),

         # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
     # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}

# for prdict mode

if mode == tf.estimator.ModeKeys.PREDICT:

return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

Step 3: Define Loss Function

Purpose

Measure how closely the model's predictions match the target classes. For multiclass classification problem, we use cross entropy as the loss metric.

Code

# Define the LOSS function
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)

Step 4: Define Estimator Spec

Define Training Spec

Purpose

Via training, we configure the model to optimize the loss value. Here, the stochastic gradient descent was used as the optimization algorithm.

Code

def cnn_model_fn(features, labels, mode):

# <組建神經網路模型的 code> ...

logits = ...

# <目前模型與實際的差距函數> ...

loss = ...

# <使用 GrandientDescent 最佳化模型>

if mode == tf.estimator.ModeKeys.TRAIN:

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)

train_op = optimizer.minimize(

loss=loss,

global_step=tf.train.get_global_step())

return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

Define Evaluation Spec

Purpose

Provide accuracy metric in our model.

Code

def cnn_model_fn(features, labels, mode):

# <組建神經網路模型的 code> ...

logits = ...

# <目前模型與實際的差距函數> ...

loss = ...

# <建立基於神經網路的 prediction 機制>

    predictions = {
         # Generate predictions (for PREDICT and EVAL mode)
         "classes": tf.argmax(input=logits, axis=1),

         # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
         # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}

....

eval_metric_ops = {

"accuracy": tf.metrics.accuracy(

labels=labels, predictions=predictions["classes"])}

return tf.estimator.EstimatorSpec(

mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

Prepare Training/Evaluating Process

Step 1: Loading Training and Test Data

def main(unused_argv):

# Load training and eval data

mnist = tf.contrib.learn.datasets.load_dataset("mnist")

# training part

train_data = mnist.train.images # Returns np.array

train_labels = np.asarray(mnist.train.labels, dtype=np.int32)

# testing part

eval_data = mnist.test.images # Returns np.array

eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

Step 2: Create the Estimator

Purpose

Provide high-level operators of training, evaluation, inference for our model. Remember, the estimator will be created based on the EstimatorSpec defined in cnn_model_fn function.

Code

def main(unused_argv):

# Load training and eval data

...

# Create the Estimator

mnist_classifier = tf.estimator.Estimator(

model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

Step 3: Setup a Logging Hook for Tracking the Process

Purpose

Since CNNs can take a while to train, We use tf.train.LoggingTensorHook to track the training process. Simplily, we show the the training progress.

Here we log the probability values from softmax layer of our CNN .

Code

def main(unused_argv):

# Load training and eval data

...

# Create the Estimator

mnist_classifier = ...

# Set up logging for predictions (log every 50 steps of training)

tensors_to_log = {"probabilities": "softmax_tensor"}

logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=50)

def cnn_model_fn(features, labels, mode):

...

# <建立基於神經網路的 prediction 機制>

    predictions = {
         # Generate predictions (for PREDICT and EVAL mode)
         "classes": tf.argmax(input=logits, axis=1),

         # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
         # `logging_hook`.
     "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}

Step 4: Training the Model

Purpose

Go to training the model by feeding the training set.

Code

def main(unused_argv):

# Load training and eval data

mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

# Create the Estimator
mnist_classifier = tf.estimator.Estimator(
     model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

# Set up logging for predictions
...

# Create the training data set
train_input_fn = tf.estimator.inputs.numpy_input_fn(
     x={"x": train_data},
     y=train_labels,
     batch_size=100,
     num_epochs=None,
     shuffle=True)

# Train the model
mnist_classifier.train(
     input_fn=train_input_fn,
     steps=20000,
     hooks=[logging_hook])

Step 5: Evaluate the Model

Purpose

Go to evaluate the model by feeding the testing set.

Code

def main(unused_argv):

# Load training and eval data

# Create the Estimator
mnist_classifier = tf.estimator.Estimator(
model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

# Create the training data set
..

# Train the model

...

# Evaluate the model and print results
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
     x={"x": eval_data},
     y=eval_labels,
     num_epochs=1,
     shuffle=False)

eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)

print(eval_results)

Setup Python Environment

[install, Anaconda] How to install TensorFlow using Anaconda (view)

Run

Command

source activate tensorflow

python cnn_mnist.py

E.g.

References

https://www.tensorflow.org/tutorials/layers
CS231n Convolutional Neural Networks for Visual Recognition (最好的 CNN structure 說明), https://cs231n.github.io/convolutional-networks/
[ts, digital] MNIST for ML Beginners -- Digital number recognition, https://docs.google.com/document/d/1Rlj86PFq_--5DUVJ2ce3zCS9GWbzHdg8bKjEEbQphig/edit?usp=sharing

井民全觀點 (Jing's Perspective)

2017年11月29日星期三

A Guide to TF Layers: Building a Convolution Neural Network

Preface

問題探討

Key Point

Table of Contents

Show Me the Code

Concept

What is CNN

Three Components of the Model

Convolutional layers

Pooling layers

Dense (fully connected) layers

Define the CNN Essential Module

An Overview of the Structure

How to Calcuate the Output Volume

Step 1: Define the CNN MNIST Classifier

Step 2: Define Prediction Model

Step 3: Define Loss Function

Step 4: Define Estimator Spec

Define Training Spec

Define Evaluation Spec

Prepare Training/Evaluating Process

Step 1: Loading Training and Test Data

Step 2: Create the Estimator

Step 3: Setup a Logging Hook for Tracking the Process

Step 4: Training the Model

Step 5: Evaluate the Model

Setup Python Environment

Run

References

Further Reading

2017年11月29日 星期三

A Guide to TF Layers: Building a Convolution Neural Network

Preface

問題探討

Key Point

Table of Contents

Show Me the Code

Concept

What is CNN

Three Components of the Model

Convolutional layers

Pooling layers

Dense (fully connected) layers

Define the CNN Essential Module

An Overview of the Structure

How to Calcuate the Output Volume

Step 1: Define the CNN MNIST Classifier

Step 2: Define Prediction Model

Step 3: Define Loss Function

Step 4: Define Estimator Spec

Define Training Spec

Define Evaluation Spec

Prepare Training/Evaluating Process

Step 1: Loading Training and Test Data

Step 2: Create the Estimator

Step 3: Setup a Logging Hook for Tracking the Process

Step 4: Training the Model

Step 5: Evaluate the Model

Setup Python Environment

Run

References

Further Reading

2017年11月29日星期三