A Guide to TF Layers: Building a Convolution Neural Network
井民全, Jing ,mqjing@gmail.com
Back to Robot Main Page
ETA: 60 min
Training ETA: > 5 hours (depends on your machine)
Google doc: This document
Preface
如果你玩過 mnist_softmax[3], 你應該知道它的辨識率大約落在 91% ~ 92% 之間, 若你使用一個更精巧的 CNN 模型 (更多 layers, 更多的神經節點, 更多的神經聯結), 那結果會提升到 97.3% 附近.
網路上有許多寫得非常好的 CNN 教學文件, 這份文件就是參考官方文件: A Guide to TF Layers: Building a Convolutional Neural Network [1], 另外, 你也應該知道 CNN 模型中, 說明每一個 layer shape 最好的文件是 CS231n Convolutional Neural Networks for Visual Recognition [2]. 所以, 有興趣的朋友, 可以直接閱讀官方文件, 就可以得到你想要的東西. 除非你是想看我的版本.
利用這份文件, 你可以開始撰寫第一支 CNN tensorflow 程式. 我把它變成 step by step 步驟. 按照這個步驟可以在 60 分鐘內, 完成下列目標:
- 自己建立 CNN model 的能力
- 了解 CNN 模型中, 每一個 layer 的 shape (intput volume 與 output volume) 以及各個 layer 之間的關係.
- 玩的能力:
- 每一個建立 tensorflow 開發環境.
- 完成官方教學文件的 tensorflow 程式碼, 開始玩 CNN
問題探討
- 不同的 model 參數, 對結果的影響為何?
Key Point
- 每一個 layer 的 input volume (shape) 與 output volume (shape) 的詳細數字, 是搞懂的關鍵.
Table of Contents
Show Me the Code
GitHub: cnn_mnist.py
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
# # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """Convolutional Neural Network Estimator for MNIST, built with tf.layers.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function import numpy as np import tensorflow as tf tf.logging.set_verbosity(tf.logging.INFO) def cnn_model_fn(features, labels, mode): """Model function for CNN.""" # Input Layer # Reshape X to 4-D tensor: [batch_size, width, height, channels] # MNIST images are 28x28 pixels, and have one color channel input_layer = tf.reshape(features["x"], [-1, 28, 28, 1]) # Convolutional Layer #1 # Computes 32 features using a 5x5 filter with ReLU activation. # Padding is added to preserve width and height. # Input Tensor Shape: [batch_size, 28, 28, 1] # Output Tensor Shape: [batch_size, 28, 28, 32] conv1 = tf.layers.conv2d( inputs=input_layer, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) # Pooling Layer #1 # First max pooling layer with a 2x2 filter and stride of 2 # Input Tensor Shape: [batch_size, 28, 28, 32] # Output Tensor Shape: [batch_size, 14, 14, 32] pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2) # Convolutional Layer #2 # Computes 64 features using a 5x5 filter. # Padding is added to preserve width and height. # Input Tensor Shape: [batch_size, 14, 14, 32] # Output Tensor Shape: [batch_size, 14, 14, 64] conv2 = tf.layers.conv2d( inputs=pool1, filters=64, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) # Pooling Layer #2 # Second max pooling layer with a 2x2 filter and stride of 2 # Input Tensor Shape: [batch_size, 14, 14, 64] # Output Tensor Shape: [batch_size, 7, 7, 64] pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2) # Flatten tensor into a batch of vectors # Input Tensor Shape: [batch_size, 7, 7, 64] # Output Tensor Shape: [batch_size, 7 * 7 * 64] pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64]) # Dense Layer #1
# Densely connected layer with 1024 neurons
# Input Tensor Shape: [batch_size, 7 * 7 * 64] # Output Tensor Shape: [batch_size, 1024] dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu) # Add dropout operation; 0.6 probability that element will be kept dropout = tf.layers.dropout( inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)
# Dense Layer #2
# Logits layer
# Input Tensor Shape: [batch_size, 1024] # Output Tensor Shape: [batch_size, 10] logits = tf.layers.dense(inputs=dropout, units=10) predictions = { # Generate predictions (for PREDICT and EVAL mode) "classes": tf.argmax(input=logits, axis=1), # Add `softmax_tensor` to the graph. It is used for PREDICT and by the # `logging_hook`. "probabilities": tf.nn.softmax(logits, name="softmax_tensor") }
# Define the PREDICT
if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions) # Define the LOSS function onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10) loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits) # Training Op (for TRAIN mode) if mode == tf.estimator.ModeKeys.TRAIN: optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001) train_op = optimizer.minimize( loss=loss, global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op) # Evaluation metrics (for EVAL mode) eval_metric_ops = { "accuracy": tf.metrics.accuracy( labels=labels, predictions=predictions["classes"])} return tf.estimator.EstimatorSpec( mode=mode, loss=loss, eval_metric_ops=eval_metric_ops) def main(unused_argv): # Load training and eval data mnist = tf.contrib.learn.datasets.load_dataset("mnist") train_data = mnist.train.images # Returns np.array train_labels = np.asarray(mnist.train.labels, dtype=np.int32) eval_data = mnist.test.images # Returns np.array eval_labels = np.asarray(mnist.test.labels, dtype=np.int32) # Create the Estimator mnist_classifier = tf.estimator.Estimator( model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model") # Set up logging for predictions # Log the values in the "Softmax" tensor with label "probabilities" tensors_to_log = {"probabilities": "softmax_tensor"} logging_hook = tf.train.LoggingTensorHook( tensors=tensors_to_log, every_n_iter=50) # Create the training data set train_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": train_data}, y=train_labels, batch_size=100, num_epochs=None, shuffle=True)
# Train the model
mnist_classifier.train( input_fn=train_input_fn, steps=20000, hooks=[logging_hook]) # create the evaluation data set eval_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": eval_data}, y=eval_labels, num_epochs=1, shuffle=False)
# Evaluate the model and print results
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn) print(eval_results) if __name__ == "__main__": tf.app.run() |
Concept
What is CNN
CNNs apply a series of filters to the raw pixel data of an image to extract and learn higher-level features, which the model can then use for classification.
Three Components of the Model
Convolutional layers
- Purpose: Feature extraction
- Input: a specified number of convolution filters to the image
- Output: a single value in the output feature map. And then apply ReLU activation function (激活函數) to the map to generate the nonlinearies to the model.
Pooling layers
- Prupose: Reduce the dimensionality of the feature map
- Algo.: max pooling
Dense (fully connected) layers
- Purpose: Perform classification on features
- Input: the values from convolution layers and downsampled by the pooling layers
- Output: with softmax activation function to generate the probabilities to the classes for an input image
Define the CNN Essential Module
An Overview of the Structure
How to Calcuate the Output Volume
(W-F+2P)/S+1
公式原理只是簡單的移動取樣問題.
W, input layer's dimension
F, filter size or the kernel size
P, the number of the padding "0" arround the image: 0
S, stride, the number pixel of moving step: 2
Example
(W-F+2P)/S+1 = 14
W: 28
F: 2 (filter: 2x2)
P: 0 (pad: 0)
S: 2 (stripe: 2)
|
Shape
Fig. The Output Volume for Pooling Layer #1.
Step 1: Define the CNN MNIST Classifier
Modules
|
Purpose
|
Argument
|
Output Shape
| |
Input Layer
|
Represent the input picture
|
W = 28 (image size)
|
{28, 28, 1}
| |
Code
| ||||
Convolutional Layer #1
|
Feature extraction
|
W = 28 (image size)
Filter (kernel size) = 5x5
Depth (Filter #) = 32
Padding = same
Stride = 1
AF: ReLu
Stride: When the stride is 1 then we move the filters one pixel at a time.
|
{28, 28, 32}
| |
Code
| ||||
Pooling Layer #1
|
Downsampling
|
Pool size = 2x2
Stride = 2
Padding = Valid
|
{14, 14, 32}
| |
Code
| ||||
Convolutional Layer #2
|
Feature extraction
|
Filter: 5x5
A.F.: ReLu
depth = 64
Padding = same
Stride = 1
AF: ReLu
|
{14, 14, 64}
| |
Code
| ||||
Pooling Layer #2
|
Downsampling
|
Pooling Size: 2x2
Stride: 2
Padding = Valid
|
{7, 7, 64}
| |
Code
| ||||
Dense Layer #1
|
Flat the feature vector
Dropout layter: prevent overfit
|
{7 x 7 x 64 = 3136}
-> {1024}
| ||
Code
Dropout the nodes
Purpose
Fig. The standard network (Left), The dropped out network (Right) that edge number reduce to 2^n.
| ||||
Dense Layer #2,
Logits
|
Classification
|
Num:10
Act: Linear
|
{10}
| |
code
| ||||
Step 2: Define Prediction Model
Purpose
Create a prediction model for handwriting digital application based on the logits tensor with the highest raw value and the probabilities from the logits layer.
Procedure
Step 1: Get the index for the highest raw value
Step 2: Get the probabilities
predictions = {
# Generate predictions (for PREDICT and EVAL mode) "classes": tf.argmax(input=logits, axis=1), # Add `softmax_tensor` to the graph. It is used for PREDICT and by the # `logging_hook`. "probabilities": tf.nn.softmax(logits, name="softmax_tensor") } |
Step 3: Create a prodict model based on the current setting
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
|
Code
def cnn_model_fn(features, labels, mode):
# <組建神經網路模型的 code> ...
logits = ...
# <建立基於神經網路的 prediction 機制>
predictions = {
# Generate predictions (for PREDICT and EVAL mode) "classes": tf.argmax(input=logits, axis=1), # Add `softmax_tensor` to the graph. It is used for PREDICT and by the # `logging_hook`. "probabilities": tf.nn.softmax(logits, name="softmax_tensor") }
# for prdict mode
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
|
Step 3: Define Loss Function
Purpose
Measure how closely the model's predictions match the target classes. For multiclass classification problem, we use cross entropy as the loss metric.
Code
# Define the LOSS function
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10) loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits) |
Step 4: Define Estimator Spec
Define Training Spec
Purpose
Via training, we configure the model to optimize the loss value. Here, the stochastic gradient descent was used as the optimization algorithm.
Code
def cnn_model_fn(features, labels, mode):
# <組建神經網路模型的 code> ...
logits = ...
# <目前模型與實際的差距函數> ...
loss = ...
# <使用 GrandientDescent 最佳化模型>
if mode == tf.estimator.ModeKeys.TRAIN:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(
loss=loss,
global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
|
Define Evaluation Spec
Purpose
Provide accuracy metric in our model.
Code
def cnn_model_fn(features, labels, mode):
# <組建神經網路模型的 code> ...
logits = ...
# <目前模型與實際的差距函數> ...
loss = ...
# <建立基於神經網路的 prediction 機制>
predictions = {
# Generate predictions (for PREDICT and EVAL mode) "classes": tf.argmax(input=logits, axis=1), # Add `softmax_tensor` to the graph. It is used for PREDICT and by the # `logging_hook`. "probabilities": tf.nn.softmax(logits, name="softmax_tensor") }
....
eval_metric_ops = {
"accuracy": tf.metrics.accuracy(
labels=labels, predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(
mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)
|
Prepare Training/Evaluating Process
Step 1: Loading Training and Test Data
def main(unused_argv):
# Load training and eval data
mnist = tf.contrib.learn.datasets.load_dataset("mnist")
# training part
train_data = mnist.train.images # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
# testing part
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
|
Step 2: Create the Estimator
Purpose
Provide high-level operators of training, evaluation, inference for our model. Remember, the estimator will be created based on the EstimatorSpec defined in cnn_model_fn function.
Code
def main(unused_argv):
# Load training and eval data
...
# Create the Estimator
mnist_classifier = tf.estimator.Estimator(
model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")
|
Step 3: Setup a Logging Hook for Tracking the Process
Purpose
Since CNNs can take a while to train, We use tf.train.LoggingTensorHook to track the training process. Simplily, we show the the training progress.
Here we log the probability values from softmax layer of our CNN .
Code
def main(unused_argv):
# Load training and eval data
...
# Create the Estimator
mnist_classifier = ...
# Set up logging for predictions (log every 50 steps of training)
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=50)
|
def cnn_model_fn(features, labels, mode):
...
# <建立基於神經網路的 prediction 機制>
predictions = {
# Generate predictions (for PREDICT and EVAL mode) "classes": tf.argmax(input=logits, axis=1), # Add `softmax_tensor` to the graph. It is used for PREDICT and by the # `logging_hook`. "probabilities": tf.nn.softmax(logits, name="softmax_tensor") } |
Step 4: Training the Model
Purpose
Go to training the model by feeding the training set.
Code
def main(unused_argv):
# Load training and eval data
mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images # Returns np.array train_labels = np.asarray(mnist.train.labels, dtype=np.int32) eval_data = mnist.test.images # Returns np.array eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
# Create the Estimator
mnist_classifier = tf.estimator.Estimator( model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model") # Set up logging for predictions ... # Create the training data set train_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": train_data}, y=train_labels, batch_size=100, num_epochs=None, shuffle=True)
# Train the model
mnist_classifier.train( input_fn=train_input_fn, steps=20000, hooks=[logging_hook]) |
Step 5: Evaluate the Model
Purpose
Go to evaluate the model by feeding the testing set.
Code
def main(unused_argv):
# Load training and eval data
mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images # Returns np.array train_labels = np.asarray(mnist.train.labels, dtype=np.int32) eval_data = mnist.test.images # Returns np.array eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
# Create the Estimator
mnist_classifier = tf.estimator.Estimator( model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")
# Create the training data set
..
# Train the model
...
# Evaluate the model and print results
eval_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": eval_data}, y=eval_labels, num_epochs=1, shuffle=False)
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)
|
Setup Python Environment
Run
Command
source activate tensorflow
python cnn_mnist.py
|
E.g.
References
- CS231n Convolutional Neural Networks for Visual Recognition (最好的 CNN structure 說明), https://cs231n.github.io/convolutional-networks/
- [ts, digital] MNIST for ML Beginners -- Digital number recognition, https://docs.google.com/document/d/1Rlj86PFq_--5DUVJ2ce3zCS9GWbzHdg8bKjEEbQphig/edit?usp=sharing