导入相关包
由于代码是在jupyter notebook中实现的,下面的‘%matplotlib inline’命令用于将图画在该页面上,不用jupyter notebook的话删掉改行代码即可。
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from tensorflow.examples.tutorials.mnist import input_data
导入数据
该部分用于下载和导入MNIST数据集,通过tensorflow内置的模块实现,其中‘input_data/’是用于存放MNIST数据的相对路径。
下面的X和Y_都不是具体的值,它们只是在tensorflow中声明的占位符(placeholder),可以在tensorflow运行某一计算时赋予相应的值,值的类型和规模等参数都可以在tf.placeholder()中声明,比如在下面的代码中,我们需要告诉tensorflow,模型的输入X的类型是float32,并且规模(shape)是[None,28,28,1]的一个张量(tensor),其中,None在下面的代码中表示数据一个batch的大小,28、28、1分别表示输入图片的宽、高和通道数。由于我们模型 的输出有10个类别,所以Y_的shape是[None,10]。
mnist = input_data.read_data_sets('input_data/',one_hot=True,reshape=False)
X = tf.placeholder(tf.float32,[None,28,28,1])
Y_ = tf.placeholder(tf.float32,[None,10])
Extracting input_data/train-images-idx3-ubyte.gz
Extracting input_data/train-labels-idx1-ubyte.gz
Extracting input_data/t10k-images-idx3-ubyte.gz
Extracting input_data/t10k-labels-idx1-ubyte.gz
参数初始化
该部分用于整个模型中参数的初始化,包括权重和偏置项,用到tensorflow中的Variable(),前两层是卷积层,以W2为例,四个维度[5,5,K,L]表示该层卷积核是5*5的,输入通道有K(=32)个,输出通道有L(=64)个
K = 32
L = 64
M = 1024
W1 = tf.Variable(tf.truncated_normal([5,5,1,K],stddev=0.1))
B1 = tf.Variable(tf.constant(0.1,tf.float32,[K]))
W2 = tf.Variable(tf.truncated_normal([5,5,K,L],stddev=0.1))
B2 = tf.Variable(tf.constant(0.1,tf.float32,[L]))
W3 = tf.Variable(tf.truncated_normal([7*7*L,M],stddev=0.1))
B3 = tf.Variable(tf.constant(0.1,tf.float32,[M]))
W4 = tf.Variable(tf.truncated_normal([M,10],stddev=0.1))
B4 = tf.Variable(tf.constant(0.1,tf.float32,[10]))
模型构建
模型包括两个卷积层和一个全连接层,最后接一个softmax层用于多分类,并在全连接层后加了dropout,需要注意的是,在最后一个卷积层和全连接层前需要先将输出reshape一个行向量,否则会导致全连接层矩阵相乘时维度不一致。
keep_prob = tf.placeholder(tf.float32)
conv1 = tf.nn.relu(tf.nn.conv2d(X,W1,strides=[1,1,1,1],padding='SAME')+B1)
pool1 = tf.nn.max_pool(conv1,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
conv2 = tf.nn.relu(tf.nn.conv2d(pool1,W2,strides=[1,1,1,1],padding='SAME')+B2)
pool2 = tf.nn.max_pool(conv2,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
pool2_flat = tf.reshape(pool2,[-1,7*7*L])
fc1 = tf.nn.relu(tf.matmul(pool2_flat,W3)+B3)
fc1_drop = tf.nn.dropout(fc1,keep_prob)
Ylogits = tf.matmul(fc1_drop,W4)+B4
Y = tf.nn.softmax(Ylogits)
损失函数
代码中采用的是交叉信息熵损失(cross entropy)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits,labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy)*100.0
准确率
准确率等于预测正确的样本数目除以总测试样本数目,需要注意的是,第一行tf.equal()用于判断预测的类别和实际类别是否相同,相同返回True,不同返回False,是布尔型,因此在第二行代码计算准确率之前需要先转化成float型(通过tf.cast())函数实现。
is_accuracy = tf.equal(tf.argmax(Y_,1),tf.argmax(Y,1))
accuracy = tf.reduce_mean(tf.cast(is_accuracy,tf.float32))
模型训练
该部分实现整个模型的训练和结果的保存,代码中应注意以下几点:
1. 每次迭代的学习率是变化的(指数衰减),因此学习率需要用lr = tf.placeholder(tf.float32)来声明。
2. 训练过程(包括参数初始化)都需要在Session()中执行,如:sess.run(init)表示运行参数初始化。
3. 训练相关数据(包括一个batch的训练样本、本次迭代的学习率、dropout中的keep_prob参数)和测试数据以字典形式通过feed_dict参数传给session。
epoch = 1000
batch = 100
train_acc = []
train_loss = []
test_acc = []
test_loss = []
lr = tf.placeholder(tf.float32)
optimizer = tf.train.AdamOptimizer(lr)
train_step = optimizer.minimize(cross_entropy)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in xrange(epoch):
max_lr = 0.003
min_lr = 0.0001
decay_speed = 2000.0
learning_rate = min_lr+(max_lr-min_lr)*np.math.exp(-i/decay_speed)
batch_X,batch_Y = mnist.train.next_batch(batch)
train_data = {X:batch_X, Y_:batch_Y,lr:learning_rate, keep_prob:0.5}
sess.run(train_step,feed_dict=train_data)
acc,loss = sess.run([accuracy,cross_entropy],feed_dict=train_data)
train_acc.append(acc)
train_loss.append(loss)
test_data = {X:mnist.test.images,Y_:mnist.test.labels,keep_prob:1}
acc,loss = sess.run([accuracy,cross_entropy],feed_dict=test_data)
test_acc.append(acc)
test_loss.append(loss)
if i%100 == 0:
print "epoch = %d, " %i, "test accuracy = %.4f," %test_acc[i], \
"test loss = %.6f" %test_loss[i], "learning rate = %.6f" %learning_rate
print "test accuracy = %.4f, " %test_acc[-1], "test loss = %.6f" %test_loss[-1]
epoch = 0, test accuracy = 0.1135, test loss = 3775.333008 learning rate = 0.003000
epoch = 100, test accuracy = 0.9294, test loss = 22.953661 learning rate = 0.002859
epoch = 200, test accuracy = 0.9523, test loss = 15.348906 learning rate = 0.002724
epoch = 300, test accuracy = 0.9624, test loss = 11.892620 learning rate = 0.002596
epoch = 400, test accuracy = 0.9683, test loss = 9.448619 learning rate = 0.002474
epoch = 500, test accuracy = 0.9740, test loss = 7.751043 learning rate = 0.002359
epoch = 600, test accuracy = 0.9741, test loss = 7.173435 learning rate = 0.002248
epoch = 700, test accuracy = 0.9811, test loss = 5.837584 learning rate = 0.002144
epoch = 800, test accuracy = 0.9800, test loss = 5.772215 learning rate = 0.002044
epoch = 900, test accuracy = 0.9765, test loss = 7.619859 learning rate = 0.001949
test accuracy = 0.9833, test loss = 5.115160
训练过程可视化
包括训练过程中准确率和损失的变化情况
plt.figure()
plt.plot(train_acc,'r',label='train_acc')
plt.plot(test_acc,'b',label='test_acc')
plt.legend()
plt.axis([0,epoch,0,1])
plt.show()
plt.figure()
plt.plot(train_loss,'r',label='train_loss')
plt.plot(test_loss,'b',label='test_loss')
plt.legend()
plt.axis([0,epoch,0,100])
plt.show()
参考文献
- Tensorflow and deep learning - without a PhD(需要翻墙)
- Tensorflow and deep learning - without a PhD视频+PPT
- Tensorflow and deep learning - without a PhD代码
- Tensorflow and deep learning - without a PhD翻译