\quad首先需要介绍一下AlexNet,2012年Hinton的学生Alex Krizhevsky提出了深度卷积神经网络模型AlexNet,它可以算是LeNet的一种更宽更深的版本。AlexNet中包含了几个技术点,也首次在CNN中成功应用了ReLU,Dropout和LRN等Trick。同时AlexNet也使用了GPU进行运算加速,作者开源了他们在GPU上训练神经网络的CUDA代码。AlexNet包含了6亿3000万个连接,6000万个参数和65万个神经元,拥有5个卷积层,其中3个卷积层后面连接了连接了最大池化层,最后还有3个全连接层。AlexNet以显著的优势赢得了竞争激烈的ILSVRC 2012的比赛,top-5的错误率降低到了16.4%,相比第二名的成绩26.2%错误率有了巨大的提升。AlexNet可以说是神经网络在低谷期后的第一次发声,确立了深度学习在计算机视觉的统治地位,同时也推动了深度学习在语音识别,自然语言处理,强化学习等领域的拓展。
\quadAlexNet将Lenet的思想发扬光大,把CNN的基本原理应用到了很深很宽的网络中。AlexNet主要用到的新技术如下:
#coding=utf-8
from datetime import datetime
import math
import time
import tensorflow as tf
#设置batch_size=32,num_batches为100
batch_size = 32
num_batches = 100
def print_activations(t):print(t.op.name, ' ', t.get_shape().as_list())
def inference(images):parameters = []with tf.name_scope('conv1') as scope:#第一个卷积层kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='bias')bias = tf.nn.bias_add(conv, biases)conv1 = tf.nn.relu(bias, name=scope)print_activations(conv1)parameters += [kernel, biases]
\quad 在第一个卷积层后面再添加LRN层和最大池化层。先使用tf.nn.lrn对前面输出的tensor conv1进行LRN处理,这里使用的depth_radius设为4,bias设为1,alpha为0.001/9,beta为0.75,基本都是AlexNet的论文中的推荐值,不过目前除了AlexNet,其他经典的卷积神经网络模型基本都放弃了LRN(主要是效果不明显),而我们使用LRN也会让前馈,反馈的速度大大下降(整体速度下降到1/3),读者可以自主选择是否使用LRN。下面使用tf.nn.max_pool对前面的输出lrn1进行最大池化处理,这里的池化尺寸为3 ×\times× 3,即将3 ×\times× 3大小的像素块降为1 ×\times× 1的像素,取样的步长为2 ×\times× 2,padding模式为VALID,即取样时不能超过边框,不像SAME模式那样可以填充边界外的点。最后将输出结果pool1的结构打印出来。
#添加LRN和最大池化层lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn1')pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool1')print_activations(pool1)
\quad接下来设计第2个卷积层,卷积核的尺寸为5 ×\times× 5,输入通道数(即上一层的输出通道数,也就是上一层的卷积核数量)为64,卷积核数量为192。同时,卷积的步长也全部设为1,即扫描全图像素。
#第二个卷积层with tf.name_scope('conv2') as scope:kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32), trainable=True, name='biases')bias = tf.nn.bias_add(conv, biases)conv2 = tf.nn.relu(bias, name=scope)parameters += [kernel, biases]print_activations(conv2)
\quad接下来对第2个卷积层的输出conv2进行处理,同样是先做LRN处理,再进行最大池化处理,参数和之前完全一样
#对conv2处理lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn2')pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool2')print_activations(pool2)
\quad下面创建第3个卷积层,基本结构和前面两个类似,也只是参数不同。这一层的卷积核尺寸为3 ×\times× 3,输入的通道数为192,卷积核数量继续扩大为384,同时卷积的步长全为1,其他地方和前面保持一致。
#第3个卷积层with tf.name_scope('conv3') as scope:kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32), trainable=True, name='biases')bias = tf.nn.bias_add(conv, biases)conv3 = tf.nn.relu(bias, name=scope)parameters += [kernel, biases]print_activations(conv3)
\quad第4个卷积层和之前也类似,这一层的卷积核尺寸为3 ×\times× 3,输入通道数384,但是卷积核尺寸数量将为256。
#第4个卷积层with tf.name_scope('conv4') as scope:kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')bias = tf.nn.bias_add(conv, biases)conv4 = tf.nn.relu(bias, name=scope)parameters += [kernel, biases]print_activations(conv4)
\quad最后的第5个卷积层同样是3 ×\times× 3大小的卷积核,输入通道数为256,卷积核数量也为256。
#第5个卷积层with tf.name_scope('conv5') as scope:kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')bias = tf.nn.bias_add(conv, biases)conv5 = tf.nn.relu(bias, name=scope)parameters += [kernel, biases]print_activations(conv5)
\quad在第5个卷积层之后,还有一个最大池化层,这个池化层和前两个卷积层一致,最后我们返回这个池化层的输出pool5。至此,inference函数就完成了,它可以创建AlexNet的卷积部分。在正式使用AlexNet来训练或预测时,还需要添加3个全连接层,隐含节点分别为4096,4096和1000。由于最后3个全连接层的计算量很小,所以没放到速度测评中,他们对耗时的影响很小。
#最大池化层pool5pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool5')print_activations(pool5)#未添加全连接层,因为对计算耗时影响小return pool5, parameters
#定义AlexNet的每轮时间评估函数
def time_tensorflow_run(session, target, info_string):num_steps_burn_in = 10 #程序预热total_durations = 0.0total_duration_squared = 0.0for i in range(num_batches + num_steps_burn_in):start_time = time.time()_ = session.run(target)duration = time.time() - start_timeif i >= num_steps_burn_in:if not i % 10:print('%s: step %d, duration = %.3f'%(datetime.now(), i - num_steps_burn_in, duration))total_durations += durationtotal_duration_squared += duration * duration#计算每轮迭代的平均耗时和标准差sd,最后将结果显示出来mn = total_durations / num_batchesvr = total_duration_squared / num_batches - mn * mnsd = math.sqrt(vr)print('%s: %s across %d steps, %.3f +/- %.3f sec / batch'%(datetime.now(), info_string, num_batches, mn, sd))
#定义主函数run_benchmark
def run_benchmark():with tf.Graph().as_default():image_size = 224images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))pool5, parameters = inference(images)init = tf.global_variables_initializer()sess = tf.Session()sess.run(init)time_tensorflow_run(sess, pool5, "Foward")objective = tf.nn.l2_loss(pool5)grad = tf.gradients(objective, parameters)time_tensorflow_run(sess, grad, "Foward-backward")run_benchmark()
算法实现的完整代码
#coding=utf-8
from datetime import datetime
import math
import time
import tensorflow as tf
#设置batch_size=32,num_batches为100
batch_size = 32
num_batches = 100
#定义一个现实网络每一层结构的函数print_actications,展示每一个卷积层或池化层输出tensor的尺寸。
def print_activations(t):print(t.op.name, ' ', t.get_shape().as_list())
#设计Alexnet的网络结构
def inference(images):parameters = []with tf.name_scope('conv1') as scope:#第一个卷积层kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='bias')bias = tf.nn.bias_add(conv, biases)conv1 = tf.nn.relu(bias, name=scope)print_activations(conv1)parameters += [kernel, biases]#添加LRN和最大池化层lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn1')pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool1')print_activations(pool1)#第二个卷积层with tf.name_scope('conv2') as scope:kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32), trainable=True, name='biases')bias = tf.nn.bias_add(conv, biases)conv2 = tf.nn.relu(bias, name=scope)parameters += [kernel, biases]print_activations(conv2)#对conv2处理lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn2')pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool2')print_activations(pool2)#第3个卷积层with tf.name_scope('conv3') as scope:kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32), trainable=True, name='biases')bias = tf.nn.bias_add(conv, biases)conv3 = tf.nn.relu(bias, name=scope)parameters += [kernel, biases]print_activations(conv3)#第4个卷积层with tf.name_scope('conv4') as scope:kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')bias = tf.nn.bias_add(conv, biases)conv4 = tf.nn.relu(bias, name=scope)parameters += [kernel, biases]print_activations(conv4)#第5个卷积层with tf.name_scope('conv5') as scope:kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights')conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')bias = tf.nn.bias_add(conv, biases)conv5 = tf.nn.relu(bias, name=scope)parameters += [kernel, biases]print_activations(conv5)#最大池化层pool5pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool5')print_activations(pool5)#未添加全连接层,因为对计算耗时影响小return pool5, parameters#定义AlexNet的每轮时间评估函数
def time_tensorflow_run(session, target, info_string):num_steps_burn_in = 10 #程序预热total_durations = 0.0total_duration_squared = 0.0for i in range(num_batches + num_steps_burn_in):start_time = time.time()_ = session.run(target)duration = time.time() - start_timeif i >= num_steps_burn_in:if not i % 10:print('%s: step %d, duration = %.3f'%(datetime.now(), i - num_steps_burn_in, duration))total_durations += durationtotal_duration_squared += duration * duration#计算每轮迭代的平均耗时和标准差sd,最后将结果显示出来mn = total_durations / num_batchesvr = total_duration_squared / num_batches - mn * mnsd = math.sqrt(vr)print('%s: %s across %d steps, %.3f +/- %.3f sec / batch'%(datetime.now(), info_string, num_batches, mn, sd))
#定义主函数run_benchmark
def run_benchmark():with tf.Graph().as_default():image_size = 224images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))pool5, parameters = inference(images)init = tf.global_variables_initializer()sess = tf.Session()sess.run(init)time_tensorflow_run(sess, pool5, "Foward")objective = tf.nn.l2_loss(pool5)grad = tf.gradients(objective, parameters)time_tensorflow_run(sess, grad, "Foward-backward")run_benchmark()