TensorFlow人工智能入门教程之十一最强网络DLSTM双向长短期记忆网络（阿里小AI实现）...

2019独角兽企业重金招聘Python工程师标准>>>

失眠。。。。上一章讲了最强网络之一 RSNN 深度残差网络这一章节我们来讲讲还有一个很强的网络模型&＃xff0c;就是双向LSTM 也就是前一阵阿里吹牛逼的小AI的实现的一个重要网络部分&＃xff0c;当然实际上比这还要复杂层数以及多个网络配合&＃xff0c;其实就好像 alphaGo 一样&＃xff0c;其实多个网络配合多层复用效果是最好的&＃xff0c;这就像我们有大脑第一中枢系统 &＃xff0c;但是我们脊髓是第二中枢系统一样&＃xff0c;脊髓可以控制我们身体的某些肌肉关节运动&＃xff0c;与大脑相互配合调节&＃xff0c;通过神经传输相互传递信息&＃xff0c;互相配合调节&＃xff0c;大脑为主脊髓为辅。

最近在学钢琴&＃xff0c;那真难。有些东西境界到的人懂的人自然会懂。所以我博客分享一下我的理解&＃xff0c;这都是自己自学摸索研究的东西&＃xff0c;主要一是希望可以给自己做个整理&＃xff0c;无聊写写东西&＃xff0c;其实这些东西对我来说都是不重要的东西&＃xff0c;但是可以让大家学习了解下人工智能&＃xff0c;人工智能就这么点么&＃xff0c;这是基础&＃xff0c;前面所有章节全部是基础 &＃xff0c;基础知识&＃xff0c;你全部掌握了这些&＃xff0c;你还只是一个门外汉&＃xff0c;最主要的是要能够熟练的使用 &＃xff0c;无论是用来做什么&＃xff0c;随心所欲&＃xff0c;因地制宜&＃xff0c;能够知道怎么运用&＃xff0c;这才是最重要的。所以我把这些对我来说还算很简单的知识吧&＃xff0c;这里以及后面,至于方向&＃xff0c;我将的东西也许有些是自己的理解&＃xff0c;但是绝对不会影响大家的使用&＃xff0c;本人去年一年创业就是使用tensorflow &＃xff0c;然后把它在spark上整合实现了&＃xff0c;重新改写了bp反馈层 ff前向层同时改写了部分代码、实现了0.6时候的tensorflow的与spark 并行训练&＃xff0c;所以对人工智能方面也许没有很好的数学基础&＃xff0c;但是对代码对理解方面还是算可以的吧。创业项目基本就是人工智能的运用以及使用。

双向LSTM 阿里的小AI 就是使用它&＃xff0c;我估计是使用了双向LSTM 之后接着一个RNN层并增强学习。但是小AI 里面最重要的还是这个双向LSTM&＃xff0c;结合RNN 结合其他的几种网络还有增强学习 .

LSTM 是为了解决 RNN的一些问题&＃xff0c;对隐藏层进行改进&＃xff0c;让前面上下文信息能够有更远的印象&＃xff0c;也就是记忆&＃xff0c;

LSTM网络本质还是RNN网络&＃xff0c;基于LSTM的RNN架构上的变化有最先的BRNN&＃xff08;双向&＃xff09;

LSTM引入了Cell 与其说LSTM是一种RNN结构&＃xff0c;倒不如说LSTM是RNN的一个魔改组件&＃xff0c;把上面看到的网络中的小圆圈换成LSTM的block&＃xff0c;就是所谓的LSTM了。那它的block长什么样子呢&＃xff1f;

Cell&＃xff0c;就是我们的小本子&＃xff0c;有个叫做state的参数东西来记事儿的
Input Gate&＃xff0c;Output Gate&＃xff0c;在参数输入输出的时候起点作用&＃xff0c;算一算东西
Forget Gate&＃xff1a;遗忘门就像人体的遗忘曲线一样&＃xff0c;正是因为遗忘的调节才能知道那些更重要&＃xff0c;因为原始的LSTM在这个位置就是一个值1&＃xff0c;是连接到下一时间的那个参数&＃xff0c;以前的事情记太牢了&＃xff0c;最近的就不住就不好了&＃xff0c;所以要选择性遗忘一些东西。通过遗忘进行调节&＃xff0c;这样知道那些更重要。那些值得记忆。

上上一章我们讲了RNN/LSTM 的使用&＃xff0c;所以那些操作不理解的可以到上上一章去看。

这里讲一下双向LSTM

LSTM网络本质还是RNN网络&＃xff0c;基于LSTM的RNN架构上的变化有最先的BRNN&＃xff08;双向&＃xff09;

在大多数应用里面 NLP 自动问答基于时间有关的上下文有关的&＃xff0c;一般都是双向LSTM&＃43;LSTM/RNN横向扩展来实现的 &＃xff0c;效果非常好。好像国内很多吹逼的都是这样的机构实现的&＃xff0c;虽然叫的名字不同但是其实是一个东西。

双向LSTM 顾名思义采用了能够双向的LSTM cell单元。是的每次能够访问下文也能访问下文

下面看看BIRNN的结构

而 LSTM 我们上面讲了其实就是RNN 把其中的组件部位换了加上了cell 也就是记忆单元。所以双向LSTM

就是把上面双向RNN 里面h 那些园的单元全部换成LSTM单元就是双向LSTM. 阿里的小AI 就是使用它&＃xff0c;我估计是使用了双向LSTM 之后接着一个RNN层吧。但是小AI 里面最重要的还是这个双向LSTM&＃xff0c;结合RNN 结合其他的几种网络还有增强学习 .

双向LSTM 在tensorflow中与上上篇文章不同的地方就是

我们直接使用rnn.rnn 来构建RNN 然后传入的LSTMcell&＃xff08;单元&＃xff09; &＃xff0c;这里双向是

rnn.bidirectional_rnn

其他基本与上上章基本相同 &＃xff0c;替换一下稍微修改下即可&＃xff0c;不理解的可以跳回去看看上上章 LSTM/RNN的内容

下面贴出示例代码

import input_data mnist &＃61; input_data.read_data_sets("/tmp/data/", one_hot&＃61;True)import tensorflow as tf from tensorflow.python.ops.constant_op import constant from tensorflow.models.rnn import rnn, rnn_cell import numpy as np# Parameters learning_rate &＃61; 0.001 training_iters &＃61; 100000 batch_size &＃61; 128 display_step &＃61; 10# Network Parameters n_input &＃61; 28 # MNIST data input (img shape: 28*28) n_steps &＃61; 28 # timesteps n_hidden &＃61; 128 # hidden layer num of features n_classes &＃61; 10 # MNIST total classes (0-9 digits)# tf Graph input x &＃61; tf.placeholder("float", [None, n_steps, n_input]) # Tensorflow LSTM cell requires 2x n_hidden length (state & cell) istate_fw &＃61; tf.placeholder("float", [None, 2*n_hidden]) istate_bw &＃61; tf.placeholder("float", [None, 2*n_hidden]) y &＃61; tf.placeholder("float", [None, n_classes])# Define weights weights &＃61; {# Hidden layer weights &＃61;> 2*n_hidden because of foward &＃43; backward cells&＃39;hidden&＃39;: tf.Variable(tf.random_normal([n_input, 2*n_hidden])),&＃39;out&＃39;: tf.Variable(tf.random_normal([2*n_hidden, n_classes])) } biases &＃61; {&＃39;hidden&＃39;: tf.Variable(tf.random_normal([2*n_hidden])),&＃39;out&＃39;: tf.Variable(tf.random_normal([n_classes])) }def BiRNN(_X, _istate_fw, _istate_bw, _weights, _biases, _batch_size, _seq_len):# BiRNN requires to supply sequence_length as [batch_size, int64]# Note: Tensorflow 0.6.0 requires BiRNN sequence_length parameter to be set# For a better implementation with latest version of tensorflow, check below_seq_len &＃61; tf.fill([_batch_size], constant(_seq_len, dtype&＃61;tf.int64))# input shape: (batch_size, n_steps, n_input)_X &＃61; tf.transpose(_X, [1, 0, 2]) # permute n_steps and batch_size# Reshape to prepare input to hidden activation_X &＃61; tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)# Linear activation_X &＃61; tf.matmul(_X, _weights[&＃39;hidden&＃39;]) &＃43; _biases[&＃39;hidden&＃39;]# Define lstm cells with tensorflow# Forward direction celllstm_fw_cell &＃61; rnn_cell.BasicLSTMCell(n_hidden, forget_bias&＃61;1.0)# Backward direction celllstm_bw_cell &＃61; rnn_cell.BasicLSTMCell(n_hidden, forget_bias&＃61;1.0)# Split data because rnn cell needs a list of inputs for the RNN inner loop_X &＃61; tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)# Get lstm cell outputoutputs &＃61; rnn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, _X,initial_state_fw&＃61;_istate_fw,initial_state_bw&＃61;_istate_bw,sequence_length&＃61;_seq_len)# Linear activation# Get inner loop last outputreturn tf.matmul(outputs[-1], _weights[&＃39;out&＃39;]) &＃43; _biases[&＃39;out&＃39;]pred &＃61; BiRNN(x, istate_fw, istate_bw, weights, biases, batch_size, n_steps)# Define loss and optimizer cost &＃61; tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) # Softmax loss optimizer &＃61; tf.train.AdamOptimizer(learning_rate&＃61;learning_rate).minimize(cost) # Adam Optimizer# Evaluate model correct_pred &＃61; tf.equal(tf.argmax(pred,1), tf.argmax(y,1)) accuracy &＃61; tf.reduce_mean(tf.cast(correct_pred, tf.float32))# Initializing the variables init &＃61; tf.initialize_all_variables()# Launch the graph with tf.Session() as sess:sess.run(init)step &＃61; 1# Keep training until reach max iterationswhile step * batch_size < training_iters:batch_xs, batch_ys &＃61; mnist.train.next_batch(batch_size)# Reshape data to get 28 seq of 28 elementsbatch_xs &＃61; batch_xs.reshape((batch_size, n_steps, n_input))# Fit training using batch datasess.run(optimizer, feed_dict&＃61;{x: batch_xs, y: batch_ys,istate_fw: np.zeros((batch_size, 2*n_hidden)),istate_bw: np.zeros((batch_size, 2*n_hidden))})if step % display_step &＃61;&＃61; 0:# Calculate batch accuracyacc &＃61; sess.run(accuracy, feed_dict&＃61;{x: batch_xs, y: batch_ys,istate_fw: np.zeros((batch_size, 2*n_hidden)),istate_bw: np.zeros((batch_size, 2*n_hidden))})# Calculate batch lossloss &＃61; sess.run(cost, feed_dict&＃61;{x: batch_xs, y: batch_ys,istate_fw: np.zeros((batch_size, 2*n_hidden)),istate_bw: np.zeros((batch_size, 2*n_hidden))})print "Iter " &＃43; str(step*batch_size) &＃43; ", Minibatch Loss&＃61; " &＃43; "{:.6f}".format(loss) &＃43; \", Training Accuracy&＃61; " &＃43; "{:.5f}".format(acc)step &＃43;&＃61; 1print "Optimization Finished!"# Calculate accuracy for 128 mnist test imagestest_len &＃61; 128test_data &＃61; mnist.test.images[:test_len].reshape((-1, n_steps, n_input))test_label &＃61; mnist.test.labels[:test_len]print "Testing Accuracy:", sess.run(accuracy, feed_dict&＃61;{x: test_data, y: test_label,istate_fw: np.zeros((test_len, 2*n_hidden)),istate_bw: np.zeros((test_len, 2*n_hidden))})