作者:小景森的童年 | 来源:互联网 | 2023-08-31 19:39
RNN相关的网络搭建已经应用广泛,本文介绍如何采用Tensorflow来完成RNN网络的搭建,包括:最小单元tf.nn.rnn_cell多步操作tf.nn.dynamic_rnn双
RNN相关的网络搭建已经应用广泛,本文介绍如何采用Tensorflow来完成RNN网络的搭建,包括:
- 最小单元 tf.nn.rnn_cell
- 多步操作 tf.nn.dynamic_rnn
- 双向多步操作 tf.nn.bidirectional_dynamic_rnn
- 多层神经网络 tf.nn.rnn_cell.MultiRNNCell
1. 单个RNN Cell
Tensorflow中实现了以下模块 :tf.nn.rnn_cell,包括了10个类:
class BasicLSTMCell
: Basic LSTM recurrent network cell.class BasicRNNCell
: The most basic RNN cell.class DeviceWrapper
: Operator that ensures an RNNCell runs on a particular device.class DropoutWrapper
: Operator adding dropout to inputs and outputs of the given cell.class GRUCell
: Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078).class LSTMCell
: Long short-term memory unit (LSTM) recurrent network cell.class LSTMStateTuple
: Tuple used by LSTM Cells for state_size
, zero_state
, and output state.class MultiRNNCell
: RNN cell composed sequentially of multiple simple cells.class RNNCell
: Abstract object representing an RNN cell.class ResidualWrapper
: RNNCell wrapper that ensures cell inputs are added to the outputs.
1.1 BasicRNNCell
我们以BasicRNNCell为例进行说明,这是最简单的RNN Cell,
调用方式包括下面两种,属于alias关系
tf.nn.rnn_cell.BasicRNNCell
tf.contrib.rnn.BasicRNNCell
对于RNN我们已经介绍过很多次了,如下图所示
假设有初始状态
因此得到的隐藏层状态就是一个tuple,我们构建方法如下代码所示
import tensorflow as tf
batch_size = 32 # batch大小
input_size = 100 # 输入向量xt维度
state_size = 128 # 隐藏状态ht维度
# 创建BasicRNNCell, num_inits是state_size
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units = state_size)
print lstm_cell.state_size
# 初始化状态也是一个tuple shape = (batch_size, state_size)
h0 = lstm_cell.zero_state(batch_size, tf.float32)
print type(h0)
# 进行一次计算,得到output和h1
output, h1 = lstm_cell(inputs, h0)
print h1.h, h1.h.shape
print h1.c, h1.c.shape
------------------------------------------------------------------------------
输出结果
LSTMStateTuple(c=128, h=128)
<class 'tensorflow.python.ops.rnn_cell_impl.LSTMStateTuple'> Tensor("basic_lstm_cell_9/Mul_2:0", shape=(32, 128), dtype=float32) (32, 128) Tensor("basic_lstm_cell_9/Add_1:0", shape=(32, 128), dtype=float32) (32, 128)
其中初始化状态也已经是一个LSTMStateTuple,包含了h和c,两个隐藏状态向量维度都是state_size = 128
2. tf.nn.dynamic_rnn
如果只有RNNCell的话,调用一次只能在序列或者时间上前进一次,假如序列长度为T,那么就要进行T次调用,在Tensorflow实战(1): 实现深层循环神经网络中我介绍了如下的方法
# 定义LSTM结构,在Tensorflow中通过一句简单的命令就可以实现一个完整的LSTM结构
lstm = tf.nn.rnn_cell.BasicLSTMCell(hidden_size) # hidden_size表示LSTM cell中单元数量
# 将LSTM中的状态初始化为全0数组,BasicLSTMCell提供了zero_state函数来生成全零的初始状态
state = lstm.zero_state(batch_size, tf.float32)
# 定义损失
loss = 0
for i in range(num_steps):
if i > 0: tf.get_variable_scope().reuse_variables() # 每一步处理时间序列中的一个时刻,将当前输入和前一时刻状态state传入定义的LSTM结构即可得到当前LSTM的输出(h_t)和更新后的状态state(h_t和c_t), lstm_output 用于输出给其他层,state用于输出给下一时刻
lstm_output, state = lstm(current_input, state)
# 将当前时刻LSTM结构的输出传入一个全联接层的到最后输出
final_output = fully_connected(lstm_output)
# 计算当前时刻的输出损失
loss += calc_loss(final_output, expected_output)
# 进行优化
.......
显然这样的方法是比较麻烦的,TensorFlow帮我们完成了tf.nn.dynamic_rnn这个类,用来做多次的调用,构造方法如下所示
tf.nn.dynamic_rnn(
cell,
inputs,
sequence_length=None,
initial_state=None,
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)
其中
- cell 必须是RNNCell的一个instance,例如BasicLSTMCell
- inputs RNN的输入,如果time_major == False, 那么shape为( batch size, max time, input size); 如果time_major == True, 那么shape 为(max time, batch size, input size)
- sequence_length 是可选的,是一个int32/int64 [batch_size]的向量,主要用于控制batch的长度,如果超过那么就将状态设置为0
- initial_state,给RNN的初始状态
现在我们介绍如何采用TensorFlow进行实现, tf.nn.bidirectional_dynamic_rnn用法如下所示
tf.nn.bidirectional_dynamic_rnn(
cell_fw, # 前向cell
cell_bw, # 后向cell
inputs, # 输入
sequence_length=None, # 输入长度
initial_state_fw=None, # 前向cell的初始状态
initial_state_bw=None, # 后向cell的初始状态
dtype=None,
parallel_iterations=None,
swap_memory=False,
time_major=False,
scope=None
)
其返回结果为一个tuple( outputs, outputs_states), 其中
- outputs 是一个tuple (output_fw, output_bw), 包括了前向RNN和后向RNN的output,shape = (batch size, time steps, state size) 表示batch中每条数据的每个时刻的状态输出
- output_states 表示一个tuple (output_state_fw, output_state_bw) 表示前向RNN和后向RNN的output, shape = (batch size, state size)
使用方法如下代码所示
import tensorflow as tf
batch_size = 32 # batch大小
input_size = 100 # 输入向量xt维度
state_size = 128 # 隐藏状态ht维度
time_steps = 10 # 序列长度
inputs = tf.random_normal(shape=[batch_size, time_steps, input_size], dtype=tf.float32)
print inputs.shape
lstm_cell_fw = tf.nn.rnn_cell.BasicLSTMCell(num_units = state_size, state_is_tuple = True)
lstm_cell_bw = tf.nn.rnn_cell.BasicLSTMCell(num_units = state_size, state_is_tuple = True)
outputs, state = tf.nn.bidirectional_dynamic_rnn(lstm_cell_fw, lstm_cell_bw,
inputs, dtype = tf.float32)
print outputs
print state
------------------------------------------------------------------------------
输出结果
(32, 10, 8)
(<tf.Tensor 'bidirectional_rnn/fw/fw/transpose_1:0' shape=(32, 10, 128) dtype=float32>, <tf.Tensor 'ReverseV2:0' shape=(32, 10, 128) dtype=float32>)
(LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_3:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_4:0' shape=(32, 128) dtype=float32>), LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_3:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_4:0' shape=(32, 128) dtype=float32>))
可以看出输出包括:
- outputs 是一个tuple包括
- ,
-
- output_states 是一个tuple,包括
- LSTMStateTuple(c=, h=),
- LSTMStateTuple(c=, h=)
4. 多层RNN tf.nn.rnn_cell.MultiRNNCell
单层RNN网络的学习能力往往是有限的,多层网络也得到了广泛应用,在Tensorflow中给出了tf.nn.rnn_cell.MultiRNNCell作为多层RNNCell的叠加,使用方法如下所示:
import tensorflow as tf
batch_size = 32 # batch大小
input_size = 100 # 输入向量xt维度
state_size = 128 # 隐藏状态ht维度
time_steps = 10 # 序列长度
# 构造输入
inputs = tf.random_normal(shape=[batch_size, time_steps, input_size], dtype=tf.float32)
# 构造多个cell
cells = [tf.nn.rnn_cell.BasicRNNCell(num_units = state_size) for _ in range(3)]
# 构造MultiRNNCell将多个cell作为整体,这里构造了3层Cell
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(cells)
print multi_rnn_cell.state_size
# 调用dynamic_rnn进行时序的运算
outputs, state = tf.nn.dynamic_rnn(cell = multi_rnn_cell, inputs = inputs, dtype = tf.float32)
print outputs
print state
------------------------------------------------------------------------------
输出结果
(128, 128, 128)
Tensor("rnn/transpose_1:0", shape=(32, 10, 128), dtype=float32)
(, , )
输出结果与2.1中介绍相同