当前位置: 开发笔记 > 编程语言 > 正文

端到端的TTS深度学习模型tacotron(中文语音合成)

作者：plumscape_191 | 来源：互联网 | 2023-02-03 22:30

TACONTRON:AFullyEnd-to-EndText-To-SpeechSynthesisModel通常的TTS模型包含许多模块，例如文本分析，声学模型，音频合成等

TACONTRON: A Fully End-to-End Text-To-Speech Synthesis Model
通常的TTS模型包含许多模块，例如文本分析，声学模型，音频合成等。而构建这些模块需要大量专业相关的知识以及特征工程，这将花费大量的时间和精力，而且各个模块之间组合在一起也会产生很多新的问题。TACOTRON是一个端到端的深度学习TTS模型，它可以说是将这些模块都放在了一个黑箱子里，我们不用花费大量的时间去了解TTS中需要用的的模块或者领域知识，直接用深度学习的方法训练出一个TTS模型，模型训练完成后，给定input,模型就能生成对应的音频。

TACOTRON是一个端到端的TTS模型，模型核心是seq2seq + attention。模型的输入为一系列文本字向量，输出spectrogram frame, 然后在使用Griffin_lim算法生成对应音频。模型结构如下图：

模型结构：

CBHG模块
encoder
decoder
post-processing net and waveform synthesis

Character Embedding

我们知道在训练模型的时候，我们拿到的数据是一条长短不一的(text, audio)的数据，深度学习的核心其实就是大量的矩阵乘法，对于模型而言，文本类型的数据是不被接受的，所以这里我们需要先把文本转化为对应的向量。这里涉及到如下几个操作

构造字典

因为纯文本数据是没法作为深度学习输入的，所以我们首先得把文本转化为一个个对应的向量，这里我使用字典下标作为字典中每一个字对应的id, 然后每一条文本就可以通过遍历字典转化成其对应的向量了。所以字典主要是应用在将文本转化成其在字典中对应的id, 根据语料库构造，这里我使用的方法是根据语料库中的字频构造字典(我使用的是基于语料库中的字构造字典，有的人可能会先分词，基于词构造。不使用基于词是现在就算是最好的分词都会有一些误分词问题，而且基于字还可以在一定程度上缓解OOV的问题)。下面是构造字典的代码：

def create_vocabulary(vocabulary_path, data_paths, max_vocabulary_size, tokenizer=None):
    if not os.path.exists(vocabulary_path):
        print("Creating vocabulary %s from data %s" % (vocabulary_path, str(data_paths)))
        vocab = defaultdict(int)
        for path in data_paths:
            with codecs.open(path, mode="r", encoding="utf-8") as fr:
                counter = 0
                for line in fr:
                    counter += 1
                    if counter % 100000 == 0:
                        print(" processing line %d" % (counter,))
                    tokens = tokenizer(line)
                    for w in tokens:
                        word = re.sub(_DIGIT_RE, " ", w)
# word = w
                        vocab[word] += 1

        vocab_list = sorted(vocab, key=vocab.get, reverse=True)
        print("Vocabulary size: %d" % len(vocab_list))
        if len(vocab_list) > max_vocabulary_size:
            vocab_list = vocab_list[:max_vocabulary_size]
        with codecs.open(vocabulary_path, mode="w", encoding="utf-8") as vocab_file:
            for w in vocab_list:
                vocab_file.write(w + "\n")

然后我们就可以将文本数据转化成对应的向量作为模型的输入。

embed layer

光有对应的id，没法很好的表征文本信息，这里就涉及到构造词向量，关于词向量不在说明，网上有很多资料，模型中使用词嵌入层，通过训练不断的学习到语料库中的每个字的词向量，代码如下:

def embed(inputs, vocab_size, num_units, zero_pad=True, scope="embedding", reuse=None):
    '''Embeds a given tensor. Args: inputs: A `Tensor` with type `int32` or `int64` containing the ids to be looked up in `lookup table`. vocab_size: An int. Vocabulary size. num_units: An int. Number of embedding hidden units. zero_pad: A boolean. If True, all the values of the fist row (id 0) should be constant zeros. scope: Optional scope for `variable_scope`. reuse: Boolean, whether to reuse the weights of a previous layer by the same name. Returns: A `Tensor` with one more rank than inputs's. The last dimesionality should be `num_units`. '''
    with tf.variable_scope(scope, reuse=reuse):
        lookup_table = tf.get_variable('lookup_table', 
                                       dtype=tf.float32, 
                                       shape=[vocab_size, num_units],
                                       initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.01))
        if zero_pad:
            lookup_table = tf.concat((tf.zeros(shape=[1, num_units]), 
                                      lookup_table[1:, :]), 0)
    return tf.nn.embedding_lookup(lookup_table, inputs)

值得注意的是，这里是随机初始化词嵌入层，另一种方法是引入预先在语料库训练的词向量(word2vec)，可以在一定程度上提升模型的效果。

音频特征提取

对于音频，我们主要是提取出它的melspectrogram音频特征。MFCC是一种比较常用的音频特征，对于声音来说，它其实是一个一维的时域信号，直观上很难看出频域的变化规律，我们知道，可以使用傅里叶变化，得到它的频域信息，但是又丢失了时域信息，无法看到频域随时域的变化，这样就没法很好的描述声音，为了解决这个问题，很多时频分析手段应运而生。短时傅里叶，小波，Wigner分布等都是常用的时频域分析方法。这里我们使用短时傅里叶。

所谓短时傅里叶变换，顾名思义，是对短时的信号做傅里叶变化。那么短时的信号怎么得到的? 是长时的信号分帧得来的。这么一想，STFT的原理非常简单，把一段长信号分帧(傅里叶变换适用于分析平稳的信号。我们假设在较短的时间跨度范围内，语音信号的变换是平坦的，这就是为什么要分帧的原因)、加窗，再对每一帧做傅里叶变换（FFT），最后把每一帧的结果沿另一个维度堆叠起来，得到类似于一幅图的二维信号形式。如果我们原始信号是声音信号，那么通过STFT展开得到的二维信号就是所谓的声谱图。

声谱图往往是很大的一张图，为了得到合适大小的声音特征，往往把它通过梅尔标度滤波器组（mel-scale filter banks），变换为梅尔频谱。在梅尔频谱上做倒谱分析（取对数，做DCT变换）就得到了梅尔倒谱。音频MFCC特征提取代码如下，这里主要使用第三方库librosa提取MFCC特征：

def get_spectrograms(sound_file): 
    '''Extracts melspectrogram and log magnitude from given `sound_file`. Args: sound_file: A string. Full path of a sound file. Returns: Transposed S: A 2d array. A transposed melspectrogram with shape of (T, n_mels) Transposed magnitude: A 2d array.Has shape of (T, 1+hp.n_fft//2) '''
    # Loading sound file
    y, sr = librosa.load(sound_file, sr=hp.sr) # or set sr to hp.sr.

    # stft. D: (1+n_fft//2, T)
    D = librosa.stft(y=y,
                     n_fft=hp.n_fft, 
                     hop_length=hp.hop_length, 
                     win_length=hp.win_length) 

    # magnitude spectrogram
    magnitude = np.abs(D) #(1+n_fft/2, T)

    # power spectrogram
    power = magnitude**2 #(1+n_fft/2, T) 

    # mel spectrogram
    S = librosa.feature.melspectrogram(S=power, n_mels=hp.n_mels) #(n_mels, T)

    return np.transpose(S.astype(np.float32)), np.transpose(magnitude.astype(np.float32)) # (T, n_mels), (T, 1+n_fft/2)

Encoder pre-net module

embeding layer之后是一个encoder pre-net模块，它有两个隐藏层，层与层之间的连接均是全连接；第一层的隐藏单元数目与输入单元数目一致，第二层的隐藏单元数目为第一层的一半；两个隐藏层采用的激活函数均为ReLu，并保持0.5的dropout来提高泛化能力
这里写图片描述

def prenet(inputs, is_training=True, scope="prenet", reuse=None):
    '''Prenet for Encoder and Decoder. Args: inputs: A 3D tensor of shape [N, T, hp.embed_size]. is_training: A boolean. scope: Optional scope for `variable_scope`. reuse: Boolean, whether to reuse the weights of a previous layer by the same name. Returns: A 3D tensor of shape [N, T, num_units/2]. '''
    with tf.variable_scope(scope, reuse=reuse):
        outputs = tf.layers.dense(inputs, units=hp.embed_size, activation=tf.nn.relu, name="dense1")
        outputs = tf.nn.dropout(outputs, keep_prob=.5 if is_training==True else 1., name="dropout1")
        outputs = tf.layers.dense(outputs, units=hp.embed_size//2, activation=tf.nn.relu, name="dense2")
        outputs = tf.nn.dropout(outputs, keep_prob=.5 if is_training==True else 1., name="dropout2") 
    return outputs # (N, T, num_units/2)

CBHG Module

CBHG模块由1-D convolution bank ，highway network ，bidirectional GRU 组成。它的功能是从输入中提取有价值的特征，有利于提高模型的泛化能力。

1-D convolution bank：

输入序列首先会经过一个卷积层，注意这个卷积层，它有K个大小不同的1维的filter，其中filter的大小为1,2,3…K。这些大小不同的卷积核提取了长度不同的上下文信息。其实就是n-gram语言模型的思想，K的不同对应了不同的gram, 例如unigrams, bigrams, up to K-grams，然后，将经过不同大小的k个卷积核的输出堆积在一起（注意：在做卷积时，运用了padding，因此这k个卷积核输出的大小均是相同的），也就是把不同的gram提取到的上下文信息组合在一起，下一层为最大池化层，stride为1，width为2。代码如下：

def conv1d_banks(inputs, K=16, is_training=True, scope="conv1d_banks", reuse=None):
    '''Applies a series of conv1d separately. Args: inputs: A 3d tensor with shape of [N, T, C] K: An int. The size of conv1d banks. That is, The `inputs` are convolved with K filters: 1, 2, ..., K. is_training: A boolean. This is passed to an argument of `batch_normalize`. Returns: A 3d tensor with shape of [N, T, K*Hp.embed_size//2]. '''
    with tf.variable_scope(scope, reuse=reuse):
        outputs = conv1d(inputs, hp.embed_size//2, 1) # k=1
        for k in range(2, K+1): # k = 2...K
            with tf.variable_scope("num_{}".format(k)):
                output = conv1d(inputs, hp.embed_size // 2, k)
                outputs = tf.concat((outputs, output), -1)
        outputs = normalize(outputs, type=hp.norm_type, is_training=is_training, 
                            activation_fn=tf.nn.relu)
    return outputs # (N, T, Hp.embed_size//2*K)

注意ouputs = normalize(outputs, type=hp.norm_type, is_training=is_training, activation_fn=tf.nn.relu)这行代码，这里对output做了batch normalization处理(每个mini_batch中)，至于BN的作用，网上有很多资料，我这里就简单说下，我们知道不加BN的神经网络，每层都会有一个非线性话操作，如sigmoid或者RELU，这样一方面每层的输入分布和上一层都不相同，这样会使得模型的收敛和预测能力下降。其次，对于深层网络而言，这样会带来梯度弥散和梯度爆炸的问题，因为模型在back propogation的时候是依据链式法则，深度越深，问题越严重，BN引入其他参数抹去了w的scale的影响。公式网上都有，这里就不在搬了。

经过池化之后，会再经过两层一维的卷积层。第一个卷积层的filter大小为3，stride为1，采用的激活函数为ReLu；第二个卷积层的filter大小为3，stride为1，没有采用激活函数（在这两个一维的卷积层之间都会进行batch normalization）。

residual connection：

经过卷积层之后，会进行一个residual connection。也就是把卷积层输出的和embeding之后的序列相加起来。使用residual connection也是一个缓解神经网络太深带来的梯度弥散问题的方法。我们知道，在训练神经网络的时候，一个合适的layer size是很重要的，网络太深，会带来梯度弥散的问题，太浅又不能很好的学到特征，residual connection可以缓解网络太深带来的问题，就是把输入和经过卷积后结果相加，这样可以确保经过多层卷积后，没有丢失太多之前输入的信息。代码如下:

enc += prenet_out # (N, T, E/2) # residual connections

这里就是把前面prenet层后经过多层卷积后的结果和prenet的输出相加。

highway network：

下一层输入到highway layers，highway nets的每一层结构为：把输入同时放入到两个一层的全连接网络中，这两个网络的激活函数分别采用了ReLu和sigmoid函数，假定输入为input，ReLu的输出为output1，sigmoid的输出为output2，那么highway layer的输出为output=output1∗output2+input∗（1−output2)。论文中使用了4层highway layer。

代码如下：

def highwaynet(inputs, num_units=None, scope="highwaynet", reuse=None):
    '''Highway networks, see https://arxiv.org/abs/1505.00387 Args: inputs: A 3D tensor of shape [N, T, W]. num_units: An int or `None`. Specifies the number of units in the highway layer or uses the input size if `None`. scope: Optional scope for `variable_scope`. reuse: Boolean, whether to reuse the weights of a previous layer by the same name. Returns: A 3D tensor of shape [N, T, W]. '''
    if not num_units:
        num_units = inputs.get_shape()[-1]

    with tf.variable_scope(scope, reuse=reuse):
        H = tf.layers.dense(inputs, units=num_units, activation=tf.nn.relu, name="dense1")
        T = tf.layers.dense(inputs, units=num_units, activation=tf.nn.sigmoid, name="dense2")
        C = 1. - T
        outputs = H * T + inputs * C
    return outputs

从代码中我们也能看出highway network公式为：
这里写图片描述
其中C等于1-T，x为输入， y 为对应的输出，T为transfer gate，C为carry gate，其实就是让网络的输出由两部分组成，分别是网络的直接输入以及输入变形后的部分。为什么要使用highway network的节奏呢，其实说白了也是一种减少缓解网络加深带来过拟合问题，以及减少较深网络的训练难度的一个trick。它主要受到LSTM门限机制的启发。

GRU：

然后将输出输入到双向的GRU中，从GRU中输出的结果就是encoder的输出。GRU是RNN的一个变体，它和LSTM一样都使用了门限机制，不同的是它只有更新门和重置门。公式如下：
这里写图片描述

Reset gate

r(t) 负责决定h(t−1) 对new memory h^(t) 的重要性有多大，如果r(t)约等于0 的话，h(t−1) 就不会传递给new memory h^(t)

new memory

h^(t) 是对新的输入x(t) 和上一时刻的hidden state h(t−1) 的总结。计算总结出的新的向量h^(t) 包含上文信息和新的输入x(t).

Update gate

z(t) 负责决定传递多少ht−1给ht 。如果z(t) 约等于1的话，ht−1 几乎会直接复制给ht ，相反，如果z(t) 约等于0， new memory h^(t) 直接传递给ht.

Hidden state:

h(t) 由 h(t−1) 和)h^(t) 相加得到，两者的权重由update gate z(t) 控制。

代码如下:

def gru(inputs, num_units=None, bidirection=False, scope="gru", reuse=None):
    '''Applies a GRU. Args: inputs: A 3d tensor with shape of [N, T, C]. num_units: An int. The number of hidden units. bidirection: A boolean. If True, bidirectional results are concatenated. scope: Optional scope for `variable_scope`. reuse: Boolean, whether to reuse the weights of a previous layer by the same name. Returns: If bidirection is True, a 3d tensor with shape of [N, T, 2*num_units], otherwise [N, T, num_units]. '''
    with tf.variable_scope(scope, reuse=reuse):
        if num_units is None:
            num_units = inputs.get_shape().as_list[-1]

        cell = tf.contrib.rnn.GRUCell(num_units)  
        if bidirection: 
            cell_bw = tf.contrib.rnn.GRUCell(num_units)
            outputs, _ = tf.nn.bidirectional_dynamic_rnn(cell, cell_bw, inputs, 
                                                         dtype=tf.float32)
            return tf.concat(outputs, 2)  
        else:
            outputs, _ = tf.nn.dynamic_rnn(cell, inputs, 
                                           dtype=tf.float32)
            return outputs

到这里，CBHG的部分（encoder）就总结完毕了，CBHG的代码：

def encode(inputs, is_training=True, scope="encoder", reuse=None):
    ''' Args: inputs: A 2d tensor with shape of [N, T], dtype of int32. seqlens: A 1d tensor with shape of [N,], dtype of int32. masks: A 3d tensor with shape of [N, T, 1], dtype of float32. is_training: Whether or not the layer is in training mode. scope: Optional scope for `variable_scope` reuse: Boolean, whether to reuse the weights of a previous layer by the same name. Returns: A collection of Hidden vectors, whose shape is (N, T, E). '''
    with tf.variable_scope(scope, reuse=reuse):
        # Load vocabulary 
# char2idx, idx2char = load_vocab()
        vocab, revocab = load_vocab()

        # Character Embedding
        inputs = embed(inputs, len(vocab), hp.embed_size) # (N, T, E) 

        # Encoder pre-net
        prenet_out = prenet(inputs, is_training=is_training) # (N, T, E/2)

        # Encoder CBHG 
        ## Conv1D bank 
        enc = conv1d_banks(prenet_out, K=hp.encoder_num_banks, is_training=is_training) # (N, T, K * E / 2)

        ### Max pooling
        enc = tf.layers.max_pooling1d(enc, 2, 1, padding="same")  # (N, T, K * E / 2)

        ### Conv1D projections
        enc = conv1d(enc, hp.embed_size//2, 3, scope="conv1d_1") # (N, T, E/2)
        enc = normalize(enc, type=hp.norm_type, is_training=is_training, 
                            activation_fn=tf.nn.relu, scope="norm1")
        enc = conv1d(enc, hp.embed_size//2, 3, scope="conv1d_2") # (N, T, E/2)
        enc = normalize(enc, type=hp.norm_type, is_training=is_training, 
                            activation_fn=None, scope="norm2")
        enc += prenet_out # (N, T, E/2) # residual connections

        ### Highway Nets
        for i in range(hp.num_highwaynet_blocks):
            enc = highwaynet(enc, num_units=hp.embed_size//2, 
                                 scope='highwaynet_{}'.format(i)) # (N, T, E/2)

        ### Bidirectional GRU
        memory = gru(enc, hp.embed_size//2, True) # (N, T, E)

    return memory

decoder模块

decoder模块主要分为三部分：
- pre-net
- Attention-RNN
- Decoder-RNN

Pre-net的结构与encoder中的pre-net相同，主要是对输入做一些非线性变换。

Attention-RNN的结构为一层包含256个GRU的RNN，它将pre-net的输出和attention的输出作为输入，经过GRU单元后输出到decoder-RNN中。

Decode-RNN为两层residual GRU，它的输出为输入与经过GRU单元输出之和。每层同样包含了256个GRU单元。第一步decoder的输入为0矩阵，之后都会把第t步的输出作为第t+1步的输入。(这里paper中使用了一个trick，就是每次decoder的时候，不仅仅预测1帧的数据，而是预测多个非重叠的帧。因为就像我们前面说到的提取音频特征的时候，我们会先分帧，相邻的帧其实是有一定的关联性的，所以每个字符在发音的时候，可能对应了多个帧，因此每个GRU单元输出为多个帧的音频文件。)

为什么这样做呢：
- 减小了model size，比如当前需要预测n个帧，如果每次一个帧的话，就对应了n个GRU，而如果每次预测r个帧的话，只需要n/r个GRU，从而减小了模型的复杂度
- 减少了模型训练和预测的时间
- 提高收敛的速度
其实下两点好处都是第一点好处带来的。代码如下：

def decode1(decoder_inputs, memory, is_training=True, scope="decoder1", reuse=None):
    '''
    Args:
      decoder_inputs: A 3d tensor with shape of [N, T', C'], where C'=hp.n_mels*hp.r, 
        dtype of float32. Shifted melspectrogram of sound files. 
      memory: A 3d tensor with shape of [N, T, C], where C=hp.embed_size.
      is_training: Whether or not the layer is in training mode.
      scope: Optional scope for `variable_scope`
      reuse: Boolean, whether to reuse the weights of a previous layer
        by the same name.

    Returns
      Predicted melspectrogram tensor with shape of [N, T', C'].
    '''
    with tf.variable_scope(scope, reuse=reuse):
        # Decoder pre-net
        dec = prenet(decoder_inputs, is_training=is_training) # (N, T', E/2)

        # Attention RNN
        dec = attention_decoder(dec, memory, num_units=hp.embed_size) # (N, T', E)

        # Decoder RNNs
        dec += gru(dec, hp.embed_size, False, scope="decoder_gru1") # (N, T', E)
        dec += gru(dec, hp.embed_size, False, scope="decoder_gru2") # (N, T', E)

        # Outputs => (N, T', hp.n_mels*hp.r)
        out_dim = decoder_inputs.get_shape().as_list()[-1]
        outputs = tf.layers.dense(dec, out_dim) 

    return outputs

post-processing

和seq2seq网络不通的是，tacotron在decoder-RNN输出之后并没有直将其作为输出通过Griffin-Lim算法合成音频，而是添加了一层post-processing模块。

为什么要添加这一层呢？
首先是因为我们使用了Griffin-Lim重建算法，根据频谱生成音频，Griffin-Lim原理是：我们知道相位是描述波形变化的，我们从频谱生成音频的时候，需要考虑连续帧之间相位变化的规律，如果找不到这个规律，生成的信号和原来的信号肯定是不一样的，Griffin Lim算法解决的就是如何不弄坏左右相邻的幅度谱和自身幅度谱的情况下，求一个近似的相位，因为相位最差和最好情况下天壤之别，所有应该会有一个相位变化的迭代方案会比上一次更好一点，而Griffin Lim算法找到了这个方案。这里说了这么多，其实就是Griffin-Lim算法需要看到所有的帧。post-processing可以在一个线性频率范围内预测幅度谱（spectral magnitude)。

其次，post-processing能看到整个解码的序列，而不像seq2seq那样，只能从左至右的运行。它能够通过正向传播和反向传播的结果来修正每一帧的预测错误。

论文中使用了CBHG的结构来作为post-processing net，前面已经详细介绍过。代码如下：

def decode2(inputs, is_training=True, scope="decoder2", reuse=None):
    ''' Args: inputs: A 3d tensor with shape of [N, T', C'], where C'=hp.n_mels*hp.r, dtype of float32. Log magnitude spectrogram of sound files. is_training: Whether or not the layer is in training mode. scope: Optional scope for `variable_scope` reuse: Boolean, whether to reuse the weights of a previous layer by the same name. Returns Predicted magnitude spectrogram tensor with shape of [N, T', C''], where C'' = (1+hp.n_fft//2)*hp.r. '''
    with tf.variable_scope(scope, reuse=reuse):
        # Decoder pre-net
        prenet_out = prenet(inputs, is_training=is_training) # (N, T'', E/2)

        # Decoder Post-processing net = CBHG
        ## Conv1D bank
        dec = conv1d_banks(prenet_out, K=hp.decoder_num_banks, is_training=is_training) # (N, T', E*K/2)

        ## Max pooling
        dec = tf.layers.max_pooling1d(dec, 2, 1, padding="same") # (N, T', E*K/2)

        ## Conv1D projections
        dec = conv1d(dec, hp.embed_size, 3, scope="conv1d_1") # (N, T', E)
        dec = normalize(dec, type=hp.norm_type, is_training=is_training, 
                            activation_fn=tf.nn.relu, scope="norm1")
        dec = conv1d(dec, hp.embed_size//2, 3, scope="conv1d_2") # (N, T', E/2)
        dec = normalize(dec, type=hp.norm_type, is_training=is_training, 
                            activation_fn=None, scope="norm2")
        dec += prenet_out

        ## Highway Nets
        for i in range(4):
            dec = highwaynet(dec, num_units=hp.embed_size//2, 
                                 scope='highwaynet_{}'.format(i)) # (N, T, E/2)

        ## Bidirectional GRU 
        dec = gru(dec, hp.embed_size//2, True) # (N, T', E) 

        # Outputs => (N, T', (1+hp.n_fft//2)*hp.r)
        out_dim = (1+hp.n_fft//2)*hp.r
        outputs = tf.layers.dense(dec, out_dim)

    return outputs

最后使用Griffin-Lim算法来将post-processing net的输出合成为语音。到这里就基本结束了，感兴趣的话可以看下论文加深理解: https://arxiv.org/pdf/1703.10135.pdf

推荐阅读

char
电话号码的字母组合解题思路和代码示例

本文介绍了力扣题目《电话号码的字母组合》的解题思路和代码示例。通过使用哈希表和递归求解的方法，可以将给定的电话号码转换为对应的字母组合。详细的解题思路和代码示例可以帮助读者更好地理解和实现该题目。 ... [详细]

蜡笔小新 2023-12-14 18:50:22
text
CSS3选择器的使用方法详解，提高Web开发效率和精准度

本文详细介绍了CSS3新增的选择器方法，包括属性选择器的使用。通过CSS3选择器，可以提高Web开发的效率和精准度，使得查找元素更加方便和快捷。同时，本文还对属性选择器的各种用法进行了详细解释，并给出了相应的代码示例。通过学习本文，读者可以更好地掌握CSS3选择器的使用方法，提升自己的Web开发能力。 ... [详细]

蜡笔小新 2023-12-14 14:37:52
char
《数据结构》学习笔记3——串匹配算法性能评估

本文主要讨论串匹配算法的性能评估，包括模式匹配、字符种类数量、算法复杂度等内容。通过借助C++中的头文件和库，可以实现对串的匹配操作。其中蛮力算法的复杂度为O(m*n)，通过随机取出长度为m的子串作为模式P，在文本T中进行匹配，统计平均复杂度。对于成功和失败的匹配分别进行测试，分析其平均复杂度。详情请参考相关学习资源。 ... [详细]

蜡笔小新 2023-12-13 16:16:05
char
自动轮播，反转播放的ViewPagerAdapter的使用方法和效果展示

本文介绍了如何使用自动轮播、反转播放的ViewPagerAdapter，并展示了其效果。该ViewPagerAdapter支持无限循环、触摸暂停、切换缩放等功能。同时提供了使用GIF.gif的示例和github地址。通过LoopFragmentPagerAdapter类的getActualCount、getActualItem和getActualPagerTitle方法可以实现自定义的循环效果和标题展示。 ... [详细]

蜡笔小新 2023-12-13 14:41:31
sum
Support Paged.JS for automatic hugo resume> PDF conversion.

FeatureRequestIsyourfeaturerequestrelatedtoaproblem?Please ... [详细]

蜡笔小新 2023-12-13 11:52:05
text
【机器学习手册】日期和时区操作的重要性及应用

本文介绍了机器学习手册中关于日期和时区操作的重要性以及其在实际应用中的作用。文章以一个故事为背景，描述了学童们面对老先生的教导时的反应，以及上官如在这个过程中的表现。同时，文章也提到了顾慎为对上官如的恨意以及他们之间的矛盾源于早年的结局。最后，文章强调了日期和时区操作在机器学习中的重要性，并指出了其在实际应用中的作用和意义。 ... [详细]

蜡笔小新 2023-12-12 17:40:14
instance
iOS实现UITextField+Limit的字符限制方法

本文介绍了在iOS开发中使用UITextField实现字符限制的方法，包括利用代理方法和使用BNTextField-Limit库的实现策略。通过这些方法，开发者可以方便地限制UITextField的字符个数和输入规则。 ... [详细]

蜡笔小新 2023-12-12 09:50:30
tree
欢乐的票圈重构之旅——RecyclerView的头尾布局增加

项目重构的Git地址：https:github.comrazerdpFriendCircletreemain-dev项目同步更新的文集：http:www.jianshu.comno ... [详细]

蜡笔小新 2023-12-11 19:09:56
tree
EzPP 0.2发布，新增YAML布局渲染功能

EzPP发布了0.2.1版本，新增了YAML布局渲染功能，可以将YAML文件渲染为图片，并且可以复用YAML作为模版，通过传递不同参数生成不同的图片。这个功能可以用于绘制Logo、封面或其他图片，让用户不需要安装或卸载Photoshop。文章还提供了一个入门例子，介绍了使用ezpp的基本渲染方法，以及如何使用canvas、text类元素、自定义字体等。 ... [详细]

蜡笔小新 2023-12-11 12:39:10
char
EPPlus绘制刻度线的方法及示例代码

本文介绍了使用EPPlus绘制刻度线的方法，并提供了示例代码。通过ExcelPackage类和List对象，可以实现在Excel中绘制刻度线的功能。具体的方法和示例代码在文章中进行了详细的介绍和演示。 ... [详细]

蜡笔小新 2023-12-10 19:32:38
instance
十大经典排序算法动图演示+Python实现

本文介绍了十大经典排序算法的原理、演示和Python实现。排序算法分为内部排序和外部排序，常见的内部排序算法有插入排序、希尔排序、选择排序、冒泡排序、归并排序、快速排序、堆排序、基数排序等。文章还解释了时间复杂度和稳定性的概念，并提供了相关的名词解释。 ... [详细]

蜡笔小新 2023-12-10 19:28:59
text
Python实验报告文档中的文件和数据格式化操作

本文介绍了Python语言程序设计中文件和数据格式化的操作，包括使用np.savetext保存文本文件，对文本文件和二进制文件进行统一的操作步骤，以及使用Numpy模块进行数据可视化编程的指南。同时还提供了一些关于Python的测试题。 ... [详细]

蜡笔小新 2023-12-10 17:02:16
char
Python使用Pillow包生成验证码图片的方法

本文介绍了使用Python中的Pillow包生成验证码图片的方法。通过随机生成数字和符号，并添加干扰象素，生成一幅验证码图片。需要配置好Python环境，并安装Pillow库。代码实现包括导入Pillow包和随机模块，定义随机生成字母、数字和字体颜色的函数。 ... [详细]

蜡笔小新 2023-12-10 16:51:25
char
aw多模态融合,多模态话语分析

本博文基于《Amalgamationofproteinsequence,structureandtextualinformationforimprovingprote ... [详细]

蜡笔小新 2023-10-17 19:16:14
jsp
3年半巨亏242亿！商汤高估了深度学习，下错了棋？

转自：新智元三年半研发开支近70亿，累计亏损242亿。AI这门生意好像越来越不好做了。近日，商汤科技已向港交所递交IPO申请。招股书显示& ... [详细]

蜡笔小新 2023-10-17 16:41:52

plumscape_191

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章