作者:手机用户2602939883 | 来源:互联网 | 2023-07-11 04:36
本文主要介绍关于python,自然语言处理,语音识别的知识点,对【BeamSearch与PrefixBeamSearch的理解与python实现】和【如果学大数据】有兴趣的朋友可以看下由【han
本文主要介绍关于python,自然语言处理,语音识别的知识点,对【Beam Search与Prefix Beam Search的理解与python实现】和【如果学大数据】有兴趣的朋友可以看下由【hangguns】投稿的技术文章,希望该技术和经验能帮到你解决你所遇的algorithm相关技术问题。
如果学大数据
引言
Beam search是一种动态规划算法,能够极大的减少搜索空间,增加搜索效率,并且其误差在可接受范围内,常被用于Sequence to Sequence模型,CTC解码等应用中
时间复杂度
对于 T × N T\times N T×N的时间序列,如果我们要遍历所有可能能,则其所需的时间复杂度为 O ( N + N 2 + N 3 + . . . + N T ) \mathcal{O}(N+N^2+N^3+...+N^T) O(N+N2+N3+...+NT),在每一时间节点,所需遍历的节点数呈指数增加。对于Viterbi算法来说,时间复杂度为 O ( N + ( T − 1 ) N 2 ) \mathcal{O}(N+(T-1)N^2) O(N+(T−1)N2),在每个时间节点输入为N个best节点,需要比较的次数为 N 2 N^2 N2,然而这个时间复杂度还是太高。在N比较大的情况下,Beam Search为更好的选择,其时间复杂度为 O ( N + ( T − 1 ) ∗ b e a m s i z e ∗ N ) \mathcal{O}(N+(T-1)*beamsize*N) O(N+(T−1)∗beamsize∗N),每个时间节点的输入为beamsize个best节点,需要比较的次数为 b e a m s i z e ∗ N beamsize*N beamsize∗N
常规Beam Search (BS)
如上图所示,常规的beam search在每个时间节点,对输入的每个节点比较N次,并从
b e a m s i z e ∗ N beamsize*N beamsize∗N个比较结果中,选择
b e a m s i z e beamsize beamsize个结果作为下一时间节点的输入,其python的简单实现如下
import numpy as np
import math
def beam_search(nodes, topk=1):
paths = {
'A':math.log(nodes[0]['A']), 'B': math.log(nodes[0]['B']), 'C':math.log(nodes[0]['C'])}
calculations = []
for l in range(1, len(nodes)):
paths_ = paths.copy()
paths = {
}
nows = {
}
cur_cal = 0
for i in nodes[l].keys():
for j in paths_.keys():
nows[j+i] = paths_[j]+math.log(nodes[l][i])
cur_cal += 1
calculations.append(cur_cal)
indices = np.argpartition(list(nows.values()), -topk)[-topk:]
for k in indices:
paths[list(nows.keys())[k]] = list(nows.values())[k]
print(f'calculation number {
calculations}')
return paths
nodes = [{
'A':0.1, 'B':0.3, 'C':0.6}, {
'A':0.2, 'B':0.4, 'C':0.4}, {
'A':0.6, 'B':0.2, 'C':0.2},
{
'A': 0.3, 'B': 0.3, 'C': 0.4}]
print(beam_search(nodes, topk=2))
输出结果:
calculation number [9, 6, 6]
{
'CBAA': -3.1419147837320724, 'CBAC': -2.854232711280291, 'CCAC': -2.854232711280291}
我们可以看到,在 N = 3 N=3 N=3, b e a m s i z e = 2 beamsize=2 beamsize=2的情况下,每个节点的比较次数为6。
Prefix(前缀)Beam Search (PBS)
在CTC算法中,由于添加了blank以及重复字符串无blank合并的规则,例如ab
可能aab
,abb
,a blank b
等多种情况的输入,因此ab
的可能性应该为多种情况log概率之和,而不能通过单条beam进行搜索,因此可以采用改进版的prefix beam search,其代码如下
""" Code from https://gist.github.com/awni/56369a90d03953e370f3964c826ed4b0 Author: Awni Hannun CTC decoder in python, 简单例子可能不太效率 用于CTC模型的输出的前缀beam search 更多细节参考 https://distill.pub/2017/ctc/#inference https://arxiv.org/abs/1408.2873 """
import numpy as np
import math
import collections
NEG_INF = -float("inf")
def make_new_beam():
fn = lambda: (NEG_INF, NEG_INF)
return collections.defaultdict(fn)
def logsumexp(*args):
""" Stable log sum exp. """
if all(a == NEG_INF for a in args):
return NEG_INF
a_max = max(args)
lsp = math.log(sum(math.exp(a - a_max)
for a in args))
return a_max + lsp
def decode(probs, beam_size=100, blank=0):
""" 对给定输出概率进行预测 Arguments: probs: 输出概率 (e.g. post-softmax) for each time step. Should be an array of shape (time x output dim). beam_size (int): Size of the beam to use during inference. blank (int): Index of the CTC blank label. Returns the output label sequence and the corresponding negative log-likelihood estimated by the decoder. """
T, S = probs.shape
probs = np.log(probs)
beam = [(tuple(), (0.0, NEG_INF))]
for t in range(T):
next_beam = make_new_beam()
for s in range(S):
p = probs[t, s]
for prefix, (p_b, p_nb) in beam:
if s == blank:
n_p_b, n_p_nb = next_beam[prefix]
n_p_b = logsumexp(n_p_b, p_b + p, p_nb + p)
next_beam[prefix] = (n_p_b, n_p_nb)
continue
end_t = prefix[-1] if prefix else None
n_prefix = prefix + (s,)
n_p_b, n_p_nb = next_beam[n_prefix]
if s != end_t:
n_p_nb = logsumexp(n_p_nb, p_b + p, p_nb + p)
else:
n_p_nb = logsumexp(n_p_nb, p_b + p)
next_beam[n_prefix] = (n_p_b, n_p_nb)
if s == end_t:
n_p_b, n_p_nb = next_beam[prefix]
n_p_nb = logsumexp(n_p_nb, p_nb + p)
next_beam[prefix] = (n_p_b, n_p_nb)
beam = sorted(next_beam.items(),
key=lambda x: logsumexp(*x[1]),
reverse=True)
beam = beam[:beam_size]
best = beam[0]
return best[0], -logsumexp(*best[1])
if __name__ == "__main__":
np.random.seed(3)
time = 50
output_dim = 20
probs = np.random.rand(time, output_dim)
probs = probs / np.sum(probs, axis=1, keepdims=True)
labels, score = decode(probs)
print(labels)
print("Score {:.3f}".format(score))
与常规BS不同的地方主要在于, PBS区分了几种情况以及log probability的计算方式
对于BS来说,
l o g l i k e l i h o o d = l o g ( p 1 ) + l o g ( p 2 ) + . . . loglikelihood=log(p1)+log(p2)+... loglikelihood=log(p1)+log(p2)+...,对于PBS来说,由于区分了存在blank和不存在blank的情况,并且其中之一的可能性为0,相加log probability等于负无穷的情况,因此不能直接相加,所以采用了一种稳定的logsumexp的方式来计算loglikelihood当前缀后接blank时,前缀不变,更新当前前缀后接blank的log概率:
n _ p _ b = l o g s u m e x p ( n _ p _ b , p _ b + p , p _ n b + p ) n\_p\_b = logsumexp(n\_p\_b, p\_b + p, p\_nb + p) n_p_b=logsumexp(n_p_b,p_b+p,p_nb+p)当前缀后接重复字符且中间没有blank隔开时,前缀也不变,更新当前前缀后接非blank的log概率:
n _ p _ n b = l o g s u m e x p ( n _ p _ n b , p _ n b + p ) n\_p\_nb = logsumexp(n\_p\_nb, p\_nb + p) n_p_nb=logsumexp(n_p_nb,p_nb+p)当前缀后接不同字符时,前缀变化,更新当前前缀后接非blank的log概率:
n _ p _ n b = l o g s u m e x p ( n _ p _ n b , p _ b + p , p _ n b + p ) n\_p\_nb = logsumexp(n\_p\_nb, p\_b + p, p\_nb + p) n_p_nb=logsumexp(n_p_nb,p_b+p,p_nb+p)当前缀后接重复字符,且中间有blank隔开,前缀变化,更新当前前缀后接非blank的log概率:
n _ p _ n b = l o g s u m e x p ( n _ p _ n b , p _ b + p ) n\_p\_nb = logsumexp(n\_p\_nb, p\_b + p) n_p_nb=logsumexp(n_p_nb,p_b+p) 总结
BS根据不同的场景可以有不同的写法,其主要目的在于在每个时间点选择TOPK的路径继续搜索,达到增加搜索效率的目的,在BS的搜索过程中,如果是生成字符串,我们还可以加入语言模型的分数,得到更好的结果:
Y ∗ = l o g P ( Y ∣ X ) + α P l m ( Y ) + β l e n ( Y ) Y^*=logP(Y|X)+\alpha P_{lm}(Y)+\beta len(Y) Y∗=logP(Y∣X)+αPlm(Y)+βlen(Y)
语言模型的加入地方一般为字符串扩增时。
参考
Sequence Modeling With CTC
本文《Beam Search与Prefix Beam Search的理解与python实现》版权归hangguns所有,引用Beam Search与Prefix Beam Search的理解与python实现需遵循CC 4.0 BY-SA版权协议。