当前位置: 开发笔记 > 编程语言 > 正文

深度学习笔记个人阅读的DeepLearning方向的paper整理

作者：噢是你1988 | 来源：互联网 | 2023-09-10 15:56

整理和语音相关的资料。参考：https:blog.csdn.netchenriwei2articledetails38064555一RNN1Recurrentn

整理和语音相关的资料。

参考&＃xff1a;https://blog.csdn.net/chenriwei2/article/details/38064555

一 RNN

1 Recurrent neural network based language model

RNN用在语言模型上的开山之作

2 Statistical Language Models Based on Neural Networks

Mikolov的博士论文&＃xff0c;主要将他在RNN用在语言模型上的工作进行串联

3 Extensions of Recurrent Neural Network Language Model

开山之作的延续&＃xff0c;RNN网络的一些改进&＃xff0c;如通过类别信息去降低模型的参数

4 A guide to recurrent neural networks and backpropagation

RNN网络的介绍以及优化算法&＃xff0c;是了解RNN网络的好文章

5 Training Recurrent Neural Networks

Ilya Sutskever的博士论文&＃xff0c;RNN网络的训练一直是个难点&＃xff0c;介绍RNN网络的训练优化方法

6 Strategies for Training Large Scale Neural Network Language Models

介绍训练RNN网络训练语言模型的一些Trick

7 Recurrent Neural Networks for Language Understanding

RNN网络语义理解方面的工作

8 Empirical Evaluation and Combination of Advanced Language Modeling Techniques

介绍一些语言模型联合技术的一些经验&＃xff0c;其中有RNN语言模型与其他模型combinine的工作

9 Speech Recognition with Deep Recurrent Neural Networks

RNN网络用在语音识别方面的工作

10 A Neural Probabilistic Language Model

不是RNN&＃xff0c;Yoshua Bengio早期将神经网络用于训练语言模型的工作&＃xff0c;也算是为后续的RNN用于语言模型铺好了基础。

11 On the diffculty of training Recurrent Neural Networks

介绍了RNN网络训练的难点&＃xff0c;比如消失的梯度&＃xff0c;以及提出的一些解决方法

12 Subword Language Modeling with Neural Networks

词级的语言模型由于OOV问题对新词不适应&＃xff0c;而字符级的语言模型虽然能克服这种问题&＃xff0c;但是模型训练的复杂度要提升&＃xff0c;

为了将两种特性结合提出了子词级的RNN语言模型训练&＃xff0c;文中还利用k-means对模型参数进行了压缩处理。

13 Performance Analysis of Neural Networks in Combination with N-Gram Language Models

关于N-gram和神经网络语言模型联合模型的性能分析&＃xff0c;从实验的角度分析性能会提升

14 Recurrent Neural Network based Language Modeling in Meeting Recognition

利用RNN与N-gram结合&＃xff0c;重估得分提升语音识别系统性能

二 DNN

1 A practical guide to training restricted Boltzmann machines

介绍RBM以及训练RBM时的N多trick,如果要实现RBM算法&＃xff0c;这篇文章必看

2 A fast learning algorithm for deep belief nets

Hinton的经典之作&＃xff0c;Deep Learning的开山之作&＃xff0c;算是Deep Learning爆发的起点

3 A Learning Algorithm for Boltzmann Machines

85年较老的介绍如何Boltzmann训练算法

4 Greedy Layer-Wise Training of Deep Networks

可以看作Yoshua Bengio对06年Hinton工作的延续和总结&＃xff0c;与06年的文章很具有互补性&＃xff0c;是入门Deep Learning的必备文章

文章中也介绍了一些trick,如如何处理第一层节点为实值的情况等等

5 Large Scale Distributed Deep Networks

google的Jeffrey Dean小组工作&＃xff0c;DistBelief框架的提出&＃xff0c;主要介绍了google如何采用分布式以及模型切分处理深度网络&＃xff0c;加速其训练效果。

6 Context Dependent Pretrained Deep Neural Networks fo Large Vocabulary Speech Recognition

微软在语音上的成功应用&＃xff0c;语音识别系统相对错误率降了20%多&＃xff0c;算是Deep Learning在工业界第一个成功案例&＃xff0c;其影响轰动一时。

7 Deep Belief Networks for phone recognition

Hinton小组将DNN用于语音上的早期工作&＃xff0c;是微软工作的基础

8 Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition

DNN在大词汇量会话语音识别工作&＃xff0c;里面有一些Voice Search和Youtube上的实验报道

9 An Empirical Study of Learning Rates in Deep Neural Networks for Speech Recognition

google的DNN-HMM语音识别系统上学习率的一些调参经验

10 Acoustic Modeling using Deep Belief Networks

Hinton小组早期在语音上的工作&＃xff0c;主要是介绍如何将DNN运用于声学模型训练

11 Deep Neural Networks for Acoustic Modeling in Speech Recognition

微软、google、IBM等几家工业界巨头对DNN在语音识别上的一些共同观点

12 Deep Belief Networks Using Discriminative Features for Phone Recognition

Hinton小组和IBM的对于采用一些区分性特征训练DNN网络的工作&＃xff0c;采用LDA降维到40维

13 A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition

DNN实验方面的对比&＃xff0c;比如采用不同的预训练方式&＃xff1a;区分性预训练和DBN生成式预训练方式对比&＃xff0c;以及神经元非线性的改变

14 Asynchronous Stochastic Gradient Desent for DNN Training

中科院的文章&＃xff0c;异步式的GPU并行训练&＃xff0c;思想基本跟DistBelief差不多&＃xff0c;只不过硬件换成了GPU&＃xff0c;模型没有做切分

15 Improving Deep Neural Networks For LVCSR using Rectified Linear Units and Dropout

利用ReLU和Dropout技术提升DNN-HMM系统

16 Improving the speed of neural networks on CPUs

google加速神经网络前向传播速度的工作&＃xff0c;如利用定点计算、SIMD技术等

17 Improved Bottleneck Features Using Pretrained Deep Neural Networks

微软DNN-HMM系统的相关工作

18 Improved feature processing for Deep Neural Networks

利用特征处理技术提升DNN-HMM系统&＃xff0c;具体的是对13维MFCC特征拼接9帧&＃xff0c;进行LDA-MLLT变换&＃xff0c;最后

也可加入SAT模块得到处理过的40维特征&＃xff0c;作为DNN-HMM系统

19 Improving neural networks by preventing co-adaptation of feature detectors

主要讲了Dropout技术和其实验比较结果分析&＃xff0c;把Dropout看做模型平均化结果

20 Exploiting Sparseness in Deep Neural Networks fo Large Vocabulary Speech Recognition

采用soft regularization和convex constraint的手段使DNN模型更加的稀疏化&＃xff0c;稀疏化的目的是

减小模型复杂度&＃xff0c;提升计算速度和模型的泛化能力

21 Feature Learning in Deep Neural Networks Studies on Speech Recognition Tasks

主要从Feature Learning的角度讨论DNN网络&＃xff0c;讨论了为何DNN网络deeper更佳&＃xff0c;为什么DNN能学出更鲁邦的特征等等。

22 Improving Neural Networks with Dropout

Hinton学生Nitish Srivastava的硕士论文&＃xff0c;主要讨论了Droput技术在神经网络的作用。

23 Learning Features from Music Audio with Deep Belief Networks

DNN深度网络在音乐分类的应用&＃xff0c;特征为MFCC&＃xff0c;类别为hiphop、blues等曲风类型

24 Low-Rank Matrix Factorization for Deep Neural Network Training with High-Dimensional Output Targets

IBM方面的工作&＃xff0c;利用低秩矩阵分解的技术解决DNN分类层权重参数过多的问题

25 Multilingual Training of Deep Neural Networks

DNN多语言方面的应用&＃xff0c;调优的时候只调分类层参数即可

26 A Cluster-Based Multiple Deep Neural Networks Method for Large Vocabulay Continuous Speech Recognition

利用类别信息分数据训练&＃xff0c;然后将所有数据训练出的小模型信息整合进了贝叶斯框架&＃xff0c;加速了整个训练过程&＃xff0c;但精度会损失&＃xff0c;解码

也会变慢

27 Restructuring of Deep Neural Network Acoustic Models with Singular Value

提出采用SVD技术对权重矩阵进行压缩&＃xff0c;减少模型的复杂度

28 Sparse Feature Learning for Deep Belief Networks

Marc’Aurelio Ranzato提出的一种unsupervised feature learning的方式&＃xff0c;这种训练的优势在于低维特性和稀疏特性&＃xff0c;

文中对比了RBM和PCA方法。

29 Training products of experts by minimizing contrastive

Hinton提出的PoE模型&＃xff0c;文中讨论了如何训练PoE模型&＃xff0c;RBM模型也是一种特殊的PoE模型&＃xff0c;RBM的训练也是从此演化而来&＃xff0c;如果

要理解CD算法原理&＃xff0c;这篇文章必读。

30 Understanding How Deep Belief Networks Perform Acoustic Modelling

文中主要讨论了DBN模型为什么在声学模型训练会取得较好系统性能的几个方面&＃xff0c;但是没有理论上的支持.

31 Pipelined Back-Propagation for Context-Dependent Deep Neural Networks

采用多GPU技术pipelined方式并行训练网络&＃xff0c;文中还提到了一些并行措施&＃xff0c;如数据并行化、模型并行化

32 Recent Advances in Deep Learning for Speech Research at Microsoft

文章主要介绍了微软在Deep Learning方面工作的进展&＃xff0c;如回归原始特征&＃xff0c;多任务特征学习、DNN模型的自适应等等

32 Rectified Linear Units Improve Restricted Boltzmann Machines

介绍ReLU技术在RBM模型上的运用&＃xff0c;即非线性层的替换。

33 Reducing the Dimensionality of Data with Neural Networks

Hinton发表在science上的文章&＃xff0c;主要介绍了如何利用神经网络进行非线性降维&＃xff0c;文中对比了PCA线性降维技术

34 Data Normalization in the Learning of Restricted Boltzmann Machines

RBM训练方面数据处理的小trick,对数据进行零均值化处理使RBM训练更鲁邦。

35 Connectionist Probability Estimators in HMM Speech Recognition

早期神经网络运用于声学模型训练的方法&＃xff0c;其实也是现在DNN-HMM工作的基础

36 Deep Learning for Robust Feature Generation in Audio-Visual Emotion Recognition

Deep Learning在视听系统情感分析的运用&＃xff0c;文中提出了多种视觉信号与听觉信号混合训练模型

37 Improving Training Time of Deep Belief Networks Through Hybrid Pre-Training And Larger Batch Sizes

采用混合式的预训练方式&＃xff0c;即生成式预训练和区分式预训练相结合方式&＃xff0c;文中还认为加大minbatch的尺寸可以增加数据并行化粒度

38 Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient

提出训练RBM的新算法PCD&＃xff0c;与CD算法不同的是全程只有一条马尔科夫链&＃xff0c;参数更新时不用重启一条新的马尔科夫链&＃xff0c;当然这么做的一个

假设前提是参数更新时&＃xff0c;模型的改变不是很大&＃xff0c;文中也提到了采用小的学习率。

39 Classification using Discriminative Restricted Boltzmann Machines

区分性DRBM的提出&＃xff0c;相比于生成式模型RBM优化的是p(x,y)函数&＃xff0c;区分性DRBM优化的是p(y|x)函数&＃xff0c;而这里的y是标签&＃xff0c;文中还提出了混合版本。

40 Learning Multiple Layers of Features from Tiny Images

Hinton学生Alex Krizhevsky的硕士论文&＃xff0c;主要是DNN工作的一些串联

41 Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition

讨论如何有效训练DNN&＃xff0c;侧重于如何并行训练方面

42 Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks

IBM的Tara N. Sainath小组DNN工作上的一些技巧总结&＃xff0c;侧重于如何提升并行化力度技巧和减少模型参数&＃xff0c;IBM主要利用对分类层做低秩矩阵分解。

而CNN虽然是DNN的演化版本&＃xff0c;参数量相对较小&＃xff0c;但是目前语音识别中最好的CNN效果跟参数量相近的DNN效果差不多。

43 Parallel Training of Neural Networks for Speech Recognition

神经网络并行化训练方面的工作&＃xff0c;文中的工作主要分为两部分&＃xff1a;多线程多核的并行化和基于SIMD的GPU并行化。

44 Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices

google在移动端语音识别实践性的工作&＃xff0c;特别是DNN和LM的优化&＃xff0c;DNN的优化方面主要包括定点计算、SIMD加速、Batch lazy计算和frame skipping技术

语言模型方面也做一定的压缩技巧。参考价值较大的实战性文章。

45 Cross-Language Knowledge Transfer Using Multilingual Deep Neural Network with Shared Hidden Layers

DNN多语言的训练&＃xff0c;所有语言共享相同隐层特征&＃xff0c;而分类层面向不同语言&＃xff0c;这种训练降低了3-5%左右&＃xff0c;原因有点类似于transfer learning,

不同语言之间的知识是可以transfer借鉴的。

46 Improving Wideband Speech Recognition using Mixed-Bandwidth Training Data in CD-DNN-HMM

利用8-kHz和16-kHz做不同的频带的CD-DNN-HMM混合训练&＃xff0c;其中比较重要的是如何设计不同频带的filter-bank对准问题&＃xff0c;

文中还有一些关于filter-bank的训练技巧&＃xff0c;如是否采用动态特征和静态特征训练。

47 Robust Visual Recognition Using Multilayer Generative Neural Networks

Hinton学生Yichuan Tang的硕士论文&＃xff0c;DNN视觉识别方面工作的串联

48 Deep Boltzmann Machines

DBM模型开篇文章。

49 On Rectified Linear Units for Speech Processing

ReLU在语音识别上的性能分析

三 CNN

1 Deep Convolutional Network Cascade for Facial Point Detection

CNN用在人脸关键点检测工作

2 Applying Convolutional Neural Networks Concepts to Hybrid NN-HMM Model for Speech Recognition

CNN运用于语音识别系统

3 ImageNet Classification with Deep Convolutional Neural Networks

12年Hinton组在ImageNet竞赛上的CNN算法&＃xff0c;不过细节不多&＃xff0c;里面介绍了网络中使用的trick,特别是relu

4 Gradient-Based Learning Applied to Document Recognition

Yann LeCun的经典文章&＃xff0c;CNN开山之作&＃xff0c;要了解CNN必先读这篇

5 A Theoretical Analysis of Feature Pooling in Visual Recognition

Pooling在视觉识别中的原理分析以及视觉识别中的比如HOG、SIFT一些类似手段总结

6 What is the Best Multi-Stage Architecture for Object Recognition

文中讨论了在OR问题上怎么样去设计多级结构以获取较好的识别性能&＃xff0c;谈的更多地是模型架构上的问题&＃xff0c;如通过怎么样的结构

获取特征的不变性&＃xff0c;怎么样去联合层级的信息&＃xff0c;做视觉的应该好好看看这篇文章

7 Deep Convolutional Neural Networks for LVCSR

CNN在LVCSR上实际运用

8 Learning Mid-Level Features For Recognition

这篇论文视觉的应该看下&＃xff0c;对当前视觉识别框架的分析以及框架个部分的关联&＃xff0c;比如coding和pooling技术。

9 Convolutional Networks and Applications in Vision

卷积网络在视觉应用的分析&＃xff0c;做视觉的应该看看。文中认为分层的思想是视觉应用当中良好的内部表达。文中将卷积网络拆分成

Filter Bank层、非线性层、pooling层进行分析。

10 Convolutional Neural Networks Applied to House Numbers Digit Classification

卷积网络用在房屋数字分类的案例&＃xff0c;文中采用了LP pooling技术&＃xff0c;通过gaussian kernel产生增大stronger特征权重&＃xff0c;抑制weaker特征权重的效应。

11 Visualizing and Understanding Convolutional Networks

卷积网络特征可视化方面的工作&＃xff0c;非常有意义的工作&＃xff0c;通过Deconvnet的方式来可视化卷积网络层的特征&＃xff0c;借助于这些特征可以帮助我们调整模型。

12 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

提出随机pooling技术&＃xff0c;不同于max pooling和average pooling&＃xff0c;pooling的形式是随机化选择的&＃xff0c;

文章观点认为随机pooling技术类似于dropout一样做了正则化作用&＃xff0c;等价于输入图像通过加噪声形成很多不同复制训练样本通过max pooling层&＃xff0c;有效地防止过拟合

13 Adaptive Deconvolutional Networks for Mid and High Level Feature Learning

中层、高层特征无监督的学习方法&＃xff0c;通过Deconvolution方式进行重构学习出图像特征。

14 Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

实践性的卷积网络方面工作&＃xff0c;文中提到如何应对训练数据较少情况的方法可以参考下。

15 Multi-column Deep Neural Networks for Image Classification

联合多个深度网络模型做平均化处理。

16 Differentiable Pooling for Hierarchical Feature Learning

一种基于高斯方法的Differentiable Pooling提出&＃xff0c;阅读这篇文章先要阅读13文章&＃xff0c;相比max pooling、average pooling在运用

Deconvolution方式进行重构会有一些优势。

17 Notes on Convolutional Neural Networks

较为详细的卷积神经网络&＃xff0c;包括梯度的计算等等。

18 Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition

非监督学习的算法PSD&＃xff0c;在Sparse Coding框架的基础上&＃xff0c;加了通过非线性变换后的基接近Sparse Coding的稀疏基的限制。

优化目标函数的时候会先固定住一些参数&＃xff0c;思想有点类似于坐标梯度下降算法。

19 Deep Neural Networks for Object Detection

google用基于DNN&＃xff08;实际是CNN&＃xff09;regression做Object Detection&＃xff0c;先析出mask,然后再精确定位。

20 Multi-GPU Training of ConvNets

多GPU并行训练卷积网络的一些工程技巧

21 Flexible, High Performance Convolutional Neural Networks for Image Classification

CNN采用GPU训练的实战性文章&＃xff0c;算是早期文章。

22 Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

google街景数字图片识别&＃xff0c;用CNN析出特征后转化为有序数字序列识别问题&＃xff0c;传统的OCR数字识别一般是要做分割&＃xff0c;

而这里作为一个整体序列进行识别&＃xff0c;文中还报道了提出模型在多种数据集下的识别率。训练的框架也是采用google的DistBelief框架。

四其他

1 An Introduction to Deep Learning

Deep Learning综述性的短文&＃xff0c;比较简短&＃xff0c;文中只是简单地提到了一些常用Deep Learning模型

2 The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

文中主要讨论了深度结构训练的难点&＃xff0c;从实验数据的角度分析了预训练的优势&＃xff0c;文中有一个有趣的观点&＃xff0c;讨论预训练的行为

类似于正则化权重矩阵。

3 Why Does Unsupervised Pre-training Help Deep Learning

文章讨论了无监督学习会帮助Deep Learning的几个方面&＃xff0c;提出了Pre-training as a Regularizer的观点&＃xff0c;从实验数据中分析&＃xff0c;

并没有理论的基础&＃xff0c;这也是Deep Learning的现阶段最被人诟病的&＃xff0c;没有完整的理论体系支撑。

4 Learning Deep Architectures for AI

Yoshua Bengio在Deep Learning的综述文章&＃xff0c;想要大概了解Deep Learning领域可以先看看这篇&＃xff0c;可以扫着看。

5 Representation Learning A Review and New Perspectives

Yoshua Bengio的在Representation Learning的综述性文章。

6 On Optimization Methods for Deep Learning

文中讨论了Deep Learning的几种优化方式&＃xff1a;SGD、L-BFGS、CG。实验对别了几种优化方式的优缺点。

7 Using Very Deep Autoencoders for Content-Based Image Retrieval

用Autoencoder的中间节点表征图像全局特征&＃xff0c;用于图像搜索。

8 Deep Learning For Signal And Information Processing

2013年龙星机器学习邓力的讲课资料&＃xff0c;主要侧重于deep learning在语音方面,比较详细。

9 On the Importance of Initialization and Momentum in Deep Learning

介绍初始化和Momentum技术在deep learning方面的重要性&＃xff0c;更多的是在实验分析上

10 Dropout Training as Adaptive Regularization

文章从原理上分析dropout技术&＃xff0c;等价于自适应的正则化技术

11 Deep learning via Hessian-free optimization

目前大部分的Deep learning优化都是基于随机梯度优化算法&＃xff0c;本文提出了一种基于Hessian-free的二阶优化算法。

12 Deep Stacking Networks For Information Retrival

DSN网络用在信息检索方面的工作

13 Deep Convex Net: A Scalable Architecture for Speech Pattern Classification

微软方面为了克服DNN并行化训练困难所设计出来的模型&＃xff0c;在计算的scalability有很大优势

14 Parallel Training of Deep Stacking Networks

DSN训练并行化

15 Scalable CALABLE Stacking and Learning for Building Deep Architectures

DSN方面的关联文章&＃xff0c;相关的几篇都可以联合起来一起看

入门学习

语音识别研究的四大前沿方

https://blog.csdn.net/haima1998/article/details/79094341

深度学习入门论文&＃xff08;语音识别领域&＃xff09;

https://blog.csdn.net/youyuyixiu/article/details/53764218

论语音识别三大关键技术

https://blog.csdn.net/qq_34231800/article/details/80189617

深度学习与语音识别—常用声学模型简介

https://blog.csdn.net/dujiajiyi_xue5211314/article/details/53943313

有趣的开源软件&＃xff1a;语音识别工具Kaldi

https://blog.csdn.net/AMDS123/article/details/70313780

神经网络-CNN结构和语音识别应用

https://blog.csdn.net/xmdxcsj/article/details/54695995

语音识别概述

https://blog.csdn.net/shichaog/article/details/72528637

端到端语音识别

https://blog.csdn.net/xmdxcsj/article/details/70300546

Attention在语音识别中的应用

https://blog.csdn.net/quheDiegooo/article/details/76842201

语音合成技术

https://blog.csdn.net/wja8a45TJ1Xa/article/details/78599509?locationNum&＃61;8&fps&＃61;1

深度学习于语音合成研究综述

https://blog.csdn.net/weixin_37598106/article/details/81513816

端到端的TTS深度学习模型tacotron(中文语音合成)

https://blog.csdn.net/yunnangf/article/details/79585089

TACOTRON:端到端的语音合成

https://blog.csdn.net/Left_Think/article/details/74905928

声纹识别技术简介

https://www.cnblogs.com/wuxian11/p/6498699.html

声纹识别技术的现状、局限与趋势

https://blog.csdn.net/jojozhangju/article/details/78637221

声纹识别

https://www.jianshu.com/p/513dadeef1fd

Deep speaker介绍

https://blog.csdn.net/Lauyeed/article/details/79936632

论文

语音识别 DNN

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al.

https://ieeexplore.ieee.org/document/5740583/?part&＃61;1

Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp&＃61;&arnumber&＃61;6296526

语音识别 CNN

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp&＃61;&arnumber&＃61;6288864

Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp&＃61;&arnumber&＃61;6639347

Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al.

https://infoscience.epfl.ch/record/210029/files/Palaz_INTERSPEECH_2015.pdf

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition(2016), Yanmin Qian et al.

https://pdfs.semanticscholar.org/8043/cbfed66c98d2255ea79254de620837478099.pdf

Very deep multilingual convolutional neural networks for LVCSR(2016), Tom Sercu et al.

https://arxiv.org/pdf/1509.08967.pdf

Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al.

https://arxiv.org/pdf/1604.01792.pdf

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al.

https://pdfs.semanticscholar.org/716e/60cbbdacf01b3148e91a555358a96308b770.pdf?_ga&＃61;2.38333155.198966451.1540996486-1278087525.1535180761

语音识别 LSTM

Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al.

https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43905.pdf

Deep LSTM for Large Vocabulary Continuous Speech Recognition(2017), Xu Tian et al.

https://arxiv.org/pdf/1703.07090.pdf

English Conversational Telephone Speech Recognition by Humans and Machines(2017), George Saon et al.

https://arxiv.org/pdf/1703.02136.pdf

语音识别 CTC

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al.

http://citeseerx.ist.psu.edu/viewdoc/download?doi&＃61;10.1.1.75.6306&rep&＃61;rep1&type&＃61;pdf

Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al.

http://proceedings.mlr.press/v32/graves14.pdf

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al.

https://arxiv.org/pdf/1408.2873.pdf

Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al.

https://arxiv.org/pdf/1412.5567.pdf

Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al.

https://arxiv.org/pdf/1511.06841.pdf

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al.

https://arxiv.org/pdf/1507.06947.pdf

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al.

https://arxiv.org/pdf/1609.06773.pdf

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al.

http://proceedings.mlr.press/v48/amodei16.pdf

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System(2016), Ronan Collobert et al.

https://arxiv.org/pdf/1609.03193.pdf

Multi-task Learning with CTC and Segmental CRF for Speech Recognition(2017), Liang Lu et al.

https://arxiv.org/pdf/1702.06378.pdf

Residual Convolutional CTC Networks for Automatic Speech Recognition(2017), Yisen Wang et al.&＃96;

https://arxiv.org/pdf/1702.07793.pdf

语音识别 Sequence Transduction

Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al.

https://arxiv.org/pdf/1211.3711.pdf

语音识别 attention

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al.

https://arxiv.org/pdf/1412.1602.pdf

Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al.

https://arxiv.org/pdf/1506.07503.pdf

End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al.

https://arxiv.org/pdf/1508.04395.pdf

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al.

https://arxiv.org/pdf/1508.01211.pdf

End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian.

https://arxiv.org/pdf/1610.05361.pdf

Direct Acoustics-to-Word Models for English Conversational Speech Recognition(2017), Kartik Audhkhasi et al.

https://arxiv.org/pdf/1703.07754.pdf

语音识别多通道

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition(2017), Tara N. Sainath et al.

http://www.ee.columbia.edu/~ronw/pubs/taslp2017-multichannel.pdf

Multichannel End-to-end Speech Recognition(2017), Tsubasa Ochiai et al.

https://arxiv.org/pdf/1703.04783.pdf

语音合成 SampleRNN

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model(2016), Soroush Mehri et al.

https://arxiv.org/pdf/1612.07837.pdf

语音合成 WaveNet

WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al.

https://arxiv.org/pdf/1609.03499.pdf

语音合成 Deep Voice

Deep Voice: Real-time Neural Text-to-Speech(2017), Sercan O. Arik et al.

https://arxiv.org/pdf/1702.07825.pdf

语音合成 Deep Voice 2

Deep Voice 2: Multi-Speaker Neural Text-to-Speech(2017), Sercan Arik et al.

https://arxiv.org/pdf/1705.08947.pdf

语音合成 Tacotron

Tacotron: Towards End-to-End Speech Synthesis(2017), Yuxuan Wang et al.

https://pdfs.semanticscholar.org/f258/f0d3260e7fbdd961993086aaafa2afc714c9.pdf

语音合成 Tacotron 2

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions(2018), Jonathan Shen et al.

https://sigport.org/sites/default/files/docs/ICASSP%202018%20-%20Tacotron%202.pdf

语音合成 Voiceloop

Voiceloop: Voice Fitting and Synthesis via a Phonological Loop(2018), Yaniv Taigman et al.

https://arxiv.org/pdf/1707.06588.pdf

声纹识别 x-vector 使用TDNN提取语音的embedding

Deep Neural Network Embeddings for Text-Independent Speaker Veriﬁcation(2017), David Snyder et al.

http://danielpovey.com/files/2017_interspeech_embeddings.pdf

百度端到端声纹识别 Triplet Loss

Deep Speaker: an End-to-End Neural Speaker Embedding System(2017), Chao Li et al.

https://arxiv.org/pdf/1705.02304.pdf

声纹识别 3D卷积网络

Text-independent speaker verification using 3d convolutional neural networks(2018), Amirsina Torﬁ et al.

https://arxiv.org/pdf/1705.09422.pdf

声纹识别端到端 GE2E

Generalized End-to-End Loss for Speaker Verfication(2018) Wan L et al.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp&＃61;&arnumber&＃61;8462665

代码

kaldi

使用广泛的语音工具包

https://github.com/kaldi-asr/kaldi

A TensorFlow implementation of Baidu&＃39;s DeepSpeech architecture

语音识别 Baidu DeepSpeech TensorFlow实现

https://github.com/mozilla/DeepSpeech

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind&＃39;s WaveNet and tensorflow

语音识别 DeepMind&＃39;s WaveNet TensorFlow实现

https://github.com/buriburisuri/speech-to-text-wavenet

End-to-end automatic speech recognition system implemented in TensorFlow.

端到端语音识别 TensorFlow实现

https://github.com/zzw922cn/Automatic_Speech_Recognition

A PyTorch Implementation of End-to-End Models for Speech-to-Text

端到端语音识别 PyTorch实现

https://github.com/awni/speech

A PaddlePaddle implementation of DeepSpeech2 architecture for ASR.

语音识别 DeepSpeech2 PaddlePaddle实现

https://github.com/PaddlePaddle/DeepSpeech

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

语音合成 Tacotron TensorFlow实现

https://github.com/Kyubyong/tacotron

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

语音合成 Tacotron2 PyTorch实现

https://github.com/NVIDIA/tacotron2

Deep neural networks for voice conversion (voice style transfer) in Tensorflow

语音合成 Deep-voice TensorFlow实现

https://github.com/andabi/deep-voice-conversion

A method to generate speech across multiple speakers

语音合成 facebook PyTorch实现

https://github.com/facebookresearch/loop

Speaker embedding(verification and recognition) using Pytorch

声纹识别 PyTorch实现

https://github.com/qqueing/DeepSpeaker-pytorch

Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

声纹识别 3D卷积 TensorFlow实现

https://github.com/astofi/3D-convolutional-speaker-recognition

产品应用

百度语音官网

http://yuyin.baidu.com/

腾讯AI开放平台

https://ai.qq.com/product/aaiasr.shtml

讯飞开放平台

https://xfyun.cn/services/voicedictation

必应语音

https://azure.microsoft.com/zh-cn/services/cognitive-services/speech/

深度学习笔记个人阅读的DeepLearning方向的paper整理

Spark 弹性分布式数据集详解

Java 15 发布，带来多项重要更新！

javax.mail.search.BodyTerm.matchPart()方法的使用及代码示例

com.sun.javadoc.PackageDoc.exceptions()方法的使用及代码示例

Cocos2d-x学习笔记：基础概念解析与内存管理机制深入探讨

您的数据库配置是否安全？DBSAT工具助您一臂之力！

WordPress Duplicator 0.4.4 版本存在跨站脚本攻击漏洞分析

解决Android EditText中TextWatcher的onTextChanged方法频繁触发问题

提升视觉效果：Unity3D中的HDR与Bloom技术（高动态范围成像与光线散射）

使用 Jupyter Notebook 实现 Markdown 编写与代码运行

支持 Cognito User Pools 类型的 API Gateway 授权器

使用Tkinter构建51Ape无损音乐爬虫UI

Vue 实现表格分页功能详解

C#实现文件的压缩与解压

HTTP header 介绍