本文所用文件的链接
链接:https://pan.baidu.com/s/1RWNVHuXMQleOrEi5vig_bQ
提取码:p57s
语音识别
语音识别可以实现通过一段音频信息(wav波) 识别出音频的内容.
通过傅里叶变换, 可以将时间域的声音分解为一系列不同频率的正弦函数的叠加. 通过频率谱线的特殊分布, 建立音频内容与文本之间的对应关系, 以此作为模型训练的基础.
语音识别
梅尔频率倒谱系数(MFCC) 描述了与声音内容密切相关的13个特殊频率所对应的能量分布. 那么我们就可以使用梅尔频率倒谱系数(MFCC)矩阵作为语音识别的特征. 基于隐马尔科夫模型进行模式识别, 找到测试样本最匹配的声音模型, 从而识别语音内容.
- 准备多个声音样本作为训练数据. 并且为每个音频都标明其类别.
- 读取每一个音频文件, 获取音频文件的mfcc矩阵.
- 以mfcc作为训练样本, 进行训练.
- 对测试样本进行测试. (基于隐马模型)
MFCC相关API:
import scipy.io.wavfile as wf
import python_speech_features as sfsample_rate, sigs = wf.read('../xx.wav')
mfcc = sf.mfcc(sigs, sample_rate)
案例: MFCC提取
"""
MFCC提取
"""
import scipy.io.wavfile as wf
import python_speech_features as sf
import matplotlib.pyplot as mpsample_rate, sigs=wf.read('../ml_data/filter.wav')
mfcc = sf.mfcc(sigs, sample_rate)
print(mfcc.shape)mp.matshow(mfcc.T, cmap='gist_rainbow')
mp.title('MFCC')
mp.ylabel('Features', fontsize=14)
mp.xlabel('Samples', fontsize=14)
mp.tick_params(labelsize=10)
mp.show()
隐马尔科夫模型相关API:
import hmmlearn.hmm as hl
model = hl.GaussianHMM(n_components=4, covariance_type='diag', n_iter=1000)
model.fit(mfccs)
score = model.score(test_mfcc)
案例:
"""
语音识别
"""
import os
import numpy as np
import scipy.io.wavfile as wf
import python_speech_features as sf
import hmmlearn.hmm as hldef search_files(directory):directory = os.path.normpath(directory)objects = {}for curdir,subdirs,files in \os.walk(directory):for file in files:if file.endswith('.wav'):label = curdir.split(os.path.sep)[-1]if label not in objects:objects[label] = []path = os.path.join(curdir, file)objects[label].append(path)return objectstrain_samples = \search_files('../ml_data/speeches/training')
train_x, train_y = [], []
for label, filenames in train_samples.items():mfccs = np.array([])for filename in filenames:sample_rate, sigs = wf.read(filename)mfcc = sf.mfcc(sigs, sample_rate)if len(mfccs) == 0:mfccs = mfccelse:mfccs = np.append(mfccs, mfcc, axis=0)train_x.append(mfccs)train_y.append(label)
models = {}
for mfccs, label in zip(train_x, train_y):model = hl.GaussianHMM(n_components=4, covariance_type='diag', n_iter=1000)models[label] = model.fit(mfccs)
test_samples = \search_files('../ml_data/speeches/testing')
test_x, test_y = [], []
for label, filenames in test_samples.items():mfccs = np.array([])for filename in filenames:sample_rate, sigs = wf.read(filename)mfcc = sf.mfcc(sigs, sample_rate)if len(mfccs) == 0:mfccs = mfccelse:mfccs = np.append(mfccs, mfcc, axis=0)test_x.append(mfccs)test_y.append(label)
pred_test_y = []
for mfccs in test_x:best_score, best_label &#61; None, Nonefor label, model in models.items():score &#61; model.score(mfccs)if (best_score is None) or \(best_score < score):best_score, best_label&#61;score,labelpred_test_y.append(best_label)print(test_y)
print(pred_test_y)