python分割語音端點(diǎn)檢測(cè)
發(fā)布日期:2022/10/23 18:40:16 瀏覽量:
一、語音信號(hào)的分幀處理
語音信號(hào)是時(shí)序信號(hào),其具有長(zhǎng)時(shí)隨機(jī)性和短時(shí)平穩(wěn)性。長(zhǎng)時(shí)隨機(jī)性指語音信號(hào)隨時(shí)間變化是一個(gè)隨機(jī)過程,短時(shí)平穩(wěn)性指在短時(shí)間內(nèi)其特性基本不變,因?yàn)槿苏f話是肌肉具有慣性,從一個(gè)狀態(tài)到另一個(gè)狀態(tài)不可能瞬時(shí)完成。語音通常在10-30ms之間相對(duì)平穩(wěn),因此語音信號(hào)處理的第一步基本都是對(duì)語音信號(hào)進(jìn)行分幀處理,幀長(zhǎng)度一般取10-30ms。
語音信號(hào)的分幀處理通常采用滑動(dòng)窗的方式,窗口可以采用直角窗、Hamming窗等。窗口長(zhǎng)度決定每一幀信號(hào)中包含原始語音信號(hào)中信息的數(shù)量,窗口每次的滑動(dòng)距離等于窗口長(zhǎng)度時(shí),每一幀信息沒有重疊,當(dāng)窗口滑動(dòng)距離小于窗口長(zhǎng)度時(shí)幀信息有重合。本博文采用直角窗進(jìn)行語音信號(hào)的分幀處理:
直角窗:
h(n)={1,0≤n≤N?10,other{\rm{h}}(n) = \left\{ {\begin{matrix}
{1, 0\le n \le N - 1}\\
{0,{\rm{other}}}
\end{matrix}} \right.h(n)={1,0≤n≤N?10,other
二、端點(diǎn)檢測(cè)方法
端點(diǎn)檢測(cè)是指找出人聲開始和結(jié)束的端點(diǎn)。利用人聲信號(hào)短時(shí)特性與非人聲信號(hào)短時(shí)特性的差異可以有效地找出人聲開始和結(jié)束的端點(diǎn),本博文介紹短時(shí)能量和短時(shí)過零率結(jié)合進(jìn)行端點(diǎn)檢測(cè)的方法。
2.1、短時(shí)能量
第n幀信號(hào)的短時(shí)平均能量定義為:
En=∑m=n?N+1n[x(m)w(n?m)]2{E_n} = \sum\limits_{m = n - N + 1}^n {{{\left[ {x\left( m \right)w\left( {n - m} \right)} \right]}^2}}En=m=n?N+1∑n[x(m)w(n?m)]2
包含人聲信號(hào)的幀的短時(shí)平均能量大于非人聲信號(hào)的幀。
2.2、短時(shí)過零率
過零信號(hào)指通過零值,相鄰取樣值改變符號(hào)即過零,過零數(shù)是樣本改變符號(hào)的數(shù)量。
第n幀信號(hào)的平均短時(shí)過零數(shù)為:
Zn=∑m=n?N+1n∣sgn[x(m)]?sgn[x(m?1)]∣w(n?m){Z_n} = \sum\limits_{m = n - N + 1}^n {\left| {{\mathop{\rm sgn}} \left[ {x\left( m \right)} \right] - {\mathop{\rm sgn}} \left[ {x\left( {m - 1} \right)} \right]} \right|w\left( {n - m} \right)}Zn=m=n?N+1∑n∣sgn[x(m)]?sgn[x(m?1)]∣w(n?m)
w(n)={1/(2N),0≤n≤N?10,otherw\left( n \right) = \left\{ {\begin{matrix}
{1/\left( {2N} \right),0 \le n \le N - 1}\\
{0,other}
\end{matrix}} \right.w(n)={1/(2N),0≤n≤N?10,other
三、Python實(shí)現(xiàn)
import wave
import numpy as np
import matplotlib.pyplot as plt
def read(data_path):
’’’讀取語音信號(hào)
’’’
wavepath = data_path
f = wave.open(wavepath,’rb’)
params = f.getparams()
nchannels,sampwidth,framerate,nframes = params[:4] #聲道數(shù)、量化位數(shù)、采樣頻率、采樣點(diǎn)數(shù)
str_data = f.readframes(nframes) #讀取音頻,字符串格式
f.close()
wavedata = np.fromstring(str_data,dtype = np.short) #將字符串轉(zhuǎn)化為浮點(diǎn)型數(shù)據(jù)
wavedata = wavedata * 1.0 / (max(abs(wavedata))) #wave幅值歸一化
return wavedata,nframes,framerate
def plot(data,time):
plt.plot(time,data)
plt.grid(’on’)
plt.show()
def enframe(data,win,inc):
’’’對(duì)語音數(shù)據(jù)進(jìn)行分幀處理
input:data(一維array):語音信號(hào)
wlen(int):滑動(dòng)窗長(zhǎng)
inc(int):窗口每次移動(dòng)的長(zhǎng)度
output:f(二維array)每次滑動(dòng)窗內(nèi)的數(shù)據(jù)組成的二維array
’’’
nx = len(data) #語音信號(hào)的長(zhǎng)度
try:
nwin = len(win)
except Exception as err:
nwin = 1
if nwin == 1:
wlen = win
else:
wlen = nwin
nf = int(np.fix((nx - wlen) / inc) + 1) #窗口移動(dòng)的次數(shù)
f = np.zeros((nf,wlen)) #初始化二維數(shù)組
indf = [inc * j for j in range(nf)]
indf = (np.mat(indf)).T
inds = np.mat(range(wlen))
indf_tile = np.tile(indf,wlen)
inds_tile = np.tile(inds,(nf,1))
mix_tile = indf_tile + inds_tile
f = np.zeros((nf,wlen))
for i in range(nf):
for j in range(wlen):
f[i,j] = data[mix_tile[i,j]]
return f
def point_check(wavedata,win,inc):
’’’語音信號(hào)端點(diǎn)檢測(cè)
input:wavedata(一維array):原始語音信號(hào)
output:StartPoint(int):起始端點(diǎn)
EndPoint(int):終止端點(diǎn)
’’’
#1.計(jì)算短時(shí)過零率
FrameTemp1 = enframe(wavedata[0:-1],win,inc)
FrameTemp2 = enframe(wavedata[1:],win,inc)
signs = np.sign(np.multiply(FrameTemp1,FrameTemp2)) # 計(jì)算每一位與其相鄰的數(shù)據(jù)是否異號(hào),異號(hào)則過零
signs = list(map(lambda x:[[i,0] [i>0] for i in x],signs))
signs = list(map(lambda x:[[i,1] [i<0] for i in x], signs))
diffs = np.sign(abs(FrameTemp1 - FrameTemp2)-0.01)
diffs = list(map(lambda x:[[i,0] [i<0] for i in x], diffs))
zcr = list((np.multiply(signs, diffs)).sum(axis = 1))
#2.計(jì)算短時(shí)能量
amp = list((abs(enframe(wavedata,win,inc))).sum(axis = 1))
# # 設(shè)置門限
# print(’設(shè)置門限’)
ZcrLow = max([round(np.mean(zcr)*0.1),3])#過零率低門限
ZcrHigh = max([round(max(zcr)*0.1),5])#過零率高門限
AmpLow = min([min(amp)*10,np.mean(amp)*0.2,max(amp)*0.1])#能量低門限
AmpHigh = max([min(amp)*10,np.mean(amp)*0.2,max(amp)*0.1])#能量高門限
# 端點(diǎn)檢測(cè)
MaxSilence = 8 #最長(zhǎng)語音間隙時(shí)間
MinAudio = 16 #最短語音時(shí)間
Status = 0 #狀態(tài)0:靜音段,1:過渡段,2:語音段,3:結(jié)束段
HoldTime = 0 #語音持續(xù)時(shí)間
SilenceTime = 0 #語音間隙時(shí)間
print(’開始端點(diǎn)檢測(cè)’)
StartPoint = 0
for n in range(len(zcr)):
if Status ==0 or Status == 1:
if amp[n] > AmpHigh or zcr[n] > ZcrHigh:
StartPoint = n - HoldTime
Status = 2
HoldTime = HoldTime + 1
SilenceTime = 0
elif amp[n] > AmpLow or zcr[n] > ZcrLow:
Status = 1
HoldTime = HoldTime + 1
else:
Status = 0
HoldTime = 0
elif Status == 2:
if amp[n] > AmpLow or zcr[n] > ZcrLow:
HoldTime = HoldTime + 1
else:
SilenceTime = SilenceTime + 1
if SilenceTime < MaxSilence:
HoldTime = HoldTime + 1
elif (HoldTime - SilenceTime) < MinAudio:
Status = 0
HoldTime = 0
SilenceTime = 0
else:
Status = 3
elif Status == 3:
break
if Status == 3:
break
HoldTime = HoldTime - SilenceTime
EndPoint = StartPoint + HoldTime
return StartPoint,EndPoint,FrameTemp1
if __name__ == ’__main__’:
data_path = ’audio_data.wav’
win = 240
inc = 80
wavedata,nframes,framerate = read(data_path)
time_list = np.array(range(0,nframes)) * (1.0 / framerate)
plot(wavedata,time_list)
StartPoint,EndPoint,FrameTemp = point_check(wavedata,win,inc)
checkdata,Framecheck = check_signal(StartPoint,EndPoint,FrameTemp,win,inc)
————————————————
版權(quán)聲明:本文為CSDN博主「weixin_39710106」的原創(chuàng)文章,遵循CC 4.0 BY-SA版權(quán)協(xié)議,轉(zhuǎn)載請(qǐng)附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/weixin_39710106/article/details/111444972
馬上咨詢: 如果您有業(yè)務(wù)方面的問題或者需求,歡迎您咨詢!我們帶來的不僅僅是技術(shù),還有行業(yè)經(jīng)驗(yàn)積累。
QQ: 39764417/308460098 Phone: 13 9800 1 9844 / 135 6887 9550 聯(lián)系人:石先生/雷先生