relation: https://khub.utp.edu.my/scholars/11193/ title: Variance based time-frequency mask estimation for unsupervised speech enhancement creator: Saleem, N. creator: Khattak, M.I. creator: Witjaksono, G. creator: Ahmad, G. description: Variance based two dimensional time-frequency mask estimation for unsupervised speech enhancement is proposed to improve the speech quality and intelligibility by reducing the low-frequency residual noise distortion in the noisy speech signals. Unlike conventional speech enhancement methods, the proposed method is able to reduce the residual noise distortion by utilizing benefits of the less aggressive Wiener gain and variance based two dimensional time-frequency mask to establish a two-stage speech enhancement method. In the first stage, the less aggressive Wiener gain with modified a priori signal-to-noise (SNR) estimate is applied to the input noisy speech to obtain a reduced noise pre-processed speech signal. In the second stage, variance based features are extracted from the pre-processed speech and compared to a nonparametric adaptive threshold to construct a two dimensional time-frequency mask. The estimated mask is then applied to the pre-processed speech from the first stage to suppress the annoying residual noise distortion. A comparative performance study is included to demonstrate the effectiveness of the proposed method in various noisy conditions. The experimental results showed large improvements in terms of the perceptual evaluation of speech quality (PESQ), segmental SNR (SegSNR), residual noise distortion (BAK) and speech distortion (SIG) over that achieved with competing methods at different input SNRs. To measure the understanding of enhanced speech in different noisy conditions, short-time intelligibility prediction (STOI) is used which reinforced a better performance of the proposed method in terms of the speech intelligibility. The time-varying spectral analysis validated significant reduction of the residual noise components in the enhanced speech. © 2019, Springer Science+Business Media, LLC, part of Springer Nature. publisher: Springer New York LLC date: 2019 type: Article type: PeerReviewed identifier: Saleem, N. and Khattak, M.I. and Witjaksono, G. and Ahmad, G. (2019) Variance based time-frequency mask estimation for unsupervised speech enhancement. Multimedia Tools and Applications, 78 (22). pp. 31867-31891. ISSN 13807501 relation: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85069714676&doi=10.1007%2fs11042-019-08032-y&partnerID=40&md5=2f59c3a2e9bac3fb5dd17568d68131b1 relation: 10.1007/s11042-019-08032-y identifier: 10.1007/s11042-019-08032-y