Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement

Type: Article

Publication Date: 2015-09-06

Citations: 97

DOI: https://doi.org/10.21437/interspeech.2015-358

Abstract

We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals.In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary mask (IBM), and integrate it into the original DNN architecture for joint optimization of all the parameters.This joint estimation scheme imposes additional constraints not available in the direct prediction of LPS, and potentially improves the learning of the primary target.Furthermore, the learned secondary information as a byproduct can be used for other purposes, e.g., the IBM-based post-processing in this work.A series of experiments show that joint LPS and MFCC learning improves the SE performance, and IBM-based post-processing further enhances listening quality of the reconstructed speech.

Locations

  • arXiv (Cornell University) - View - PDF
  • Interspeech 2022 - View

Similar Works

Action Title Year Authors
+ Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement 2017 Yong Xu
Jun Du
Zhen Ying Huang
Li-Rong Dai
Chin‐Hui Lee
+ Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement 2018 Ziyi Xu
Maximilian Strake
Tim Fingscheidt
+ PDF Chat Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement 2019 Ziyi Xu
Maximilian Strake
Tim Fingscheidt
+ Masks Fusion with Multi-Target Learning For Speech Enhancement 2021 Liangchen Zhou
Wenbin Jiang
Jingyan Xu
Fei Wen
Peilin Liu
+ Components Loss for Neural Networks in Mask-Based Speech Enhancement 2019 Ziyi Xu
Samy Elshamy
Ziyue Zhao
Tim Fingscheidt
+ Distortionless Multi-Channel Target Speech Enhancement for Overlapped Speech Recognition 2020 Bo Wu
Yu Meng
Lianwu Chen
Yong Xu
Chao Weng
Dan Su
Dong Yu
+ Normalized Features for Improving the Generalization of DNN Based Speech Enhancement 2017 Robert Rehr
Timo Gerkmann
+ Normalized Features for Improving the Generalization of DNN Based Speech Enhancement 2017 Robert Rehr
Timo Gerkmann
+ Complex spectrogram enhancement by convolutional neural network with multi-metrics learning 2017 Szu‐Wei Fu
Ting-Yao Hu
Yu Tsao
Xugang Lu
+ Incorporating Multi-Target in Multi-Stage Speech Enhancement Model for Better Generalization 2021 Lu Zhang
Mingjiang Wang
Andong Li
Zehua Zhang
Xuyi Zhuang
+ PDF Chat Complex spectrogram enhancement by convolutional neural network with multi-metrics learning 2017 Szu‐Wei Fu
Ting-Yao Hu
Yu Tsao
Xugang Lu
+ PDF Chat Multi-task Single Channel Speech Enhancement Using Speech Presence Probability As A Secondary Task Training Target 2021 Lei Wang
Jie Zhu
Ina Kodrasi
+ MDNet: Learning Monaural Speech Enhancement from Deep Prior Gradient 2022 Andong Li
Chengshi Zheng
Ziyang Zhang
Xiaodong Li
+ Multi-task single channel speech enhancement using speech presence probability as a secondary task training target 2020 L. Wang
J. Zhu
I. Kodrasi
+ Improving the Generalizability of Deep Neural Network Based Speech Enhancement. 2017 Robert Rehr
Timo Gerkmann
+ PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement 2022 Xiaofeng Ge
Jiangyu Han
Yanhua Long
Haixin Guan
+ Deep Interaction between Masking and Mapping Targets for Single-Channel Speech Enhancement 2021 Lu Zhang
Mingjiang Wang
Zehua Zhang
Xuyi Zhuang
+ Speech Enhancement using a Deep Mixture of Experts 2017 Shlomo E. Chazan
Jacob Goldberger
Sharon Gannot
+ PDF Chat On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement 2017 Robert Rehr
Timo Gerkmann
+ Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders 2020 Cheng Yu
Ryandhimas E. Zezario
Syu‐Siang Wang
Jonathan H. Sherman
Yi-Yen Hsieh
Xugang Lu
Hsin‐Min Wang
Yu Tsao

Works That Cite This (27)

Action Title Year Authors
+ Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling 2019 Peidong Wang
Ke Tan
DeLiang Wang
+ End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks 2017 Szu‐Wei Fu
Yu Tsao
Xugang Lu
Hisashi Kawai
+ PDF Chat DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score 2018 Yuma Koizumi
Kenta Niwa
Yusuke Hioka
Kazunori Kobayashi
Yoichi Haneda
+ Hierarchical learning for DNN-based acoustic scene classification 2016 Yong Xu
Qiang Huang
Wenwu Wang
Mark D. Plumbley
+ Incorporating Symbolic Sequential Modeling for Speech Enhancement 2019 Chien-Feng Liao
Yu Tsao
Xugang Lu
Hisashi Kawai
+ Deep Xi as a Front-End for Robust Automatic Speech Recognition 2019 Aaron Nicolson
Kuldip K. Paliwal
+ PDF Chat VACE-WPE: Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation 2021 Joon‐Young Yang
Joon‐Hyuk Chang
+ Distortionless Multi-Channel Target Speech Enhancement for Overlapped Speech Recognition 2020 Bo Wu
Yu Meng
Lianwu Chen
Yong Xu
Chao Weng
Dan Su
Dong Yu
+ A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition 2020 Linhao Dong
Cheng Yi
Jianzong Wang
Shiyu Zhou
Shuang Xu
Xueli Jia
Bo Xu
+ Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders 2020 Cheng Yu
Ryandhimas E. Zezario
Syu‐Siang Wang
Jonathan H. Sherman
Yi-Yen Hsieh
Xugang Lu
Hsin‐Min Wang
Yu Tsao