End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

Type: Article

Publication Date: 2020-04-09

Citations: 138

DOI: https://doi.org/10.1109/icassp40776.2020.9054177

Abstract

An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones. The former requires the system to be invariant to different indexing of the microphones with the same locations, while the latter requires the system to be able to process inputs with varying dimensions. Conventional optimization-based beamforming techniques satisfy these requirements by definition, while for deep learning-based end-to-end systems those constraints are not fully addressed. In this paper, we propose transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation. Based on the filter-and-sum network (FaSNet), a recently proposed end-to-end time-domain beamforming system, we show how TAC significantly improves the separation performance across various numbers of microphones in noisy reverberant separation tasks with ad-hoc arrays. Moreover, we show that TAC also significantly improves the separation performance with fixed geometry array configuration, further proving the effectiveness of the proposed paradigm in the general problem of multi-microphone speech separation.

Locations

  • arXiv (Cornell University) - View - PDF
  • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - View

Similar Works

Action Title Year Authors
+ End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation 2019 Yi Luo
Zhuo Chen
Nima Mesgarani
Takuya Yoshioka
+ Implicit Filter-and-sum Network for Multi-channel Speech Separation 2020 Yi Luo
Nima Mesgarani
+ PDF Chat Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments 2024 Ji Hyun Kim
Stijn Kindt
Nilesh Madhu
Hong-Goo Kang
+ PDF Chat Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments 2024 Ji Hyun Kim
Stijn Kindt
Nilesh Madhu
Hong-Goo Kang
+ Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training 2021 Changsheng Quan
Xiaofei Li
+ PDF Chat Multi-Channel Narrow-Band Deep Speech Separation with Full-Band Permutation Invariant Training 2022 Changsheng Quan
Xiaofei Li
+ PDF Chat FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio Processing 2019 Yi Luo
Cong Han
Nima Mesgarani
Enea Ceolini
Shih‐Chii Liu
+ FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing 2019 Yi Luo
Enea Ceolini
Cong Han
Shih‐Chii Liu
Nima Mesgarani
+ DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation 2020 Ali Aroudi
Sebastian Braun
+ DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation. 2020 Ali Aroudi
Sebastian Braun
+ PDF Chat Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters 2023 Kristina Tesch
Timo Gerkmann
+ Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters 2023 Kristina Tesch
Timo Gerkmann
+ PDF Chat Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation 2022 Rongzhi Gu
Shi-Xiong Zhang
Yuexian Zou
Dong Yu
+ Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation 2022 Rongzhi Gu
Shi-Xiong Zhang
Yuexian Zou
Dong Yu
+ VarArray: Array-Geometry-Agnostic Continuous Speech Separation 2021 Takuya Yoshioka
Xiaofei Wang
Dongmei Wang
Min Tang
Zirun Zhu
Zhuo Chen
Naoyuki Kanda
+ PDF Chat VarArray: Array-Geometry-Agnostic Continuous Speech Separation 2022 Takuya Yoshioka
Xiaofei Wang
Dongmei Wang
Min Tang
Zirun Zhu
Zhuo Chen
Naoyuki Kanda
+ Deep Ad-hoc Beamforming 2018 Xiao-Lei Zhang
+ Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement 2019 Zhong-Qiu Wang
Hakan Erdoğan
Scott Wisdom
Kevin Wilson
Desh Raj
Shinji Watanabe
Zhuo Chen
John R. Hershey
+ PDF Chat Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement 2021 Zhong-Qiu Wang
Hakan Erdoğan
Scott Wisdom
Kevin Wilson
Desh Raj
Shinji Watanabe
Zhuo Chen
John R. Hershey
+ PDF Chat Learning Filterbanks for End-to-End Acoustic Beamforming 2022 Samuele Cornell
Manuel Pariente
François Grondin
Stefano Squartini

Works That Cite This (90)

Action Title Year Authors
+ PDF Chat Learning to Rank Microphones for Distant Speech Recognition 2021 Samuele Cornell
Alessio Brutti
Marco Matassoni
Stefano Squartini
+ PDF Chat Distortion-Controlled Training for end-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss 2021 Yi Luo
Cong Han
Nima Mesgarani
+ PDF Chat DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction 2022 Jiangyu Han
Yanhua Long
Lukáš Burget
Jaň Černocký
+ PDF Chat Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation 2022 Ziye Yang
Shanzheng Guan
Xiao-Lei Zhang
+ PDF Chat Binaural Multichannel Blind Speaker Separation With a Causal Low-Latency and Low-Complexity Approach 2023 Nils L. Westhausen
Bernd T. Meyer
+ Dasformer: Deep Alternating Spectrogram Transformer For Multi/Single-Channel Speech Separation 2023 Shuo Wang
Xiang‐Yu Kong
Xiulian Peng
Hesam Movassagh
Vinod Prakash
Yan Lu
+ PDF Chat A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings 2023 Mohan Shi
Jie Zhang
Zhihao Du
Fan Yu
Qian Chen
Shiliang Zhang
Li-Rong Dai
+ PDF Chat Low Bit Rate Binaural Link for Improved Ultra Low-Latency Low-Complexity Multichannel Speech Enhancement in Hearing Aids 2023 Nils L. Westhausen
Bernd T. Meyer
+ PDF Chat Neural Speech Separation Using Spatially Distributed Microphones 2020 Dongmei Wang
Zhuo Chen
Takuya Yoshioka
+ PDF Chat Embedding and Beamforming: All-Neural Causal Beamformer for Multichannel Speech Enhancement 2022 Andong Li
Wenzhe Liu
Chengshi Zheng
Xiaodong Li