Attention Is All You Need In Speech Separation

Type: Article

Publication Date: 2021-05-13

Citations: 364

DOI: https://doi.org/10.1109/icassp39728.2021.9413901

Abstract

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The Sep-Former learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest speech separation systems with comparable performance.

Locations

  • arXiv (Cornell University) - View - PDF
  • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - View

Works That Cite This (121)

Action Title Year Authors
+ MT3: Multi-Task Multitrack Music Transcription 2021 Josh Gardner
Ian Simon
Ethan Manilow
Curtis Hawthorne
Jesse Engel
+ PDF Chat Exploring Self-Attention Mechanisms for Speech Separation 2023 Cem Subakan
Mirco Ravanelli
Samuele Cornell
François Grondin
Mirko Bronzi
+ PDF Chat DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction 2022 Jiangyu Han
Yanhua Long
Lukáš Burget
Jaň Černocký
+ Dasformer: Deep Alternating Spectrogram Transformer For Multi/Single-Channel Speech Separation 2023 Shuo Wang
Xiang‐Yu Kong
Xiulian Peng
Hesam Movassagh
Vinod Prakash
Yan Lu
+ Ripple Sparse Self-Attention for Monaural Speech Enhancement 2023 Qiquan Zhang
Hongxu Zhu
Qi Song
Xinyuan Qian
Zhaoheng Ni
Haizhou Li
+ Robustdistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness 2023 Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
+ On The Design and Training Strategies for Rnn-Based Online Neural Speech Separation Systems 2023 Kai Li
Yi Luo
+ PDF Chat Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation 2023 Yoshiki Masuyama
Xuankai Chang
Wangyou Zhang
Samuele Cornell
Zhong-Qiu Wang
Nobutaka Ono
Yanmin Qian
Shinji Watanabe
+ Latent Iterative Refinement for Modular Source Separation 2023 Dimitrios Bralios
Efthymios Tzinis
Gordon Wichern
Paris Smaragdis
Jonathan Le Roux
+ MossFormer: Pushing the Performance Limit of Monaural Speech Separation Using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions 2023 Shengkui Zhao
Bin Ma