Very Deep Self-Attention Networks for End-to-End Speech Recognition
Very Deep Self-Attention Networks for End-to-End Speech Recognition
Recently, end-to-end sequence-to-sequence models for speech recognition have gained significant interest in the research community.While previous architecture choices revolve around time-delay neural networks (TDNN) and long shortterm memory (LSTM) recurrent neural networks, we propose to use self-attention via the Transformer architecture as an alternative.Our analysis shows that deep Transformer networks …