Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks
Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks
Self-attention network (SAN) can benefit significantly from the bi-directional representation learning through unsupervised pretraining paradigms such as BERT and XLNet.In this paper, we present an XLNet-like pretraining scheme "Speech-XLNet" to learn speech representations with self-attention networks (SANs).Firstly, we find that by shuffling the speech frame orders, Speech-XLNet serves as a …