Prefer a chat interface with context about you and your work?
Unsupervised Pre-Training of Bidirectional Speech Encoders via Masked Reconstruction
We propose an approach for pre-training speech representations via a masked reconstruction loss. Our pre-trained encoder networks are bidirectional and can therefore be used directly in typical bidirectional speech recognition models. The pre-trained networks can then be fine-tuned on a smaller amount of supervised data for speech recognition. Experiments with …