Listening While Speaking and Visualizing: Improving ASR Through Multimodal Chain
Listening While Speaking and Visualizing: Improving ASR Through Multimodal Chain
Previously, a machine speech chain, which is based on sequence-to-sequence deep learning, was proposed to mimic speech perception and production behavior. Such chains separately processed listening and speaking by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) and simultaneously enabled them to teach each other in semi-supervised learning when they …