Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework
Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework
Previous research has proposed a machine speech chain to enable automatic speech recognition (ASR) and text-to-speech synthesis (TTS) to assist each other in semi-supervised learning and to avoid the need for a large amount of paired speech and text data.However, that framework still requires a large amount of unpaired (speech …