Ask a Question

Prefer a chat interface with context about you and your work?

Deep multimodal learning for Audio-Visual Speech Recognition

Deep multimodal learning for Audio-Visual Speech Recognition

In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep …