End-to-End Audiovisual Fusion with LSTMs
End-to-End Audiovisual Fusion with LSTMs
Several end-to-end deep learning approaches have been recently presented which simultaneously extract visual features from the input images and perform visual speech classification.However, research on jointly extracting audio and visual features and performing classification is very limited.In this work, we present an end-to-end audiovisual model based on Bidirectional Long Short-Term …