LiRA: Learning Visual Speech Representations from Audio Through Self-Supervision
LiRA: Learning Visual Speech Representations from Audio Through Self-Supervision
The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.Recent works have focused on each of these modalities separately, while others have attempted to model both simultaneously in a cross-modal fashion.However, comparatively little attention has been given to leveraging …