Ask a Question

Prefer a chat interface with context about you and your work?

SVTS: Scalable Video-to-Speech Synthesis

SVTS: Scalable Video-to-Speech Synthesis

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.This task has received an increasing amount of attention due to its self-supervised nature (i.e., can be trained without manual labelling) combined with the ever-growing collection of audio-visual data available online.Despite these strong …