Prefer a chat interface with context about you and your work?
End-to-End Video-to-Speech Synthesis Using Generative Adversarial Networks
Video-to-speech is the process of reconstructing the audio speech from a video of a spoken utterance. Previous approaches to this task have relied on a two-step process where an intermediate representation is inferred from the video and is then decoded into waveform audio using a vocoder or a waveform reconstruction …