CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech
CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech
Prosody Transfer (PT) is a technique that aims to use the prosody from a source audio as a reference while synthesising speech.Fine-grained PT aims at capturing prosodic aspects like rhythm, emphasis, melody, duration, and loudness, from a source audio at a very granular level and transferring them when synthesising speech …