One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers.However, such model suffers from the limitation that it can only convert the voice to the speakers in the training data, …