Speaker and Style Disentanglement of Speech Based on Contrastive
Predictive Coding Supported Factorized Variational Autoencoder
Speaker and Style Disentanglement of Speech Based on Contrastive
Predictive Coding Supported Factorized Variational Autoencoder
Speech signals encompass various information across multiple levels including content, speaker, and style. Disentanglement of these information, although challenging, is important for applications such as voice conversion. The contrastive predictive coding supported factorized variational autoencoder achieves unsupervised disentanglement of a speech signal into speaker and content embeddings by assuming speaker …