InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt
Expressive text-to-speech (TTS) aims to synthesize speech with varying speaking styles to better reflect human speech patterns. In this study, we attempt to use natural language as a style prompt to control the styles in the synthetic speech, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.</i> , "Sigh tone in full of sad mood with …