DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot
Speech Synthesis via Direct Metric Optimization
DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot
Speech Synthesis via Direct Metric Optimization
Diffusion models have demonstrated significant potential in speech synthesis tasks, including text-to-speech (TTS) and voice cloning. However, their iterative denoising processes are inefficient and hinder the application of end-to-end optimization with perceptual metrics. In this paper, we propose a novel method of distilling TTS diffusion models with direct end-to-end evaluation …