Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous

Type: Preprint

Publication Date: 2018-03-24

Citations: 56

View Publication

Locations

arXiv (Cornell University) - View

Similar Works

Action	Title	Year	Authors
+	Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron	2018	RJ Skerry-Ryan Eric Battenberg Ying Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss Rob Clark Rif A. Saurous
+ PDF Chat	CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech	2020	Sri Karlapati Alexis Moinet Arnaud Joly Viacheslav Klimkov Daniel Sáez-Trigueros Thomas Drugman
+	Fine-grained robust prosody transfer for single-speaker neural text-to-speech	2019	Viacheslav Klimkov Srikanth Ronanki Jonas Rohnke Thomas Drugman
+ PDF Chat	Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech	2019	Viacheslav Klimkov Srikanth Ronanki Jonas Rohnke Thomas Drugman
+	Fine-grained robust prosody transfer for single-speaker neural text-to-speech	2019	Viacheslav Klimkov Srikanth Ronanki Jonas Rohnke Thomas Drugman
+	Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis	2021	Julian Zaïdi Hugo Seuté Benjamin van Niekerk Marc-André Carbonneau
+ PDF Chat	Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis	2022	Julian Zaïdi Hugo Seuté Benjamin van Niekerk Marc-André Carbonneau
+	Robust and fine-grained prosody control of end-to-end speech synthesis	2018	Younggun Lee Taesu Kim
+ PDF Chat	Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis	2019	Younggun Lee Taesu Kim
+ PDF Chat	CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer	2022	Sri Karlapati Penny Karanasou Mateusz Łajszczak Syed Ammar Abbas Alexis Moinet Peter Makarov Ray Li Arent van Korlaar Simon Slangen Thomas Drugman
+	Cross-lingual Prosody Transfer for Expressive Machine Dubbing	2023	Jakub Świątkowski Duo Wang Mikolaj Babianski Patrick Lumban Tobing Ravichander Vipperla Vincent Pollet
+	CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer	2022	Sri Karlapati Penny Karanasou Mateusz Łajszczak Ammar Abbas Alexis Moinet Peter Makarov Ray Li Arent van Korlaar Simon Slangen Thomas Drugman
+	Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis.	2021	Julian Zaïdi Hugo Seuté Benjamin van Niekerk Marc‐André Carbonneau
+	Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows.	2021	Iván Vallés-Ṕerez Julian Roth Grzegorz Beringer Roberto Barra-Chicote Jasha Droppo
+ PDF Chat	Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows	2021	Iván Vallés-Ṕerez Julian Roth Grzegorz Beringer Roberto Barra-Chicote Jasha Droppo
+	Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows	2021	Iván Vallés-Ṕerez Julian Roth Grzegorz Beringer Roberto Barra-Chicote Jasha Droppo
+	Do Prosody Transfer Models Transfer Prosody?	2023	Atli Thor Sigurgeirsson Simon King
+	eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer	2023	Ammar Abbas Sri Karlapati Bastian Schnell Penny Karanasou Marcel Granero Moya Amith Nagaraj Ayman Boustati Nicole Peinelt Alexis Moinet Thomas Drugman
+	Expressive TTS Training with Frame and Style Reconstruction Loss	2020	Rui Liu Berrak Şişman Guanglai Gao Haizhou Li
+	Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis	2018	Daisy Stanton Yuxuan Wang RJ Skerry-Ryan

Works That Cite This (39)

Action	Title	Year	Authors
+ PDF Chat	A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities	2019	Deepali Aneja Daniel McDuff Shital Shah
+	Expressive Neural Voice Cloning	2021	Paarth Neekhara Shehzeen Hussain Shlomo Dubnov Farinaz Koushanfar Julian McAuley
+	Controllable neural text-to-speech synthesis using intuitive prosodic features	2020	Tuomo Raitio Ramya Rasipuram Dan Castellani
+	One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization	2019	Ju-Chieh Chou Cheng-chieh Yeh Hung-yi Lee
+	Towards Fine-Grained Prosody Control for Voice Conversion	2019	Zheng Lian Zhengqi Wen
+	Sample Efficient Adaptive Text-to-Speech	2018	Yutian Chen Yannis Assael Brendan Shillingford David Budden Scott Reed Heiga Zen Quan Wang Luis C. Cobo Andrew Trask Ben Laurie
+	Building a mixed-lingual neural TTS system with only monolingual data	2019	Liumeng Xue Wei Song Guanghui Xu Lei Xie Zhizheng Wu
+	Pitchtron: Towards audiobook generation from ordinary people's voices	2020	Sung‐Hee Jung Hoirin Kim
+	Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features	2019	Siddharth Gururani Kilol Gupta Dhɑvɑl Shɑh Zahra Shakeri Jervis Pinto
+ PDF Chat	Towards Fine-Grained Prosody Control for Voice Conversion	2021	Zheng Lian Rongxiu Zhong Zhengqi Wen Bin Liu Jianhua Tao

Works Cited by This (11)

Action	Title	Year	Authors
+	Generating Sequences With Recurrent Neural Networks	2013	Alex Graves
+	Deep Voice 2: Multi-Speaker Neural Text-to-Speech	2017	Sercan Ö. Arık Gregory Diamos Andrew Gibiansky J. J. Miller Kainan Peng Wei Ping Jonathan Raiman Yanqi Zhou
+	Uncovering Latent Style Factors for Expressive Speech Synthesis	2017	Yuxuan Wang RJ Skerry-Ryan Ying Xiao Daisy Stanton Joel Shor Eric Battenberg Rob Clark Rif A. Saurous
+	Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions	2017	Jonathan Shen Ruoming Pang Ron J. Weiss Mike Schuster Navdeep Jaitly Zongheng Yang Zhifeng Chen Yu Zhang Yuxuan Wang RJ Skerry-Ryan
+	On Using Backpropagation for Speech Texture Generation and Voice Conversion	2017	Jan Chorowski Ron J. Weiss Rif A. Saurous Samy Bengio
+	Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis	2018	Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Ying Xiao Fei Ren Jia Ye Rif A. Saurous
+	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	2015	Sergey Ioffe Christian Szegedy
+	Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation	2014	Kyunghyun Cho Bart van Merriënboer Çaǧlar Gülçehre Dzmitry Bahdanau Fethi Bougares Holger Schwenk Yoshua Bengio
+ PDF Chat	Tacotron: Towards End-to-End Speech Synthesis	2017	Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss Navdeep Jaitly Zongheng Yang Ying Xiao Zhifeng Chen Samy Bengio
+	Neural Discrete Representation Learning	2017	Aäron van den Oord Oriol Vinyals Koray Kavukcuoglu