Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

Type: Preprint

Publication Date: 2018-03-24

Citations: 56

Locations

  • arXiv (Cornell University) - View

Similar Works

Action Title Year Authors
+ Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron 2018 RJ Skerry-Ryan
Eric Battenberg
Ying Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
Rob Clark
Rif A. Saurous
+ PDF Chat CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech 2020 Sri Karlapati
Alexis Moinet
Arnaud Joly
Viacheslav Klimkov
Daniel Sáez-Trigueros
Thomas Drugman
+ Fine-grained robust prosody transfer for single-speaker neural text-to-speech 2019 Viacheslav Klimkov
Srikanth Ronanki
Jonas Rohnke
Thomas Drugman
+ PDF Chat Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech 2019 Viacheslav Klimkov
Srikanth Ronanki
Jonas Rohnke
Thomas Drugman
+ Fine-grained robust prosody transfer for single-speaker neural text-to-speech 2019 Viacheslav Klimkov
Srikanth Ronanki
Jonas Rohnke
Thomas Drugman
+ Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis 2021 Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
Marc-André Carbonneau
+ PDF Chat Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis 2022 Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
Marc-André Carbonneau
+ Robust and fine-grained prosody control of end-to-end speech synthesis 2018 Younggun Lee
Taesu Kim
+ PDF Chat Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis 2019 Younggun Lee
Taesu Kim
+ PDF Chat CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer 2022 Sri Karlapati
Penny Karanasou
Mateusz Łajszczak
Syed Ammar Abbas
Alexis Moinet
Peter Makarov
Ray Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
+ Cross-lingual Prosody Transfer for Expressive Machine Dubbing 2023 Jakub Świątkowski
Duo Wang
Mikolaj Babianski
Patrick Lumban Tobing
Ravichander Vipperla
Vincent Pollet
+ CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer 2022 Sri Karlapati
Penny Karanasou
Mateusz Łajszczak
Ammar Abbas
Alexis Moinet
Peter Makarov
Ray Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
+ Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis. 2021 Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
Marc‐André Carbonneau
+ Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows. 2021 Iván Vallés-Ṕerez
Julian Roth
Grzegorz Beringer
Roberto Barra-Chicote
Jasha Droppo
+ PDF Chat Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows 2021 Iván Vallés-Ṕerez
Julian Roth
Grzegorz Beringer
Roberto Barra-Chicote
Jasha Droppo
+ Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows 2021 Iván Vallés-Ṕerez
Julian Roth
Grzegorz Beringer
Roberto Barra-Chicote
Jasha Droppo
+ Do Prosody Transfer Models Transfer Prosody? 2023 Atli Thor Sigurgeirsson
Simon King
+ eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer 2023 Ammar Abbas
Sri Karlapati
Bastian Schnell
Penny Karanasou
Marcel Granero Moya
Amith Nagaraj
Ayman Boustati
Nicole Peinelt
Alexis Moinet
Thomas Drugman
+ Expressive TTS Training with Frame and Style Reconstruction Loss 2020 Rui Liu
Berrak Şişman
Guanglai Gao
Haizhou Li
+ Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis 2018 Daisy Stanton
Yuxuan Wang
RJ Skerry-Ryan

Works Cited by This (11)

Action Title Year Authors
+ Generating Sequences With Recurrent Neural Networks 2013 Alex Graves
+ Deep Voice 2: Multi-Speaker Neural Text-to-Speech 2017 Sercan Ö. Arık
Gregory Diamos
Andrew Gibiansky
J. J. Miller
Kainan Peng
Wei Ping
Jonathan Raiman
Yanqi Zhou
+ Uncovering Latent Style Factors for Expressive Speech Synthesis 2017 Yuxuan Wang
RJ Skerry-Ryan
Ying Xiao
Daisy Stanton
Joel Shor
Eric Battenberg
Rob Clark
Rif A. Saurous
+ Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions 2017 Jonathan Shen
Ruoming Pang
Ron J. Weiss
Mike Schuster
Navdeep Jaitly
Zongheng Yang
Zhifeng Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
+ On Using Backpropagation for Speech Texture Generation and Voice Conversion 2017 Jan Chorowski
Ron J. Weiss
Rif A. Saurous
Samy Bengio
+ Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis 2018 Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Ying Xiao
Fei Ren
Jia Ye
Rif A. Saurous
+ Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 2015 Sergey Ioffe
Christian Szegedy
+ Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation 2014 Kyunghyun Cho
Bart van Merriënboer
Çaǧlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
+ PDF Chat Tacotron: Towards End-to-End Speech Synthesis 2017 Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Ying Xiao
Zhifeng Chen
Samy Bengio
+ Neural Discrete Representation Learning 2017 Aäron van den Oord
Oriol Vinyals
Koray Kavukcuoglu