Expressive TTS Training with Frame and Style Reconstruction Loss

Type: Preprint

Publication Date: 2020-01-01

Citations: 19

DOI: https://doi.org/10.48550/arxiv.2008.01490

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Expressive TTS Training With Frame and Style Reconstruction Loss 2021 Rui Liu
Berrak Şişman
Guanglai Gao
Haizhou Li
+ Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis 2021 Xudong Dai
Gong Cheng
Longbiao Wang
Kaili Zhang
+ Expressive Text-to-Speech using Style Tag 2021 Minchan Kim
Sung Jun Cheon
Byoung Jin Choi
Jong Jin Kim
Nam Soo Kim
+ PDF Chat Expressive Text-to-Speech using Style Tag 2021 Minchan Kim
Sung Jun Cheon
Byoung Jin Choi
Jong Jin Kim
Nam Soo Kim
+ PDF Chat Expressive Text-to-Speech Using Style Tag 2021 Minchan Kim
Sung Jun Cheon
Byoung Jin Choi
Jong Jin Kim
Nam Soo Kim
+ CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training 2023 Zhenhui Ye
Rongjie Huang
Yi Ren
Ziyue Karen Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
+ Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron 2018 RJ Skerry-Ryan
Eric Battenberg
Ying Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
Rob Clark
Rif A. Saurous
+ Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron 2018 RJ Skerry-Ryan
Eric Battenberg
Ying Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
Rob Clark
Rif A. Saurous
+ Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis 2018 Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Ying Xiao
Fei Ren
Jia Ye
Rif A. Saurous
+ Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis 2023 Chunyu Qiang
Peng Yang
Hao Che
Ying Zhang
Xiaorui Wang
Zhongyuan Wang
+ Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows. 2021 Iván Vallés-Ṕerez
Julian Roth
Grzegorz Beringer
Roberto Barra-Chicote
Jasha Droppo
+ PDF Chat Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech 2022 Yi Ren
Ming Lei
Zhiying Huang
Shiliang Zhang
Qian Chen
Zhijie Yan
Zhou Zhao
+ ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech 2022 Yi Ren
Ming Lei
Zhiying Huang
Shiliang Zhang
Qian Chen
Zhijie Yan
Zhou Zhao
+ PDF Chat Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows 2021 Iván Vallés-Ṕerez
Julian Roth
Grzegorz Beringer
Roberto Barra-Chicote
Jasha Droppo
+ Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows 2021 Iván Vallés-Ṕerez
Julian Roth
Grzegorz Beringer
Roberto Barra-Chicote
Jasha Droppo
+ PDF Chat Style Mixture of Experts for Expressive Text-To-Speech Synthesis 2024 Ahad Jawaid
Shreeram Suresh Chandra
Junchen Lu
Berrak Şişman
+ PDF Chat CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer 2022 Sri Karlapati
Penny Karanasou
Mateusz Łajszczak
Syed Ammar Abbas
Alexis Moinet
Peter Makarov
Ray Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
+ CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer 2022 Sri Karlapati
Penny Karanasou
Mateusz Łajszczak
Ammar Abbas
Alexis Moinet
Peter Makarov
Ray Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
+ StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis 2022 Yinghao Aaron Li
Cong Han
Nima Mesgarani
+ Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS 2021 Tuomo Raitio
Jiangchuan Li
Shreyas Seshadri

Works Cited by This (33)

Action Title Year Authors
+ PDF Chat LSTM: A Search Space Odyssey 2016 Klaus Greff
Rupesh K. Srivastava
Jan Koutník
Bas R. Steunebrink
Jürgen Schmidhuber
+ Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 2015 Sergey Ioffe
Christian Szegedy
+ PDF Chat Perceptual Losses for Real-Time Style Transfer and Super-Resolution 2016 Justin Johnson
Alexandre Alahi
Li Fei-Fei
+ Curriculum Learning for Speech Emotion Recognition From Crowdsourced Labels 2019 Reza Lotfian
Carlos Busso
+ PDF Chat Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis 2018 Daisy Stanton
Yuxuan Wang
RJ Skerry-Ryan
+ PDF Chat Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis 2019 Yu-An Chung
Yuxuan Wang
Wei-Ning Hsu
Yu Zhang
RJ Skerry-Ryan
+ WaveNet: A Generative Model for Raw Audio 2016 Aäron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
Andrew Senior
Koray Kavukcuoglu
+ CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network 2019 Vincent Wan
Chun-an Chan
Tom Kenter
Jakub Vít
Rob Clark
+ An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning 2016 Guoqiang Zhong
Lina Wang
Junyu Dong
+ PDF Chat Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study 2018 Siddique Latif
Rajib Rana
Junaid Qadir
Julien Epps