Expressive TTS Training with Frame and Style Reconstruction Loss

Rui Liu, Berrak Şişman, Guanglai Gao, Haizhou Li

Type: Preprint

Publication Date: 2020-01-01

Citations: 19

DOI: https://doi.org/10.48550/arxiv.2008.01490

View Publication

Locations

arXiv (Cornell University) - View - PDF
DataCite API - View

Similar Works

Action	Title	Year	Authors
+ PDF Chat	Expressive TTS Training With Frame and Style Reconstruction Loss	2021	Rui Liu Berrak Şişman Guanglai Gao Haizhou Li
+	Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis	2021	Xudong Dai Gong Cheng Longbiao Wang Kaili Zhang
+	Expressive Text-to-Speech using Style Tag	2021	Minchan Kim Sung Jun Cheon Byoung Jin Choi Jong Jin Kim Nam Soo Kim
+ PDF Chat	Expressive Text-to-Speech using Style Tag	2021	Minchan Kim Sung Jun Cheon Byoung Jin Choi Jong Jin Kim Nam Soo Kim
+ PDF Chat	Expressive Text-to-Speech Using Style Tag	2021	Minchan Kim Sung Jun Cheon Byoung Jin Choi Jong Jin Kim Nam Soo Kim
+	CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training	2023	Zhenhui Ye Rongjie Huang Yi Ren Ziyue Karen Jiang Jinglin Liu Jinzheng He Xiang Yin Zhou Zhao
+	Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron	2018	RJ Skerry-Ryan Eric Battenberg Ying Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss Rob Clark Rif A. Saurous
+	Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron	2018	RJ Skerry-Ryan Eric Battenberg Ying Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss Rob Clark Rif A. Saurous
+	Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis	2018	Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Ying Xiao Fei Ren Jia Ye Rif A. Saurous
+	Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis	2023	Chunyu Qiang Peng Yang Hao Che Ying Zhang Xiaorui Wang Zhongyuan Wang
+	Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows.	2021	Iván Vallés-Ṕerez Julian Roth Grzegorz Beringer Roberto Barra-Chicote Jasha Droppo
+ PDF Chat	Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech	2022	Yi Ren Ming Lei Zhiying Huang Shiliang Zhang Qian Chen Zhijie Yan Zhou Zhao
+	ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech	2022	Yi Ren Ming Lei Zhiying Huang Shiliang Zhang Qian Chen Zhijie Yan Zhou Zhao
+ PDF Chat	Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows	2021	Iván Vallés-Ṕerez Julian Roth Grzegorz Beringer Roberto Barra-Chicote Jasha Droppo
+	Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows	2021	Iván Vallés-Ṕerez Julian Roth Grzegorz Beringer Roberto Barra-Chicote Jasha Droppo
+ PDF Chat	Style Mixture of Experts for Expressive Text-To-Speech Synthesis	2024	Ahad Jawaid Shreeram Suresh Chandra Junchen Lu Berrak Şişman
+ PDF Chat	CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer	2022	Sri Karlapati Penny Karanasou Mateusz Łajszczak Syed Ammar Abbas Alexis Moinet Peter Makarov Ray Li Arent van Korlaar Simon Slangen Thomas Drugman
+	CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer	2022	Sri Karlapati Penny Karanasou Mateusz Łajszczak Ammar Abbas Alexis Moinet Peter Makarov Ray Li Arent van Korlaar Simon Slangen Thomas Drugman
+	StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis	2022	Yinghao Aaron Li Cong Han Nima Mesgarani
+	Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS	2021	Tuomo Raitio Jiangchuan Li Shreyas Seshadri

Works That Cite This (15)

Action	Title	Year	Authors
+	A Survey on Neural Speech Synthesis	2021	Xu Tan Tao Qin Frank K. Soong Tie‐Yan Liu
+	Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training	2021	Kun Zhou Berrak Şişman Haizhou Li
+ PDF Chat	Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech	2021	Kun Zhou Berrak Şişman Haizhou Li
+	NatiQ: An End-to-end Text-to-Speech System for Arabic	2022	Ahmed Abdelalí Nadir Durrani Cenk Demiroğlu Fahim Dalvi Hamdy Mubarak Kareem Darwish
+	Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset	2020	Kun Zhou Berrak Şişman Rui Liu Haizhou Li
+	Emotional Voice Conversion: Theory, Databases and ESD	2021	Kun Zhou Berrak Şişman Rui Liu Haizhou Li
+	GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis	2020	Rui Liu Berrak Şişman Haizhou Li
+ PDF Chat	An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning	2020	Berrak Şişman Junichi Yamagishi Simon King Haizhou Li
+	Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows.	2021	Iván Vallés-Ṕerez Julian Roth Grzegorz Beringer Roberto Barra-Chicote Jasha Droppo
+	An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning	2020	Berrak Şişman Junichi Yamagishi Simon King Haizhou Li

Works Cited by This (33)

Action	Title	Year	Authors
+ PDF Chat	LSTM: A Search Space Odyssey	2016	Klaus Greff Rupesh K. Srivastava Jan Koutník Bas R. Steunebrink Jürgen Schmidhuber
+	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	2015	Sergey Ioffe Christian Szegedy
+ PDF Chat	Perceptual Losses for Real-Time Style Transfer and Super-Resolution	2016	Justin Johnson Alexandre Alahi Li Fei-Fei
+	Curriculum Learning for Speech Emotion Recognition From Crowdsourced Labels	2019	Reza Lotfian Carlos Busso
+ PDF Chat	Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis	2018	Daisy Stanton Yuxuan Wang RJ Skerry-Ryan
+ PDF Chat	Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis	2019	Yu-An Chung Yuxuan Wang Wei-Ning Hsu Yu Zhang RJ Skerry-Ryan
+	WaveNet: A Generative Model for Raw Audio	2016	Aäron van den Oord Sander Dieleman Heiga Zen Karen Simonyan Oriol Vinyals Alex Graves Nal Kalchbrenner Andrew Senior Koray Kavukcuoglu
+	CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network	2019	Vincent Wan Chun-an Chan Tom Kenter Jakub Vít Rob Clark
+	An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning	2016	Guoqiang Zhong Lina Wang Junyu Dong
+ PDF Chat	Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study	2018	Siddique Latif Rajib Rana Junaid Qadir Julien Epps