Generative Semantic Communication for Text-to-Speech Synthesis

Jiahao Zheng, Jinke Ren, Peng Xu, Zhihao Yuan, Jie Xu, Fangxin Wang, Gui Gui, Shuguang Cui

Type: Preprint

Publication Date: 2024-10-04

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2410.03459

Abstract

Semantic communication is a promising technology to improve communication efficiency by transmitting only the semantic information of the source data. However, traditional semantic communication methods primarily focus on data reconstruction tasks, which may not be efficient for emerging generative tasks such as text-to-speech (TTS) synthesis. To address this limitation, this paper develops a novel generative semantic communication framework for TTS synthesis, leveraging generative artificial intelligence technologies. Firstly, we utilize a pre-trained large speech model called WavLM and the residual vector quantization method to construct two semantic knowledge bases (KBs) at the transmitter and receiver, respectively. The KB at the transmitter enables effective semantic extraction, while the KB at the receiver facilitates lifelike speech synthesis. Then, we employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead. Finally, numerical results demonstrate that our framework achieves much higher fidelity for the generated speech than four baselines, in both cases with additive white Gaussian noise channel and Rayleigh fading channel.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+ PDF Chat	Robust Semantic Communications for Speech-to-Text Translation	2024	Zhenzi Weng Zhijin Qin Xiaoming Tao
+ PDF Chat	DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation	2023	Yongxin Zhu Zhujin Gao Xinyuan Zhou Ye Zhongyi Linli Xu
+	Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis	2022	Zhenzi Weng Zhijin Qin Xiaoming Tao Chengkang Pan Guangyi Liu Geoffrey Ye Li
+ PDF Chat	Deep Learning Enabled Semantic Communications With Speech Recognition and Synthesis	2023	Zhenzi Weng Zhijin Qin Xiaoming Tao Chengkang Pan Guangyi Liu Geoffrey Ye Li
+ PDF Chat	Semantic Communication Systems for Speech Transmission	2021	Zhenzi Weng Zhijin Qin
+	Semantic Communication Systems for Speech Transmission	2021	Zhenzi Weng Zhijin Qin
+	DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation	2023	Yongxin Zhu Zhujin Gao Xinyuan Zhou Zhongyi Ye Linli Xu
+	Semantic-preserved Communication System for Highly Efficient Speech Transmission	2022	Tianxiao Han Qianqian Yang Zhiguo Shi Shibo He Zhaoyang Zhang
+	DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation	2023	Zhichao Wu Qiulin Li Sixing Liu Qun Yang
+	DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation	2024	Zhichao Wu Qiulin Li Sixing Liu Qun Yang
+ PDF Chat	Semantic Communications for Speech Recognition	2021	Zhenzi Weng Zhijin Qin Geoffrey Ye Li
+	Semantic Communications for Speech Recognition	2021	Zhenzi Weng Zhijin Qin Geoffrey Ye Li
+	SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation	2024	Dong Zhang Xin Zhang Jun Zhan Shimin Li Yaqian Zhou Xipeng Qiu
+	QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning	2023	Haohan Guo Fenglong Xie Jiawen Kang Yujia Xiao Xixin Wu Helen Meng
+	Vec-Tok Speech: speech vectorization and tokenization for neural speech generation	2023	Xinfa Zhu Yuanjun Lv Yi Lei Tao Li Wendi He Hongbin Zhou Heng Lu Lei Xie
+	Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform	2022	Masaya Kawamura Yuma Shirahata Ryuichi Yamamoto Kentaro Tachibana
+	Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform	2023	Masaya Kawamura Yuma Shirahata Ryuichi Yamamoto Kentaro Tachibana
+ PDF Chat	SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models	2024	Dongchao Yang Dingdong Wang Haohan Guo Xueyuan Chen Xixin Wu Helen Meng
+ PDF Chat	SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models	2024	Dongchao Yang Dingdong Wang Haohan Guo Xueyuan Chen Xixin Wu Helen Meng
+	Wireless Deep Speech Semantic Transmission	2022	Zixuan Xiao Shengshi Yao Jincheng Dai Sixian Wang Kai Niu Ping Zhang

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors