Generative Semantic Communication for Text-to-Speech Synthesis

Type: Preprint

Publication Date: 2024-10-04

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2410.03459

Abstract

Semantic communication is a promising technology to improve communication efficiency by transmitting only the semantic information of the source data. However, traditional semantic communication methods primarily focus on data reconstruction tasks, which may not be efficient for emerging generative tasks such as text-to-speech (TTS) synthesis. To address this limitation, this paper develops a novel generative semantic communication framework for TTS synthesis, leveraging generative artificial intelligence technologies. Firstly, we utilize a pre-trained large speech model called WavLM and the residual vector quantization method to construct two semantic knowledge bases (KBs) at the transmitter and receiver, respectively. The KB at the transmitter enables effective semantic extraction, while the KB at the receiver facilitates lifelike speech synthesis. Then, we employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead. Finally, numerical results demonstrate that our framework achieves much higher fidelity for the generated speech than four baselines, in both cases with additive white Gaussian noise channel and Rayleigh fading channel.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Robust Semantic Communications for Speech-to-Text Translation 2024 Zhenzi Weng
Zhijin Qin
Xiaoming Tao
+ PDF Chat DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation 2023 Yongxin Zhu
Zhujin Gao
Xinyuan Zhou
Ye Zhongyi
Linli Xu
+ Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis 2022 Zhenzi Weng
Zhijin Qin
Xiaoming Tao
Chengkang Pan
Guangyi Liu
Geoffrey Ye Li
+ PDF Chat Deep Learning Enabled Semantic Communications With Speech Recognition and Synthesis 2023 Zhenzi Weng
Zhijin Qin
Xiaoming Tao
Chengkang Pan
Guangyi Liu
Geoffrey Ye Li
+ PDF Chat Semantic Communication Systems for Speech Transmission 2021 Zhenzi Weng
Zhijin Qin
+ Semantic Communication Systems for Speech Transmission 2021 Zhenzi Weng
Zhijin Qin
+ DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation 2023 Yongxin Zhu
Zhujin Gao
Xinyuan Zhou
Zhongyi Ye
Linli Xu
+ Semantic-preserved Communication System for Highly Efficient Speech Transmission 2022 Tianxiao Han
Qianqian Yang
Zhiguo Shi
Shibo He
Zhaoyang Zhang
+ DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation 2023 Zhichao Wu
Qiulin Li
Sixing Liu
Qun Yang
+ DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation 2024 Zhichao Wu
Qiulin Li
Sixing Liu
Qun Yang
+ PDF Chat Semantic Communications for Speech Recognition 2021 Zhenzi Weng
Zhijin Qin
Geoffrey Ye Li
+ Semantic Communications for Speech Recognition 2021 Zhenzi Weng
Zhijin Qin
Geoffrey Ye Li
+ SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation 2024 Dong Zhang
Xin Zhang
Jun Zhan
Shimin Li
Yaqian Zhou
Xipeng Qiu
+ QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning 2023 Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen Meng
+ Vec-Tok Speech: speech vectorization and tokenization for neural speech generation 2023 Xinfa Zhu
Yuanjun Lv
Yi Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
+ Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform 2022 Masaya Kawamura
Yuma Shirahata
Ryuichi Yamamoto
Kentaro Tachibana
+ Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform 2023 Masaya Kawamura
Yuma Shirahata
Ryuichi Yamamoto
Kentaro Tachibana
+ PDF Chat SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models 2024 Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen Meng
+ PDF Chat SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models 2024 Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen Meng
+ Wireless Deep Speech Semantic Transmission 2022 Zixuan Xiao
Shengshi Yao
Jincheng Dai
Sixian Wang
Kai Niu
Ping Zhang

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors