Rongjie Huang

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence 2024 Fuming You
Minghui Fang
Tang Li
Rongjie Huang
Yongqi Wang
Zhou Zhao
+ PDF Chat OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup 2024 Xize Cheng
Siqi Zheng
Zehan Wang
Minghui Fang
Ziang Zhang
Rongjie Huang
Ziyang Ma
Shengpeng Ji
Jialong Zuo
Tao Jin
+ PDF Chat MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes 2024 Zhenhui Ye
Tianyun Zhong
Yi Ren
Ziyue Karen Jiang
Jiawei Huang
Rongjie Huang
Jinglin Liu
Jinzheng He
Chen Zhang
Zehan Wang
+ PDF Chat MEDIC: Zero-shot Music Editing with Disentangled Inversion Control 2024 Huadai Liu
Jialei Wang
Xiangtai Li
Rongjie Huang
Yang Liu
Jiayang Xu
Haiping Hao
+ PDF Chat OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces 2024 Z.Y. Wang
Ziang Zhang
Hang Zhang
Luping Liu
Rongjie Huang
Xize Cheng
Hengshuang Zhao
Zhou Zhao
+ PDF Chat Accompanied Singing Voice Synthesis with Fully Text-controlled Melody 2024 Ruiqi Li
Zhiqing Hong
Yongqi Wang
Lichao Zhang
Rongjie Huang
Siqi Zheng
Zhou Zhao
+ PDF Chat Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching 2024 Yongqi Wang
Wenxiang Guo
Rongjie Huang
Jiawei Huang
Zehan Wang
Fuming You
Ruiqi Li
Zhou Zhao
+ PDF Chat Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion 2024 Zehan Wang
Ziang Zhang
Xize Cheng
Rongjie Huang
Luping Liu
Zhenhui Ye
Haifeng Huang
Yang Zhao
Tao Jin
Peng Gao
+ PDF Chat Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment 2024 Zhiqing Hong
Rongjie Huang
Xize Cheng
Yongqi Wang
Ruiqi Li
Fuming You
Zhou Zhao
Zhimeng Zhang
+ PDF Chat 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization 2024 Yafeng Chen
Siqi Zheng
Hui Wang
Luyao Cheng
Tinglong Zhu
Changhe Song
Rongjie Huang
Ziyang Ma
Qian Chen
Shiliang Zhang
+ PDF Chat AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head 2024 Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
Zhenhui Ye
Yuning Wu
Zhiqing Hong
Jiawei Huang
Jinglin Liu
+ PDF Chat Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt 2024 Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
+ PDF Chat Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models 2024 Shengpeng Ji
Minghui Fang
Ziyue Karen Jiang
Rongjie Huang
Jialung Zuo
Shulei Wang
Zhou Zhao
+ Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis 2024 Zhenhui Ye
Tianyun Zhong
Yi Ren
Jiaqi Yang
Weichuang Li
Jiawei Huang
Ziyue Karen Jiang
Jinzheng He
Rongjie Huang
Jinglin Liu
+ PDF Chat InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt 2024 Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
+ PDF Chat MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition 2023 Xize Cheng
Tao Jin
Rongjie Huang
Linjun Li
Lin Wang
Zehan Wang
Ye Wang
Huadai Liu
Aoxiong Yin
Zhou Zhao
+ VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement 2023 Chenye Cui
Zhou Zhao
Yi Ren
Jinglin Liu
Rongjie Huang
Feiyang Chen
Zhefeng Wang
Baoxing Huai
Fei Wu
+ Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models 2023 Rongjie Huang
Jiawei Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiang Yin
Zhou Zhao
+ InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt 2023 Dongchao Yang
Songxiang Liu
Rongjie Huang
Guangzhi Lei
Chao Weng
Helen Meng
Dong Yu
+ MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition 2023 Xize Cheng
Linjun Li
Tao Jin
Rongjie Huang
Lin Wang
Zehan Wang
Huangdai Liu
Ye Wang
Aoxiong Yin
Zhou Zhao
+ AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head 2023 Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
Zhenhui Ye
Yuning Wu
Zhiqing Hong
Jiawei Huang
Jinglin Liu
+ GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation 2023 Zhenhui Ye
Jinzheng He
Ziyue Karen Jiang
Rongjie Huang
Jiawei Huang
Jinglin Liu
Yi Ren
Xiang Yin
Zejun Ma
Zhou Zhao
+ HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec 2023 Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
+ AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment 2023 Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
+ RMSSinger: Realistic-Music-Score based Singing Voice Synthesis 2023 Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
+ CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training 2023 Zhenhui Ye
Rongjie Huang
Yi Ren
Ziyue Karen Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
+ Wav2SQL: Direct Generalizable Speech-To-SQL Parsing 2023 Huadai Liu
Rongjie Huang
Jinzheng He
Gang Sun
Ran Shen
Xize Cheng
Zhou Zhao
+ ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer 2023 Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
+ FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models 2023 Ziyue Karen Jiang
Qian Yang
Jialong Zuo
Zhenhui Ye
Rongjie Huang
Yi Ren
Zhou Zhao
+ AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation 2023 Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Linjun Li
Zhenhui Ye
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiang Yin
+ Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation 2023 Jiawei Huang
Yi Ren
Rongjie Huang
Dongchao Yang
Zhenhui Ye
Chen Zhang
Jinglin Liu
Xiang Yin
Zejun Ma
Zhou Zhao
+ Make-A-Voice: Unified Voice Synthesis With Discrete Representation 2023 Rongjie Huang
Chunlei Zhang
Yongqi Wang
Dongchao Yang
Luping Liu
Zhenhui Ye
Ziyue Karen Jiang
Chao Weng
Zhou Zhao
Dong Yu
+ Detector Guidance for Multi-Object Text-to-Image Generation 2023 Luping Liu
Zijian Zhang
Yi Ren
Rongjie Huang
Xiang Yin
Zhou Zhao
+ Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias 2023 Ziyue Karen Jiang
Yi Ren
Zhenhui Ye
Jinglin Liu
Chen Zhang
Qian Yang
Shengpeng Ji
Rongjie Huang
Chunfeng Wang
Xiang Yin
+ AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation 2023 Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Linjun Li
Zhenhui Ye
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiang Yin
+ PDF Chat RMSSinger: Realistic-Music-Score based Singing Voice Synthesis 2023 Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
+ PDF Chat FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models 2023 Ziyue Karen Jiang
Qian Yang
Jialong Zuo
Zhenhui Ye
Rongjie Huang
Yi Ren
Zhou Zhao
+ PDF Chat AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment 2023 Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
+ CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training 2023 Zhenhui Ye
Rongjie Huang
Yi Ren
Ziyue Karen Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
+ Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer 2023 Yongqi Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
+ UniAudio: An Audio Foundation Model Toward Universal Audio Generation 2023 Dongchao Yang
Jinchuan Tian
Xu Tan
Rongjie Huang
Songxiang Liu
Xuankai Chang
Jiatong Shi
Sheng Zhao
Jiang Bian
Xixin Wu
+ PDF Chat ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer 2023 Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
+ Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers 2023 Haifeng Huang
Zehan Wang
Rongjie Huang
Luping Liu
Xize Cheng
Yang Zhao
Tao Jin
Zhou Zhao
+ TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation 2023 Xize Cheng
Rongjie Huang
Linjun Li
Tao Jin
Zehan Wang
Aoxiong Yin
Minglei Li
Xinyu Duan
changpeng yang
Zhou Zhao
+ PDF Chat SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation 2022 Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
Zhefeng Wang
+ PDF Chat FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis 2022 Rongjie Huang
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
Yi Ren
Zhou Zhao
+ TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation 2022 Rongjie Huang
Zhou Zhao
Jinglin Liu
Huadai Liu
Yi Ren
Lichao Zhang
Jinzheng He
+ ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech 2022 Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
+ FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis 2022 Rongjie Huang
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
Yi Ren
Zhou Zhao
+ GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech 2022 Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
+ VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement 2022 Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
+ PDF Chat Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus 2021 Rongjie Huang
Feiyang Chen
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
+ PDF Chat EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model 2021 Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
+ PDF Chat EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model 2021 Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
+ EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model 2021 Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
+ Bilateral Denoising Diffusion Models 2021 Max W. Y. Lam
Jun Wang
Rongjie Huang
Dan Su
Dong Yu
+ Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus 2021 Rongjie Huang
Feiyang Chen
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ FastSpeech 2: Fast and High-Quality End-to-End Text to Speech 2020 Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie‐Yan Liu
9
+ HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis 2020 Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
9
+ Attention Is All You Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
8
+ FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis 2022 Rongjie Huang
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
Yi Ren
Zhou Zhao
7
+ PDF Chat Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus 2021 Rongjie Huang
Feiyang Chen
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
6
+ ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech 2022 Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
5
+ WaveNet: A Generative Model for Raw Audio 2016 Aäron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alexander Graves
Nal Kalchbrenner
Andrew Senior
Koray Kavukcuoglu
5
+ Denoising Diffusion Probabilistic Models 2020 Jonathan Ho
Ajay N. Jain
Pieter Abbeel
5
+ DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism 2021 Jinglin Liu
Chengxi Li
Yi Ren
Feiyang Chen
Zhou Zhao
5
+ FastSpeech: Fast, Robust and Controllable Text to Speech 2019 Yi Ren
Yangjun Ruan
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie‐Yan Liu
5
+ wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations 2020 Alexei Baevski
Henry Zhou
Abdelrahman Mohamed
Michael Auli
5
+ PDF Chat Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram 2020 Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
5
+ PDF Chat Waveglow: A Flow-based Generative Network for Speech Synthesis 2019 Ryan Prenger
Rafael Valle
Bryan Catanzaro
5
+ TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation 2022 Rongjie Huang
Zhou Zhao
Jinglin Liu
Huadai Liu
Yi Ren
Lichao Zhang
Jinzheng He
5
+ GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech 2022 Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
5
+ PDF Chat SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation 2022 Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
Zhefeng Wang
4
+ PDF Chat Tacotron: Towards End-to-End Speech Synthesis 2017 Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Ying Xiao
Zhifeng Chen
Samy Bengio
4
+ Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search 2020 Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
4
+ PDF Chat DeepSinger: Singing Voice Synthesis with Data Mined From the Web 2020 Yi Ren
Xu Tan
Tao Qin
Jian Luan
Zhou Zhao
Tie‐Yan Liu
4
+ End-to-End Adversarial Text-to-Speech 2020 Jeff Donahue
Sander Dieleman
Mikołaj Bińkowski
Erich Elsen
Karen Simonyan
4
+ HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units 2021 Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdelrahman Mohamed
4
+ PDF Chat Revisiting Over-Smoothness in Text to Speech 2022 Yi Ren
Xu Tan
Tao Qin
Zhou Zhao
Tie‐Yan Liu
3
+ PDF Chat Conformer: Convolution-augmented Transformer for Speech Recognition 2020 Anmol Gulati
James Qin
Chung‐Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
3
+ PDF Chat Diff-TTS: A Denoising Diffusion Model for Text-to-Speech 2021 Myeonghun Jeong
Hyeongju Kim
Sung Jun Cheon
Byoung Jin Choi
Nam Soo Kim
3
+ Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech 2021 Popov Va
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
3
+ PDF Chat Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech 2021 Jaehyeon Kim
Jungil Kong
Juhee Son
3
+ WaveGrad: Estimating Gradients for Waveform Generation 2020 Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
William Chan
3
+ WaveNet: A Generative Model for Raw Audio 2016 Aäron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
Andrew Senior
Koray Kavukcuoglu
3
+ PDF Chat Multi-Band Melgan: Faster Waveform Generation For High-Quality Text-To-Speech 2021 Geng Yang
Shan Yang
Kai Liu
Peng Fang
Wei Chen
Lei Xie
3
+ DiffWave: A Versatile Diffusion Model for Audio Synthesis 2020 Zhifeng Kong
Wei Ping
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
3
+ DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis. 2021 Jinglin Liu
Chengxi Li
Yi Ren
Feiyang Chen
Peng Liu
Zhou Zhao
3
+ AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head 2023 Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
Zhenhui Ye
Yuning Wu
Zhiqing Hong
Jiawei Huang
Jinglin Liu
3
+ Diffusion Models Beat GANs on Image Synthesis 2021 Prafulla Dhariwal
Alex Nichol
3
+ PDF Chat Image-to-Image Translation with Conditional Adversarial Networks 2017 Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
3
+ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018 Jacob Devlin
Ming‐Wei Chang
Kenton Lee
Kristina Toutanova
3
+ MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis 2019 Kundan Kumar
Rithesh Kumar
T. de Boissière
Lucas Gestin
Wei Zhen Teoh
Jose Sotelo
Alexandre de Brébisson
Yoshua Bengio
Aaron Courville
3
+ Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models 2023 Rongjie Huang
Jiawei Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiang Yin
Zhou Zhao
3
+ LRS3-TED: a large-scale dataset for visual speech recognition 2018 Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
2
+ PDF Chat Deep Audio-Visual Speech Recognition 2018 Triantafyllos Afouras
Joon Son Chung
Andrew Senior
Oriol Vinyals
Andrew Zisserman
2
+ Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction 2022 Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdelrahman Mohamed
2
+ Bilateral Denoising Diffusion Models 2021 Max W. Y. Lam
Jun Wang
Rongjie Huang
Dan Su
Dong Yu
2
+ Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation 2021 Dongchan Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
2
+ A Survey on Neural Speech Synthesis 2021 Xu Tan
Tao Qin
Frank K. Soong
Tie‐Yan Liu
2
+ PDF Chat HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units 2021 Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdelrahman Mohamed
2
+ FastSpeech 2: Fast and High-Quality End-to-End Text to Speech 2020 Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie‐Yan Liu
2
+ PDF Chat Generalized End-to-End Loss for Speaker Verification 2018 Li Wan
Quan Wang
Alan Papir
Ignacio López Moreno
2
+ PDF Chat ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders 2021 Yu Gu
Xiang Yin
Yonghui Rao
Yuan Wan
Benlai Tang
Yang Zhang
Jitong Chen
Yuxuan Wang
Zejun Ma
2
+ VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation 2021 Changhan Wang
Morgane Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
Juan Pino
Emmanuel Dupoux
2
+ Deep Unsupervised Learning using Nonequilibrium Thermodynamics 2015 Jascha Sohl‐Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
2
+ Denoising Diffusion Implicit Models 2020 Jiaming Song
Chenlin Meng
Stefano Ermon
2