+
PDF
Chat
|
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and
Correspondence
|
2024
|
Fuming You
Minghui Fang
Tang Li
Rongjie Huang
Yongqi Wang
Zhou Zhao
|
+
PDF
Chat
|
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
|
2024
|
Xize Cheng
Siqi Zheng
Zehan Wang
Minghui Fang
Ziang Zhang
Rongjie Huang
Ziyang Ma
Shengpeng Ji
Jialong Zuo
Tao Jin
|
+
PDF
Chat
|
MimicTalk: Mimicking a personalized and expressive 3D talking face in
minutes
|
2024
|
Zhenhui Ye
Tianyun Zhong
Yi Ren
Ziyue Karen Jiang
Jiawei Huang
Rongjie Huang
Jinglin Liu
Jinzheng He
Chen Zhang
Zehan Wang
|
+
PDF
Chat
|
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
|
2024
|
Huadai Liu
Jialei Wang
Xiangtai Li
Rongjie Huang
Yang Liu
Jiayang Xu
Haiping Hao
|
+
PDF
Chat
|
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
|
2024
|
Z.Y. Wang
Ziang Zhang
Hang Zhang
Luping Liu
Rongjie Huang
Xize Cheng
Hengshuang Zhao
Zhou Zhao
|
+
PDF
Chat
|
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
|
2024
|
Ruiqi Li
Zhiqing Hong
Yongqi Wang
Lichao Zhang
Rongjie Huang
Siqi Zheng
Zhou Zhao
|
+
PDF
Chat
|
Frieren: Efficient Video-to-Audio Generation with Rectified Flow
Matching
|
2024
|
Yongqi Wang
Wenxiang Guo
Rongjie Huang
Jiawei Huang
Zehan Wang
Fuming You
Ruiqi Li
Zhou Zhao
|
+
PDF
Chat
|
Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge
Fusion
|
2024
|
Zehan Wang
Ziang Zhang
Xize Cheng
Rongjie Huang
Luping Liu
Zhenhui Ye
Haifeng Huang
Yang Zhao
Tao Jin
Peng Gao
|
+
PDF
Chat
|
Text-to-Song: Towards Controllable Music Generation Incorporating Vocals
and Accompaniment
|
2024
|
Zhiqing Hong
Rongjie Huang
Xize Cheng
Yongqi Wang
Ruiqi Li
Fuming You
Zhou Zhao
Zhimeng Zhang
|
+
PDF
Chat
|
3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker
Verification and Diarization
|
2024
|
Yafeng Chen
Siqi Zheng
Hui Wang
Luyao Cheng
Tinglong Zhu
Changhe Song
Rongjie Huang
Ziyang Ma
Qian Chen
Shiliang Zhang
|
+
PDF
Chat
|
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
|
2024
|
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
Zhenhui Ye
Yuning Wu
Zhiqing Hong
Jiawei Huang
Jinglin Liu
|
+
PDF
Chat
|
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural
Language Prompt
|
2024
|
Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
|
+
PDF
Chat
|
Language-Codec: Reducing the Gaps Between Discrete Codec Representation
and Speech Language Models
|
2024
|
Shengpeng Ji
Minghui Fang
Ziyue Karen Jiang
Rongjie Huang
Jialung Zuo
Shulei Wang
Zhou Zhao
|
+
|
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
|
2024
|
Zhenhui Ye
Tianyun Zhong
Yi Ren
Jiaqi Yang
Weichuang Li
Jiawei Huang
Ziyue Karen Jiang
Jinzheng He
Rongjie Huang
Jinglin Liu
|
+
PDF
Chat
|
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt
|
2024
|
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
|
+
PDF
Chat
|
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
|
2023
|
Xize Cheng
Tao Jin
Rongjie Huang
Linjun Li
Lin Wang
Zehan Wang
Ye Wang
Huadai Liu
Aoxiong Yin
Zhou Zhao
|
+
|
VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement
|
2023
|
Chenye Cui
Zhou Zhao
Yi Ren
Jinglin Liu
Rongjie Huang
Feiyang Chen
Zhefeng Wang
Baoxing Huai
Fei Wu
|
+
|
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
|
2023
|
Rongjie Huang
Jiawei Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiang Yin
Zhou Zhao
|
+
|
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
|
2023
|
Dongchao Yang
Songxiang Liu
Rongjie Huang
Guangzhi Lei
Chao Weng
Helen Meng
Dong Yu
|
+
|
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
|
2023
|
Xize Cheng
Linjun Li
Tao Jin
Rongjie Huang
Lin Wang
Zehan Wang
Huangdai Liu
Ye Wang
Aoxiong Yin
Zhou Zhao
|
+
|
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
|
2023
|
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
Zhenhui Ye
Yuning Wu
Zhiqing Hong
Jiawei Huang
Jinglin Liu
|
+
|
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
|
2023
|
Zhenhui Ye
Jinzheng He
Ziyue Karen Jiang
Rongjie Huang
Jiawei Huang
Jinglin Liu
Yi Ren
Xiang Yin
Zejun Ma
Zhou Zhao
|
+
|
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
|
2023
|
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
|
+
|
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
|
2023
|
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
|
+
|
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
|
2023
|
Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
|
+
|
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
|
2023
|
Zhenhui Ye
Rongjie Huang
Yi Ren
Ziyue Karen Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
|
+
|
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
|
2023
|
Huadai Liu
Rongjie Huang
Jinzheng He
Gang Sun
Ran Shen
Xize Cheng
Zhou Zhao
|
+
|
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
|
2023
|
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
|
+
|
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
|
2023
|
Ziyue Karen Jiang
Qian Yang
Jialong Zuo
Zhenhui Ye
Rongjie Huang
Yi Ren
Zhou Zhao
|
+
|
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
|
2023
|
Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Linjun Li
Zhenhui Ye
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiang Yin
|
+
|
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
|
2023
|
Jiawei Huang
Yi Ren
Rongjie Huang
Dongchao Yang
Zhenhui Ye
Chen Zhang
Jinglin Liu
Xiang Yin
Zejun Ma
Zhou Zhao
|
+
|
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
|
2023
|
Rongjie Huang
Chunlei Zhang
Yongqi Wang
Dongchao Yang
Luping Liu
Zhenhui Ye
Ziyue Karen Jiang
Chao Weng
Zhou Zhao
Dong Yu
|
+
|
Detector Guidance for Multi-Object Text-to-Image Generation
|
2023
|
Luping Liu
Zijian Zhang
Yi Ren
Rongjie Huang
Xiang Yin
Zhou Zhao
|
+
|
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
|
2023
|
Ziyue Karen Jiang
Yi Ren
Zhenhui Ye
Jinglin Liu
Chen Zhang
Qian Yang
Shengpeng Ji
Rongjie Huang
Chunfeng Wang
Xiang Yin
|
+
|
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
|
2023
|
Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Linjun Li
Zhenhui Ye
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiang Yin
|
+
PDF
Chat
|
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
|
2023
|
Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
|
+
PDF
Chat
|
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
|
2023
|
Ziyue Karen Jiang
Qian Yang
Jialong Zuo
Zhenhui Ye
Rongjie Huang
Yi Ren
Zhou Zhao
|
+
PDF
Chat
|
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
|
2023
|
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
|
+
|
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training
|
2023
|
Zhenhui Ye
Rongjie Huang
Yi Ren
Ziyue Karen Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
|
+
|
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
|
2023
|
Yongqi Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
|
+
|
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
|
2023
|
Dongchao Yang
Jinchuan Tian
Xu Tan
Rongjie Huang
Songxiang Liu
Xuankai Chang
Jiatong Shi
Sheng Zhao
Jiang Bian
Xixin Wu
|
+
PDF
Chat
|
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
|
2023
|
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
|
+
|
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
|
2023
|
Haifeng Huang
Zehan Wang
Rongjie Huang
Luping Liu
Xize Cheng
Yang Zhao
Tao Jin
Zhou Zhao
|
+
|
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
|
2023
|
Xize Cheng
Rongjie Huang
Linjun Li
Tao Jin
Zehan Wang
Aoxiong Yin
Minglei Li
Xinyu Duan
changpeng yang
Zhou Zhao
|
+
PDF
Chat
|
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
|
2022
|
Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
Zhefeng Wang
|
+
PDF
Chat
|
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
|
2022
|
Rongjie Huang
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
Yi Ren
Zhou Zhao
|
+
|
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
|
2022
|
Rongjie Huang
Zhou Zhao
Jinglin Liu
Huadai Liu
Yi Ren
Lichao Zhang
Jinzheng He
|
+
|
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
|
2022
|
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
|
+
|
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
|
2022
|
Rongjie Huang
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
Yi Ren
Zhou Zhao
|
+
|
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
|
2022
|
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
|
+
|
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
|
2022
|
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
|
+
PDF
Chat
|
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus
|
2021
|
Rongjie Huang
Feiyang Chen
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
|
+
PDF
Chat
|
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
|
2021
|
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
|
+
PDF
Chat
|
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model
|
2021
|
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
|
+
|
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
|
2021
|
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
|
+
|
Bilateral Denoising Diffusion Models
|
2021
|
Max W. Y. Lam
Jun Wang
Rongjie Huang
Dan Su
Dong Yu
|
+
|
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus
|
2021
|
Rongjie Huang
Feiyang Chen
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
|