MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal …