Consistent Multimodal Generation via A Unified GAN Framework
Consistent Multimodal Generation via A Unified GAN Framework
We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in …