Ask a Question

Prefer a chat interface with context about you and your work?

Modeling Image Composition for Complex Scene Generation

Modeling Image Composition for Complex Scene Generation

We present a method that achieves state-of-the-art results on challenging (few-shot) layout-to-image generation tasks by accurately modeling textures, structures and relationships contained in a complex scene. After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch. Compared …