Ask a Question

Prefer a chat interface with context about you and your work?

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

Text-to-video generation models have shown significant progress in the recent years. However, they still struggle with generating complex dynamic scenes based on compositional text prompts, such as attribute binding for multiple objects, temporal dynamics associated with different objects, and interactions between objects. Our key motivation is that complex tasks can …