Ask a Question

Prefer a chat interface with context about you and your work?

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Text-guided video prediction (TVP) involves predicting the motion of future frames from the initial frame according to an instruction, which has wide applications in virtual reality, robotics, and content creation. Previous TVP methods make significant breakthroughs by adapting Stable Diffusion for this task. However, they struggle with frame consistency and …