AID: Adapting Image2Video Diffusion Models for Instruction-guided Video
Prediction
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video
Prediction
Text-guided video prediction (TVP) involves predicting the motion of future frames from the initial frame according to an instruction, which has wide applications in virtual reality, robotics, and content creation. Previous TVP methods make significant breakthroughs by adapting Stable Diffusion for this task. However, they struggle with frame consistency and …