Event-Guided Procedure Planning from Instructional Videos with Text Supervision
Event-Guided Procedure Planning from Instructional Videos with Text Supervision
In this work, we focus on the task of procedure planning from instructional videos with text supervision, where a model aims to predict an action sequence to transform the initial visual state into the goal visual state. A critical challenge of this task is the large semantic gap between observed …