In-Context Ensemble Improves Video-Language Models for Low-Level
Workflow Understanding from Human Demonstrations
In-Context Ensemble Improves Video-Language Models for Low-Level
Workflow Understanding from Human Demonstrations
A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow based on a video demonstration. SOPs are a crucial step toward automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation …