ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense
Concepts about Actions
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense
Concepts about Actions
Humans observe various actions being performed by other humans (physically or in videos/images) and can draw a wide range of inferences about it beyond what they can visually perceive. Such inferences include determining the aspects of the world that make action execution possible (e.g. liquid objects can undergo pouring), predicting …