Modeling Multimodal Social Interactions: New Challenges and Baselines
with Densely Aligned Representations
Modeling Multimodal Social Interactions: New Challenges and Baselines
with Densely Aligned Representations
Understanding social interactions involving both verbal and non-verbal cues is essential to effectively interpret social situations. However, most prior works on multimodal social cues focus predominantly on single-person behaviors or rely on holistic visual representations that are not densely aligned to utterances in multi-party environments. They are limited in modeling …