Prefer a chat interface with context about you and your work?
Spatio-Temporal Attention Models for Grounded Video Captioning