Ask a Question

Prefer a chat interface with context about you and your work?

Multi-modal Dense Video Captioning

Multi-modal Dense Video Captioning

Dense video captioning is a task of localizing interesting events from an untrimmed video and producing textual description (captions) for each localized event. Most of the previous works in dense video captioning are solely based on visual information and completely ignore the audio track. However, audio, and speech, in particular, …