Ask a Question

Prefer a chat interface with context about you and your work?

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset. Our model is based on the encoder--decoder pipeline, popular in image and video captioning systems. We propose to utilize two different kinds of video features, one to capture the …