Ask a Question

Prefer a chat interface with context about you and your work?

Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network

Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network

In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos. We construct a novel gated fusion network, with one particularly designed cross-gating (CG) block, to effectively encode and fuse different types of representations, e.g., …