Ask a Question

Prefer a chat interface with context about you and your work?

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation

Text-based video segmentation aims to segment the target object in a video based on a describing sentence. Incorporating motion information from optical flow maps with appearance and linguistic modalities is crucial yet has been largely ignored by previous work. In this paper, we design a method to fuse and align …