Ask a Question

Prefer a chat interface with context about you and your work?

Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network

Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network

We consider the problem of referring segmentation in images and videos with natural language. Given an input image (or video) and a referring expression, the goal is to segment the entity referred by the expression in the image or video. In this paper, we propose a cross-modal self-attention (CMSA) module …