Ask a Question

Prefer a chat interface with context about you and your work?

X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

3D dense captioning aims to describe individual objects in 3D scenes by natural language, where 3D scenes are usually represented as RGB-D scans or point clouds. However, only exploiting single modal information, e.g., point cloud, previous approaches fail to produce faithful descriptions. Though aggregating 2D features into point clouds may …