Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation
Visual Question Answering on 3D Point Cloud (VQA-3D) is an emerging yet challenging field that aims at answering various types of textual questions given an entire point cloud scene. To tackle this problem, we propose the CLEVR3D, a large-scale VQA-3D dataset consisting of 171K questions from 8,771 3D scenes. Specifically, …