DOrA: 3D Visual Grounding with Order-Aware Referring
DOrA: 3D Visual Grounding with Order-Aware Referring
3D visual grounding aims to identify the target object within a 3D point cloud scene referred to by a natural language description. While previous works attempt to exploit the verbo-visual relation with proposed cross-modal transformers, unstructured natural utterances and scattered objects might lead to undesirable performances. In this paper, we …