Ask a Question

Prefer a chat interface with context about you and your work?

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs' spatial perception and reasoning capabilities. SpatialRGPT advances VLMs' spatial understanding through two key innovations: …