Ask a Question

Prefer a chat interface with context about you and your work?

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts

Existing benchmarks for visual question answering lack in visual grounding and complexity, particularly in evaluating spatial reasoning skills. We introduce FlowVQA, a novel benchmark aimed at assessing the capabilities of visual question-answering multimodal language models in reasoning with flowcharts as visual contexts. FlowVQA comprises 2,272 carefully generated and human-verified flowchart …