ViUniT: Visual Unit Tests for More Robust Visual Programming
ViUniT: Visual Unit Tests for More Robust Visual Programming
Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models answer correctly, they produce incorrect programs 33% of the time. These models are often right for the wrong reasons and risk unexpected failures …