Learning to Reason: End-to-End Module Networks for Visual Question Answering
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer “is there an equal number of balls and boxes?” we can look for balls, look for boxes, count them, and compare the results. The recently proposed …