Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition
Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition
Existing Scene Text Recognition (STR) methods typically use a language model to optimize the joint probability of the 1D character sequence predicted by a visual recognition (VR) model, which ignore the 2D spatial context of visual semantics within and between character instances, making them not generalize well to arbitrary shape …