Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

Type: Preprint

Publication Date: 2024-05-09

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2405.05841

Abstract

In text recognition, self-supervised pre-training emerges as a good solution to reduce dependence on expansive annotated real data. Previous studies primarily focus on local visual representation by leveraging mask image modeling or sequence contrastive learning. However, they omit modeling the linguistic information in text images, which is crucial for recognizing text. To simultaneously capture local character features and linguistic information in visual space, we propose Symmetric Superimposition Modeling (SSM). The objective of SSM is to reconstruct the direction-specific pixel and feature signals from the symmetrically superimposed input. Specifically, we add the original image with its inverted views to create the symmetrically superimposed inputs. At the pixel level, we reconstruct the original and inverted images to capture character shapes and texture-level linguistic context. At the feature level, we reconstruct the feature of the same original image and inverted image with different augmentations to model the semantic-level linguistic context and the local character discrimination. In our design, we disrupt the character shape and linguistic rules. Consequently, the dual-level reconstruction facilitates understanding character shapes and linguistic information from the perspective of visual texture and feature semantics. Experiments on various text recognition benchmarks demonstrate the effectiveness and generality of SSM, with 4.1% average performance gains and 86.6% new state-of-the-art average word accuracy on Union14M benchmarks.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition 2024 Zuan Gao
Yuxin Wang
Yadong Qu
Boqiang Zhang
Zixiao Wang
Jianjun Xu
Hongtao Xie
+ Self-supervised Character-to-Character Distillation for Text Recognition 2022 Tongkun Guan
Wei Shen
+ PDF Chat Self-supervised Character-to-Character Distillation for Text Recognition 2023 Tongkun Guan
Wei Shen
Xue Yang
Qi Feng
Zekun Jiang
Xiaokang Yang
+ PDF Chat VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer 2024 Humen Zhong
Zhibo Yang
Zhaohai Li
Peng Wang
Jun Tang
Wenqing Cheng
Cong Yao
+ SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization 2022 Canjie Luo
Lianwen Jin
Jingdong Chen
+ PDF Chat SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization 2022 Canjie Luo
Lianwen Jin
Jingdong Chen
+ Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition 2022 Mingkun Yang
Minghui Liao
Lu Pu
Jing Wang
Shenggao Zhu
Hualin Luo
Qi Tian
Xiang Bai
+ PDF Chat Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition 2022 Mingkun Yang
Minghui Liao
Pu Lu
Jing Wang
Shenggao Zhu
Hualin Luo
Qi Tian
Xiang Bai
+ PDF Chat Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing 2024 Yadong Qu
Yuxin Wang
Bangbang Zhou
Zixiao Wang
Hongtao Xie
Yongdong Zhang
+ I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition 2021 Chuhui Xue
Shijian Lu
Song Bai
Wenqing Zhang
Changhu Wang
+ PDF Chat Self-Supervised Implicit Glyph Attention for Text Recognition 2023 Tongkun Guan
Chaochen Gu
Jingzheng Tu
Xue Yang
Qi Feng
Yudi Zhao
Wei Shen
+ PDF Chat Portmanteauing Features for Scene Text Recognition 2022 Yew Lee Tan
Ernest Yu Kai Chew
Adams Wai‐Kin Kong
Jung‐Jae Kim
Joo‐Hwee Lim
+ Portmanteauing Features for Scene Text Recognition 2022 Yew Lee Tan
Ernest Yu Kai Chew
Adams Wai‐Kin Kong
Jung‐Jae Kim
Joo‐Hwee Lim
+ Self-supervised Implicit Glyph Attention for Text Recognition 2022 Tongkun Guan
Chaochen Gu
Jingzheng Tu
Xue Yang
Feng Qi
Yudi Zhao
Wei Shen
+ PDF Chat MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition 2024 Chang Liu
Simon Corbillé
Elisa H Barney Smith
+ PDF Chat Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition 2024 Tiancheng Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Chang Wen Chen
+ Masked and Permuted Implicit Context Learning for Scene Text Recognition 2023 Xiaomeng Yang
Zhi Qiao
Wei Jin
Yu Zhou
Ye Yuan
Zhilong Ji
Dongbao Yang
Weiping Wang
+ PDF Chat Pushing the Performance Limit of Scene Text Recognizer without Human Annotation 2022 Caiyuan Zheng
Hui Li
Seon-Min Rhee
Seungju Han
Jae‐Joon Han
Peng Wang
+ Pushing the Performance Limit of Scene Text Recognizer without Human Annotation 2022 Caiyuan Zheng
Hui Li
Seon-Min Rhee
Seungju Han
Jae-Joon Han
Peng Wang
+ Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition 2023 Da Cheng
Peng Wang
Cong Yao

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors