Ask a Question

Prefer a chat interface with context about you and your work?

DocFormer: End-to-End Transformer for Document Understanding

DocFormer: End-to-End Transformer for Document Understanding

We present DocFormer - a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is a challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and layouts. In addition, DocFormer is pre-trained in an unsupervised fashion using carefully designed tasks which …