DocKD: Knowledge Distillation from LLMs for Open-World Document
Understanding Models
DocKD: Knowledge Distillation from LLMs for Open-World Document
Understanding Models
Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We identify that directly prompting LLMs often fails to generate informative and …