Ask a Question

Prefer a chat interface with context about you and your work?

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We identify that directly prompting LLMs often fails to generate informative and …