InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

Type: Preprint

Publication Date: 2024-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2401.13313

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding 2023 Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Ming Yan
Yuhao Dan
Chenlin Zhao
Guohai Xu
Chenliang Li
Junfeng Tian
+ Document Understanding Dataset and Evaluation (DUDE) 2023 Jordy Van Landeghem
Rubèn Tito
Łukasz Borchmann
Michał Pietruszka
Paweł Józiak
Rafał Powalski
Dawid Jurkiewicz
Mickaël Coustaty
Bertrand Ackaert
Ernest Valveny
+ PDF Chat HRVDA: High-Resolution Visual Document Assistant 2024 Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
+ Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering 2023 Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang⋆
+ PDF Chat DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models 2024 Sungnyun Kim
Haofu Liao
Srikar Appalaraju
Peng Tang
Zhuowen Tu
Ravi Kumar Satzoda
R. Manmatha
Vijay Mahadevan
Stefano Soatto
+ LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding 2023 Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tong Sun
+ Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs 2023 Yonghui Wang
Wengang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
+ PDF Chat VisualMRC: Machine Reading Comprehension on Document Images 2021 Ryota Tanaka
Kyosuke Nishida
Sen Yoshida
+ VisualMRC: Machine Reading Comprehension on Document Images 2021 Ryota Tanaka
Kyosuke Nishida
Sen Yoshida
+ PDF Chat Instruction Makes a Difference 2024 Tosin Adewumi
Nudrat Habib
Lama Alkhaled
Elisa Barney
+ PDF Chat LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding 2024 Masato Fujitake
+ PDF Chat TRINS: Towards Multimodal Language Models that Can Read 2024 Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
+ PDF Chat SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap 2023 Daehee Kim
Yoonsik Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
+ SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap 2023 Daehee Kim
Yoonsik Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
+ PDF Chat DocFormerv2: Local Features for Document Understanding 2024 Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
+ PDF Chat DocFormer: End-to-End Transformer for Document Understanding 2021 Srikar Appalaraju
Bhavan Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
+ DocFormer: End-to-End Transformer for Document Understanding 2021 Srikar Appalaraju
Bhavan Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
+ DocFormerv2: Local Features for Document Understanding 2023 Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
+ PDF Chat BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations 2025 Simone Giovannini
Fabio Coppini
Andrea Gemelli
Simone Marinai
+ PDF Chat TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document 2024 Yuliang Liu
Biao Yang
Qiang Liu
Zhang Li
Zhiyin Ma
Shuo Zhang
Xiang Bai

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors