InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

Ryota Tanaka, Taichi Iki, Kyosuke Nishida, Kuniko Saito, Jun Suzuki

Type: Preprint

Publication Date: 2024-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2401.13313

View Publication

Locations

arXiv (Cornell University) - View - PDF
DataCite API - View

Similar Works

Action	Title	Year	Authors
+	mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	2023	Jiabo Ye Anwen Hu Haiyang Xu Qinghao Ye Ming Yan Yuhao Dan Chenlin Zhao Guohai Xu Chenliang Li Junfeng Tian
+	Document Understanding Dataset and Evaluation (DUDE)	2023	Jordy Van Landeghem Rubèn Tito Łukasz Borchmann Michał Pietruszka Paweł Józiak Rafał Powalski Dawid Jurkiewicz Mickaël Coustaty Bertrand Ackaert Ernest Valveny
+ PDF Chat	HRVDA: High-Resolution Visual Document Assistant	2024	Chaohu Liu Kun Yin Haoyu Cao Xinghua Jiang Xin Li Yinsong Liu Deqiang Jiang Xing Sun Linli Xu
+	Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering	2023	Wenjin Wang Yunhao Li Yixin Ou Yin Zhang⋆
+ PDF Chat	DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models	2024	Sungnyun Kim Haofu Liao Srikar Appalaraju Peng Tang Zhuowen Tu Ravi Kumar Satzoda R. Manmatha Vijay Mahadevan Stefano Soatto
+	LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding	2023	Yanzhe Zhang Ruiyi Zhang Jiuxiang Gu Yufan Zhou Nedim Lipka Diyi Yang Tong Sun
+	Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs	2023	Yonghui Wang Wengang Zhou Hao Feng Keyi Zhou Houqiang Li
+ PDF Chat	VisualMRC: Machine Reading Comprehension on Document Images	2021	Ryota Tanaka Kyosuke Nishida Sen Yoshida
+	VisualMRC: Machine Reading Comprehension on Document Images	2021	Ryota Tanaka Kyosuke Nishida Sen Yoshida
+ PDF Chat	Instruction Makes a Difference	2024	Tosin Adewumi Nudrat Habib Lama Alkhaled Elisa Barney
+ PDF Chat	LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding	2024	Masato Fujitake
+ PDF Chat	TRINS: Towards Multimodal Language Models that Can Read	2024	Ruiyi Zhang Yanzhe Zhang Jian Chen Yufan Zhou Jiuxiang Gu Changyou Chen Tong Sun
+ PDF Chat	SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap	2023	Daehee Kim Yoonsik Kim Donghyun Kim Yumin Lim Geewook Kim Taeho Kil
+	SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap	2023	Daehee Kim Yoonsik Kim Donghyun Kim Yumin Lim Geewook Kim Taeho Kil
+ PDF Chat	DocFormerv2: Local Features for Document Understanding	2024	Srikar Appalaraju Peng Tang Qi Dong Nishant Sankaran Yichu Zhou R. Manmatha
+ PDF Chat	DocFormer: End-to-End Transformer for Document Understanding	2021	Srikar Appalaraju Bhavan Jasani Bhargava Urala Kota Yusheng Xie R. Manmatha
+	DocFormer: End-to-End Transformer for Document Understanding	2021	Srikar Appalaraju Bhavan Jasani Bhargava Urala Kota Yusheng Xie R. Manmatha
+	DocFormerv2: Local Features for Document Understanding	2023	Srikar Appalaraju Peng Tang Qi Dong Nishant Sankaran Yichu Zhou R. Manmatha
+ PDF Chat	BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations	2025	Simone Giovannini Fabio Coppini Andrea Gemelli Simone Marinai
+ PDF Chat	TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document	2024	Yuliang Liu Biao Yang Qiang Liu Zhang Li Zhiyin Ma Shuo Zhang Xiang Bai

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors