Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2311.13194

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat HRVDA: High-Resolution Visual Document Assistant 2024 Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
+ Kosmos-2.5: A Multimodal Literate Model 2023 Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
Yaoyao Chang
Shaohan Huang
Wenhui Wang
Dong Li
Weiyao Luo
+ PDF Chat Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want 2024 Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
+ PDF Chat DOGE: Towards Versatile Visual Document Grounding and Referring 2024 Yinan Zhou
Yuxin Chen
Haokun Lin
Shuyu Yang
Li Zhu
Zhongang Qi
Chen Ma
Ying Shan
+ PDF Chat TRINS: Towards Multimodal Language Models that Can Read 2024 Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
+ PDF Chat LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models 2024 Ruiyi Zhang
Yufan Zhou
Jian Chen
Jiuxiang Gu
Changyou Chen
Tong Sun
+ PDF Chat TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document 2024 Yuliang Liu
Biao Yang
Qiang Liu
Zhang Li
Zhiyin Ma
Shuo Zhang
Xiang Bai
+ PDF Chat Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge 2024 Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip H. S. Torr
Lu Yuan
+ PDF Chat MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning 2024 Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
Fenglou Huang
D.K. Shah
Xianzhi Du
B. Zhang
Yanghao Li
+ LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding 2023 Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tong Sun
+ PDF Chat DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming 2024 Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
+ Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models 2023 Geewook Kim
Hodong Lee
Daehee Kim
Haeji Jung
Sanghee Park
Yoonsik Kim
Sangdoo Yun
Taeho Kil
Bado Lee
Seunghyun Park
+ InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions 2024 Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
+ BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions 2023 Wenbo Hu
Yifan Xu
Yi Li
Weiyue Li
Zeyuan Chen
Zhuowen Tu
+ PDF Chat BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions 2024 Wenbo Hu
Yifan Xu
Yi Li
Weiyue Li
Zeyuan Chen
Zhuowen Tu
+ PDF Chat Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models 2024 Chuofan Ma
Yi Jiang
Jiannan Wu
Zehuan Yuan
Xiaojuan Qi
+ UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model 2023 Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Ming Yan
Guohai Xu
Chenliang Li
Junfeng Tian
Qi Qian
Ji Zhang
+ PDF Chat MMR: Evaluating Reading Ability of Large Multimodal Models 2024 Jian Chen
Ruiyi Zhang
Yufan Zhou
Ryan A. Rossi
Jiuxiang Gu
Changyou Chen
+ PDF Chat SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension 2024 Bohao Li
Yuying Ge
Yi Chen
Yixiao Ge
Ruimao Zhang
Ying Shan
+ PDF Chat MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations 2024 Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
Xinze Li
Xinyuan Lu
Ziyu Liu
Yan Ma
Xiaoyi Dong

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors