Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

Type: Preprint

Publication Date: 2023-01-01

Citations: 2

DOI: https://doi.org/10.48550/arxiv.2310.04716

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Harnessing Webpage UIs for Text-Rich Visual Understanding 2024 Junpeng Liu
Tianyue Ou
Yifan Song
Yuzhong Qu
Wai Lam
Chenyan Xiong
Wenhu Chen
Graham Neubig
Xiang Yue
+ InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation 2023 Rongyao Fang
Shilin Yan
Zhaoyang Huang
Jingqiu Zhou
Hao Tian
Jifeng Dai
Hongsheng Li
+ PDF Chat HRVDA: High-Resolution Visual Document Assistant 2024 Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
+ Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs 2023 Yonghui Wang
Wengang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
+ ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations 2023 Yue Jiang
Eldon Schoop
Amanda Swearngin
Jeffrey Nichols
+ PDF Chat V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM 2024 Abdur Rahman
Rajat Chawla
Muskaan Kumar
Arkajit Datta
A. N. JHA
Mukunda NS
Ishaan Bhola
+ Lexi: Self-Supervised Learning of the UI Language 2023 Pratyay Banerjee
Shweti Mahajan
Kushal Arora
Chitta Baral
Oriana Riva
+ Lexi: Self-Supervised Learning of the UI Language 2022 Pratyay Banerjee
Shweti Mahajan
Kushal Arora
Chitta Baral
Oriana Riva
+ PDF Chat Aria-UI: Visual Grounding for GUI Instructions 2024 Yuhao Yang
Yue Wang
Dongxu Li
Ziyang Luo
Bei Chen
Chao Huang
Junnan Li
+ To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning 2023 Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu–Gang Jiang
+ PDF Chat ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning 2024 Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Haoran Wei
Hongyu Zhou
Jianjian Sun
Yuang Peng
Runpei Dong
Chunrui Han
+ ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning 2023 Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Haoran Wei
Hongyu Zhou
Jianjian Sun
Yuang Peng
Runpei Dong
Chunrui Han
+ PDF Chat Generative Visual Instruction Tuning 2024 Jefferson Hernandez
Ruben Villegas
Vicente Ordóñez
+ PDF Chat Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want 2024 Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
+ InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions 2024 Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
+ LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model 2023 Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
Aojun Zhou
Wei Zhang
Pan Lu
Conghui He
Xiangyu Yue
+ PDF Chat MetaMorph: Multimodal Understanding and Generation via Instruction Tuning 2024 Shengbang Tong
Daiming Fan
Jianfei Zhu
Yunyang Xiong
Xinlei Chen
Koustuv Sinha
Michael Rabbat
Yann LeCun
Saining Xie
Zhuang Liu
+ VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use 2023 Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
Ludwig Schimdt
+ Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models 2023 Geewook Kim
Hodong Lee
Daehee Kim
Haeji Jung
Sanghee Park
Yoonsik Kim
Sangdoo Yun
Taeho Kil
Bado Lee
Seunghyun Park
+ Caption Anything: Interactive Image Description with Diverse Multimodal Controls 2023 Teng Wang
Jinrui Zhang
Junjie Fei
Yixiao Ge
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
Ying Shan

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors