Referring to Screen Texts with Voice Assistants

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2306.07298

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding 2024 Yue Fan
Lei Ding
Ching-Chen Kuo
Shan Jiang
Yang Zhao
Xinze Guan
Jie Yang
Yi Zhang
Xin Wang
+ PDF Chat Understanding mobile GUI: From pixel-words to screen-sentences 2024 Jingwen Fu
Xiaoyi Zhang
Yuwang Wang
Wenjun Zeng
Nanning Zheng
+ MARRS: Multimodal Reference Resolution System 2023 Halim Cagri Ates
Shruti Bhargava Choubey
Site Li
Jiarui Lu
S. Maddula
Joel Ruben Antony Moniz
Anil Kumar Nalamalapu
Roman Hoang Nguyen
Melis Ozyildirim
Alkesh Patel
+ PDF Chat ScreenAI: A Vision-Language Model for UI and Infographics Understanding 2024 Gilles Baechler
Srinivas Sunkara
Maria Wang
Fedir Zubach
Hassan Mansoor
Vincent Etter
Victor Cărbune
Jason Lin
Jindong Chen
Abhanshu Sharma
+ PDF Chat ScreenAI: A Vision-Language Model for UI and Infographics Understanding 2024 Gilles Baechler
Srinivas Sunkara
Maria Wang
Fedir Zubach
Hassan Mansoor
Vincent Etter
Victor Cărbune
Jason Lin
Jindong Chen
Abhanshu Sharma
+ Understanding Mobile GUI: from Pixel-Words to Screen-Sentences 2021 Jingwen Fu
Xiaoyi Zhang
Yuwang Wang
Wenjun Zeng
Sam Yang
Grayson Hilliard
+ Lexi: Self-Supervised Learning of the UI Language 2023 Pratyay Banerjee
Shweti Mahajan
Kushal Arora
Chitta Baral
Oriana Riva
+ Lexi: Self-Supervised Learning of the UI Language 2022 Pratyay Banerjee
Shweti Mahajan
Kushal Arora
Chitta Baral
Oriana Riva
+ SeeReader: An (Almost) Eyes-Free Mobile Rich Document Viewer 2009 Scott A. Carter
Laurent Denoue
+ PDF Chat Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs 2024 Keen You
Haotian Zhang
Eldon Schoop
Floris Weers
Amanda Swearngin
Jeffrey Nichols
Yinfei Yang
Zhe Gan
+ Generating Easy-to-Understand Referring Expressions for Target Identifications 2018 M. Tanaka
Takayuki Itamochi
Kenichi Narioka
Ikuro Sato
Yoshitaka Ushiku
Tatsuya Harada
+ PDF Chat Generating Easy-to-Understand Referring Expressions for Target Identifications 2019 M. Tanaka
Takayuki Itamochi
Kenichi Narioka
Ikuro Sato
Yoshitaka Ushiku
Tatsuya Harada
+ Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements 2020 Li Yang
Gang Li
Luheng He
Jingjie Zheng
Hong Li
Zhiwei Guan
+ Towards Better Semantic Understanding of Mobile Interfaces 2022 Srinivas Sunkara
Maria Wang
Lijuan Liu
Gilles Baechler
Yu-Chung Hsiao
Jindong
Chen
Abhanshu Sharma
James W. Stout
+ Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements 2020 Yang Li
Gang Li
Luheng He
Jingjie Zheng
Hong Li
Zhiwei Guan
+ GAVIN: Gaze-Assisted Voice-Based Implicit Note-taking. 2021 Anam Ahmad Khan
Joshua Newn
Ryan Kelly
Namrata Srivastava
James Bailey
Eduardo Velloso
+ ExpressEdit: Video Editing with Natural Language and Sketching 2024 Bekzat Tilekbay
Saelyne Yang
Michal A. Lewkowicz
Alex Suryapranata
Juho Kim
+ PDF Chat ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces 2021 Zecheng He
Srinivas Sunkara
Xiaoxue Zang
Ying Xu
Lijuan Liu
Nevan Wichers
Gabriel Schubiner
Ruby Lee
Jindong Chen
+ ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces 2020 Zecheng He
Srinivas Sunkara
Xiaoxue Zang
Ying Xu
Lijuan Liu
Nevan Wichers
Gabriel Schubiner
Ruby Lee
Jindong Chen
Blaise Agüera y Arcas
+ VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models 2023 Zongjie Li
Chaozheng Wang
Chaowei Liu
Pingchuan Ma
Daoyuan Wu
Shuai Wang
Cuiyun Gao

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors