Linli Xu

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models 2024 Yubo Wang
Chaohu Liu
Yanqiu Qu
Haoyu Cao
Deqiang Jiang
Linli Xu
+ PDF Chat HRVDA: High-Resolution Visual Document Assistant 2024 Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
+ PDF Chat Empowering Diffusion Models on the Embedding Space for Text Generation 2024 Zhujin Gao
Junliang Guo
Xu Tan
Yongxin Zhu
Zhang Fei
Jiang Bian
Linli Xu
+ PDF Chat End-to-End Word-Level Pronunciation Assessment with MASK Pre-training 2023 Yukang Liang
Kaitao Song
Shaoguang Mao
Huiqiang Jiang
Luna Qiu
Yuqing Yang
Dongsheng Li
Linli Xu
Lili Qiu
+ PDF Chat Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA 2023 Yongxin Zhu
Zhen Liu
Yukang Liang
Xin Li
Hao Líu
Changcun Bao
Linli Xu
+ Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA 2023 Yongxin Zhu
Zhen Liu
Yukang Liang
Xin Li
Hao Líu
Changcun Bao
Linli Xu
+ End-to-End Word-Level Pronunciation Assessment with MASK Pre-training 2023 Yukang Liang
Kaitao Song
Shaoguang Mao
Huiqiang Jiang
Luna Qiu
Yuqing Yang
Dongsheng Li
Linli Xu
Lili Qiu
+ Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation 2022 Peining Zhang
Junliang Guo
Linli Xu
Mu You
Junming Yin
+ Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation 2022 Jiquan Li
Junliang Guo
Yongxin Zhu
Xin Sheng
Deqiang Jiang
Bo Ren
Linli Xu
+ Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation 2022 Zhujin Gao
Junliang Guo
Xu Tan
Yongxin Zhu
Fang Zhang
Jiang Bian
Linli Xu
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ PDF Chat VQA: Visual Question Answering 2015 Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
1
+ PDF Chat Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering 2018 Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Jay Gould
Lei Zhang
1
+ Pythia v0.1: the Winning Entry to the VQA Challenge 2018 2018 Yu Jiang
Vivek Natarajan
Xinlei Chen
Marcus Rohrbach
Dhruv Batra
Devi Parikh
1
+ PDF Chat Grounding Referring Expressions in Images by Variational Context 2018 Hanwang Zhang
Yulei Niu
Shih‐Fu Chang
1
+ PDF Chat Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression 2019 Hamid Rezatofighi
Nathan Tsoi
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
1
+ PDF Chat VizWiz Grand Challenge: Answering Visual Questions from Blind People 2018 Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
1
+ PDF Chat Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments 2018 Peter Anderson
Qi Wu
Damien Teney
Jake Bruce
Mark Johnson
Niko Sünderhauf
Ian Reid
Stephen Jay Gould
Anton van den Hengel
1
+ PDF Chat MAttNet: Modular Attention Network for Referring Expression Comprehension 2018 Licheng Yu
Zhe Lin
Xiaohui Shen
Shuicheng Yan
Xin Lu
Mohit Bansal
Tamara L. Berg
1
+ RoBERTa: A Robustly Optimized BERT Pretraining Approach 2019 Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
Mike Lewis
Luke Zettlemoyer
Veselin Stoyanov
1
+ PDF Chat Towards VQA Models That Can Read 2019 Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
1
+ PDF Chat A Fast and Accurate One-Stage Approach to Visual Grounding 2019 Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
1
+ PDF Chat Scene Text Visual Question Answering 2019 Ali Furkan Biten
Rubèn Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
C. V. Jawahar
Ernest Valveny
Dìmosthenis Karatzas
1
+ PDF Chat Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering 2018 Yash Goyal
Tejas Khot
Aishwarya Agrawal
Douglas Summers-Stay
Dhruv Batra
Devi Parikh
1
+ PDF Chat Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA 2020 Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
1
+ Mapping Natural Language Instructions to Mobile UI Action Sequences 2020 Yang Li
Jiacong He
Xin Zhou
Yuan Zhang
Jason Baldridge
1
+ PDF Chat ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network 2020 Yuliang Liu
Hao Chen
Chunhua Shen
Tong He
Lianwen Jin
Handong Wang
1
+ PDF Chat Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text 2020 Difei Gao
Ke Li
Ruiping Wang
Shiguang Shan
Xilin Chen
1
+ BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 2020 Mike Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdelrahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
1
+ PDF Chat UNITER: UNiversal Image-TExt Representation Learning 2020 Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
1
+ PDF Chat LayoutLM: Pre-training of Text and Layout for Document Image Understanding 2020 Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
1
+ PDF Chat TextCaps: A Dataset for Image Captioning with Reading Comprehension 2020 Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
1
+ PDF Chat Spatially Aware Multimodal Transformers for TextVQA 2020 Yash Kant
Dhruv Batra
Peter Anderson
Alexander G. Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
1
+ PDF Chat Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps 2021 Qi Zhu
Chenyu Gao
Peng Wang
Qi Wu
1
+ PDF Chat Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering 2020 Wei Han
Hantao Huang
Tao Han
1
+ PDF Chat MDETR - Modulated Detection for End-to-End Multi-Modal Understanding 2021 Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
1
+ SimVLM: Simple Visual Language Model Pretraining with Weak Supervision 2021 Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
1
+ PDF Chat Structured Multimodal Attentions for TextVQA 2021 Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton van den Hengel
Qi Wu
1
+ PDF Chat Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling 2021 Xiaopeng Lu
Zhen Fan
Yansen Wang
Jean Oh
Carolyn Penstein Rosé
1
+ PDF Chat TransVG: End-to-End Visual Grounding with Transformers 2021 Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wengang Zhou
Houqiang Li
1
+ Flamingo: a Visual Language Model for Few-Shot Learning 2022 Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
Arthur Mensch
Katie Millican
Malcolm Reynolds
1
+ PDF Chat LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding 2022 Jiapeng Wang
Lianwen Jin
Kai Ding
1
+ PDF Chat TAP: Text-Aware Pre-training for Text-VQA and Text-Caption 2020 Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
Dinei Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
1
+ Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 2019 Colin Raffel
Noam Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
1
+ COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images 2016 Andreas Veit
Tomáš Matera
Lukáš Neumann
Jiřı́ Matas
Serge Belongie
1
+ PDF Chat LaTr: Layout-Aware Transformer for Scene-Text VQA 2022 Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
1
+ GIT: A Generative Image-to-text Transformer for Vision and Language 2022 Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
1
+ Attention Is All You Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
1
+ PDF Chat Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 2016 Shaoqing Ren
Kaiming He
Ross Girshick
Jian Sun
1