+
PDF
Chat
|
Asymptotic Distribution of Coordinates on High Dimensional Spheres
|
2007
|
M. C. Spruill
|
1
|
+
PDF
Chat
|
Deep Residual Learning for Image Recognition
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
1
|
+
PDF
Chat
|
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
|
2017
|
Yash Goyal
Tejas Khot
Douglas Summers-Stay
Dhruv Batra
Devi Parikh
|
1
|
+
|
BERTScore: Evaluating Text Generation with BERT
|
2019
|
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
|
1
|
+
|
Know What You Don’t Know: Unanswerable Questions for SQuAD
|
2018
|
Pranav Rajpurkar
Robin Jia
Percy Liang
|
1
|
+
PDF
Chat
|
DVQA: Understanding Data Visualizations via Question Answering
|
2018
|
Kushal Kafle
Brian Price
Scott Cohen
Christopher Kanan
|
1
|
+
PDF
Chat
|
VizWiz Grand Challenge: Answering Visual Questions from Blind People
|
2018
|
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
|
1
|
+
|
Compositional Semantic Parsing on Semi-Structured Tables
|
2015
|
Panupong Pasupat
Percy Liang
|
1
|
+
PDF
Chat
|
Towards VQA Models That Can Read
|
2019
|
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
|
1
|
+
PDF
Chat
|
Scene Text Visual Question Answering
|
2019
|
Ali Furkan Biten
Rubèn Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
C. V. Jawahar
Ernest Valveny
Dìmosthenis Karatzas
|
1
|
+
PDF
Chat
|
Multilingual Denoising Pre-training for Neural Machine Translation
|
2020
|
Yinhan Liu
Jiatao Gu
Naman Goyal
Xian Li
Sergey Edunov
Marjan Ghazvininejad
Mike Lewis
Luke Zettlemoyer
|
1
|
+
|
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
|
2020
|
Mike Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdelrahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
|
1
|
+
|
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
|
2020
|
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
Georg Heigold
Sylvain Gelly
|
1
|
+
|
Dense Passage Retrieval for Open-Domain Question Answering
|
2020
|
Vladimir Karpukhin
Barlas Oğuz
Sewon Min
Patrick Lewis
Ledell Wu
Sergey Edunov
Danqi Chen
Wen-tau Yih
|
1
|
+
PDF
Chat
|
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
|
2020
|
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
|
1
|
+
PDF
Chat
|
Curriculum Learning: A Survey
|
2022
|
Petru Soviany
Radu Tudor Ionescu
Paolo Rota
Nicu Sebe
|
1
|
+
PDF
Chat
|
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
|
2021
|
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
Baining Guo
|
1
|
+
|
Learning Transferable Visual Models From Natural Language Supervision
|
2021
|
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
|
1
|
+
PDF
Chat
|
VisualMRC: Machine Reading Comprehension on Document Images
|
2021
|
Ryota Tanaka
Kyosuke Nishida
Sen Yoshida
|
1
|
+
PDF
Chat
|
DocFormer: End-to-End Transformer for Document Understanding
|
2021
|
Srikar Appalaraju
Bhavan Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
|
1
|
+
|
LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding
|
2021
|
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
Guoxin Wang
Yijuan Lu
Dinei Florêncio
Cha Zhang
Wanxiang Che
|
1
|
+
PDF
Chat
|
ICDAR 2021 Competition on Document Visual Question Answering
|
2021
|
Rubèn Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dìmosthenis Karatzas
|
1
|
+
PDF
Chat
|
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
|
2021
|
Rafał Powalski
Łukasz Borchmann
Dawid Jurkiewicz
Tomasz Dwojak
Michał Pietruszka
Gabriela Pałka
|
1
|
+
|
PaLM: Scaling Language Modeling with Pathways
|
2022
|
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
Adam Roberts
Paul Barham
Hyung Won Chung
Charles Sutton
Sebastian Gehrmann
|
1
|
+
|
Flamingo: a Visual Language Model for Few-Shot Learning
|
2022
|
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
Arthur Mensch
Katie Millican
Malcolm Reynolds
|
1
|
+
|
Training Compute-Optimal Large Language Models
|
2022
|
Jordan Hoffmann
Sebastian Borgeaud
Arthur Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
|
1
|
+
|
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
|
2021
|
Jack W. Rae
Sebastian Borgeaud
Trevor Cai
Katie Millican
Jordan Hoffmann
Francis Song
John Aslanides
Sarah B. Henderson
Roman Ring
Susannah Young
|
1
|
+
|
Training language models to follow instructions with human feedback
|
2022
|
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
|
1
|
+
|
OPT: Open Pre-trained Transformer Language Models
|
2022
|
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
Shuohui Chen
Christopher Dewan
Mona Diab
Xian Li
Xi Victoria Lin
|
1
|
+
|
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
|
2022
|
Peng Wang
Yang An
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
|
1
|
+
PDF
Chat
|
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
|
2022
|
Ahmed Masry
Xuan Long
Jia Qing Tan
Shafiq Joty
Enamul Hoque
|
1
|
+
|
Multimodal Few-Shot Learning with Frozen Language Models
|
2021
|
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
|
1
|
+
|
Supervised Contrastive Learning
|
2020
|
Prannay Khosla
Piotr Teterwak
Chen Wang
Aaron Sarna
Yonglong Tian
Phillip Isola
Aaron Maschinot
Ce Liu
Dilip Krishnan
|
1
|
+
|
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
|
2019
|
Colin Raffel
Noam Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
|
1
|
+
|
Language Models are Few-Shot Learners
|
2020
|
T. B. Brown
Benjamin F. Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
|
1
|
+
|
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
|
2022
|
Wenhui Wang
Hangbo Bao
Dong Li
Johan Björck
Zhiliang Peng
Qiang Liu
Kriti Aggarwal
Owais Khan Mohammed
Saksham Singhal
Subhojit Som
|
1
|
+
|
PreSTU: Pre-Training for Scene-Text Understanding
|
2022
|
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei‐Lun Chao
Radu Soricut
|
1
|
+
|
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
|
2021
|
Teakgyu Hong
Donghyun Kim
Mingi Ji
Wonseok Hwang
Dae‐Hyun Nam
Sungrae Park
|
1
|
+
PDF
Chat
|
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
|
2022
|
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
|
1
|
+
|
LAION-5B: An open large-scale dataset for training next generation image-text models
|
2022
|
C Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
Mehdi Cherti
Theo Coombes
Aarush Katta
C. H. Mullis
Mitchell Wortsman
|
1
|
+
|
Scaling Instruction-Finetuned Language Models
|
2022
|
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
William Fedus
Eric Li
Xuezhi Wang
Mostafa Dehghani
Siddhartha Brahma
|
1
|
+
PDF
Chat
|
OCR-Free Document Understanding Transformer
|
2022
|
Geewook Kim
Teakgyu Hong
Moonbin Yim
Jeongyeon Nam
Jinyoung Park
Jinyeong Yim
Wonseok Hwang
Sangdoo Yun
Dongyoon Han
Seunghyun Park
|
1
|
+
|
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
|
2023
|
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
|
1
|
+
PDF
Chat
|
End-to-End Document Recognition and Understanding with Dessurt
|
2023
|
Brian Davis
Bryan S. Morse
Brian Price
Chris Tensmeyer
Curtis Wigington
Vlad I. Morariu
|
1
|
+
|
LLaMA: Open and Efficient Foundation Language Models
|
2023
|
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
Timothée Lacroix
Baptiste Rozière
Naman Goyal
Eric Hambro
Faisal Azhar
|
1
|
+
|
PaLM-E: An Embodied Multimodal Language Model
|
2023
|
Danny Driess
Fei Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
Brian Ichter
Ayzaan Wahid
Jonathan Tompson
Quan Vuong
Tianhe Yu
|
1
|
+
|
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
|
2023
|
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
|
1
|
+
|
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
|
2023
|
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
Yiyang Zhou
Junyang Wang
Anwen Hu
Pengcheng Shi
Yaya Shi
|
1
|
+
|
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
|
2023
|
Wenliang Dai
Junnan Li
Dongxu Li
Anthony Meng Huat Tiong
Junqi Zhao
Weisheng Wang
Boyang Li
Pascale Fung
Steven C. H. Hoi
|
1
|
+
|
On the Hidden Mystery of OCR in Large Multimodal Models
|
2023
|
Yuliang Liu
Zhang Li
Hongliang Li
Wenwen Yu
Mingxin Huang
Dezhi Peng
Ming-Yu Liu
Mingrui Chen
Chunyuan Li
Lianwen Jin
|
1
|