+
PDF
Chat
|
Matterport3D: Learning from RGB-D Data in Indoor Environments
|
2017
|
Anne Lynn S. Chang
Angela Dai
Thomas Funkhouser
Maciej Halber
Matthias NieBner
Manolis Savva
Shuran Song
Andy Zeng
Yinda Zhang
|
16
|
+
PDF
Chat
|
Deep Residual Learning for Image Recognition
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
15
|
+
PDF
Chat
|
ImageNet Large Scale Visual Recognition Challenge
|
2015
|
Olga Russakovsky
Jia Deng
Hao Su
Jonathan Krause
Sanjeev Satheesh
Sean Ma
Zhiheng Huang
Andrej Karpathy
Aditya Khosla
Michael S. Bernstein
|
15
|
+
PDF
Chat
|
TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
|
2019
|
Howard Chen
Alane Suhr
Dipendra Misra
Noah Snavely
Yoav Artzi
|
12
|
+
PDF
Chat
|
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
|
2019
|
Xin Wang
Qiuyuan Huang
Aslı Çelikyılmaz
Jianfeng Gao
Dinghan Shen
Yuan-Fang Wang
William Yang Wang
Lei Zhang
|
12
|
+
PDF
Chat
|
CIDEr: Consensus-based image description evaluation
|
2015
|
Ramakrishna Vedantam
C. Lawrence Zitnick
Devi Parikh
|
11
|
+
|
Microsoft COCO Captions: Data Collection and Evaluation Server
|
2015
|
Xinlei Chen
Hao Fang
Tsung-Yi Lin
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollár
C. Lawrence Zitnick
|
11
|
+
PDF
Chat
|
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
|
2019
|
Hao Tan
Licheng Yu
Mohit Bansal
|
10
|
+
PDF
Chat
|
SPICE: Semantic Propositional Image Caption Evaluation
|
2016
|
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Jay Gould
|
10
|
+
|
On Evaluation of Embodied Navigation Agents
|
2018
|
Peter Anderson
Anne Lynn S. Chang
Devendra Singh Chaplot
Alexey Dosovitskiy
Saurabh Gupta
Vladlen Koltun
Jana Košecká
Jitendra Malik
Roozbeh Mottaghi
Manolis Savva
|
10
|
+
|
Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View
|
2020
|
Harsh Mehta
Yoav Artzi
Jason Baldridge
Eugene Ie
Piotr Mirowski
|
10
|
+
PDF
Chat
|
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
|
2018
|
Peter Anderson
Qi Wu
Damien Teney
Jake Bruce
Mark Johnson
Niko Sünderhauf
Ian Reid
Stephen Jay Gould
Anton van den Hengel
|
10
|
+
PDF
Chat
|
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
|
2018
|
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Jay Gould
Lei Zhang
|
9
|
+
PDF
Chat
|
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
|
2020
|
Yuankai Qi
Qi Wu
Peter Anderson
Xin Wang
William Yang Wang
Chunhua Shen
Anton van den Hengel
|
9
|
+
|
Hierarchical Question-Image Co-Attention for Visual Question Answering
|
2016
|
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
|
9
|
+
PDF
Chat
|
Gibson Env: Real-World Perception for Embodied Agents
|
2018
|
Fei Xia
Amir Zamir
Zhiyang He
Alexander F. Sax
Jitendra Malik
Silvio Savarese
|
8
|
+
PDF
Chat
|
Effective Approaches to Attention-based Neural Machine Translation
|
2015
|
Thang Luong
Hieu Pham
Christopher D. Manning
|
8
|
+
PDF
Chat
|
Deep visual-semantic alignments for generating image descriptions
|
2015
|
Andrej Karpathy
Li Fei-Fei
|
8
|
+
|
Adam: A Method for Stochastic Optimization
|
2014
|
Diederik P. Kingma
Jimmy Ba
|
8
|
+
PDF
Chat
|
Long-term recurrent convolutional networks for visual recognition and description
|
2015
|
Jeff Donahue
Lisa Anne Hendricks
Sergio Guadarrama
Marcus Rohrbach
Subhashini Venugopalan
Trevor Darrell
Kate Saenko
|
7
|
+
PDF
Chat
|
VQA: Visual Question Answering
|
2015
|
Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
|
7
|
+
|
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
|
2020
|
Alexander Ku
Peter Anderson
Roma Patel
Eugene Ie
Jason Baldridge
|
7
|
+
PDF
Chat
|
From captions to visual concepts and back
|
2015
|
Hao Fang
Saurabh Gupta
Forrest Iandola
Rupesh K. Srivastava
Li Deng
Piotr Dollár
Jianfeng Gao
Xiaodong He
Margaret Mitchell
John Platt
|
7
|
+
PDF
Chat
|
Embodied Question Answering
|
2018
|
Abhishek Das
Samyak Datta
Georgia Gkioxari
Stefan Lee
Devi Parikh
Dhruv Batra
|
7
|
+
PDF
Chat
|
IQA: Visual Question Answering in Interactive Environments
|
2018
|
Daniel Gordon
Aniruddha Kembhavi
Mohammad Rastegari
Joseph Redmon
Dieter Fox
Ali Farhadi
|
7
|
+
|
Effective and General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping
|
2019
|
Gabriel Magalhaes
Vihan Jain
Alexander Ku
Eugene Ie
Jason Baldridge
|
6
|
+
PDF
Chat
|
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
|
2017
|
Yash Goyal
Tejas Khot
Douglas Summers-Stay
Dhruv Batra
Devi Parikh
|
6
|
+
|
Neural Machine Translation by Jointly Learning to Align and Translate
|
2015
|
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
|
6
|
+
PDF
Chat
|
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
|
2016
|
Shaoqing Ren
Kaiming He
Ross Girshick
Jian Sun
|
6
|
+
PDF
Chat
|
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation
|
2018
|
Xin Wang
Wenhan Xiong
Hongmin Wang
William Yang Wang
|
6
|
+
PDF
Chat
|
The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation
|
2019
|
Chih‐Yao Ma
Zuxuan Wu
Ghassan AlRegib
Caiming Xiong
Zsolt Kira
|
6
|
+
|
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
|
2015
|
Shaoqing Ren
Kaiming He
Ross Girshick
Jian Sun
|
6
|
+
PDF
Chat
|
Generation and Comprehension of Unambiguous Object Descriptions
|
2016
|
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana Camburu
Alan Yuille
Kevin Murphy
|
5
|
+
PDF
Chat
|
Visual7W: Grounded Question Answering in Images
|
2016
|
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
|
5
|
+
PDF
Chat
|
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
|
2020
|
Mohit Shridhar
Jesse Thomason
Daniel Gordon
Yonatan Bisk
Winson Han
Roozbeh Mottaghi
Luke Zettlemoyer
Dieter Fox
|
5
|
+
PDF
Chat
|
Connecting Vision and Language with Localized Narratives
|
2020
|
Jordi Pont-Tuset
Jasper Uijlings
Soravit Changpinyo
Radu Soricut
Vittorio Ferrari
|
5
|
+
PDF
Chat
|
Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation
|
2019
|
Liyiming Ke
Xiujun Li
Yonatan Bisk
Ari Holtzman
Zhe Gan
Jingjing Liu
Jianfeng Gao
Yejin Choi
Siddhartha S Srinivasa
|
5
|
+
PDF
Chat
|
Show and tell: A neural image caption generator
|
2015
|
Oriol Vinyals
Alexander Toshev
Samy Bengio
Dumitru Erhan
|
5
|
+
|
MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
|
2017
|
Manolis Savva
Anne Lynn S. Chang
Alexey Dosovitskiy
Thomas Funkhouser
Vladlen Koltun
|
5
|
+
|
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
|
2017
|
Vahid Kazemi
Ali Elqursh
|
4
|
+
|
Vision-and-Dialog Navigation
|
2019
|
Jesse Thomason
Michael H. Murray
Maya Çakmak
Luke Zettlemoyer
|
4
|
+
|
Language Modeling with Gated Convolutional Networks
|
2016
|
Yann Dauphin
Angela Fan
Michael Auli
David Grangier
|
4
|
+
|
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
|
2014
|
Junhua Mao
Wei Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
|
4
|
+
PDF
Chat
|
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
|
2017
|
Jiasen Lu
Caiming Xiong
Devi Parikh
Richard Socher
|
4
|
+
PDF
Chat
|
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
|
2015
|
Yukun Zhu
Ryan Kiros
Rich Zemel
Ruslan Salakhutdinov
Raquel Urtasun
Antonio Torralba
Sanja Fidler
|
4
|
+
|
Matterport3D: Learning from RGB-D Data in Indoor Environments
|
2017
|
Anne Lynn S. Chang
Angela Dai
Thomas Funkhouser
Maciej Halber
Matthias Nießner
Manolis Savva
Shuran Song
Andy Zeng
Yinda Zhang
|
4
|
+
|
Attention is All you Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
|
4
|
+
|
VQA: Visual Question Answering
|
2015
|
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. Lawrence Zitnick
Dhruv Batra
Devi Parikh
|
4
|
+
|
Zero-Shot Visual Question Answering
|
2016
|
Damien Teney
Anton van den Hengel
|
4
|
+
|
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
|
2014
|
Kyunghyun Cho
Bart van Merriënboer
Çaǧlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
|
4
|