+
PDF
Chat
|
Attention-Based Multimodal Fusion for Video Description
|
2017
|
Chiori Hori
Takaaki Hori
TengâYok Lee
Ziming Zhang
Bret Harsham
John R. Hershey
Tim K. Marks
Kazuhiko Sumi
|
8
|
+
PDF
Chat
|
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
|
2017
|
JoĂŁo Carreira
Andrew Zisserman
|
7
|
+
PDF
Chat
|
CIDEr: Consensus-based image description evaluation
|
2015
|
Ramakrishna Vedantam
C. Lawrence Zitnick
Devi Parikh
|
6
|
+
|
Very Deep Convolutional Networks for Large-Scale Image Recognition
|
2014
|
Karen Simonyan
Andrew Zisserman
|
6
|
+
PDF
Chat
|
VQA: Visual Question Answering
|
2015
|
Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
|
6
|
+
|
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
|
2016
|
Gunnar A. Sigurdsson
GĂŒl Varol
Xiaolong Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
|
5
|
+
|
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
|
2015
|
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
Wei Xu
|
5
|
+
PDF
Chat
|
Dense-Captioning Events in Videos
|
2017
|
Ranjay Krishna
Kenji Hata
Frederic Ren
Li Fei-Fei
Juan Carlos Niebles
|
4
|
+
PDF
Chat
|
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
|
2017
|
Abhishek Das
Satwik Kottur
José M. F. Moura
Stefan Lee
Dhruv Batra
|
4
|
+
|
Neural Machine Translation by Jointly Learning to Align and Translate
|
2014
|
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
|
4
|
+
PDF
Chat
|
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
|
2016
|
Shaoqing Ren
Kaiming He
Ross Girshick
Jian Sun
|
4
|
+
|
A Neural Conversational Model
|
2015
|
Oriol Vinyals
Quoc V. Le
|
4
|
+
|
The Kinetics Human Action Video Dataset
|
2017
|
Andrew Zisserman
JoĂŁo Carreira
Karen Simonyan
Will Kay
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
T.C. Green
Trevor Back
|
4
|
+
|
The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems
|
2015
|
Ryan Lowe
Nissan Pow
Iulian Vlad Serban
Joëlle Pineau
|
4
|
+
PDF
Chat
|
End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features
|
2019
|
Chiori Hori
Huda Alamri
Jue Wang
Gordon Wichern
Takaaki Hori
Anoop Cherian
Tim K. Marks
Vincent Cartillier
Raphael Gontijo Lopes
Abhishek Das
|
4
|
+
PDF
Chat
|
Learning Spatiotemporal Features with 3D Convolutional Networks
|
2015
|
Du Tran
Lubomir Bourdev
Rob Fergus
Lorenzo Torresani
Manohar Paluri
|
3
|
+
|
Microsoft COCO Captions: Data Collection and Evaluation Server
|
2015
|
Xinlei Chen
Hao Fang
Tsung-Yi Lin
Ramakrishna Vedantam
Saurabh Gupta
Piotr DollĂĄr
C. Lawrence Zitnick
|
3
|
+
|
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
|
2018
|
Huda Alamri
Vincent Cartillier
Raphael Gontijo Lopes
Abhishek Das
Jue Wang
Irfan Essa
Dhruv Batra
Devi Parikh
Anoop Cherian
Tim K. Marks
|
3
|
+
|
The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems
|
2015
|
Ryan Lowe
Nissan Pow
Iulian Vlad Serban
Joëlle Pineau
|
3
|
+
|
Attention Is All You Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Ćukasz Kaiser
Illia Polosukhin
|
3
|
+
PDF
Chat
|
Yin and Yang: Balancing and Answering Binary Visual Questions
|
2016
|
Peng Zhang
Yash Goyal
Douglas Summers-Stay
Dhruv Batra
Devi Parikh
|
3
|
+
PDF
Chat
|
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
|
2015
|
Alessandro Sordoni
Michel Galley
Michael Auli
Chris Brockett
Yangfeng Ji
Margaret Mitchell
JianâYun Nie
Jianfeng Gao
Bill Dolan
|
3
|
+
PDF
Chat
|
MovieQA: Understanding Stories in Movies through Question-Answering
|
2016
|
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
Raquel Urtasun
Sanja Fidler
|
3
|
+
|
Attention is All you Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Ćukasz Kaiser
Illia Polosukhin
|
3
|
+
PDF
Chat
|
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
|
2017
|
Yunseok Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
|
3
|
+
|
Very Deep Convolutional Networks for Large-Scale Image Recognition
|
2014
|
Karen Simonyan
Andrew Zisserman
|
3
|
+
PDF
Chat
|
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
|
2016
|
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
Wei Xu
|
3
|
+
|
VQA: Visual Question Answering
|
2015
|
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. Lawrence Zitnick
Dhruv Batra
Devi Parikh
|
2
|
+
|
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
|
2017
|
Abhishek Das
Satwik Kottur
José M. F. Moura
Stefan Lee
Dhruv Batra
|
2
|
+
PDF
Chat
|
Show and tell: A neural image caption generator
|
2015
|
Oriol Vinyals
Alexander Toshev
Samy Bengio
Dumitru Erhan
|
2
|
+
|
MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
|
2019
|
Lorenzo Bertoni
S. Kreiss
Alexandre Alahi
|
2
|
+
PDF
Chat
|
Bounding Box Regression With Uncertainty for Accurate Object Detection
|
2019
|
Yihui He
Chenchen Zhu
Jianren Wang
Marios Savvides
Xiangyu Zhang
|
2
|
+
|
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
|
2014
|
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
|
2
|
+
PDF
Chat
|
Coherent Multi-sentence Video Description with Variable Level of Detail
|
2014
|
Anna Rohrbach
Marcus Rohrbach
Wei Qiu
Annemarie Friedrich
Manfred Pinkal
Bernt Schiele
|
2
|
+
|
A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference
|
2019
|
Kumar Shridhar
Felix Laumann
Marcus Liwicki
|
2
|
+
PDF
Chat
|
A Dataset for Document Grounded Conversations
|
2018
|
Kangyan Zhou
Shrimai Prabhumoye
Alan W. Black
|
2
|
+
|
Audio-Visual Scene-Aware Dialog
|
2019
|
Huda Alamri
Vincent Cartillier
Abhishek Das
Jue Wang
Anoop Cherian
Irfan Essa
Dhruv Batra
Tim K. Marks
Chiori Hori
Peter Anderson
|
2
|
+
PDF
Chat
|
Evaluating and Calibrating Uncertainty Prediction in Regression Tasks
|
2022
|
Dan Levi
Liran Gispan
Niv Giladi
Ethan Fetaya
|
2
|
+
PDF
Chat
|
Visual7W: Grounded Question Answering in Images
|
2016
|
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
|
2
|
+
PDF
Chat
|
Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors
|
2018
|
Xuanyi Dong
Shoou-I Yu
Xinshuo Weng
Shih-En Wei
Yi Yang
Yaser Sheikh
|
2
|
+
|
Numerical Coordinate Regression with Convolutional Neural Networks
|
2018
|
Aiden Nibali
Zhen He
Stuart Morgan
Luke A. Prendergast
|
2
|
+
PDF
Chat
|
DeepPose: Human Pose Estimation via Deep Neural Networks
|
2014
|
Alexander Toshev
Christian Szegedy
|
2
|
+
|
2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning
|
2018
|
Diogo Luvizon
David Picard
Hedi Tabia
|
2
|
+
PDF
Chat
|
Describing Videos by Exploiting Temporal Structure
|
2015
|
Li Yao
Atousa Torabi
Kyunghyun Cho
Nicolas Ballas
Christopher Pal
Hugo Larochelle
Aaron Courville
|
2
|
+
PDF
Chat
|
Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty
|
2018
|
Apratim Bhattacharyya
Mario Fritz
Bernt Schiele
|
2
|
+
|
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
|
2012
|
Khurram Soomro
Amir Zamir
Mubarak Shah
|
2
|
+
PDF
Chat
|
Stacked Hourglass Networks for Human Pose Estimation
|
2016
|
Alejandro Newell
Kaiyu Yang
Jia Deng
|
2
|
+
PDF
Chat
|
Uncertainty Estimates and Multi-hypotheses Networks for Optical Flow
|
2018
|
Eddy Ilg
ĂzgĂŒn Ăiçek
Silvio Galesso
Aaron Klein
Osama Makansi
Frank Hutter
Thomas Brox
|
2
|
+
|
Rethinking on Multi-Stage Networks for Human Pose Estimation.
|
2019
|
Wenbo Li
Zhicheng Wang
Binyi Yin
Qixiang Peng
Yuming Du
Tianzi Xiao
Gang Yu
Hongtao Lu
Yichen Wei
Jian Sun
|
2
|
+
PDF
Chat
|
Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks
|
2018
|
Zhenhua Feng
Josef Kittler
Muhammad Awais
Patrik Huber
XiaoâJun Wu
|
2
|