+
PDF
Chat
|
From Recognition to Cognition: Visual Commonsense Reasoning
|
2019
|
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
|
4
|
+
PDF
Chat
|
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
|
2021
|
Shailaja Keyur Sampat
Akshay Kumar
Yezhou Yang
Chitta Baral
|
3
|
+
|
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
|
2020
|
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
|
3
|
+
PDF
Chat
|
Deep Residual Learning for Image Recognition
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
3
|
+
PDF
Chat
|
VQA: Visual Question Answering
|
2015
|
Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
|
3
|
+
PDF
Chat
|
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
|
2017
|
Justin Johnson
Bharath Hariharan
Laurens van der Maaten
Li Fei-Fei
C. Lawrence Zitnick
Ross Girshick
|
3
|
+
PDF
Chat
|
Modeling Context in Referring Expressions
|
2016
|
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
|
2
|
+
|
Attention Is All You Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Ĺukasz Kaiser
Illia Polosukhin
|
2
|
+
PDF
Chat
|
Composing Text and Image for Image Retrieval - an Empirical Odyssey
|
2019
|
Nam Vo
Lu Jiang
Chen Sun
Kevin Murphy
Li-Jia Li
Li Fei-Fei
James Hays
|
2
|
+
PDF
Chat
|
Semantically Distributed Robust Optimization for Vision-and-Language Inference
|
2022
|
Tejas Gokhale
Abhishek Chaudhary
Pratyay Banerjee
Chitta Baral
Yezhou Yang
|
2
|
+
PDF
Chat
|
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
|
2019
|
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
|
2
|
+
|
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
|
2022
|
Shailaja Keyur Sampat
Maitreya Patel
Subhasish Das
Yezhou Yang
Chitta Baral
|
2
|
+
PDF
Chat
|
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
|
2015
|
Mateusz Malinowski
Marcus Rohrbach
Mario Fritz
|
1
|
+
|
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
|
2014
|
Andrej Karpathy
Armand Joulin
Fei Fei F Li
|
1
|
+
|
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
|
2014
|
JunâYoung Chung
Ăaǧlar GĂźlçehre
Kyunghyun Cho
Yoshua Bengio
|
1
|
+
PDF
Chat
|
Going deeper with convolutions
|
2015
|
Christian Szegedy
Wei Liu
Yangqing Jia
Pierre Sermanet
Scott Reed
Dragomir Anguelov
Dumitru Erhan
Vincent Vanhoucke
Andrew Rabinovich
|
1
|
+
PDF
Chat
|
A dataset for Movie Description
|
2015
|
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
|
1
|
+
PDF
Chat
|
Show and tell: A neural image caption generator
|
2015
|
Oriol Vinyals
Alexander Toshev
Samy Bengio
Dumitru Erhan
|
1
|
+
|
Very Deep Convolutional Networks for Large-Scale Image Recognition
|
2014
|
Karen Simonyan
Andrew Zisserman
|
1
|
+
PDF
Chat
|
Enriching Word Vectors with Subword Information
|
2017
|
Piotr Bojanowski
Ădouard Grave
Armand Joulin
TomĂĄĹĄ Mikolov
|
1
|
+
|
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
|
2018
|
Kexin Yi
Jia-Jun Wu
Chuang Gan
Antonio Torralba
Pushmeet Kohli
Joshua B. Tenenbaum
|
1
|
+
PDF
Chat
|
CIDEr: Consensus-based image description evaluation
|
2015
|
Ramakrishna Vedantam
C. Lawrence Zitnick
Devi Parikh
|
1
|
+
PDF
Chat
|
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
|
2016
|
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
|
1
|
+
|
Back to the Future: Knowledge Distillation for Human Action Anticipation
|
2019
|
Vinh Cao Trần
Yang Wang
Minh Hoai
|
1
|
+
PDF
Chat
|
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
|
2017
|
Zheng Shou
Jonathan Chan
Alireza Zareian
Kazuyuki Miyazawa
ShihâFu Chang
|
1
|
+
PDF
Chat
|
Encouraging LSTMs to Anticipate Actions Very Early
|
2017
|
Mohammad Sadegh Aliakbarian
Fatemeh Sadat Saleh
Mathieu Salzmann
Basura Fernando
Lars Petersson
Lars Andersson
|
1
|
+
|
A Neural Representation of Sketch Drawings
|
2017
|
David Ha
Douglas Eck
|
1
|
+
|
Neural Message Passing for Quantum Chemistry
|
2017
|
Justin Gilmer
Samuel S. Schoenholz
Patrick Riley
Oriol Vinyals
George E. Dahl
|
1
|
+
PDF
Chat
|
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
|
2017
|
Yunseok Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
|
1
|
+
PDF
Chat
|
Action Tubelet Detector for Spatio-Temporal Action Localization
|
2017
|
Vicky Kalogeiton
Philippe Weinzaepfel
Vittorio Ferrari
Cordelia Schmid
|
1
|
+
PDF
Chat
|
Revisiting Visual Question Answering Baselines
|
2016
|
Allan Jabri
Armand Joulin
Laurens van der Maaten
|
1
|
+
PDF
Chat
|
Predicting Motivations of Actions by Leveraging Text
|
2016
|
Carl Vondrick
Deniz Oktay
Hamed Pirsiavash
Antonio Torralba
|
1
|
+
|
Explaining and Harnessing Adversarial Examples
|
2014
|
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
|
1
|
+
PDF
Chat
|
Learning to Track for Spatio-Temporal Action Localization
|
2015
|
Philippe Weinzaepfel
ZaĂŻd Harchaoui
Cordelia Schmid
|
1
|
+
|
Relational inductive biases, deep learning, and graph networks
|
2018
|
Peter Battaglia
Jessica B. Hamrick
Victor Bapst
Ălvaro SĂĄnchezâGonzĂĄlez
VinĂcius Zambaldi
Mateusz Malinowski
Andrea Tacchetti
David Raposo
Adam Santoro
Ryan Faulkner
|
1
|
+
|
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
|
2018
|
Dima Damen
Hazel Doughty
Giovanni Maria Farinella
Sanja Fidler
Antonino Furnari
Evangelos Kazakos
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
|
1
|
+
|
Generative Adversarial Text to Image Synthesis
|
2016
|
Scott Reed
Zeynep Akata
Xinchen Yan
Lajanugen Logeswaran
Bernt Schiele
Honglak Lee
|
1
|
+
|
Which Training Methods for GANs do actually Converge?
|
2018
|
Lars Mescheder
Andreas Geiger
Sebastian Nowozin
|
1
|
+
|
Generalizing to Unseen Domains via Adversarial Data Augmentation
|
2018
|
Riccardo Volpi
Hongseok Namkoong
Ozan Ĺener
John C. Duchi
Vittorio Murino
Silvio Savarese
|
1
|
+
PDF
Chat
|
W-TALC: Weakly-Supervised Temporal Activity Localization and Classification
|
2018
|
Sujoy Paul
Sourya Roy
Amit K. RoyâChowdhury
|
1
|
+
PDF
Chat
|
ODSQA: Open-Domain Spoken Question Answering Dataset
|
2018
|
ChiaâHsuan Lee
Shang-Ming Wang
HuanâCheng Chang
Hung-yi Lee
|
1
|
+
PDF
Chat
|
Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions
|
2019
|
Misha Wagner
Hector Basevi
Rakshith Shetty
Wenbin Li
Mateusz Malinowski
Mario Fritz
AleĹĄ Leonardis
|
1
|
+
PDF
Chat
|
Temporal Relational Reasoning in Videos
|
2018
|
Bolei Zhou
Alex Andonian
Aude Oliva
Antonio Torralba
|
1
|
+
PDF
Chat
|
Domain Generalization with Domain-Specific Aggregation Modules
|
2019
|
Antonio DâInnocente
Barbara Caputo
|
1
|
+
PDF
Chat
|
Action and Intention Recognition of Pedestrians in Urban Traffic
|
2018
|
Dimitrios Varytimidis
Fernando AlonsoâFernandez
Boris DurĂĄn
Cristofer Englund
|
1
|
+
|
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
2018
|
Jacob Devlin
MingâWei Chang
Kenton Lee
Kristina Toutanova
|
1
|
+
PDF
Chat
|
Deep Neural Networks as a Computational Model for Human Shape Sensitivity
|
2016
|
Jonas Kubilius
Stefania Bracci
Hans Op de Beeck
|
1
|
+
|
GQA: a new dataset for compositional question answering over real-world images.
|
2019
|
Drew A. Hudson
Christopher D. Manning
|
1
|
+
|
Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards
|
2017
|
Junjie Zhang
Qi Wu
Chunhua Shen
Jian Zhang
Jianfeng Lu
Anton van den Hengel
|
1
|
+
PDF
Chat
|
Deeper, Broader and Artier Domain Generalization
|
2017
|
Da Li
Yongxin Yang
Yi-Zhe Song
Timothy M. Hospedales
|
1
|