+
|
Microsoft COCO Captions: Data Collection and Evaluation Server
|
2015
|
Xinlei Chen
Hao Fang
Tsung-Yi Lin
Ramakrishna Vedantam
Saurabh Gupta
Piotr DollĂĄr
C. Lawrence Zitnick
|
1
|
+
PDF
Chat
|
VQA: Visual Question Answering
|
2015
|
Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
|
1
|
+
PDF
Chat
|
Deep Residual Learning for Image Recognition
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
1
|
+
PDF
Chat
|
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
|
2017
|
Israel D. Gebru
SilĂšye Ba
Xiaofei Li
Radu Horaud
|
1
|
+
PDF
Chat
|
Detecting Engagement in Egocentric Video
|
2016
|
Yu-Chuan Su
Kristen Grauman
|
1
|
+
|
Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning
|
2016
|
William Lotter
Gabriel Kreiman
David Cox
|
1
|
+
|
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
|
2012
|
Khurram Soomro
Amir Zamir
Mubarak Shah
|
1
|
+
PDF
Chat
|
Anticipating Visual Representations from Unlabeled Video
|
2016
|
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
|
1
|
+
PDF
Chat
|
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
|
2016
|
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
|
1
|
+
PDF
Chat
|
Joint CTC-attention based end-to-end speech recognition using multi-task learning
|
2017
|
Suyoun Kim
Takaaki Hori
Shinji Watanabe
|
1
|
+
PDF
Chat
|
Actions ~ Transformations
|
2016
|
Xiaolong Wang
Ali Farhadi
Abhinav Gupta
|
1
|
+
PDF
Chat
|
Feature Pyramid Networks for Object Detection
|
2017
|
Tsung-Yi Lin
Piotr DollĂĄr
Ross Girshick
Kaiming He
Bharath Hariharan
Serge Belongie
|
1
|
+
|
Transformation-Based Models of Video Sequences
|
2017
|
Joost R. van Amersfoort
Anitha Kannan
MarcâAurelio Ranzato
Arthur Szlam
Du Tran
Soumith Chintala
|
1
|
+
PDF
Chat
|
Joint Discovery of Object States and Manipulation Actions
|
2017
|
Jean-Baptiste Alayrac
Josef Ć ivic
Ivan Laptev
Simon Lacoste-Julien
|
1
|
+
PDF
Chat
|
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video
|
2017
|
Davide Moltisanti
Michael Wray
Walterio MayolâCuevas
Dima Damen
|
1
|
+
|
Decomposing Motion and Content for Natural Video Sequence Prediction
|
2017
|
Ruben Villegas
Shuicheng Yan
Seunghoon Hong
Xunyu Lin
Honglak Lee
|
1
|
+
|
The Kinetics Human Action Video Dataset
|
2017
|
Andrew Zisserman
JoĂŁo Carreira
Karen Simonyan
Will Kay
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
T.C. Green
Trevor Back
|
1
|
+
PDF
Chat
|
The âSomething Somethingâ Video Database for Learning and Evaluating Visual Common Sense
|
2017
|
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna MaterzyĆska
Susanne Westphal
Heuna Kim
Valentin Haenel
Ingo Fruend
P.N. Yianilos
Moritz Mueller-Freitag
|
1
|
+
PDF
Chat
|
VoxCeleb: A Large-Scale Speaker Identification Dataset
|
2017
|
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
|
1
|
+
|
Detecting the Moment of Completion: Temporal Models for Localising Action Completion
|
2017
|
Farnoosh Heidarivincheh
Majid Mirmehdi
Dima Damen
|
1
|
+
PDF
Chat
|
Next-active-object prediction from egocentric videos
|
2017
|
Antonino Furnari
Sebastiano Battiato
Kristen Grauman
Giovanni Maria Farinella
|
1
|
+
PDF
Chat
|
Temporal Relational Reasoning in Videos
|
2018
|
Bolei Zhou
Alex Andonian
Aude Oliva
Antonio Torralba
|
1
|
+
PDF
Chat
|
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild
|
2018
|
Matthias MĂŒller
Adel Bibi
Silvio Giancola
Salman Alsubaihi
Bernard Ghanem
|
1
|
+
|
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
|
2018
|
Dima Damen
Hazel Doughty
Giovanni Maria Farinella
Sanja Fidler
Antonino Furnari
Evangelos Kazakos
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
|
1
|
+
|
YOLOv3: An Incremental Improvement.
|
2018
|
Joseph Redmon
Ali Farhadi
|
1
|
+
|
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
|
2018
|
Andrew Owens
Alexei A. Efros
|
1
|
+
|
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
|
2018
|
Gunnar A. Sigurdsson
Abhinav Gupta
Cordelia Schmid
Ali Farhadi
Karteek Alahari
|
1
|
+
PDF
Chat
|
VoxCeleb2: Deep Speaker Recognition
|
2018
|
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
|
1
|
+
PDF
Chat
|
Multi-modal egocentric activity recognition using multi-kernel learning
|
2020
|
Mehmet Ali Arabacı
Fatih Ăzkan
Elif SĂŒrer
Peter JanÄoviÄ
Alptekin Temizel
|
1
|
+
PDF
Chat
|
Diagnosing Error in Temporal Action Detectors
|
2018
|
Humam Alwassel
Fabian Caba Heilbron
VĂctor Escorcia
Bernard Ghanem
|
1
|
+
PDF
Chat
|
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
|
2018
|
Sourish Chaudhuri
Joseph Roth
Daniel P. W. Ellis
Andrew Gallagher
Liat Kaver
Radhika Marvin
Caroline Pantofaru
Nathan Reale
Loretta Guarino Reid
Kevin Wilson
|
1
|
+
PDF
Chat
|
Attributes as Operators: Factorizing Unseen Attribute-Object Compositions
|
2018
|
Tushar Nagarajan
Kristen Grauman
|
1
|
+
|
Self-Supervised Generation of Spatial Audio for 360 Video
|
2018
|
Pedro Morgado
Nuno Vasconcelos
Timothy Langlois
Oliver Wang
|
1
|
+
PDF
Chat
|
Deep Audio-Visual Speech Recognition
|
2018
|
Triantafyllos Afouras
Joon Son Chung
Andrew Senior
Oriol Vinyals
Andrew Zisserman
|
1
|
+
PDF
Chat
|
LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking
|
2019
|
Heng Fan
Liting Lin
Fan Yang
Peng Chu
Ge Deng
Sijia Yu
Hexin Bai
Yong Xu
Chunyuan Liao
Haibin Ling
|
1
|
+
|
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
|
2018
|
Jacob Devlin
MingâWei Chang
Kenton Lee
Kristina Toutanova
|
1
|
+
PDF
Chat
|
GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild
|
2019
|
Lianghua Huang
Xin Zhao
Kaiqi Huang
|
1
|
+
|
Learning Semantic Embedding Spaces for Slicing Vegetables
|
2019
|
Mohit Sharma
Kevin Zhang
Oliver Kroemer
|
1
|
+
|
Objects as Points
|
2019
|
Xingyi Zhou
Dequan Wang
Philipp KrĂ€henbĂŒhl
|
1
|
+
PDF
Chat
|
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
|
2019
|
Daniel Park
William Chan
Yu Zhang
ChungâCheng Chiu
Barret Zoph
Ekin D. Cubuk
Quoc V. Le
|
1
|
+
|
The Replica Dataset: A Digital Replica of Indoor Spaces
|
2019
|
Julian Straub
Thomas J. Whelan
Lingni Ma
Yufan Chen
Erik Wijmans
Simon Green
Jakob Julian Engel
Raul Mur-Artal
Carl Yuheng Ren
Shobhit Verma
|
1
|
+
|
Learning Temporal Transformations From Time-Lapse Videos
|
2016
|
Yipin Zhou
Tamara L. Berg
|
1
|
+
|
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
|
2015
|
Shaoqing Ren
Kaiming He
Ross Girshick
Jian Sun
|
1
|
+
PDF
Chat
|
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
|
2018
|
Tianwei Lin
Xu Zhao
Haisheng Su
Chongjing Wang
Ming Yang
|
1
|
+
PDF
Chat
|
LAEO-Net: Revisiting People Looking at Each Other in Videos
|
2019
|
Manuel J. MarĂnâJimĂ©nez
Vicky Kalogeiton
Pablo Medina-Suarez
Andrew Zisserman
|
1
|
+
PDF
Chat
|
Multiview RGB-D Dataset for Object Instance Detection
|
2016
|
Georgios Georgakis
Md Alimoor Reza
Arsalan Mousavian
Phi-Hung Le
Jana KoĆĄeckĂĄ
|
1
|
+
PDF
Chat
|
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
|
2018
|
Andrew Owens
Alexei A. Efros
|
1
|
+
PDF
Chat
|
Non-local Neural Networks
|
2018
|
Xiaolong Wang
Ross Girshick
Abhinav Gupta
Kaiming He
|
1
|
+
PDF
Chat
|
Learning to Separate Object Sounds by Watching Unlabeled Video
|
2018
|
Ruohan Gao
Rogério Feris
Kristen Grauman
|
1
|
+
|
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
|
2015
|
Xingjian Shi
Zhourong Chen
Hao Wang
DitâYan Yeung
Wai Kin Wong
Wangâchun Woo
|
1
|