+
PDF
Chat
|
Learning Joint Representations of Videos and Sentences with Web Image Search
|
2016
|
Mayu Otani
Yuta Nakashima
Esa Rahtu
Janne Heikkilä
Naokazu Yokoya
|
3
|
+
PDF
Chat
|
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
|
2017
|
João Carreira
Andrew Zisserman
|
3
|
+
PDF
Chat
|
Deep Residual Learning for Image Recognition
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
3
|
+
PDF
Chat
|
Title Generation for User Generated Videos
|
2016
|
Kuo-Hao Zeng
Tseng-Hung Chen
Juan Carlos Niebles
Min Sun
|
3
|
+
|
Neural Machine Translation by Jointly Learning to Align and Translate
|
2015
|
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
|
3
|
+
PDF
Chat
|
MovieQA: Understanding Stories in Movies through Question-Answering
|
2016
|
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
Raquel Urtasun
Sanja Fidler
|
3
|
+
PDF
Chat
|
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
|
2017
|
Jun-Yan Zhu
Taesung Park
Phillip Isola
Alexei A. Efros
|
3
|
+
PDF
Chat
|
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
|
2019
|
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
|
3
|
+
PDF
Chat
|
Video Captioning via Hierarchical Reinforcement Learning
|
2018
|
Xin Wang
Wenhu Chen
Jiawei Wu
Yuan-Fang Wang
William Yang Wang
|
2
|
+
PDF
Chat
|
Localizing Moments in Video with Natural Language
|
2017
|
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Šivic
Trevor Darrell
Bryan Russell
|
2
|
+
PDF
Chat
|
A Closer Look at Spatiotemporal Convolutions for Action Recognition
|
2018
|
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
|
2
|
+
PDF
Chat
|
You Only Look Once: Unified, Real-Time Object Detection
|
2016
|
Joseph Redmon
Santosh Divvala
Ross Girshick
Ali Farhadi
|
2
|
+
|
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
|
2014
|
Kyunghyun Cho
Bart van Merriënboer
Çaǧlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
|
2
|
+
|
Attention is All you Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
|
2
|
+
PDF
Chat
|
End-to-End Video Captioning With Multitask Reinforcement Learning
|
2019
|
Lijun Li
Boqing Gong
|
2
|
+
PDF
Chat
|
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
|
2018
|
Jingwen Wang
Wenhao Jiang
Lin Ma
Wei Liu
Yong Xu
|
2
|
+
|
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
|
2016
|
Jeff Donahue
Lisa Anne Hendricks
Marcus Rohrbach
Subhashini Venugopalan
Sergio Guadarrama
Kate Saenko
Trevor Darrell
|
2
|
+
PDF
Chat
|
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
|
2019
|
Nayyer Aafaq
Naveed Akhtar
Wei Liu
Syed Zulqarnain Gilani
Ajmal Mian
|
2
|
+
PDF
Chat
|
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
|
2019
|
Lianli Gao
Xiangpeng Li
Jingkuan Song
Heng Tao Shen
|
2
|
+
PDF
Chat
|
Supervising Neural Attention Models for Video Captioning by Human Gaze Data
|
2017
|
Youngjae Yu
Jongwook Choi
Yeon‐Hwa Kim
Kyung Yoo
Sang‐Hun Lee
Gunhee Kim
|
2
|
+
PDF
Chat
|
Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms
|
2018
|
Sweta Agrawal
Amit Awekar
|
2
|
+
PDF
Chat
|
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
|
2014
|
Ross Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
|
2
|
+
PDF
Chat
|
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
|
2018
|
Kensho Hara
Hirokatsu Kataoka
Yutaka Satoh
|
2
|
+
|
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
|
2018
|
Xin Wang
Yuan-Fang Wang
William Yang Wang
|
2
|
+
PDF
Chat
|
Sequence to Sequence -- Video to Text
|
2015
|
Subhashini Venugopalan
Marcus Rohrbach
Jeffrey Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
|
2
|
+
PDF
Chat
|
Speech recognition with deep recurrent neural networks
|
2013
|
Alex Graves
Abdelrahman Mohamed
Geoffrey E. Hinton
|
2
|
+
PDF
Chat
|
CIDEr: Consensus-based image description evaluation
|
2015
|
Ramakrishna Vedantam
C. Lawrence Zitnick
Devi Parikh
|
2
|
+
PDF
Chat
|
ECO: Efficient Convolutional Network for Online Video Understanding
|
2018
|
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
|
2
|
+
PDF
Chat
|
Long-Term Temporal Convolutions for Action Recognition
|
2017
|
Gül Varol
Ivan Laptev
Cordelia Schmid
|
2
|
+
PDF
Chat
|
TGIF: A New Dataset and Benchmark on Animated GIF Description
|
2016
|
Yuncheng Li
Yale Song
Liangliang Cao
Joel Tetreault
Larry Goldberg
Alejandro Jaimes
Jiebo Luo
|
2
|
+
PDF
Chat
|
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
|
2018
|
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Murphy
|
2
|
+
PDF
Chat
|
Weakly-Supervised Alignment of Video with Text
|
2015
|
Piotr Bojanowski
Rémi Lajugie
Édouard Grave
Francis Bach
Ivan Laptev
Jean Ponce
C. Schmid
|
2
|
+
PDF
Chat
|
Deep visual-semantic alignments for generating image descriptions
|
2015
|
Andrej Karpathy
Li Fei-Fei
|
2
|
+
PDF
Chat
|
Evaluation of automatic video captioning using direct assessment
|
2018
|
Yvette Graham
George Awad
Alan F. Smeaton
|
2
|
+
PDF
Chat
|
From captions to visual concepts and back
|
2015
|
Hao Fang
Saurabh Gupta
Forrest Iandola
Rupesh K. Srivastava
Li Deng
Piotr Dollár
Jianfeng Gao
Xiaodong He
Margaret Mitchell
John Platt
|
2
|
+
|
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
|
2018
|
Albert Gatt
Emiel Krahmer
|
2
|
+
PDF
Chat
|
SPICE: Semantic Propositional Image Caption Evaluation
|
2016
|
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Jay Gould
|
2
|
+
PDF
Chat
|
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation
|
2016
|
Rakshith Shetty
Jorma Laaksonen
|
2
|
+
PDF
Chat
|
Show and tell: A neural image caption generator
|
2015
|
Oriol Vinyals
Alexander Toshev
Samy Bengio
Dumitru Erhan
|
2
|
+
PDF
Chat
|
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
|
2015
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
2
|
+
PDF
Chat
|
Reconstruction Network for Video Captioning
|
2018
|
Bairui Wang
Lin Ma
Wei Zhang
Wei Liu
|
2
|
+
PDF
Chat
|
Aggregated Residual Transformations for Deep Neural Networks
|
2017
|
Saining Xie
Ross Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
|
2
|
+
PDF
Chat
|
Video Captioning with Transferred Semantic Attributes
|
2017
|
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
|
2
|
+
PDF
Chat
|
Jointly Modeling Embedding and Translation to Bridge Video and Language
|
2016
|
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Yong Rui
|
2
|
+
PDF
Chat
|
Video Captioning with Multi-Faceted Attention
|
2018
|
Xiang Long
Chuang Gan
Gerard de Melo
|
2
|
+
PDF
Chat
|
FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
|
2017
|
Eddy Ilg
N. Michael Mayer
Tonmoy Saikia
Margret Keuper
Alexey Dosovitskiy
Thomas Brox
|
2
|
+
PDF
Chat
|
End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering
|
2017
|
Youngjae Yu
Hyungjin Ko
Jongwook Choi
Gunhee Kim
|
2
|
+
PDF
Chat
|
Describing Videos by Exploiting Temporal Structure
|
2015
|
Li Yao
Atousa Torabi
Kyunghyun Cho
Nicolas Ballas
Christopher Pal
Hugo Larochelle
Aaron Courville
|
2
|
+
PDF
Chat
|
From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning
|
2018
|
Jingkuan Song
Yuyu Guo
Lianli Gao
Xuelong Li
Alan Hanjalić
Heng Tao Shen
|
2
|
+
PDF
Chat
|
Weakly Supervised Dense Video Captioning
|
2017
|
Zhiqiang Shen
Jianguo Li
Su Zhou
Minjun Li
Yurong Chen
Yu‐Gang Jiang
Xiangyang Xue
|
2
|