Yuchong Sun

Follow

Generating author description...

Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ PDF Chat Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval 2021 Max Bain
Arsha Nagrani
GĂźl Varol
Andrew Zisserman
3
+ PDF Chat HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips 2019 Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Ĺ ivic
3
+ Decoupled Weight Decay Regularization 2017 Ilya Loshchilov
Frank Hutter
2
+ Learning a Text-Video Embedding from Incomplete and Heterogeneous Data 2018 Antoine Miech
Ivan Laptev
Josef Ĺ ivic
2
+ HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training 2020 Linjie Li
Yen‐Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
2
+ PDF Chat Deep Residual Learning for Image Recognition 2016 Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
2
+ PDF Chat Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification 2018 Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Murphy
2
+ PDF Chat End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering 2017 Youngjae Yu
Hyungjin Ko
Jongwook Choi
Gunhee Kim
2
+ Use What You Have: Video Retrieval Using Representations From Collaborative Experts 2019 Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
2
+ Support-set bottlenecks for video-text representation learning 2020 Mandela Patrick
Po-Yao Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
JoĂŁo F. Henriques
Andrea Vedaldi
2
+ Is Space-Time Attention All You Need for Video Understanding? 2021 Gedas Bertasius
Heng Wang
Lorenzo Torresani
2
+ PDF Chat Image Super-Resolution Via Iterative Refinement 2022 Chitwan Saharia
Jonathan Ho
William Chan
Tim Salimans
David J. Fleet
Mohammad Norouzi
2
+ PDF Chat A Style-Based Generator Architecture for Generative Adversarial Networks 2019 Tero Karras
Samuli Laine
Timo Aila
2
+ PDF Chat VideoBERT: A Joint Model for Video and Language Representation Learning 2019 Chen Sun
Austin Myers
Carl Vondrick
Kevin Murphy
Cordelia Schmid
2
+ PDF Chat Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling 2021 Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Mohit Bansal
Jingjing Liu
2
+ PDF Chat TediGAN: Text-Guided Diverse Face Image Generation and Manipulation 2021 Weihao Xia
Yujiu Yang
Jing‐Hao Xue
Baoyuan Wu
2
+ PDF Chat A Joint Sequence Fusion Model for Video Question Answering and Retrieval 2018 Youngjae Yu
Jong-Seok Kim
Gunhee Kim
2
+ PDF Chat Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation 2021 Elad Richardson
Yuval Alaluf
Or Patashnik
Yotam Nitzan
Yaniv Azar
Stav Shapiro
Daniel Cohen‐Or
2
+ PDF Chat Hierarchical Conditional Relation Networks for Video Question Answering 2020 Thao Minh Le
Vuong Le
Svetha Venkatesh
Truyen Tran
2
+ PDF Chat ActBERT: Learning Global-Local Video-Text Representations 2020 Linchao Zhu
Yi Yang
2
+ PDF Chat Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning 2021 Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
2
+ PDF Chat Motion-Appearance Co-memory Networks for Video Question Answering 2018 Jiyang Gao
Runzhou Ge
Kan Chen
Ram Nevatia
2
+ PDF Chat End-to-End Learning of Visual Representations From Uncurated Instructional Videos 2020 Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Ĺ ivic
Andrew Zisserman
2
+ PDF Chat Localizing Moments in Video with Natural Language 2017 Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Ĺ ivic
Trevor Darrell
Bryan Russell
2
+ PDF Chat Multi-modal Transformer for Video Retrieval 2020 Valentin Gabeur
Chen Sun
Karteek Alahari
Cordelia Schmid
2
+ PDF Chat SlowFast Networks for Video Recognition 2019 Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
2
+ PDF Chat TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering 2017 Yunseok Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
2
+ PDF Chat UNITER: UNiversal Image-TExt Representation Learning 2020 Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
2
+ PDF Chat Cross-Modal and Hierarchical Modeling of Video and Text 2018 Bowen Zhang
Hexiang Hu
Fei Sha
2
+ PDF Chat Learning Spatiotemporal Features with 3D Convolutional Networks 2015 Du Tran
Lubomir Bourdev
Rob Fergus
Lorenzo Torresani
Manohar Paluri
2
+ YouTube-8M: A Large-Scale Video Classification Benchmark 2016 Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
George Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
2
+ ALBERT: A Lite BERT for Self-supervised Learning of Language Representations 2019 Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
1
+ PDF Chat What Makes A Good Story? Designing Composite Rewards for Visual Storytelling 2020 Junjie Hu
Yu Cheng
Zhe Gan
Jingjing Liu
Jianfeng Gao
Graham Neubig
1
+ PDF Chat Local Aggregation for Unsupervised Learning of Visual Embeddings 2019 Chengxu Zhuang
Alex Zhai
Daniel Yamins
1
+ ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data 2020 Di Qi
Lin Su
Jia Song
Edward Cui
Taroon Bharti
Arun Sacheti
1
+ UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation. 2020 Huaishao Luo
Lei Ji
Botian Shi
Haoyang Huang
Nan Duan
Tianrui Li
Xilin Chen
Ming Zhou
1
+ REALM: Retrieval-Augmented Language Model Pre-Training 2020 Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming‐Wei Chang
1
+ XGPT: Cross-modal Generative Pre-Training for Image Captioning 2020 Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
1
+ Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers 2020 Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
1
+ Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks 2020 Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Dong Li
Furu Wei
1
+ Language Models are Few-Shot Learners 2020 T. B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
1
+ PDF Chat A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation 2020 Anyi Rao
Linning Xu
Yu Xiong
Guodong Xu
Qingqiu Huang
Bolei Zhou
Dahua Lin
1
+ VirTex: Learning Visual Representations from Textual Annotations 2020 Karan Desai
Justin Johnson
1
+ PDF Chat Momentum Contrast for Unsupervised Visual Representation Learning 2020 Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross Girshick
1
+ Large-Scale Adversarial Training for Vision-and-Language Representation Learning 2020 Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
1
+ ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph 2020 F. Richard Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
1
+ Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders 2020 Nicola Messina
Giuseppe Amato
Andrea Esuli
Fabrizio Falchi
Claudio Gennaro
Stéphane Marchand‐Maillet
1
+ X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers 2020 Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
1
+ PDF Chat Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks 2020 Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Dong Li
Furu Wei
1
+ Emerging Trends of Multimodal Research in Vision and Language. 2020 Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumder
Soujanya Poria
Roger Zimmermann
Amir Zadeh
1