Projects
Reading
People
Chat
SU\G
(đ¸)
/K¡U
Projects
Reading
People
Chat
Sign Up
Light
Dark
System
Vaishnavi Himakunthala
Follow
Share
Generating author description...
All published works
Action
Title
Year
Authors
+
Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings
2023
Daniel M. Rose
Vaishnavi Himakunthala
Andy Ouyang
Ryan He
Alex Mei
Yujie Lu
Michael Saxon
Chinmay Sonar
Diba Mirza
William Yang Wang
+
Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
2023
Vaishnavi Himakunthala
Andy Ouyang
Daniel M. Rose
Ryan He
Alex Mei
Yujie Lu
Chinmay Sonar
Michael Saxon
William Yang Wang
+
PDF
Chat
Letâs Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
2023
Vaishnavi Himakunthala
Andy Ouyang
Daniel M. Rose
Ryan He
Alex Mei
Yujie Lu
Chinmay Sonar
Michael Saxon
William Yang Wang
Common Coauthors
Coauthor
Papers Together
William Yang Wang
3
Alex Mei
3
Michael Saxon
3
Yujie Lu
3
Daniel M. Rose
3
Chinmay Sonar
3
Andy Ouyang
3
Ryan He
3
Diba Mirza
1
Commonly Cited References
Action
Title
Year
Authors
# of times referenced
+
Weakly Supervised Memory Networks.
2015
Sainbayar Sukhbaatar
Arthur Szlam
Jason Weston
Rob Fergus
1
+
YouTube-8M: A Large-Scale Video Classification Benchmark
2016
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
George Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
1
+
PDF
Chat
A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering
2017
Tegan Maharaj
Nicolas Ballas
Anna Rohrbach
Aaron Courville
Christopher Pal
1
+
PDF
Chat
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
2017
Yunseok Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
1
+
PDF
Chat
Multimodal Dual Attention Memory for Video Story Question Answering
2018
Kyung-Min Kim
Seong-Ho Choi
Jin-Hwa Kim
ByoungâTak Zhang
1
+
BERTScore: Evaluating Text Generation with BERT
2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
1
+
PDF
Chat
Towards Automatic Learning of Procedures From Web Instructional Videos
2018
Luowei Zhou
Chenliang Xu
Jason J. Corso
1
+
DeepStory: Video Story QA by Deep Embedded Memory Networks
2017
Kyung-Min Kim
Min-Oh Heo
Seong-Ho Choi
ByoungâTak Zhang
1
+
PDF
Chat
TVQA: Localized, Compositional Video Question Answering
2018
Jie Lei
Licheng Yu
Mohit Bansal
Tamara L. Berg
1
+
PDF
Chat
MovieQA: Understanding Stories in Movies through Question-Answering
2016
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
Raquel Urtasun
Sanja Fidler
1
+
PDF
Chat
Dense-Captioning Events in Videos
2017
Ranjay Krishna
Kenji Hata
Frederic Ren
Li Fei-Fei
Juan Carlos Niebles
1
+
PDF
Chat
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
2019
Yu Zhou
Dejing Xu
Jun Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
1
+
PDF
Chat
MarioQA: Answering Questions by Watching Gameplay Videos
2017
Jonghwan Mun
Paul Hongsuck Seo
Ilchae Jung
Bohyung Han
1
+
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
2019
Nils Reimers
Iryna Gurevych
1
+
PDF
Chat
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
2019
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Ĺ ivic
1
+
PDF
Chat
VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
2019
Xin Wang
Jiawei Wu
Junkun Chen
Lei Li
YuanâFang Wang
William Yang Wang
1
+
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
2019
Kexin Yi
Chuang Gan
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
Joshua B. Tenenbaum
1
+
PDF
Chat
Leveraging Video Descriptions to Learn Video Question Answering
2017
Kuo-Hao Zeng
Tseng-Hung Chen
Ching-Yao Chuang
Yuan-Hong Liao
Juan Carlos Niebles
Min Sun
1
+
PDF
Chat
TVQA+: Spatio-Temporal Grounding for Video Question Answering
2020
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
1
+
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
2020
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
1
+
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
2021
Wonjae Kim
Bokyung Son
Ildoo Kim
1
+
Learning Transferable Visual Models From Natural Language Supervision
2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
1
+
PDF
Chat
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
2021
Max Bain
Arsha Nagrani
GĂźl Varol
Andrew Zisserman
1
+
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
2021
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
1
+
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022
Jason Lee
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Ed H.
Quoc V. Le
Denny Zhou
1
+
PaLM: Scaling Language Modeling with Pathways
2022
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
Adam Roberts
Paul Barham
Hyung Won Chung
Charles Sutton
Sebastian Gehrmann
1
+
Flamingo: a Visual Language Model for Few-Shot Learning
2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
Arthur Mensch
Katie Millican
Malcolm Reynolds
1
+
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
2022
Junnan Li
Dongxu Li
Caiming Xiong
Steven C. H. Hoi
1
+
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
2022
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
Xudong Lin
Shuohang Wang
Ziyi Yang
Chenguang Zhu
Derek Hoiem
1
+
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
2022
Peng Wang
Yang An
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
1
+
MERLOT: Multimodal Neural Script Knowledge Models
2021
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
Jae Sung Park
Jize Cao
Ali Farhadi
Yejin Choi
1
+
Language Models are Few-Shot Learners
2020
T. B. Brown
Benjamin F. Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
1
+
GRiT: A Generative Region-to-text Transformer for Object Understanding
2022
Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
1
+
PDF
Chat
Detecting Twenty-Thousand Classes Using Image-Level Supervision
2022
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Philipp Krähenbßhl
Ishan Misra
1
+
PaLM-E: An Embodied Multimodal Language Model
2023
Danny Driess
Fei Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
Brian Ichter
Ayzaan Wahid
Jonathan Tompson
Quan Vuong
Tianhe Yu
1
+
Visual Instruction Tuning
2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
1
+
Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings
2023
Daniel M. Rose
Vaishnavi Himakunthala
Andy Ouyang
Ryan He
Alex Mei
Yujie Lu
Michael Saxon
Chinmay Sonar
Diba Mirza
William Yang Wang
1
+
Otter: A Multi-Modal Model with In-Context Instruction Tuning
2023
Bo Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Jingkang Yang
Ziwei Liu
1
+
Multilingual Conceptual Coverage in Text-to-Image Models
2023
Michael Saxon
William Yang Wang
1
+
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
2023
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
Wanrong Zhu
Kalyani Marathe
Yonatan Bitton
Samir Yitzhak Gadre
Shiori Sagawa
1
+
PDF
Chat
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
2023
Tsu-Jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
1
+
PDF
Chat
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
2023
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
1
+
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
2023
Wanrong Zhu
Yan An
Yujie Lu
Wenda Xu
Xin Wang
Miguel P. Eckstein
William Yang Wang
1
+
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
2023
Wenliang Dai
Zihan Liu
Ziwei Ji
Dan Su
Pascale Fung
1