Vaishnavi Himakunthala

Follow

Generating author description...

Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ Weakly Supervised Memory Networks. 2015 Sainbayar Sukhbaatar
Arthur Szlam
Jason Weston
Rob Fergus
1
+ YouTube-8M: A Large-Scale Video Classification Benchmark 2016 Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
George Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
1
+ PDF Chat A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering 2017 Tegan Maharaj
Nicolas Ballas
Anna Rohrbach
Aaron Courville
Christopher Pal
1
+ PDF Chat TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering 2017 Yunseok Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
1
+ PDF Chat Multimodal Dual Attention Memory for Video Story Question Answering 2018 Kyung-Min Kim
Seong-Ho Choi
Jin-Hwa Kim
Byoung‐Tak Zhang
1
+ BERTScore: Evaluating Text Generation with BERT 2019 Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
1
+ PDF Chat Towards Automatic Learning of Procedures From Web Instructional Videos 2018 Luowei Zhou
Chenliang Xu
Jason J. Corso
1
+ DeepStory: Video Story QA by Deep Embedded Memory Networks 2017 Kyung-Min Kim
Min-Oh Heo
Seong-Ho Choi
Byoung‐Tak Zhang
1
+ PDF Chat TVQA: Localized, Compositional Video Question Answering 2018 Jie Lei
Licheng Yu
Mohit Bansal
Tamara L. Berg
1
+ PDF Chat MovieQA: Understanding Stories in Movies through Question-Answering 2016 Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
Raquel Urtasun
Sanja Fidler
1
+ PDF Chat Dense-Captioning Events in Videos 2017 Ranjay Krishna
Kenji Hata
Frederic Ren
Li Fei-Fei
Juan Carlos Niebles
1
+ PDF Chat ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering 2019 Yu Zhou
Dejing Xu
Jun Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
1
+ PDF Chat MarioQA: Answering Questions by Watching Gameplay Videos 2017 Jonghwan Mun
Paul Hongsuck Seo
Ilchae Jung
Bohyung Han
1
+ Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks 2019 Nils Reimers
Iryna Gurevych
1
+ PDF Chat HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips 2019 Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Ĺ ivic
1
+ PDF Chat VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research 2019 Xin Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan‐Fang Wang
William Yang Wang
1
+ CLEVRER: CoLlision Events for Video REpresentation and Reasoning 2019 Kexin Yi
Chuang Gan
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
Joshua B. Tenenbaum
1
+ PDF Chat Leveraging Video Descriptions to Learn Video Question Answering 2017 Kuo-Hao Zeng
Tseng-Hung Chen
Ching-Yao Chuang
Yuan-Hong Liao
Juan Carlos Niebles
Min Sun
1
+ PDF Chat TVQA+: Spatio-Temporal Grounding for Video Question Answering 2020 Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
1
+ What is More Likely to Happen Next? Video-and-Language Future Event Prediction 2020 Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
1
+ ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 2021 Wonjae Kim
Bokyung Son
Ildoo Kim
1
+ Learning Transferable Visual Models From Natural Language Supervision 2021 Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
1
+ PDF Chat Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval 2021 Max Bain
Arsha Nagrani
GĂźl Varol
Andrew Zisserman
1
+ VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling 2021 Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
1
+ Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 2022 Jason Lee
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Ed H.
Quoc V. Le
Denny Zhou
1
+ PaLM: Scaling Language Modeling with Pathways 2022 Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
Adam Roberts
Paul Barham
Hyung Won Chung
Charles Sutton
Sebastian Gehrmann
1
+ Flamingo: a Visual Language Model for Few-Shot Learning 2022 Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
Arthur Mensch
Katie Millican
Malcolm Reynolds
1
+ BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation 2022 Junnan Li
Dongxu Li
Caiming Xiong
Steven C. H. Hoi
1
+ Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners 2022 Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
Xudong Lin
Shuohang Wang
Ziyi Yang
Chenguang Zhu
Derek Hoiem
1
+ OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework 2022 Peng Wang
Yang An
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
1
+ MERLOT: Multimodal Neural Script Knowledge Models 2021 Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
Jae Sung Park
Jize Cao
Ali Farhadi
Yejin Choi
1
+ Language Models are Few-Shot Learners 2020 T. B. Brown
Benjamin F. Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
1
+ GRiT: A Generative Region-to-text Transformer for Object Understanding 2022 Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
1
+ PDF Chat Detecting Twenty-Thousand Classes Using Image-Level Supervision 2022 Xingyi Zhou
Rohit Girdhar
Armand Joulin
Philipp Krähenbßhl
Ishan Misra
1
+ PaLM-E: An Embodied Multimodal Language Model 2023 Danny Driess
Fei Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
Brian Ichter
Ayzaan Wahid
Jonathan Tompson
Quan Vuong
Tianhe Yu
1
+ Visual Instruction Tuning 2023 Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
1
+ Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings 2023 Daniel M. Rose
Vaishnavi Himakunthala
Andy Ouyang
Ryan He
Alex Mei
Yujie Lu
Michael Saxon
Chinmay Sonar
Diba Mirza
William Yang Wang
1
+ Otter: A Multi-Modal Model with In-Context Instruction Tuning 2023 Bo Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Jingkang Yang
Ziwei Liu
1
+ Multilingual Conceptual Coverage in Text-to-Image Models 2023 Michael Saxon
William Yang Wang
1
+ OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models 2023 Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
Wanrong Zhu
Kalyani Marathe
Yonatan Bitton
Samir Yitzhak Gadre
Shiori Sagawa
1
+ PDF Chat Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation 2023 Tsu-Jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
1
+ PDF Chat An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling 2023 Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
1
+ Visualize Before You Write: Imagination-Guided Open-Ended Text Generation 2023 Wanrong Zhu
Yan An
Yujie Lu
Wenda Xu
Xin Wang
Miguel P. Eckstein
William Yang Wang
1
+ Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training 2023 Wenliang Dai
Zihan Liu
Ziwei Ji
Dan Su
Pascale Fung
1