Nitish Shirish Keskar

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat GPT-4o System Card 2024 OpenAI
NULL AUTHOR_ID
A. M. Hurst
Adam Lerer
Adam P. Goucher
Adam Perelman
Aditya Ramesh
Aidan Clark
AJ Ostrow
Akila Welihinda
+ PDF Chat Modeling Multi-hop Question Answering as Single Sequence Prediction 2022 Semih Yavuz
Kazuma Hashimoto
Yingbo Zhou
Nitish Shirish Keskar
Caiming Xiong
+ Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models 2022 Aarohi Srivastava
Abhinav Rastogi
Abhishek S. Rao
Abu Awal Shoeb
Abubakar Abid
Adam Fisch
Adam R. Brown
Adam Santoro
Aditya Gupta
Adrià Garriga-Alonso
+ Generating Negative Samples for Sequential Recommendation 2022 Yongjun Chen
Jia Li
Zhiwei Liu
Nitish Shirish Keskar
Huan Wang
Julian McAuley
Caiming Xiong
+ Modeling Multi-hop Question Answering as Single Sequence Prediction 2022 Semih Yavuz
Kazuma Hashimoto
Yingbo Zhou
Nitish Shirish Keskar
Caiming Xiong
+ GeDi: Generative Discriminator Guided Sequence Generation 2021 Ben Krause
Akhilesh Gotmare
Bryan McCann
Nitish Shirish Keskar
Shafiq Joty
Richard Socher
Nazneen Fatema Rajani
+ GeDi: Generative Discriminator Guided Sequence Generation 2021 Ben Krause
Akhilesh Gotmare
Bryan McCann
Nitish Shirish Keskar
Shafiq Joty
Richard Socher
Nazneen Fatema Rajani
+ Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality 2021 Gustavo Aguilar
Bryan McCann
Tong Niu
Nazneen Fatema Rajani
Nitish Shirish Keskar
Thamar Solorio
+ PDF Chat Unsupervised Paraphrasing with Pretrained Language Models 2021 Tong Niu
Semih Yavuz
Yingbo Zhou
Nitish Shirish Keskar
Huan Wang
Caiming Xiong
+ Combining Data-driven Supervision with Human-in-the-loop Feedback for Entity Resolution 2021 Wenpeng Yin
Shelby Heinecke
Jia Li
Nitish Shirish Keskar
Michael S. Jones
Shouzhong Shi
Stanislav Georgiev
Kurt Milich
J. A. Esposito
Caiming Xiong
+ Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality. 2020 Gustavo Aguilar
Bryan McCann
Tong Niu
Nazneen Fatema Rajani
Nitish Shirish Keskar
Thamar Solorio
+ Unsupervised Paraphrase Generation via Dynamic Blocking. 2020 Tong Niu
Semih Yavuz
Yingbo Zhou
Huan Wang
Nitish Shirish Keskar
Caiming Xiong
+ PDF Chat Unsupervised Paraphrasing with Pretrained Language Models 2020 Tong Niu
Semih Yavuz
Yingbo Zhou
Nitish Shirish Keskar
Huan Wang
Caiming Xiong
+ PDF Chat GeDi: Generative Discriminator Guided Sequence Generation 2020 Ben Krause
Akhilesh Gotmare
Bryan McCann
Nitish Shirish Keskar
Shafiq Joty
Richard Socher
Nazneen Fatema Rajani
+ Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm 2020 Sourya Basu
Govardana Sachitanandam Ramachandran
Nitish Shirish Keskar
Lav R. Varshney
+ ProGen: Language Modeling for Protein Generation 2020 Ali Madani
Bryan McCann
Nikhil Naik
Nitish Shirish Keskar
Namrata Anand
Raphael R. Eguchi
Po‐Ssu Huang
Richard Socher
+ ProGen: Language Modeling for Protein Generation 2020 Ali Madani
Bryan McCann
Nikhil Naik
Nitish Shirish Keskar
Namrata Anand
Raphael R. Eguchi
Po‐Ssu Huang
Richard Socher
+ Limits of Detecting Text Generated by Large-Scale Language Models. 2020 Lav R. Varshney
Nitish Shirish Keskar
Richard Socher
+ PDF Chat Limits of Detecting Text Generated by Large-Scale Language Models 2020 Lav R. Varshney
Nitish Shirish Keskar
Richard Socher
+ Improving out-of-distribution generalization via multi-task self-supervised pretraining 2020 Isabela Albuquerque
Nikhil Naik
Junnan Li
Nitish Shirish Keskar
Richard Socher
+ Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity 2020 Sourya Basu
Govardana Sachitanandam Ramachandran
Nitish Shirish Keskar
Lav R. Varshney
+ Unsupervised Paraphrasing with Pretrained Language Models 2020 Tong Niu
Semih Yavuz
Yingbo Zhou
Nitish Shirish Keskar
Huan Wang
Caiming Xiong
+ GeDi: Generative Discriminator Guided Sequence Generation 2020 Ben Krause
Akhilesh Gotmare
Bryan McCann
Nitish Shirish Keskar
Shafiq Joty
Richard Socher
Nazneen Fatema Rajani
+ ProGen: Language Modeling for Protein Generation 2020 Ali Madani
Bryan McCann
Nikhil Naik
Nitish Shirish Keskar
Namrata Anand
Raphael R. Eguchi
Po‐Ssu Huang
Richard Socher
+ Limits of Detecting Text Generated by Large-Scale Language Models 2020 Lav R. Varshney
Nitish Shirish Keskar
Richard Socher
+ Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality 2020 Gustavo Aguilar
Bryan McCann
Tong Niu
Nazneen Fatema Rajani
Nitish Shirish Keskar
Thamar Solorio
+ Global Capacity Measures for Deep ReLU Networks via Path Sampling 2019 Ryan Theisen
Jason M. Klusowski
Huan Wang
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ Neural Text Summarization: A Critical Evaluation 2019 Wojciech Kryściński
Nitish Shirish Keskar
Bryan McCann
Caiming Xiong
Richard Socher
+ Unifying Question Answering and Text Classification via Span Extraction. 2019 Nitish Shirish Keskar
Bryan McCann
Caiming Xiong
Richard Socher
+ Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering 2019 Victor W. Zhong
Caiming Xiong
Nitish Shirish Keskar
Richard Socher
+ XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering 2019 Jasdeep Singh
Bryan McCann
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ Neural Text Summarization: A Critical Evaluation 2019 Wojciech Kryściński
Nitish Shirish Keskar
Bryan McCann
Caiming Xiong
Richard Socher
+ Pretrained AI Models: Performativity, Mobility, and Change 2019 Lav R. Varshney
Nitish Shirish Keskar
Richard Socher
+ CTRL: A Conditional Transformer Language Model for Controllable Generation 2019 Nitish Shirish Keskar
Bryan McCann
Lav R. Varshney
Caiming Xiong
Richard Socher
+ Unifying Question Answering, Text Classification, and Regression via Span Extraction 2019 Nitish Shirish Keskar
Bryan McCann
Caiming Xiong
Richard Socher
+ Global Capacity Measures for Deep ReLU Networks via Path Sampling 2019 Ryan Theisen
Jason M. Klusowski
Huan Wang
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ Neural Text Summarization: A Critical Evaluation 2019 Wojciech Kryściński
Nitish Shirish Keskar
Bryan McCann
Caiming Xiong
Richard Socher
+ Balancing Communication and Computation in Distributed Optimization 2018 Albert S. Berahas
Raghu Bollapragada
Nitish Shirish Keskar
Ermin Wei
+ A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation 2018 Akhilesh Gotmare
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ PDF Chat A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation 2018 Akhilesh Gotmare
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation. 2018 Akhilesh Gotmare
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ The Natural Language Decathlon: Multitask Learning as Question Answering 2018 Bryan McCann
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ Using Mode Connectivity for Loss Landscape Analysis. 2018 Akhilesh Gotmare
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ An Analysis of Neural Language Modeling at Multiple Scales 2018 Stephen Merity
Nitish Shirish Keskar
Richard Socher
+ Identifying Generalization Properties in Neural Networks 2018 Huan Wang
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ The Natural Language Decathlon: Multitask Learning as Question Answering 2018 Bryan McCann
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ Using Mode Connectivity for Loss Landscape Analysis 2018 Akhilesh Gotmare
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation 2018 Akhilesh Gotmare
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
+ PDF Chat A limited-memory quasi-Newton algorithm for bound-constrained non-smooth optimization 2017 Nitish Shirish Keskar
Andreas Wächter
+ Regularizing and Optimizing LSTM Language Models 2017 Stephen Merity
Nitish Shirish Keskar
Richard Socher
+ Regularizing and Optimizing LSTM Language Models 2017 Stephen Merity
Nitish Shirish Keskar
Richard Socher
+ Balancing Communication and Computation in Distributed Optimization 2017 Albert S. Berahas
Raghu Bollapragada
Nitish Shirish Keskar
Ermin Wei
+ Weighted Transformer Network for Machine Translation 2017 Karim Ahmed
Nitish Shirish Keskar
Richard Socher
+ Improving Generalization Performance by Switching from Adam to SGD 2017 Nitish Shirish Keskar
Richard Socher
+ Regularizing and Optimizing LSTM Language Models 2017 Stephen Merity
Nitish Shirish Keskar
Richard Socher
+ A Limited-Memory Quasi-Newton Algorithm for Bound-Constrained Nonsmooth Optimization 2016 Nitish Shirish Keskar
Andreas Wäechter
+ On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima 2016 Nitish Shirish Keskar
Dheevatsa Mudigere
Jorge Nocedal
Mikhail Smelyanskiy
Ping Tang
+ A second-order method for convex<sub>1</sub>-regularized optimization with active-set prediction 2016 Nitish Shirish Keskar
Jorge Nocedal
Figen Öztoprak
Andreas Wächter
+ PDF Chat adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs 2016 Nitish Shirish Keskar
Albert S. Berahas
+ On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima 2016 Nitish Shirish Keskar
Dheevatsa Mudigere
Jorge Nocedal
Mikhail Smelyanskiy
Ping Tang
+ A Limited-Memory Quasi-Newton Algorithm for Bound-Constrained Nonsmooth Optimization 2016 Nitish Shirish Keskar
Andreas Wäechter
+ A Second-Order Method for Convex $\ell_1$-Regularized Optimization with Active Set Prediction 2015 Nitish Shirish Keskar
Jorge Nocedal
Figen Öztoprak
Andreas Wäechter
+ A Second-Order Method for Convex $\ell_1$-Regularized Optimization with Active Set Prediction 2015 Nitish Shirish Keskar
Jorge Nocedal
Figen Öztoprak
Andreas Wäechter
+ adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs 2015 Nitish Shirish Keskar
Albert S. Berahas
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ The Curious Case of Neural Text Degeneration 2020 Ari Holtzman
Jan Buys
Leo Du
Maxwell Forbes
Yejin Choi
8
+ Attention is All you Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
8
+ CTRL: A Conditional Transformer Language Model for Controllable Generation 2019 Nitish Shirish Keskar
Bryan McCann
Lav R. Varshney
Caiming Xiong
Richard Socher
8
+ Learned in translation: contextualized word vectors 2017 Bryan McCann
James Bradbury
Caiming Xiong
Richard Socher
7
+ PDF Chat Get To The Point: Summarization with Pointer-Generator Networks 2017 Abigail See
Peter J. Liu
Christopher D. Manning
7
+ PDF Chat Deep Residual Learning for Image Recognition 2016 Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
7
+ Neural Machine Translation by Jointly Learning to Align and Translate 2015 Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
5
+ Deep contextualized word representations 2018 Matthew E. Peters
Mark E Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
5
+ Deep Contextualized Word Representations 2018 Matthew E. Peters
Mark E Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
5
+ PDF Chat Universal Language Model Fine-tuning for Text Classification 2018 Jeremy Howard
Sebastian Ruder
4
+ Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks 2020 Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
4
+ A Deep Reinforced Model for Abstractive Summarization 2017 Romain Paulus
Caiming Xiong
Richard Socher
4
+ The Natural Language Decathlon: Multitask Learning as Question Answering 2018 Bryan McCann
Nitish Shirish Keskar
Caiming Xiong
Richard Socher
4
+ Distilling the Knowledge in a Neural Network 2015 Geoffrey E. Hinton
Oriol Vinyals
Jay B. Dean
4
+ PDF Chat Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books 2015 Yukun Zhu
Ryan Kiros
Rich Zemel
Ruslan Salakhutdinov
Raquel Urtasun
Antonio Torralba
Sanja Fidler
4
+ TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension 2017 Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
4
+ Introductory Lectures on Convex Optimization: A Basic Course 2014 Ю Е Нестеров
4
+ Attention Is All You Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
4
+ Efficient Estimation of Word Representations in Vector Space 2013 Tomáš Mikolov
Kai Chen
Greg S. Corrado
Jay B. Dean
4
+ The Curious Case of Neural Text Degeneration 2019 Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
4
+ SQuAD: 100,000+ Questions for Machine Comprehension of Text 2016 Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
4
+ Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 2016 Yonghui Wu
Mike Schuster
Zhifeng Chen
Quoc V. Le
Mohammad Norouzi
Wolfgang Macherey
Maxim Krikun
Yuan Cao
Qin Gao
Klaus Macherey
4
+ Teaching Machines to Read and Comprehend 2015 Karl Moritz Hermann
Tomáš Kočiský
Edward Grefenstette
Lasse Espeholt
Will Kay
Mustafa Suleyman
Phil Blunsom
4
+ A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference 2017 Adina Williams
Nikita Nangia
Samuel R. Bowman
4
+ Using the Output Embedding to Improve Language Models 2017 Ofir Press
Lior Wolf
3
+ Generating Sentences from a Continuous Space 2016 Samuel R. Bowman
Luke Vilnis
Oriol Vinyals
Andrew M. Dai
Rafał Józefowicz
Samy Bengio
3
+ Adversarial Example Generation with Syntactically Controlled Paraphrase Networks 2018 Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
3
+ PDF Chat Learning to Write with Cooperative Discriminators 2018 Ari Holtzman
Jan Buys
Maxwell Forbes
Antoine Bosselut
David Golub
Yejin Choi
3
+ A Deep Generative Framework for Paraphrase Generation 2018 Ankush Gupta
Arvind Agarwal
Prawaan Singh
Piyush Rai
3
+ Natural Language Processing (almost) from Scratch 2011 Ronan Collobert
Jason Weston
Léon Bottou
Michael Karlen
Koray Kavukcuoglu
Pavel P. Kuksa
3
+ Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling 2016 Hakan Inan
Khashayar Khosravi
Richard Socher
3
+ PDF Chat Speech recognition with deep recurrent neural networks 2013 Alex Graves
Abdelrahman Mohamed
Geoffrey E. Hinton
3
+ PDF Chat A Neural Attention Model for Abstractive Sentence Summarization 2015 Alexander M. Rush
Sumit Chopra
Jason Weston
3
+ A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task 2016 Danqi Chen
Jason Bolton
Christopher D. Manning
3
+ Very Deep Convolutional Networks for Large-Scale Image Recognition 2014 Karen Simonyan
Andrew Zisserman
3
+ Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies 2018 Max Grusky
Mor Naaman
Yoav Artzi
3
+ PDF Chat Unifying Human and Statistical Evaluation for Natural Language Generation 2019 Tatsunori Hashimoto
Hugh Zhang
Percy Liang
3
+ Defending Against Neural Fake News 2019 Rowan Zellers
Ari Holtzman
Hannah Rashkin
Yonatan Bisk
Ali Farhadi
Franziska Roesner
Yejin Choi
3
+ PDF Chat A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations 2019 Mingda Chen
Qingming Tang
Sam Wiseman
Kevin Gimpel
3
+ Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 2015 Sergey Ioffe
Christian Szegedy
3
+ Fine-tuned Language Models for Text Classification. 2018 Jeremy Howard
Sebastian Ruder
3
+ QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension 2018 Adams Wei Yu
David Dohan
Minh-Thang Luong
Rui Zhao
Kai Chen
Mohammad Norouzi
Quoc V. Le
3
+ Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour 2017 Priya Goyal
Piotr Dollár
Ross Girshick
Pieter Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3
+ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018 Jacob Devlin
Ming‐Wei Chang
Kenton Lee
Kristina Toutanova
3
+ Regularizing and Optimizing LSTM Language Models 2017 Stephen Merity
Nitish Shirish Keskar
Richard Socher
3
+ Generating Long Sequences with Sparse Transformers. 2019 Rewon Child
Scott Gray
Alec Radford
Ilya Sutskever
3
+ PDF Chat Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation 2017 Melvin Johnson
Mike Schuster
Quoc V. Le
Maxim Krikun
Yonghui Wu
Zhifeng Chen
Nikhil Thorat
Fernanda Viégas
Martin Wattenberg
Greg S. Corrado
3
+ Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling 2016 Hakan Inan
Khashayar Khosravi
Richard Socher
3
+ Neural Machine Translation by Jointly Learning to Align and Translate 2014 Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
3
+ Learning to Generate Reviews and Discovering Sentiment 2017 Alec Radford
Rafał Józefowicz
Ilya Sutskever
3