Abhishek Panigrahi

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? 2025 Simon Park
Abhishek Panigrahi
Yun Chen
Dingli Yu
Anirudh Goyal
Sanjeev Arora
+ PDF Chat Progressive distillation induces an implicit curriculum 2024 Abhishek Panigrahi
Bingbin Liu
Sadhika Malladi
Andrej Risteski
S.P. Goel
+ PDF Chat Representing Rule-based Chatbots with Transformers 2024 Dan Friedman
Abhishek Panigrahi
Danqi Chen
+ PDF Chat Efficient Stagewise Pretraining via Progressive Subnetworks 2024 Abhishek Panigrahi
Nikunj Saunshi
Kaifeng Lyu
Sobhan Miryoosefi
Sashank J. Reddi
Satyen Kale
Sanjiv Kumar
+ Task-Specific Skill Localization in Fine-tuned Language Models 2023 Abhishek Panigrahi
Nikunj Saunshi
Haoyu Zhao
Sanjeev Arora
+ Do Transformers Parse while Predicting the Masked Word? 2023 Haoyu Zhao
Abhishek Panigrahi
Rong Ge
Sanjeev Arora
+ Trainable Transformer in Transformer 2023 Abhishek Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
+ PDF Chat Do Transformers Parse while Predicting the Masked Word? 2023 Haoyu Zhao
Abhishek Panigrahi
Rong Ge
Sanjeev Arora
+ Understanding Gradient Descent on Edge of Stability in Deep Learning 2022 Sanjeev Arora
Zhiyuan Li
Abhishek Panigrahi
+ On the SDEs and Scaling Rules for Adaptive Gradient Algorithms 2022 Sadhika Malladi
Kaifeng Lyu
Abhishek Panigrahi
Sanjeev Arora
+ Learning and Generalization in RNNs 2021 Abhishek Panigrahi
Navin Goyal
+ DeepTagRec: A Content-cum-User based Tag Recommendation Framework for Stack Overflow 2019 Suman Kalyan Maity
Abhishek Panigrahi
Sayan Ghosh
Arundhati Banerjee
Pawan Goyal
Animesh Mukherjee
+ Effect of Activation Functions on the Training of Overparametrized Neural Nets 2019 Abhishek Panigrahi
Abhishek Shetty
Navin Goyal
+ Analysis on Gradient Propagation in Batch Normalized Residual Networks 2018 Abhishek Panigrahi
Yueru Chen
C.ā€C. Jay Kuo
+ Analyzing Social Book Reading Behavior on Goodreads and how it predicts Amazon Best Sellers 2018 Suman Kalyan Maity
Abhishek Panigrahi
Animesh Mukherjee
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ Scikit-learn: Machine Learning in Python 2012 FabiƔn Pedregosa
Gaƫl Varoquaux
Alexandre Gramfort
Vincent Michel
Bertrand Thirion
Olivier Grisel
Mathieu Blondel
Peter Prettenhofer
Ron J. Weiss
Vincent Dubourg
1
+ PDF Chat What do Neural Machine Translation Models Learn about Morphology? 2017 Yonatan Belinkov
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
James Glass
1
+ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018 Jacob Devlin
Mingā€Wei Chang
Kenton Lee
Kristina Toutanova
1
+ Decoupled Weight Decay Regularization 2017 Ilya Loshchilov
Frank Hutter
1
+ Visualizing and Measuring the Geometry of BERT 2019 Andy Coenen
Emily Reif
Ann Yuan
Been Kim
Adam Pearce
Fernanda ViƩgas
Martin Wattenberg
1
+ RoBERTa: A Robustly Optimized BERT Pretraining Approach 2019 Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
Mike Lewis
Luke Zettlemoyer
Veselin Stoyanov
1
+ Designing and Interpreting Probes with Control Tasks 2019 John Hewitt
Percy Liang
1
+ PDF Chat Probing Natural Language Inference Models through Semantic Fragments 2020 Kyle Richardson
Hai Hu
Lawrence S. Moss
Ashish Sabharwal
1
+ Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction 2020 Taeuk Kim
Jihun Choi
Daniel Edmiston
Sang-goo Lee
1
+ Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT 2020 Zhiyong Wu
Yun Chen
Ben Kao
Qun Liu
1
+ On the Ability and Limitations of Transformers to Recognize Formal Languages 2020 Satwik Bhattamishra
Kabir Ahuja
Navin Goyal
1
+ Probing Pretrained Language Models for Lexical Semantics 2020 Ivan Vulić
Edoardo Maria Ponti
Robert Litschko
Goran GlavaĀšÅ”
Anna Korhonen
1
+ On the Computational Power of Transformers and Its Implications in Sequence Modeling 2020 Satwik Bhattamishra
Arkil Patel
Navin Goyal
1
+ PDF Chat A Primer in BERTology: What We Know About How BERT Works 2020 Anna Rogers
Olga Kovaleva
Anna Rumshisky
1
+ PDF Chat Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing 2021 Rowan Hall Maudslay
Ryan Cotterell
1
+ Syntax-Enhanced Pre-trained Model 2021 Zenan Xu
Daya Guo
Duyu Tang
Qinliang Su
Linjun Shou
Ming Gong
Wanjun Zhong
Xiaojun Quan
Daxin Jiang
Nan Duan
1
+ Self-Attention Networks Can Process Bounded Hierarchical Languages 2021 Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
1
+ Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers 2021 Colin Wei
Yining Chen
Tengyu Ma
1
+ Inductive Biases and Variable Creation in Self-Attention Mechanisms 2021 Benjamin Edelman
Surbhi Goel
Sham M. Kakade
Cyril Zhang
1
+ PDF Chat How to Train BERT with an Academic Budget 2021 Peter Izsak
Moshe Berchansky
Omer Levy
1
+ PDF Chat Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees 2021 Jiangang Bai
Yujing Wang
Yiren Chen
Yaming Yang
Jing Bai
Jing Yu
Yunhai Tong
1
+ PDF Chat Saturated Transformers are Constant-Depth Threshold Circuits 2022 William Merrill
Ashish Sabharwal
Noah A. Smith
1
+ In-context Learning and Induction Heads 2022 Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova DasSarma
Tom Henighan
Ben Mann
Amanda Askell
Yuntao Bai
Anna Chen
1
+ Understanding intermediate layers using linear classifier probes 2016 Guillaume Alain
Yoshua Bengio
1
+ Progress measures for grokking via mechanistic interpretability 2023 Neel Nanda
Lawrence Chan
Tom Lieberum
J. Lacey Smith
Jacob Steinhardt
1
+ Transformers Learn Shortcuts to Automata 2022 Bingbin Liu
Jordan T. Ash
Surbhi Goel
Akshay Krishnamurthy
Cyril Zhang
1
+ Exploring Length Generalization in Large Language Models 2022 Cem Anil
Yuhuai Wu
Anders Andreassen
Aitor Lewkowycz
Vedant Misra
Vinay Ramasesh
Ambrose Slone
Guy Gur-Ari
Ethan Dyer
Behnam Neyshabur
1
+ Probing for Constituency Structure in Neural Language Models 2022 David Arps
Younes Samih
Laura Kallmeyer
Hassan Sajjad
1
+ Should You Mask 15% in Masked Language Modeling? 2023 Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
1