Eran Malach

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat Loss-to-Loss Prediction: Scaling Laws for All Datasets 2024 David Brandfonbrener
Nikhil Anand
Nikhil Vyas
Eran Malach
Sham M. Kakade
+ PDF Chat Mixture of Parrots: Experts improve memorization more than reasoning 2024 Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham M. Kakade
Eran Malach
+ PDF Chat LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 2024 Akshara Prabhakar
Yuanzhi Li
Karthik Narasimhan
Sham M. Kakade
Eran Malach
Samy Jelassi
+ PDF Chat On the Power of Decision Trees in Auto-Regressive Language Modeling 2024 Ye-Hua Gan
Tomer Galanti
Tomaso Poggio
Eran Malach
+ PDF Chat Universal Length Generalization with Turing Programs 2024 Kaiying Hou
David Brandfonbrener
Sham M. Kakade
Samy Jelassi
Eran Malach
+ PDF Chat A New Perspective on Shampoo's Preconditioner 2024 Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham M. Kakade
Lucas Janson
+ PDF Chat Transcendence: Generative Models Can Outperform The Experts That Train Them 2024 Wei Zhang
Vincent Zhu
Naomi Saphra
Anat Kleiman
Benjamin Edelman
Milind Tambe
Sham M. Kakade
Eran Malach
+ PDF Chat The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains 2024 Benjamin Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
+ PDF Chat Repeat After Me: Transformers are Better than State Space Models at Copying 2024 Samy Jelassi
David Brandfonbrener
Sham M. Kakade
Eran Malach
+ Less is More: Selective Layer Finetuning with SubTuning 2023 Gal Kaplun
Andrey Gurevich
Tal Swisa
Mazor David
Shai Shalev‐Shwartz
Eran Malach
+ Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck 2023 Benjamin Edelman
Surbhi Goel
Sham M. Kakade
Eran Malach
Cyril Zhang
+ Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD 2023 E. Livne
Gal Kaplun
Eran Malach
Shai Shalev-Schwatz
+ Auto-Regressive Next-Token Predictors are Universal Learners 2023 Eran Malach
+ Knowledge Distillation: Bad Models Can Be Good Role Models 2022 Gal Kaplun
Eran Malach
Preetum Nakkiran
Shai Shalev‐Shwartz
+ Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit 2022 Boaz Barak
Benjamin Edelman
Surbhi Goel
Sham M. Kakade
Eran Malach
Cyril Zhang
+ On the Power of Differentiable Learning versus PAC and SQ Learning 2021 Emmanuel Abbé
Pritish Kamath
Eran Malach
Colin Sandon
Nathan Srebro
+ The Connection Between Approximation, Depth Separation and Learnability in Neural Networks 2021 Eran Malach
Gilad Yehudai
Shai Shalev‐Shwartz
Ohad Shamir
+ Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels 2021 Eran Malach
Pritish Kamath
Emmanuel Abbé
Nathan Srebro
+ The Connection Between Approximation, Depth Separation and Learnability in Neural Networks 2021 Eran Malach
Gilad Yehudai
Shai Shalev‐Shwartz
Ohad Shamir
+ On the Power of Differentiable Learning versus PAC and SQ Learning 2021 Emmanuel Abbé
Pritish Kamath
Eran Malach
Colin Sandon
Nathan Srebro
+ Learning Parities with Neural Networks 2020 Amit Daniely
Eran Malach
+ Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020 Eran Malach
Gilad Yehudai
Shai Shalev‐Shwartz
Ohad Shamir
+ When Hardness of Approximation Meets Hardness of Learning 2020 Eran Malach
Shai Shalev‐Shwartz
+ Computational Separation Between Convolutional and Fully-Connected Networks 2020 Eran Malach
Shai Shalev‐Shwartz
+ Learning Parities with Neural Networks 2020 Amit Daniely
Eran Malach
+ Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020 Eran Malach
Gilad Yehudai
Shai Shalev‐Shwartz
Ohad Shamir
+ Is Deeper Better only when Shallow is Good? 2019 Eran Malach
Shai Shalev‐Shwartz
+ Decoupling Gating from Linearity 2019 Jonathan Fiat
Eran Malach
Shai Shalev‐Shwartz
+ ID3 Learns Juntas for Smoothed Product Distributions 2019 Alon Brutzkus
Amit Daniely
Eran Malach
+ On the Optimality of Trees Generated by ID3 2019 Alon Brutzkus
Amit Daniely
Eran Malach
+ Learning Boolean Circuits with Neural Networks 2019 Eran Malach
Shai Shalev‐Shwartz
+ A Provably Correct Algorithm for Deep Learning that Actually Works 2018 Eran Malach
Shai Shalev‐Shwartz
+ SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data 2017 Alon Brutzkus
Amir Globerson
Eran Malach
Shai Shalev‐Shwartz
+ Decoupling "when to update" from "how to update" 2017 Eran Malach
Shai Shalev‐Shwartz
+ SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data 2017 Alon Brutzkus
Amir Globerson
Eran Malach
Shai Shalev‐Shwartz
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ Gradient Descent Provably Optimizes Over-parameterized Neural Networks 2018 Simon S. Du
Xiyu Zhai
BarnabĂĄs PĂłczos
Aarti Singh
7
+ Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks 2019 Sanjeev Arora
Simon S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
6
+ SGD Learns the Conjugate Kernel Class of the Network 2017 Amit Daniely
5
+ A Convergence Theory for Deep Learning via Over-Parameterization 2018 Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
5
+ Learning Parities with Neural Networks 2020 Amit Daniely
Eran Malach
5
+ On the Power and Limitations of Random Features for Understanding Neural Networks 2019 Gilad Yehudai
Ohad Shamir
5
+ Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers 2018 Zeyuan Allen-Zhu
Yuanzhi Li
Yingyu Liang
4
+ Benefits of depth in neural networks 2016 Matus Telgarsky
4
+ Towards moderate overparameterization: global convergence guarantees for training shallow neural networks 2019 Samet Oymak
Mahdi Soltanolkotabi
4
+ Distribution-specific hardness of learning neural networks 2018 Ohad Shamir
4
+ PDF Chat A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics 2020 E Weinan
Chao Ma
Lei Wu
4
+ Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? 2018 Samet Oymak
Mahdi Soltanolkotabi
4
+ Diverse Neural Network Learns True Target Functions 2016 Bo Xie
Yingyu Liang
Le Song
4
+ PDF Chat Wide neural networks of any depth evolve as linear models under gradient descent <sup>*</sup> 2020 Jaehoon Lee
Lechao Xiao
Samuel S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Sohl‐Dickstein
Jeffrey Pennington
4
+ Backward Feature Correction: How Deep Learning Performs Deep Learning 2020 Zeyuan Allen-Zhu
Yuanzhi Li
3
+ On the number of response regions of deep feed forward networks with piece-wise linear activations 2013 Razvan Pascanu
Guido MontĂșfar
Yoshua Bengio
3
+ Representation Benefits of Deep Feedforward Networks 2015 Matus Telgarsky
3
+ Depth Separation for Neural Networks 2017 Amit Daniely
3
+ Failures of Gradient-Based Deep Learning 2017 Shai Shalev‐Shwartz
Ohad Shamir
Shaked Shammah
3
+ Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs 2017 Alon Brutzkus
Amir Globerson
3
+ Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks 2018 Mahdi Soltanolkotabi
Adel Javanmard
Jason D. Lee
3
+ On the Expressive Power of Deep Learning: A Tensor Analysis 2015 Nadav Cohen
Or Sharir
Amnon Shashua
3
+ An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis 2017 Yuandong Tian
2
+ A complete characterization of statistical query learning with applications to evolvability 2012 Vitaly Feldman
2
+ Bounding and Counting Linear Regions of Deep Neural Networks 2017 Thiago Serra
Christian Tjandraatmadja
Srikumar Ramalingam
2
+ PDF Chat Linearized two-layers neural networks in high dimension 2021 Behrooz Ghorbani
Mei Song
Theodor Misiakiewicz
Andrea Montanari
2
+ Decision trees are PAC-learnable from most product distributions: a smoothed analysis 2008 Adam Tauman Kalai
Shang‐Hua Teng
2
+ Deep Learning and Hierarchal Generative Models. 2016 Elchanan Mossel
2
+ A Provably Correct Algorithm for Deep Learning that Actually Works 2018 Eran Malach
Shai Shalev‐Shwartz
2
+ On the Number of Linear Regions of Deep Neural Networks 2014 Guido MontĂșfar
Razvan Pascanu
Kyunghyun Cho
Yoshua Bengio
2
+ Poly-time universality and limitations of deep learning 2020 Emmanuel Abbé
Colin Sandon
2
+ Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity 2020 Pritish Kamath
Omar Montasser
Nathan Srebro
2
+ PDF Chat Deep vs. shallow networks: An approximation theory perspective 2016 H. N. Mhaskar
Tomaso Poggio
2
+ In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning 2014 Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
2
+ When is a Convolutional Filter Easy To Learn? 2017 Simon S. Du
Jason D. Lee
Yuandong Tian
2
+ Beyond the low-degree algorithm: mixtures of subcubes and their applications 2019 Sitan Chen
Ankur Moitra
2
+ ID3 Learns Juntas for Smoothed Product Distributions 2019 Alon Brutzkus
Amit Daniely
Eran Malach
2
+ On the Expressive Power of Deep Neural Networks 2016 Maithra Raghu
Ben Poole
Jon Kleinberg
Surya Ganguli
Jascha Sohl‐Dickstein
2
+ Recovery Guarantees for One-hidden-layer Neural Networks 2017 Kai Zhong
Zhao Song
Prateek Jain
Peter L. Bartlett
Inderjit S. Dhillon
2
+ What Can ResNet Learn Efficiently, Going Beyond Kernels? 2019 Zeyuan Allen-Zhu
Yuanzhi Li
2
+ SGD Learns the Conjugate Kernel Class of the Network 2017 Amit Daniely
2
+ Provable limitations of deep learning. 2018 Emmanuel Abbé
Colin Sandon
2
+ Neural Tangent Kernel: Convergence and Generalization in Neural Networks 2018 Arthur Paul Jacot
Franck Gabriel
Clément Hongler
2
+ What Can ResNet Learn Efficiently, Going Beyond Kernels? 2019 Zeyuan Allen-Zhu
Yuanzhi Li
2
+ Depth Creates No Bad Local Minima 2017 Haihao Lu
Kenji Kawaguchi
1
+ PDF Chat Analysis of a Random Forests Model 2010 GĂ©rard Biau
1
+ PDF Chat Uniform approximation of functions with random bases 2008 Ali Rahimi
Benjamin Recht
1
+ Gradient Descent Learns Linear Dynamical Systems 2016 Moritz Hardt
Tengyu Ma
Benjamin Recht
1
+ Convexified Convolutional Neural Networks 2016 Yuchen Zhang
Percy Liang
Martin J. Wainwright
1
+ On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition 2018 Marco Mondelli
Andrea Montanari
1