Konstantin Mishchenko

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference 2024 Yang Hao
Chen Chen
Wayne Luk
Ka Fai Cedric Yiu
Rui Li
Konstantin Mishchenko
Stylianos I. Venieris
Hongxiang Fan
+ PDF Chat The Road Less Scheduled 2024 Aaron Defazio
Xingyu
Yang
Harsh Mehta
Konstantin Mishchenko
Ahmed Khaled
Ashok Cutkosky
+ PDF Chat Super-Universal Regularized Newton Method 2024 Nikita Doikov
Konstantin Mishchenko
Yurii Nesterov
+ Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization 2023 Grigory Malinovsky
Konstantin Mishchenko
Peter Richtárik
+ Regularized Newton Method with Global \({\boldsymbol{\mathcal{O}(1/{k}^2)}}\) Convergence 2023 Konstantin Mishchenko
+ PDF Chat Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy 2023 Blake Woodworth
Konstantin Mishchenko
Francis Bach
+ Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes 2023 Konstantin Mishchenko
Slavomír Hanzely
Peter Richtárik
+ Learning-Rate-Free Learning by D-Adaptation 2023 Aaron Defazio
Konstantin Mishchenko
+ Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy 2023 Blake Woodworth
Konstantin Mishchenko
Francis Bach
+ DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method 2023 Khaled Ahmed
Konstantin Mishchenko
Chi Jin
+ Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity 2023 Konstantin Mishchenko
Rustem Islamov
Eduard Gorbunov
Samuel Horváth
+ Prodigy: An Expeditiously Adaptive Parameter-Free Learner 2023 Konstantin Mishchenko
Aaron Defazio
+ Adaptive Proximal Gradient Method for Convex Optimization 2023 Yura Malitsky
Konstantin Mishchenko
+ When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement 2023 Aaron Defazio
Ashok Cutkosky
Harsh Mehta
Konstantin Mishchenko
+ PDF Chat Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms 2022 Adil Salim
Laurent Condat
Konstantin Mishchenko
Peter Richtárik
+ ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! 2022 Konstantin Mishchenko
Grigory Malinovsky
Sebastian U. Stich
Peter Richtárik
+ Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization 2022 Grigory Malinovsky
Konstantin Mishchenko
Peter Richtárik
+ Adaptive Learning Rates for Faster Stochastic Gradient Methods 2022 Samuel Horváth
Konstantin Mishchenko
Peter Richtárik
+ Super-Universal Regularized Newton Method 2022 Nikita Doikov
Konstantin Mishchenko
Yurii Nesterov
+ Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays 2022 Konstantin Mishchenko
Francis Bach
Mathieu Even
Blake Woodworth
+ On Seven Fundamental Optimization Challenges in Machine Learning 2021 Konstantin Mishchenko
+ IntSGD: Floatless Compression of Stochastic Gradients 2021 Konstantin Mishchenko
Bokun Wang
Dmitry Kovalev
Peter Richtárik
+ Proximal and Federated Random Reshuffling 2021 Konstantin Mishchenko
Ahmed Khaled
Peter Richtárik
+ Regularized Newton Method with Global $O(1/k^2)$ Convergence 2021 Konstantin Mishchenko
+ On Seven Fundamental Optimization Challenges in Machine Learning 2021 Konstantin Mishchenko
+ IntSGD: Adaptive Floatless Compression of Stochastic Gradients 2021 Konstantin Mishchenko
Bokun Wang
Dmitry Kovalev
Peter Richtárik
+ Proximal and Federated Random Reshuffling 2021 Konstantin Mishchenko
Khaled Ahmed
Peter Richtárik
+ Tighter Theory for Local SGD on Identical and Heterogeneous Data 2020 Ahmed Khaled
Konstantin Mishchenko
Peter Richtárik
+ Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms. 2020 Adil Salim
Laurent Condat
Konstantin Mishchenko
Peter Richtárik
+ PDF Chat A Distributed Flexible Delay-Tolerant Proximal Gradient Algorithm 2020 Konstantin Mishchenko
Franck Iutzeler
Jérôme Malick
+ Random Reshuffling: Simple Analysis with Vast Improvements 2020 Konstantin Mishchenko
Ahmed Khaled
Peter Richtárik
+ Adaptive Gradient Descent without Descent 2019 Yura Malitsky
Konstantin Mishchenko
+ Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent. 2019 Konstantin Mishchenko
+ First Analysis of Local GD on Heterogeneous Data 2019 Ahmed Khaled
Konstantin Mishchenko
Peter Richtárik
+ Better Communication Complexity for Local SGD 2019 Ahmed Khaled
Konstantin Mishchenko
Peter Richtárik
+ Tighter Theory for Local SGD on Identical and Heterogeneous Data. 2019 Ahmed Khaled
Konstantin Mishchenko
Peter Richtárik
+ A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls. 2019 Konstantin Mishchenko
Mallory Montgomery
Federico Vaggi
+ A Stochastic Decoupling Method for Minimizing the Sum of Smooth and Non-Smooth Functions 2019 Konstantin Mishchenko
Peter Richtárik
+ 99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it 2019 Konstantin Mishchenko
Filip Hanzely
Peter Richtárik
+ Distributed Learning with Compressed Gradient Differences 2019 Konstantin Mishchenko
Eduard Gorbunov
Martin Takáč
Peter Richtárik
+ Stochastic Distributed Learning with Gradient Quantization and Variance Reduction 2019 Samuel Horváth
Dmitry Kovalev
Konstantin Mishchenko
Sebastian U. Stich
Peter Richtárik
+ Revisiting Stochastic Extragradient 2019 Konstantin Mishchenko
Dmitry Kovalev
Egor Shulgin
Peter Richtárik
Yura Malitsky
+ MISO is Making a Comeback With Better Proofs and Rates 2019 Xun Qian
Alibek Sailanbayev
Konstantin Mishchenko
Peter Richtárik
+ Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates 2019 Dmitry Kovalev
Konstantin Mishchenko
Peter Richtárik
+ Adaptive Gradient Descent without Descent 2019 Konstantin Mishchenko
Yura Malitsky
+ Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent 2019 Konstantin Mishchenko
+ Tighter Theory for Local SGD on Identical and Heterogeneous Data 2019 Ahmed Khaled
Konstantin Mishchenko
Peter Richtárik
+ First Analysis of Local GD on Heterogeneous Data 2019 Ahmed Khaled
Konstantin Mishchenko
Peter Richtárik
+ A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls 2019 Konstantin Mishchenko
Mallory Montgomery
Federico Vaggi
+ A Stochastic Decoupling Method for Minimizing the Sum of Smooth and Non-Smooth Functions 2019 Konstantin Mishchenko
Peter Richtárik
+ 99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it 2019 Konstantin Mishchenko
Filip Hanzely
Peter Richtárik
+ SEGA: Variance Reduction via Gradient Sketching 2018 Filip Hanzely
Konstantin Mishchenko
Peter Richtárik
+ A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints 2018 Konstantin Mishchenko
Peter Richtárik
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ Proximal-Proximal-Gradient Method 2017 Ernest K. Ryu
Wotao Yin
4
+ Variance Reduced Stochastic Gradient Descent with Neighbors 2015 Thomas Hofmann
Aurélien Lucchi
Simon Lacoste-Julien
Brian McWilliams
4
+ Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian Sketching 2018 Robert M. Gower
Peter Richtárik
Francis Bach
4
+ Federated Learning: Strategies for Improving Communication Efficiency 2016 Jakub Konečný
H. Brendan McMahan
Felix X. Yu
Peter Richtárik
Ananda Theertha Suresh
Dave Bacon
4
+ Cubic regularization of Newton method and its global performance 2006 Yurii Nesterov
B. T. Polyak
4
+ SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives 2014 Aaron Defazio
Francis Bach
Simon Lacoste-Julien
4
+ Adding vs. Averaging in Distributed Primal-Dual Optimization 2015 Chenxin Ma
Virginia Smith
Martin Jaggi
Michael I. Jordan
Peter Richtárik
Martin Takáč
4
+ PDF Chat Optimization with Sparsity-Inducing Penalties 2011 Francis Bach
Rodolphe Jenatton
Julien Mairal
Guillaume Obozinski
3
+ PDF Chat Parallel Gradient Distribution in Unconstrained Optimization 1995 L. O. Mangasarian
3
+ Gradient methods for minimizing composite functions 2012 Yu. Nesterov
3
+ PDF Chat A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications 2016 Heinz H. Bauschke
Jérôme Bolte
Marc Teboulle
3
+ A Stochastic Decoupling Method for Minimizing the Sum of Smooth and Non-Smooth Functions 2019 Konstantin Mishchenko
Peter Richtárik
3
+ A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints 2018 Konstantin Mishchenko
Peter Richtárik
3
+ Lectures on Convex Optimization 2018 Yurii Nesterov
3
+ On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization 2018 Fan Zhou
Guojing Cong
3
+ Introductory Lectures on Convex Optimization: A Basic Course 2014 Ю Е Нестеров
3
+ Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems 2014 Aaron Defazio
Tibério S. Caetano
Justin Domke
3
+ A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets 2012 Nicolas Le Roux
Mark Schmidt
Francis Bach
3
+ PDF Chat Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate 2018 Aryan Mokhtari
Mert Gürbüzbalaban
Alejandro Ribeiro
3
+ Solving variational inequalities with stochastic mirror-prox algorithm 2011 Anatoli Juditsky
Arkadi Nemirovski
Claire Tauvel
3
+ QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding 2016 Dan Alistarh
Demjan Grubic
Jerry Li
Ryota Tomioka
Milan Vojnović
3
+ Total Generalized Variation 2010 Kristian Bredies
Karl Kunisch
Thomas Pock
3
+ Reducing Noise in GAN Training with Variance Reduced Extragradient 2019 Tatjana Chavdarova
Gauthier Gidel
François Fleuret
Simon Lacoste-Julien
3
+ signSGD: Compressed Optimisation for Non-Convex Problems 2018 Jeremy Bernstein
Yu-Xiang Wang
Kamyar Azizzadenesheli
Animashree Anandkumar
3
+ Distributed Learning with Compressed Gradient Differences 2019 Konstantin Mishchenko
Eduard Gorbunov
Martin Takáč
Peter Richtárik
3
+ PDF Chat Distributed optimization with arbitrary local solvers 2017 Chenxin Ma
Jakub Konečný
Martin Jaggi
Virginia Smith
Michael I. Jordan
Peter Richtárik
Martin Takáč
3
+ Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems 2014 Aaron Defazio
Justin Domke
Tibério S. Caetano
3
+ Stochastic Distributed Learning with Gradient Quantization and Variance Reduction 2019 Samuel Horváth
Dmitry Kovalev
Konstantin Mishchenko
Sebastian U. Stich
Peter Richtárik
3
+ Unified Optimal Analysis of the (Stochastic) Gradient Method 2019 Sebastian U. Stich
3
+ Adaptive Federated Learning in Resource Constrained Edge Computing Systems 2018 Shiqiang Wang
Tiffany Tuor
Theodoros Salonidis
Kin K. Leung
Christian Makaya
Ting He
Kevin Chan
2
+ PDF Chat Implementable tensor methods in unconstrained convex optimization 2019 Yurii Nesterov
2
+ Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms 2018 Jianyu Wang
Gauri Joshi
2
+ PDF Chat IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate 2018 Aryan Mokhtari
Mark Eisen
Alejandro Ribeiro
2
+ Distributed learning with compressed gradients 2018 Sarit Khirirat
Hamid Reza Feyzmahdavian
Mikael Johansson
2
+ Sparsified SGD with Memory 2018 Sebastian U. Stich
Jean-Baptiste Cordonnier
Martin Jaggi
2
+ Stochastic Three-Composite Convex Minimization with a Linear Operator 2018 Renbo Zhao
Volkan Cevher
2
+ SVRG meets SAGA: k-SVRG - A Tale of Limited Memory. 2018 Anant Raj
Sebastian U. Stich
2
+ Distributed coordinate descent method for learning with big data 2016 Peter Richtárik
Martin Takáč
2
+ PDF Chat Coordinate descent algorithms 2015 Stephen J. Wright
2
+ First-Order Methods in Optimization 2017 Amir Beck
2
+ Randomized projection methods for convex feasibility problems: conditioning and convergence rates 2018 Ion Necoara
Peter Richtárik
Andrei Pătraşcu
2
+ Local SGD Converges Fast and Communicates Little 2018 Sebastian U. Stich
2
+ A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms 2012 Laurent Condat
2
+ Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure 2016 Alberto Bietti
Julien Mairal
2
+ PDF Chat Relatively Smooth Convex Optimization by First-Order Methods, and Applications 2018 Haihao Lu
Robert M. Freund
Yurii Nesterov
2
+ A method for the solution of certain non-linear problems in least squares 1944 Kenneth Levenberg
2
+ PDF Chat On the ergodic convergence rates of a first-order primal–dual algorithm 2015 Antonin Chambolle
Thomas Pock
2
+ AIDE: Fast and Communication Efficient Distributed Optimization 2016 Sashank J. Reddi
Jakub Konečný
Peter Richtárik
Barnabás Póczos
Alexander J. Smola
2
+ PDF Chat Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm 2015 Deanna Needell
Nathan Srebro
Rachel Ward
2
+ PDF Chat Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results 2009 Coralia Cartis
Nicholas I. M. Gould
Philippe L. Toint
2