Sharper exponential convergence rates for Sinkhorn’s algorithm in continuous settings

Abstract

Abstract We study the convergence rate of Sinkhorn’s algorithm for solving entropy-regularized optimal transport problems when at least one of the probability measures, $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> , admits a density over $$\mathbb {R}^d$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msup> <mml:mrow> <mml:mi>R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> </mml:math> . For a semi-concave cost function bounded by $$c_{\infty }$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:math> and a regularization parameter $$\lambda &gt; 0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>&gt;</mml:mo> <mml:mn>0</mml:mn> </mml:mrow> </mml:math> , we obtain exponential convergence guarantees on the dual sub-optimality gap with contraction rates that are polynomial in $$\lambda /c_{\infty }$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:mrow> </mml:math> . This represents an exponential improvement over the known contraction rate $$1 - \Theta (\exp (-c_{\infty }/\lambda ))$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:mo>exp</mml:mo> <mml:mrow> <mml:mo>(</mml:mo> <mml:mo>-</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> <mml:mo>/</mml:mo> <mml:mi>λ</mml:mi> <mml:mo>)</mml:mo> </mml:mrow> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> achievable via Hilbert’s projective metric. Specifically, we prove a contraction rate value of $$1-\Theta (\lambda ^2/c_\infty ^2)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>λ</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>/</mml:mo> <mml:msubsup> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> <mml:mn>2</mml:mn> </mml:msubsup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> when $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> has a bounded log-density. In some cases, such as when $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> is log-concave and the cost function is $$c(x,y)=-\langle x, y\rangle $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>c</mml:mi> <mml:mo>(</mml:mo> <mml:mi>x</mml:mi> <mml:mo>,</mml:mo> <mml:mi>y</mml:mi> <mml:mo>)</mml:mo> <mml:mo>=</mml:mo> <mml:mo>-</mml:mo> <mml:mo>⟨</mml:mo> <mml:mi>x</mml:mi> <mml:mo>,</mml:mo> <mml:mi>y</mml:mi> <mml:mo>⟩</mml:mo> </mml:mrow> </mml:math> , this rate improves to $$1-\Theta (\lambda /c_\infty )$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> . The latter rate matches the one that we derive for the transport between isotropic Gaussian measures, indicating tightness in the dependency in $$\lambda /c_\infty $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:mrow> </mml:math> . Our results are fully non-asymptotic and explicit in all the parameters of the problem.

Locations

  • Mathematical Programming
  • arXiv (Cornell University)

Ask a Question About This Paper

Summary

This paper presents a significant advancement in understanding the convergence rates of Sinkhorn’s algorithm, a widely used method for solving entropy-regularized optimal transport problems, particularly in continuous settings where at least one of the probability measures possesses a density.

Optimal Transport (OT) is a powerful mathematical framework with applications across machine learning, statistics, and physics. Sinkhorn’s algorithm provides an efficient way to compute approximations to OT solutions by introducing an entropy regularization term. A long-standing challenge in the field has been to precisely characterize the algorithm’s convergence speed. Previous work typically fell into two categories:
1. Exponential convergence rates: These guaranteed rapid convergence but came with a severe dependency on problem parameters (specifically, the ratio of the cost function’s oscillation, \(c_\infty\), to the regularization parameter, \(\lambda\)), where the contraction rate was exponentially worse for large \(c_\infty/\lambda\). This made the theoretical guarantees loose in many practical scenarios.
2. Polynomial convergence rates: These offered better, polynomial dependence on \(c_\infty/\lambda\) but exhibited a slower, polynomial decay in the number of iterations.

The key innovation of this paper is to bridge this gap, achieving a “best-of-both-worlds” scenario: exponential convergence with robust, polynomial dependence on \(c_\infty/\lambda\), under specific geometric assumptions on the problem’s inputs. This represents a substantial improvement over prior results.

The main innovations are:

  1. Sharper Exponential Rates: The paper provides exponential convergence guarantees for the dual sub-optimality gap with contraction rates of \(1 - \Theta(\lambda^2/c_\infty^2)\) and, in some cases, \(1 - \Theta(\lambda/c_\infty)\). These rates are polynomial in \(\lambda/c_\infty\), a significant improvement over the \(1 - \exp(-c_\infty/\lambda)\) type dependence previously known for exponential convergence. The \(1 - \Theta(\lambda/c_\infty)\) rate is particularly sharp, matching the known rate for transport between isotropic Gaussian measures, suggesting its tightness.
  2. Leveraging Geometric Structure: The improved rates are obtained by exploiting additional geometric assumptions on the problem’s components. Specifically, the paper considers cases where the cost function is semi-concave (a common property for many relevant costs like squared Euclidean distance) and one of the marginal probability measures (source or target) has either a bounded log-density or a log-concave density. These assumptions allow for a deeper analysis of the problem’s underlying structure.
  3. Novel Analytical Tools: The enhanced convergence analysis relies on a combination of sophisticated mathematical tools:
    • Bernstein’s Inequality: This concentration inequality is employed to derive a sharper one-step improvement bound for Sinkhorn’s algorithm. Unlike prior analyses that used less refined inequalities like Hoeffding’s, Bernstein’s inequality allows replacing a crude bound dependent on the cost oscillation \(c_\infty\) with a term related to the variance of relevant quantities, leading to tighter estimates.
    • Prékopa-Leindler Inequality: This fundamental inequality from convex geometry and probability theory is crucial for establishing strong-concavity estimates of the semi-dual functional. This strong concavity, in turn, allows linking the variance terms (introduced via Bernstein’s inequality) back to the overall sub-optimality gap, enabling the proof of exponential convergence.

The main prior ingredients needed for this work are:

  • Entropy-Regularized Optimal Transport: The core problem formulation, including its primal and dual forms, is foundational.
  • Sinkhorn’s Algorithm: The iterative scheme itself, involving alternating “c-transforms” (or their entropic counterparts), is the central object of study.
  • Dual Formulation of OT: The paper analyzes the convergence of the dual sub-optimality gap, a standard approach in optimization.
  • Hilbert’s Projective Metric: Earlier analyses of Sinkhorn’s algorithm utilized this metric to establish contraction properties and exponential convergence, providing a baseline against which the new “sharper” rates are compared.
  • Concepts from Convex Analysis: Notions like semi-concavity/convexity, strong convexity, and properties of convex conjugates (c-transforms) are essential building blocks.
  • Probability Theory and Concentration Inequalities: Beyond Bernstein’s and Prékopa-Leindler, a general understanding of measure theory and properties of random variables is implicitly used throughout the derivations.

In essence, this paper significantly pushes the theoretical understanding of Sinkhorn’s algorithm by demonstrating that, under reasonable geometric conditions on the data, one can achieve highly desirable exponential convergence rates that are far more robust and practically relevant than previously understood.

We study the convergence rate of Sinkhorn's algorithm for solving entropy-regularized optimal transport problems when at least one of the probability measures, $\mu$, admits a density over $\mathbb{R}^d$. For a … We study the convergence rate of Sinkhorn's algorithm for solving entropy-regularized optimal transport problems when at least one of the probability measures, $\mu$, admits a density over $\mathbb{R}^d$. For a semi-concave cost function bounded by $c_{\infty}$ and a regularization parameter $\lambda > 0$, we obtain exponential convergence guarantees on the dual sub-optimality gap with contraction rate polynomial in $\lambda/c_{\infty}$. This represents an exponential improvement over the known contraction rate $1 - \Theta(\exp(-c_{\infty}/\lambda))$ achievable via Hilbert's projective metric. Specifically, we prove a contraction rate value of $1-\Theta(\lambda^2/c_\infty^2)$ when $\mu$ has a bounded log-density. In some cases, such as when $\mu$ is log-concave and the cost function is $c(x,y)=-\langle x, y \rangle$, this rate improves to $1-\Theta(\lambda/c_\infty)$. The latter rate matches the one that we derive for the transport between isotropic Gaussian measures, indicating tightness in the dependency in $\lambda/c_\infty$. Our results are fully non-asymptotic and explicit in all the parameters of the problem.
We study Sinkhorn’s algorithm for solving the entropically regularized optimal transport problem. Its iterate [Formula: see text] is shown to satisfy [Formula: see text], where H denotes relative entropy and … We study Sinkhorn’s algorithm for solving the entropically regularized optimal transport problem. Its iterate [Formula: see text] is shown to satisfy [Formula: see text], where H denotes relative entropy and [Formula: see text] denotes the optimal coupling. This holds for a large class of cost functions and marginals, including quadratic cost with sub-Gaussian marginals. We also obtain the rate [Formula: see text] for the dual suboptimality and [Formula: see text] for the marginal entropies. More precisely, we derive nonasymptotic bounds, and in contrast to previous results on linear convergence that are limited to bounded costs, our estimates do not deteriorate exponentially with the regularization parameter. We also obtain a stability result for [Formula: see text] as a function of the marginals quantified in relative entropy. Funding: P. Ghosal was supported by the National Science Foundation [Grant DMS-2153661]. M. Nutz was supported by the National Science Foundation [Grants DMS-1812661, DMS-2106056, and DMS-2407074].
We study Sinkhorn's algorithm for solving the entropically regularized optimal transport problem. Its iterate $\pi_{t}$ is shown to satisfy $H(\pi_{t}|\pi_{*})+H(\pi_{*}|\pi_{t})=O(t^{-1})$ where $H$ denotes relative entropy and $\pi_{*}$ the optimal coupling. … We study Sinkhorn's algorithm for solving the entropically regularized optimal transport problem. Its iterate $\pi_{t}$ is shown to satisfy $H(\pi_{t}|\pi_{*})+H(\pi_{*}|\pi_{t})=O(t^{-1})$ where $H$ denotes relative entropy and $\pi_{*}$ the optimal coupling. This holds for a large class of cost functions and marginals, including quadratic cost with subgaussian marginals. We also obtain the rate $O(t^{-1})$ for the dual suboptimality and $O(t^{-2})$ for the marginal entropies. More precisely, we derive non-asymptotic bounds, and in contrast to previous results on linear convergence that are limited to bounded costs, our estimates do not deteriorate exponentially with the regularization parameter. We also obtain a stability result for $\pi_{*}$ as a function of the marginals, quantified in relative entropy.
The Sinkhorn algorithm is a widely used method for solving the optimal transport problem, and the Greenkhorn algorithm is one of its variants. While there are modified versions of these … The Sinkhorn algorithm is a widely used method for solving the optimal transport problem, and the Greenkhorn algorithm is one of its variants. While there are modified versions of these two algorithms whose computational complexities are $O({n^2\|C\|_\infty^2\log n}/{\varepsilon^2})$ to achieve an $\varepsilon$-accuracy, to the best of our knowledge, the existing complexities for the vanilla versions are $O({n^2\|C\|_\infty^3\log n}/{\varepsilon^3})$. In this paper we fill this gap and show that the complexities of the vanilla Sinkhorn and Greenkhorn algorithms are indeed $O({n^2\|C\|_\infty^2\log n}/{\varepsilon^2})$. The analysis relies on the equicontinuity of the dual variables for the discrete entropic regularized optimal transport problem, which is of independent interest.
We present new complexity results for several algorithms that approximately solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. First, we show … We present new complexity results for several algorithms that approximately solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. First, we show that a greedy variant of the classical Sinkhorn algorithm, known as the \textit{Greenkhorn} algorithm, achieves the complexity bound of $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$, which improves the best known bound $\widetilde{\mathcal{O}}(n^2\varepsilon^{-3})$. Notably, this matches the best known complexity bound of the Sinkhorn algorithm and explains the superior performance of the Greenkhorn algorithm in practice. Furthermore, we generalize an adaptive primal-dual accelerated gradient descent (APDAGD) algorithm with mirror mapping $\phi$ and show that the resulting \textit{adaptive primal-dual accelerated mirror descent} (APDAMD) algorithm achieves the complexity bound of $\widetilde{\mathcal{O}}(n^2\sqrt{\delta}\varepsilon^{-1})$ where $\delta>0$ depends on $\phi$. We point out that an existing complexity bound for the APDAGD algorithm is not valid in general using a simple counterexample and then establish the complexity bound of $\widetilde{\mathcal{O}}(n^{5/2}\varepsilon^{-1})$ by exploiting the connection between the APDAMD and APDAGD algorithms. Moreover, we introduce accelerated Sinkhorn and Greenkhorn algorithms that achieve the complexity bound of $\widetilde{\mathcal{O}}(n^{7/3}\varepsilon^{-1})$, which improves on the complexity bounds $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$ of Sinkhorn and Greenkhorn algorithms in terms of $\varepsilon$. Experimental results on synthetic and real datasets demonstrate the favorable performance of new algorithms in practice.
The aim of this note is to give an elementary proof of linear convergence of the Sinkhorn algorithm for the entropic regularization of multimarginal optimal transport in the setting of … The aim of this note is to give an elementary proof of linear convergence of the Sinkhorn algorithm for the entropic regularization of multimarginal optimal transport in the setting of general probability spaces. The proof simply relies on (i) the fact that Sinkhorn iterates are bounded, (ii) the strong convexity of the exponential on bounded intervals, and (iii) the convergence analysis of the coordinate descent (Gauss--Seidel) method of Beck and Tetruashvili [SIAM J. Optim, 23 (2013), pp. 2037--2060].
We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based … We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based on the celebrated Sinkhorn's algorithm, we prove the complexity bound $\widetilde{O}\left({n^2/\varepsilon^2}\right)$ arithmetic operations. For the second one, which is based on our novel Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD) algorithm, we prove the complexity bound $\widetilde{O}\left(\min\left\{n^{9/4}/\varepsilon, n^{2}/\varepsilon^2 \right\}\right)$ arithmetic operations. Both bounds have better dependence on $\varepsilon$ than the state-of-the-art result given by $\widetilde{O}\left({n^2/\varepsilon^3}\right)$. Our second algorithm not only has better dependence on $\varepsilon$ in the complexity bound, but also is not specific to entropic regularization and can solve the OT problem with different regularizers.
We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based … We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based on the celebrated Sinkhorn's algorithm, we prove the complexity bound $\widetilde{O}\left({n^2/\varepsilon^2}\right)$ arithmetic operations. For the second one, which is based on our novel Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD) algorithm, we prove the complexity bound $\widetilde{O}\left(\min\left\{n^{9/4}/\varepsilon, n^{2}/\varepsilon^2 \right\}\right)$ arithmetic operations. Both bounds have better dependence on $\varepsilon$ than the state-of-the-art result given by $\widetilde{O}\left({n^2/\varepsilon^3}\right)$. Our second algorithm not only has better dependence on $\varepsilon$ in the complexity bound, but also is not specific to entropic regularization and can solve the OT problem with different regularizers.
We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based … We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based on the celebrated Sinkhorn's algorithm, we prove the complexity bound $\widetilde{O}\left({n^2/\varepsilon^2}\right)$ arithmetic operations. For the second one, which is based on our novel Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD) algorithm, we prove the complexity bound $\widetilde{O}\left(\min\left\{n^{9/4}/\varepsilon, n^{2}/\varepsilon^2 \right\}\right)$ arithmetic operations. Both bounds have better dependence on $\varepsilon$ than the state-of-the-art result given by $\widetilde{O}\left({n^2/\varepsilon^3}\right)$. Our second algorithm not only has better dependence on $\varepsilon$ in the complexity bound, but also is not specific to entropic regularization and can solve the OT problem with different regularizers.
In 2013, Cuturi [Cut13] introduced the Sinkhorn algorithm for matrix scaling as a method to compute solutions to regularized optimal transport problems. In this paper, aiming at a better convergence … In 2013, Cuturi [Cut13] introduced the Sinkhorn algorithm for matrix scaling as a method to compute solutions to regularized optimal transport problems. In this paper, aiming at a better convergence rate for a high accuracy solution, we work on understanding the Sinkhorn algorithm under regularization scheduling, and thus modify it with a mechanism that adaptively doubles the regularization parameter $\eta$ periodically. We prove that such modified version of Sinkhorn has an exponential convergence rate as iteration complexity depending on $\log(1/\varepsilon)$ instead of $\varepsilon^{-O(1)}$ from previous analyses [Cut13][ANWR17] in the optimal transport problems with integral supply and demand. Furthermore, with cost and capacity scaling procedures, the general optimal transport problem can be solved with a logarithmic dependence on $1/\varepsilon$ as well.
We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most … We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most $n$ components. We show that the complexity of the Sinkhorn algorithm for finding an $\varepsilon$-approximate solution to the UOT problem is of order $\widetilde{\mathcal{O}}(n^2/ \varepsilon)$, which is near-linear time. To the best of our knowledge, this complexity is better than the complexity of the Sinkhorn algorithm for solving the Optimal Transport (OT) problem, which is of order $\widetilde{\mathcal{O}}(n^2/\varepsilon^2)$. Our proof technique is based on the geometric convergence of the Sinkhorn updates to the optimal dual solution of the entropic regularized UOT problem and some properties of the primal solution. It is also different from the proof for the complexity of the Sinkhorn algorithm for approximating the OT problem since the UOT solution does not have to meet the marginal constraints.
We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most … We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most $n$ components. We show that the complexity of the Sinkhorn algorithm for finding an $\varepsilon$-approximate solution to the UOT problem is of order $\widetilde{\mathcal{O}}(n^2/ \varepsilon)$, which is near-linear time. To the best of our knowledge, this complexity is better than the complexity of the Sinkhorn algorithm for solving the Optimal Transport (OT) problem, which is of order $\widetilde{\mathcal{O}}(n^2/\varepsilon^2)$. Our proof technique is based on the geometric convergence of the Sinkhorn updates to the optimal dual solution of the entropic regularized UOT problem and some properties of the primal solution. It is also different from the proof for the complexity of the Sinkhorn algorithm for approximating the OT problem since the UOT solution does not have to meet the marginal constraints.
Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this … Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this work, we focus on a general class of OT problems under a combination of equality and inequality constraints. We derive the corresponding entropy regularization formulation and introduce a Sinkhorn-type algorithm for such constrained OT problems supported by theoretical guarantees. We first bound the approximation error when solving the problem through entropic regularization, which reduces exponentially with the increase of the regularization parameter. Furthermore, we prove a sublinear first-order convergence rate of the proposed Sinkhorn-type algorithm in the dual space by characterizing the optimization procedure with a Lyapunov function. To achieve fast and higher-order convergence under weak entropy regularization, we augment the Sinkhorn-type algorithm with dynamic regularization scheduling and second-order acceleration. Overall, this work systematically combines recent theoretical and numerical advances in entropic optimal transport with the constrained case, allowing practitioners to derive approximate transport plans in complex scenarios.
We provide theoretical analyses for two algorithms that solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. We show that a greedy … We provide theoretical analyses for two algorithms that solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. We show that a greedy variant of the classical Sinkhorn algorithm, known as the \emph{Greenkhorn algorithm}, can be improved to $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$, improving on the best known complexity bound of $\widetilde{\mathcal{O}}(n^2\varepsilon^{-3})$. Notably, this matches the best known complexity bound for the Sinkhorn algorithm and helps explain why the Greenkhorn algorithm can outperform the Sinkhorn algorithm in practice. Our proof technique, which is based on a primal-dual formulation and a novel upper bound for the dual solution, also leads to a new class of algorithms that we refer to as \emph{adaptive primal-dual accelerated mirror descent} (APDAMD) algorithms. We prove that the complexity of these algorithms is $\widetilde{\mathcal{O}}(n^2\sqrt{\delta}\varepsilon^{-1})$, where $\delta > 0$ refers to the inverse of the strong convexity module of Bregman divergence with respect to $\|\cdot\|_\infty$. This implies that the APDAMD algorithm is faster than the Sinkhorn and Greenkhorn algorithms in terms of $\varepsilon$. Experimental results on synthetic and real datasets demonstrate the favorable performance of the Greenkhorn and APDAMD algorithms in practice.
We provide theoretical analyses for two algorithms that solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. We show that a greedy … We provide theoretical analyses for two algorithms that solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. We show that a greedy variant of the classical Sinkhorn algorithm, known as the \emph{Greenkhorn algorithm}, can be improved to $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$, improving on the best known complexity bound of $\widetilde{\mathcal{O}}(n^2\varepsilon^{-3})$. Notably, this matches the best known complexity bound for the Sinkhorn algorithm and helps explain why the Greenkhorn algorithm can outperform the Sinkhorn algorithm in practice. Our proof technique, which is based on a primal-dual formulation and a novel upper bound for the dual solution, also leads to a new class of algorithms that we refer to as \emph{adaptive primal-dual accelerated mirror descent} (APDAMD) algorithms. We prove that the complexity of these algorithms is $\widetilde{\mathcal{O}}(n^2\sqrtδ\varepsilon^{-1})$, where $δ&gt; 0$ refers to the inverse of the strong convexity module of Bregman divergence with respect to $\|\cdot\|_\infty$. This implies that the APDAMD algorithm is faster than the Sinkhorn and Greenkhorn algorithms in terms of $\varepsilon$. Experimental results on synthetic and real datasets demonstrate the favorable performance of the Greenkhorn and APDAMD algorithms in practice.
We study the convergence of the transport plans $\gamma_\epsilon$ towards $\gamma_0$ as well as the cost of the entropy-regularized optimal transport $(c,\gamma_\epsilon)$ towards $(c,\gamma_0)$ as the regularization parameter $\epsilon$ vanishes … We study the convergence of the transport plans $\gamma_\epsilon$ towards $\gamma_0$ as well as the cost of the entropy-regularized optimal transport $(c,\gamma_\epsilon)$ towards $(c,\gamma_0)$ as the regularization parameter $\epsilon$ vanishes in the setting of finite entropy marginals. We show that under the assumption of infinitesimally twisted cost and compactly supported marginals the distance $W_2(\gamma_\epsilon,\gamma_0)$ is asymptotically greater than $C\sqrt{\epsilon}$ and the suboptimality $(c,\gamma_\epsilon)-(c,\gamma_0)$ is of order $\epsilon$. In the quadratic cost case the compactness assumption is relaxed into a moment of order $2+\delta$ assumption. Moreover, in the case of a Lipschitz transport map for the non-regularized problem, the distance $W_2(\gamma_\epsilon,\gamma_0)$ converges to $0$ at rate $\sqrt{\epsilon}$. Finally, if in addition the marginals have finite Fisher information, we prove $(c,\gamma_\epsilon)-(c,\gamma_0) \sim d\epsilon/2$ and we provide a companion expansion of $H(\gamma_\epsilon)$. These results are achieved by disentangling the role of the cost and the entropy in the regularized problem.
We show non-asymptotic exponential convergence of Sinkhorn iterates to the Schr\"odinger potentials, solutions of the quadratic Entropic Optimal Transport problem on $\mathbb{R}^d$. Our results hold under mild assumptions on the … We show non-asymptotic exponential convergence of Sinkhorn iterates to the Schr\"odinger potentials, solutions of the quadratic Entropic Optimal Transport problem on $\mathbb{R}^d$. Our results hold under mild assumptions on the marginal inputs: in particular, we only assume that they admit an asymptotically positive log-concavity profile, covering as special cases log-concave distributions and bounded smooth perturbations of quadratic potentials. More precisely, we provide exponential $\mathrm{L}^1$ and pointwise convergence of the iterates (resp. their gradient and Hessian) to Schr\"odinger potentials (resp. their gradient and Hessian) for large enough values of the regularization parameter. As a corollary, we establish exponential convergence of Sinkhorn plans with respect to a symmetric relative entropy and in Wasserstein distance of order one. Up to the authors' knowledge, these are the first results which establish exponential convergence of Sinkhorn's algorithm in a general setting without assuming bounded cost functions or compactly supported marginals.
We present several new complexity results for the entropic regularized algorithms that approximately solve the optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. First, … We present several new complexity results for the entropic regularized algorithms that approximately solve the optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. First, we improve the complexity bound of a greedy variant of Sinkhorn, known as \textit{Greenkhorn}, from $\widetilde{O}(n^2\varepsilon^{-3})$ to $\widetilde{O}(n^2\varepsilon^{-2})$. Notably, our result can match the best known complexity bound of Sinkhorn and help clarify why Greenkhorn significantly outperforms Sinkhorn in practice in terms of row/column updates as observed by~\citet{Altschuler-2017-Near}. Second, we propose a new algorithm, which we refer to as \textit{APDAMD} and which generalizes an adaptive primal-dual accelerated gradient descent (APDAGD) algorithm~\citep{Dvurechensky-2018-Computational} with a prespecified mirror mapping $\phi$. We prove that APDAMD achieves the complexity bound of $\widetilde{O}(n^2\sqrt{\delta}\varepsilon^{-1})$ in which $\delta>0$ stands for the regularity of $\phi$. In addition, we show by a counterexample that the complexity bound of $\widetilde{O}(\min\{n^{9/4}\varepsilon^{-1}, n^2\varepsilon^{-2}\})$ proved for APDAGD before is invalid and give a refined complexity bound of $\widetilde{O}(n^{5/2}\varepsilon^{-1})$. Further, we develop a \textit{deterministic} accelerated variant of Sinkhorn via appeal to estimated sequence and prove the complexity bound of $\widetilde{O}(n^{7/3}\varepsilon^{-4/3})$. As such, we see that accelerated variant of Sinkhorn outperforms Sinkhorn and Greenkhorn in terms of $1/\varepsilon$ and APDAGD and accelerated alternating minimization (AAM)~\citep{Guminov-2021-Combination} in terms of $n$. Finally, we conduct the experiments on synthetic and real data and the numerical results show the efficiency of Greenkhorn, APDAMD and accelerated Sinkhorn in practice.
We introduce a new class of convex-regularized Optimal Transport losses, which generalizes the classical Entropy-regularization of Optimal Transport and Sinkhorn divergences, and propose a generalized Sinkhorn algorithm. Our framework unifies … We introduce a new class of convex-regularized Optimal Transport losses, which generalizes the classical Entropy-regularization of Optimal Transport and Sinkhorn divergences, and propose a generalized Sinkhorn algorithm. Our framework unifies many regularizations and numerical methods previously appeared in the literature. We show the existence of the maximizer for the dual problem, complementary slackness conditions, providing a complete characterization of solutions for such class of variational problems. As a consequence, we study structural properties of these losses, including continuity, differentiability and provide explicit formulas for the its gradient. Finally, we provide theoretical guarantees of convergences and stability of the generalized Sinkhorn algorithm, even in the continuous setting. The techniques developed here are directly applicable also to study Wasserstein barycenters or, more generally, multi-marginal problems.
We consider the problem of steering a linear dynamical system with complete state observation from an initial Gaussian distribution in state-space to a final one with minimum energy control. The … We consider the problem of steering a linear dynamical system with complete state observation from an initial Gaussian distribution in state-space to a final one with minimum energy control. The system is stochastically driven through the control channels; an example for such a system is that of an inertial particle experiencing random "white noise" forcing. We show that a target probability distribution can always be achieved in finite time. The optimal control is given in state-feedback form and is computed explicitly by solving a pair of differential Lyapunov equations that are nonlinearly coupled through their boundary values. This result, given its attractive algorithmic nature, appears to have several potential applications such as to quality control, control of industrial processes, as well as to active control of nanomechanical systems and molecular cooling. The problem to steer a diffusion process between end-point marginals has a long history (Schrödinger bridges) and the present case of steering a linear stochastic system constitutes such a Schrödinger bridge for possibly degenerate diffusions. Our results provide the first implementable form of the optimal control for a general Gauss-Markov process. Illustrative examples are provided for steering inertial particles and for "cooling" a stochastic oscillator. A final result establishes directly the property of Schrödinger bridges as the most likely random evolution between given marginals to the present context of linear stochastic systems. A second part to this work, that is to appear as part II, addresses the general situation where the stochastic excitation enters through channels that may differ from those used to control.
I. Introduction.1. IF in any case of statistical observation we classify the objects or individuals observed into two classes only-e.g., peas into yellowseeded and green-seeded, or the members of any … I. Introduction.1. IF in any case of statistical observation we classify the objects or individuals observed into two classes only-e.g., peas into yellowseeded and green-seeded, or the members of any group of mankind into male anid female-the resulting data are of the simplest possible statistical form.If, for each object or individual, we note two characters instead of one, dividing again into two classes only, the data become slightly more complex.We have four classes resulting from the two successive divisions-the class all the members of which possess both characters, the class all the members of which possess the first character but not the second, the class all the members of which possess the second character but not the first, and finally the class all the members of which possess neither of the two characters noted.The data resulting from any such count may, if space is no great consideration, be conveniently represented in the form of a small table such as the following (Macdonell, 10, Table II), which shows the recoveries and deaths amongst vaccinated and unvaccinated patients during the small-pox epidemic at Sheffield in 1887-88.There were 4,703 cases, of which 4,I51 were vaccinated TABLE I.-Sheseffld smalt-pox epidemic, 1887-88: cited fronm lacdonell (10).
Recent years have witnessed an explosion in the volume and variety of data collected in all scientific disciplines and industrial settings. Such massive data sets present a number of challenges … Recent years have witnessed an explosion in the volume and variety of data collected in all scientific disciplines and industrial settings. Such massive data sets present a number of challenges to researchers in statistics and machine learning. This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level. It includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices - as well as chapters devoted to in-depth exploration of particular model classes - including sparse linear models, matrix models with rank constraints, graphical models, and various types of non-parametric models. With hundreds of worked examples and exercises, this text is intended both for courses and for self-study by graduate students and researchers in statistics, machine learning, and related fields who must understand, apply, and adapt modern statistical methods suited to large-scale data.
Monge--Kantorovich optimal mass transport (OMT) provides a blueprint for geometries in the space of positive densities---it quantifies the cost of transporting a mass distribution into another. In particular, it provides … Monge--Kantorovich optimal mass transport (OMT) provides a blueprint for geometries in the space of positive densities---it quantifies the cost of transporting a mass distribution into another. In particular, it provides natural options for interpolation of distributions (displacement interpolation) and for modeling flows. As such it has been the cornerstone of recent developments in physics, probability theory, image processing, time-series analysis, and several other fields. In spite of extensive work and theoretical developments, the computation of OMT for large-scale problems has remained a challenging task. An alternative framework for interpolating distributions, rooted in statistical mechanics and large deviations, is that of the Schrödinger bridge problem (SBP), which leads to entropic interpolation. SBP may be seen as a stochastic regularization of OMT, and can be cast as the stochastic control problem of steering the probability density of the state-vector of a dynamical system between two marginals. The actual computation of entropic flows, however, has received hardly any attention. In our recent work on Schrödinger bridges for Markov chains and quantum channels, we showed that the solution can be efficiently obtained from the fixed point of a map which is contractive in the Hilbert metric. Thus, the purpose of this paper is to show that a similar approach can be taken in the context of diffusion processes which (i) leads to a new proof of a classical result on SBP and (ii) provides an efficient computational scheme for both SBP and OMT. We illustrate this new computational approach by obtaining interpolation of densities in representative examples such as interpolation of images.
In this work, we study the entropic regularization of the strictly correlated electrons formalism, discussing the implications for density functional theory and establishing a link with earlier works on quantum … In this work, we study the entropic regularization of the strictly correlated electrons formalism, discussing the implications for density functional theory and establishing a link with earlier works on quantum kinetic energy and classical entropy. We carry out a very preliminary investigation (using simplified models) on the use of the solution of the entropic regularized problem to build approximations for the kinetic correlation functional at large coupling strengths. We also analyze lower and upper bounds to the Hohenberg–Kohn functional using the entropic regularized strictly correlated electrons problem.
Abstract We show that the discrete Sinkhorn algorithm—as applied in the setting of Optimal Transport on a compact manifold—converges to the solution of a fully non-linear parabolic PDE of Monge–Ampère … Abstract We show that the discrete Sinkhorn algorithm—as applied in the setting of Optimal Transport on a compact manifold—converges to the solution of a fully non-linear parabolic PDE of Monge–Ampère type, in a large-scale limit. The latter evolution equation has previously appeared in different contexts (e.g. on the torus it can be be identified with the Ricci flow). This leads to algorithmic approximations of the potential of the Optimal Transport map, as well as the Optimal Transport distance, with explicit bounds on the arithmetic complexity of the construction and the approximation errors. As applications we obtain explicit schemes of nearly linear complexity, at each iteration, for optimal transport on the torus and the two-sphere, as well as the far-field antenna problem. Connections to Quasi-Monte Carlo methods are exploited.
Abstract This paper exploit the equivalence between the Schrödinger Bridge problem (Léonard in J Funct Anal 262:1879–1920, 2012; Nelson in Phys Rev 150:1079, 1966; Schrödinger in Über die umkehrung der … Abstract This paper exploit the equivalence between the Schrödinger Bridge problem (Léonard in J Funct Anal 262:1879–1920, 2012; Nelson in Phys Rev 150:1079, 1966; Schrödinger in Über die umkehrung der naturgesetze. Verlag Akademie der wissenschaften in kommission bei Walter de Gruyter u, Company, 1931) and the entropy penalized optimal transport (Cuturi in: Advances in neural information processing systems, pp 2292–2300, 2013; Galichon and Salanié in: Matching with trade-offs: revealed preferences over competing characteristics. CEPR discussion paper no. DP7858, 2010) in order to find a different approach to the duality, in the spirit of optimal transport. This approach results in a priori estimates which are consistent in the limit when the regularization parameter goes to zero. In particular, we find a new proof of the existence of maximizing entropic-potentials and therefore, the existence of a solution of the Schrödinger system. Our method extends also when we have more than two marginals: the main new result is the proof that the Sinkhorn algorithm converges even in the continuous multi-marginal case. This provides also an alternative proof of the convergence of the Sinkhorn algorithm in two marginals.
We study the quantitative stability of the quadratic optimal transport map between a fixed probability density ρ and a probability measure μ on Rd, which we denote Tμ. Assuming that … We study the quantitative stability of the quadratic optimal transport map between a fixed probability density ρ and a probability measure μ on Rd, which we denote Tμ. Assuming that the source density ρ is bounded from above and below on a compact convex set, we prove that the map μ↦Tμ is bi-Hölder continuous on large families of probability measures, such as the set of probability measures whose moment of order p>d is bounded by some constant. These stability estimates show that the linearized optimal transport metric W2,ρ(μ,ν)=‖Tμ−Tν‖L2(ρ,Rd) is bi-Hölder equivalent to the 2-Wasserstein distance on such sets, justifying its use in applications.
The aim of this note is to give an elementary proof of linear convergence of the Sinkhorn algorithm for the entropic regularization of multimarginal optimal transport in the setting of … The aim of this note is to give an elementary proof of linear convergence of the Sinkhorn algorithm for the entropic regularization of multimarginal optimal transport in the setting of general probability spaces. The proof simply relies on (i) the fact that Sinkhorn iterates are bounded, (ii) the strong convexity of the exponential on bounded intervals, and (iii) the convergence analysis of the coordinate descent (Gauss--Seidel) method of Beck and Tetruashvili [SIAM J. Optim, 23 (2013), pp. 2037--2060].
We establish that the iterates of the iterative proportional fitting procedure, also known as Sinkhorn's algorithm and commonly used to solve entropy-regularised optimal transport problems, are stable w.r.t. perturbations of … We establish that the iterates of the iterative proportional fitting procedure, also known as Sinkhorn's algorithm and commonly used to solve entropy-regularised optimal transport problems, are stable w.r.t. perturbations of the marginals, uniformly in time. Our result is quantitative and stated in terms of the 1-Wasserstein metric. As a corollary we establish a quantitative stability result for Schrödinger bridges.
Concentration inequalities deal with deviations of functions of independent random variables from their expectation.In the last decade new tools have been introduced making it possible to establish simple and powerful … Concentration inequalities deal with deviations of functions of independent random variables from their expectation.In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities.These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms.This text attempts to summarize some of the basic tools.
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation … This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. Introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite block-length approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC Bayes and variational principle, Kolmogorov's metric entropy, strong data processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by a solutions manual for instructors, and additional standalone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.