Abstract We study the convergence rate of Sinkhorn’s algorithm for solving entropy-regularized optimal transport problems when at least one of the probability measures, $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> , admits a density over $$\mathbb {R}^d$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msup> <mml:mrow> <mml:mi>R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> </mml:math> . For a semi-concave cost function bounded by $$c_{\infty }$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:math> and a regularization parameter $$\lambda > 0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>></mml:mo> <mml:mn>0</mml:mn> </mml:mrow> </mml:math> , we obtain exponential convergence guarantees on the dual sub-optimality gap with contraction rates that are polynomial in $$\lambda /c_{\infty }$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:mrow> </mml:math> . This represents an exponential improvement over the known contraction rate $$1 - \Theta (\exp (-c_{\infty }/\lambda ))$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:mo>exp</mml:mo> <mml:mrow> <mml:mo>(</mml:mo> <mml:mo>-</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> <mml:mo>/</mml:mo> <mml:mi>λ</mml:mi> <mml:mo>)</mml:mo> </mml:mrow> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> achievable via Hilbert’s projective metric. Specifically, we prove a contraction rate value of $$1-\Theta (\lambda ^2/c_\infty ^2)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>λ</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>/</mml:mo> <mml:msubsup> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> <mml:mn>2</mml:mn> </mml:msubsup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> when $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> has a bounded log-density. In some cases, such as when $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> is log-concave and the cost function is $$c(x,y)=-\langle x, y\rangle $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>c</mml:mi> <mml:mo>(</mml:mo> <mml:mi>x</mml:mi> <mml:mo>,</mml:mo> <mml:mi>y</mml:mi> <mml:mo>)</mml:mo> <mml:mo>=</mml:mo> <mml:mo>-</mml:mo> <mml:mo>⟨</mml:mo> <mml:mi>x</mml:mi> <mml:mo>,</mml:mo> <mml:mi>y</mml:mi> <mml:mo>⟩</mml:mo> </mml:mrow> </mml:math> , this rate improves to $$1-\Theta (\lambda /c_\infty )$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> . The latter rate matches the one that we derive for the transport between isotropic Gaussian measures, indicating tightness in the dependency in $$\lambda /c_\infty $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:mrow> </mml:math> . Our results are fully non-asymptotic and explicit in all the parameters of the problem.
This paper presents a significant advancement in understanding the convergence rates of Sinkhorn’s algorithm, a widely used method for solving entropy-regularized optimal transport problems, particularly in continuous settings where at least one of the probability measures possesses a density.
Optimal Transport (OT) is a powerful mathematical framework with applications across machine learning, statistics, and physics. Sinkhorn’s algorithm provides an efficient way to compute approximations to OT solutions by introducing an entropy regularization term. A long-standing challenge in the field has been to precisely characterize the algorithm’s convergence speed. Previous work typically fell into two categories:
1. Exponential convergence rates: These guaranteed rapid convergence but came with a severe dependency on problem parameters (specifically, the ratio of the cost function’s oscillation, \(c_\infty\), to the regularization parameter, \(\lambda\)), where the contraction rate was exponentially worse for large \(c_\infty/\lambda\). This made the theoretical guarantees loose in many practical scenarios.
2. Polynomial convergence rates: These offered better, polynomial dependence on \(c_\infty/\lambda\) but exhibited a slower, polynomial decay in the number of iterations.
The key innovation of this paper is to bridge this gap, achieving a “best-of-both-worlds” scenario: exponential convergence with robust, polynomial dependence on \(c_\infty/\lambda\), under specific geometric assumptions on the problem’s inputs. This represents a substantial improvement over prior results.
The main innovations are:
The main prior ingredients needed for this work are:
In essence, this paper significantly pushes the theoretical understanding of Sinkhorn’s algorithm by demonstrating that, under reasonable geometric conditions on the data, one can achieve highly desirable exponential convergence rates that are far more robust and practically relevant than previously understood.