Constrained approximate optimal transport maps

Abstract

We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $\nu$ and the image by $g$ of a source measure $\mu$. This is relevant when an OT map from $\mu$ to $\nu$ does not exist or does not satisfy the desired constraints of $G$. We address existence and uniqueness for generic subclasses of $L$-Lipschitz functions, including gradients of (strongly) convex functions and typical Neural Networks. We explore a variant that approaches a transport plan, showing equivalence to a map problem in some cases. For the squared Euclidean cost, we propose alternating minimisation over a transport plan $\pi$ and map $g$, with the optimisation over $g$ being the $L^2$ projection on $G$ of the barycentric mapping $\overline{\pi}$. In dimension one, this global problem equates the $L^2$ projection of $\overline{\pi^*}$ onto $G$ for an OT plan $\pi^*$ between $\mu$ and $\nu$, but this does not extend to higher dimensions. We introduce a simple kernel method to find $g$ within a Reproducing Kernel Hilbert Space in the discrete case. We present numerical methods for $L$-Lipschitz gradients of $\ell$-strongly convex potentials, and study the convergence of SGD methods for Neural Networks.

Locations

  • ESAIM Control Optimisation and Calculus of Variations
  • arXiv (Cornell University)

Ask a Question About This Paper

Summary

This work introduces and rigorously analyzes the problem of finding a constrained approximate optimal transport (OT) map. The core challenge addressed is learning a map g from a source measure μ to a target measure ν such that the push-forward g#μ is “close” to ν, while g is restricted to lie within a predefined function class G. This framework is highly significant because traditional unconstrained OT maps may not exist, may lack desirable regularity properties (e.g., Lipschitz continuity), or may not align with the structural constraints imposed by specific modeling choices, particularly in high-dimensional or complex data settings.

The relevance of this problem is particularly pronounced in modern machine learning applications, such as generative modeling (e.g., GANs, VAEs, normalizing flows, diffusion models), where models implicitly learn push-forward mappings. In these contexts, imposing constraints on the learned map g—such as Lipschitz continuity for robustness or architectural limitations (e.g., neural networks)—is crucial for training stability, generalization, and interpretability, even if it means sacrificing perfect alignment (g#μ = ν). The problem also finds utility in scenarios requiring geometric invariances or comparisons between distributions in spaces of different dimensions, such as shape matching or word embeddings.

The key innovations presented are multi-faceted:

  1. Formalization and Existence Guarantees: The work formally defines the constrained approximate OT map problem and, crucially, establishes general conditions for the existence of solutions. These conditions involve a coercive cost function, a Lipschitz-constrained function class G that is closed under a compact-open topology, and problem finiteness.
  2. Analysis of Specific Function Classes: The theoretical existence results are applied to two prominent function classes:
    • Gradients of (strongly) convex functions: It is proven that these L-Lipschitz functions satisfy the conditions for existence of a solution, providing a rigorous foundation for their use as constrained maps.
    • Neural Networks (NNs): For NNs with Lipschitz activation functions and compact parameter sets, existence of solutions is also established, legitimizing their widespread use in learning generative mappings.
  3. Connection to Transport Plan Approximation: The paper explores a variant where the goal is to approximate a full optimal transport plan (rather than just the target measure). For the squared Euclidean cost, this “plan approximation problem” is shown to be equivalent to the map problem.
  4. Alternating Minimization Scheme: For the squared Euclidean cost, an alternating minimization strategy is proposed. The sub-problem of optimizing g (given a transport plan) is elegantly reinterpreted as an L2 projection of the barycentric map of the plan onto the constrained function class G. A critical theoretical finding is that while the equivalence between minimizing the OT cost and projecting onto the barycentric map holds in one dimension, it does not extend to higher dimensions, demonstrated by a concrete counterexample in 2D.
  5. Practical Numerical Methods with Convergence Guarantees: Concrete algorithms are developed for various function classes:
    • For gradients of convex functions, a method based on Quadratic Constrained Quadratic Programs (QCQP) within an alternating minimization framework is provided.
    • For maps in Reproducing Kernel Hilbert Spaces (RKHS), a regularized optimization problem is formulated, allowing for reduction to a finite-dimensional problem solvable by gradient descent.
    • For Neural Networks, a Stochastic Gradient Descent (SGD) approach is detailed, leveraging concepts from non-smooth non-convex optimization (Clarke sub-gradients, semi-algebraicity) to ensure convergence to critical points.
  6. Illustrative Application: The utility of the developed methods is showcased through an application to color transfer, where learned maps are applied to new images to demonstrate robustness to outliers.

The main prior ingredients needed for this work span several areas of mathematics and computer science:

  • Optimal Transport Theory: Fundamental concepts such as push-forward measures, the Monge-Kantorovich duality, Wasserstein distances (especially W2), and the properties of optimal transport plans and barycentric maps are central. Works by Villani, Santambrogio, and Ambrosio form the theoretical backbone.
  • Calculus of Variations: The direct method of calculus of variations is a primary tool for proving the existence of minimizers, relying on properties like coercivity, lower semi-continuity, and compactness arguments (e.g., Arzelà-Ascoli theorem).
  • Functional Analysis: Concepts of L-Lipschitz functions are crucial for defining the constrained function class G. Reproducing Kernel Hilbert Spaces (RKHS) provide a specific function space for one of the numerical methods, utilizing their reproducing property.
  • Convex Analysis: The properties of convex functions, particularly strongly convex ones, and their gradients are extensively used to define and analyze a specific, well-behaved class of constrained maps.
  • Optimization Theory: The work relies on:
    • Alternating Minimization: A standard iterative optimization technique.
    • Gradient Descent and Stochastic Gradient Descent (SGD): Foundational algorithms for minimizing objective functions, adapted for non-convex and non-smooth settings.
    • Non-smooth Non-convex Optimization: Advanced theory concerning Clarke sub-gradients and semi-algebraic functions (e.g., works by Bolte, Pauwels) is essential for proving convergence guarantees for SGD in complex settings like neural networks.
    • Quadratic Programming: Used to solve the sub-problems arising from the gradients of convex functions class.
  • Numerical Optimal Transport: The implementation of the proposed algorithms leverages existing discrete OT solvers (e.g., Sinkhorn algorithm, network simplex method) to compute transport costs and plans efficiently.
  • Probability Theory: Concepts of probability distributions, empirical measures, and their manipulations (e.g., quantile functions, conditional expectation) are fundamental for defining and analyzing the problem.
We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $\nu$ and the image by $g$ of a … We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $\nu$ and the image by $g$ of a source measure $\mu$. This is relevant when an OT map from $\mu$ to $\nu$ does not exist or does not satisfy the desired constraints of $G$. We address existence and uniqueness for generic subclasses of $L$-Lipschitz functions, including gradients of (strongly) convex functions and typical Neural Networks. We explore a variant that approaches a transport plan, showing equivalence to a map problem in some cases. For the squared Euclidean cost, we propose alternating minimisation over a transport plan $\pi$ and map $g$, with the optimisation over $g$ being the $L^2$ projection on $G$ of the barycentric mapping $\overline{\pi}$. In dimension one, this global problem equates the $L^2$ projection of $\overline{\pi^*}$ onto $G$ for an OT plan $\pi^*$ between $\mu$ and $\nu$, but this does not extend to higher dimensions. We introduce a simple kernel method to find $g$ within a Reproducing Kernel Hilbert Space in the discrete case. Finally, we present numerical methods for $L$-Lipschitz gradients of $\ell$-strongly convex potentials.
Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another. Recent works have drawn inspiration … Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another. Recent works have drawn inspiration from Brenier's theorem, which states that when the ground cost is the squared-Euclidean distance, the ``best'' map to morph a continuous measure in $\mathcal{P}(\Rd)$ into another must be the gradient of a convex function. To exploit that result, [Makkuva+ 2020, Korotin+2020] consider maps $T=\nabla f_\theta$, where $f_\theta$ is an input convex neural network (ICNN), as defined by Amos+2017, and fit $\theta$ with SGD using samples. Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges, due notably to the many constraints imposed on $\theta$; the need to approximate the conjugate of $f_\theta$; or the limitation that they only work for the squared-Euclidean cost. More generally, we question the relevance of using Brenier's result, which only applies to densities, to constrain the architecture of candidate maps fitted on samples. Motivated by these limitations, we propose a radically different approach to estimating OT maps: Given a cost $c$ and a reference measure $\rho$, we introduce a regularizer, the Monge gap $\mathcal{M}^c_{\rho}(T)$ of a map $T$. That gap quantifies how far a map $T$ deviates from the ideal properties we expect from a $c$-OT map. In practice, we drop all architecture requirements for $T$ and simply minimize a distance (e.g., the Sinkhorn divergence) between $T\sharp\mu$ and $\nu$, regularized by $\mathcal{M}^c_\rho(T)$. We study $\mathcal{M}^c_{\rho}$, and show how our simple pipeline outperforms significantly other baselines in practice.
We introduce a novel neural network-based algorithm to compute optimal transport (OT) plans for general cost functionals. In contrast to common Euclidean costs, i.e., $\ell^1$ or $\ell^2$, such functionals provide … We introduce a novel neural network-based algorithm to compute optimal transport (OT) plans for general cost functionals. In contrast to common Euclidean costs, i.e., $\ell^1$ or $\ell^2$, such functionals provide more flexibility and allow using auxiliary information, such as class labels, to construct the required transport map. Existing methods for general costs are discrete and have limitations in practice, i.e. they do not provide an out-of-sample estimation. We address the challenge of designing a continuous OT approach for general costs that generalizes to new data points in high-dimensional spaces, such as images. Additionally, we provide the theoretical error analysis for our recovered transport plans. As an application, we construct a cost functional to map data distributions while preserving the class-wise structure.
Estimating Wasserstein distances between two high-dimensional densities suffers from the curse of dimensionality: one needs an exponential (wrt dimension) number of samples to ensure that the distance between two empirical … Estimating Wasserstein distances between two high-dimensional densities suffers from the curse of dimensionality: one needs an exponential (wrt dimension) number of samples to ensure that the distance between two empirical measures is comparable to the distance between the original densities. Therefore, optimal transport (OT) can only be used in machine learning if it is substantially regularized. On the other hand, one of the greatest achievements of the OT literature in recent years lies in regularity theory: Caffarelli showed that the OT map between two well behaved measures is Lipschitz, or equivalently when considering 2-Wasserstein distances, that Brenier convex potentials (whose gradient yields an optimal map) are smooth. We propose in this work to draw inspiration from this theory and use regularity as a regularization tool. We give algorithms operating on two discrete measures that can recover nearly optimal transport maps with small distortion, or equivalently, nearly optimal Brenier potentials that are strongly convex and smooth. The problem boils down to solving alternatively a convex QCQP and a discrete OT problem, granting access to the values and gradients of the Brenier potential not only on sampled points, but also out of sample at the cost of solving a simpler QCQP for each evaluation. We propose algorithms to estimate and evaluate transport maps with desired regularity properties, benchmark their statistical performance, apply them to domain adaptation and visualize their action on a color transfer task.
Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for … Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for quadratic-cost transport -- specifically, computation of the Wasserstein-2 distance, a commonly-used formulation of optimal transport in machine learning. To overcome the challenge of computing ground truth transport maps between continuous measures needed to assess these solvers, we use input-convex neural networks (ICNN) to construct pairs of measures whose ground truth OT maps can be obtained analytically. This strategy yields pairs of continuous benchmark measures in high-dimensional spaces such as spaces of images. We thoroughly evaluate existing optimal transport solvers using these benchmark measures. Even though these solvers perform well in downstream tasks, many do not faithfully recover optimal transport maps. To investigate the cause of this discrepancy, we further test the solvers in a setting of image generation. Our study reveals crucial limitations of existing solvers and shows that increased OT accuracy does not necessarily correlate to better results downstream.
Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for … Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for quadratic-cost transport -- specifically, computation of the Wasserstein-2 distance, a commonly-used formulation of optimal transport in machine learning. To overcome the challenge of computing ground truth transport maps between continuous measures needed to assess these solvers, we use input-convex neural networks (ICNN) to construct pairs of measures whose ground truth OT maps can be obtained analytically. This strategy yields pairs of continuous benchmark measures in high-dimensional spaces such as spaces of images. We thoroughly evaluate existing optimal transport solvers using these benchmark measures. Even though these solvers perform well in downstream tasks, many do not faithfully recover optimal transport maps. To investigate the cause of this discrepancy, we further test the solvers in a setting of image generation. Our study reveals crucial limitations of existing solvers and shows that increased OT accuracy does not necessarily correlate to better results downstream.
We present a novel neural-networks-based algorithm to compute optimal transport maps and plans for strong and weak transport costs. To justify the usage of neural networks, we prove that they … We present a novel neural-networks-based algorithm to compute optimal transport maps and plans for strong and weak transport costs. To justify the usage of neural networks, we prove that they are universal approximators of transport plans between probability distributions. We evaluate the performance of our optimal transport algorithm on toy examples and on the unpaired image-to-image translation.
We investigate the optimal transport problem between probability measures when the underlying cost function is understood to satisfy a least action principle, also known as a Lagrangian cost. These generalizations … We investigate the optimal transport problem between probability measures when the underlying cost function is understood to satisfy a least action principle, also known as a Lagrangian cost. These generalizations are useful when connecting observations from a physical system where the transport dynamics are influenced by the geometry of the system, such as obstacles (e.g., incorporating barrier functions in the Lagrangian), and allows practitioners to incorporate a priori knowledge of the underlying system such as non-Euclidean geometries (e.g., paths must be circular). Our contributions are of computational interest, where we demonstrate the ability to efficiently compute geodesics and amortize spline-based paths, which has not been done before, even in low dimensional problems. Unlike prior work, we also output the resulting Lagrangian optimal transport map without requiring an ODE solver. We demonstrate the effectiveness of our formulation on low-dimensional examples taken from prior work. The source code to reproduce our experiments is available at https://github.com/facebookresearch/lagrangian-ot.
Within the field of optimal transport (OT), the choice of ground cost is crucial to ensuring that the optimality of a transport map corresponds to usefulness in real-world applications. It … Within the field of optimal transport (OT), the choice of ground cost is crucial to ensuring that the optimality of a transport map corresponds to usefulness in real-world applications. It is therefore desirable to use known information to tailor cost functions and hence learn OT maps which are adapted to the problem at hand. By considering a class of neural ground costs whose Monge maps have a known form, we construct a differentiable Monge map estimator which can be optimized to be consistent with known information about an OT map. In doing so, we simultaneously learn both an OT map estimator and a corresponding adapted cost function. Through suitable choices of loss function, our method provides a general approach for incorporating prior information about the Monge map itself when learning adapted OT maps and cost functions.
Optimal transport theory has provided machine learning with several tools to infer a push-forward map between densities from samples. While this theory has recently seen tremendous methodological developments in machine … Optimal transport theory has provided machine learning with several tools to infer a push-forward map between densities from samples. While this theory has recently seen tremendous methodological developments in machine learning, its practical implementation remains notoriously difficult, because it is plagued by both computational and statistical challenges. Because of such difficulties, existing approaches rarely depart from the default choice of estimating such maps with the simple squared-Euclidean distance as the ground cost, $c(x,y)=\|x-y\|^2_2$. We follow a different path in this work, with the motivation of \emph{learning} a suitable cost structure to encourage maps to transport points along engineered features. We extend the recently proposed Monge-Bregman-Occam pipeline~\citep{cuturi2023monge}, that rests on an alternative cost formulation that is also cost-invariant $c(x,y)=h(x-y)$, but which adopts a more general form as $h=\tfrac12 \ell_2^2+\tau$, where $\tau$ is an appropriately chosen regularizer. We first propose a method that builds upon proximal gradient descent to generate ground truth transports for such structured costs, using the notion of $h$-transforms and $h$-concave potentials. We show more generally that such a method can be extended to compute $h$-transforms for entropic potentials. We study a regularizer that promotes transport displacements in low-dimensional spaces, and propose to learn such a basis change using Riemannian gradient descent on the Stiefel manifold. We show that these changes lead to estimators that are more robust and easier to interpret.
We present a flow-based approach to the optimal transport (OT) problem between two continuous distributions $\pi_0,\pi_1$ on $\mathbb{R}^d$, of minimizing a transport cost $\mathbb{E}[c(X_1-X_0)]$ in the set of couplings $(X_0,X_1)$ … We present a flow-based approach to the optimal transport (OT) problem between two continuous distributions $\pi_0,\pi_1$ on $\mathbb{R}^d$, of minimizing a transport cost $\mathbb{E}[c(X_1-X_0)]$ in the set of couplings $(X_0,X_1)$ whose marginal distributions on $X_0,X_1$ equals $\pi_0,\pi_1$, respectively, where $c$ is a cost function. Our method iteratively constructs a sequence of neural ordinary differentiable equations (ODE), each learned by solving a simple unconstrained regression problem, which monotonically reduce the transport cost while automatically preserving the marginal constraints. This yields a monotonic interior approach that traverses inside the set of valid couplings to decrease the transport cost, which distinguishes itself from most existing approaches that enforce the coupling constraints from the outside. The main idea of the method draws from rectified flow, a recent approach that simultaneously decreases the whole family of transport costs induced by convex functions $c$ (and is hence multi-objective in nature), but is not tailored to minimize a specific transport cost. Our method is a single-object variant of rectified flow that guarantees to solve the OT problem for a fixed, user-specified convex cost function $c$.
We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT. This helps repeatedly solve similar OT problems between … We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT. This helps repeatedly solve similar OT problems between different measures by leveraging the knowledge and information present from past problems to rapidly predict and solve new problems. Otherwise, standard methods ignore the knowledge of the past solutions and suboptimally re-solve each problem from scratch. We instantiate Meta OT models in discrete and continuous settings between grayscale images, spherical data, classification labels, and color palettes and use them to improve the computational time of standard OT solvers. Our source code is available at http://github.com/facebookresearch/meta-ot
We address the convergence problem in learning the Optimal Transport (OT) map, where the OT Map refers to a map from one distribution to another while minimizing the transport cost. … We address the convergence problem in learning the Optimal Transport (OT) map, where the OT Map refers to a map from one distribution to another while minimizing the transport cost. Semi-dual Neural OT, a widely used approach for learning OT Maps with neural networks, often generates fake solutions that fail to transfer one distribution to another accurately. We identify a sufficient condition under which the max-min solution of Semi-dual Neural OT recovers the true OT Map. Moreover, to address cases when this sufficient condition is not satisfied, we propose a novel method, OTP, which learns both the OT Map and the Optimal Transport Plan, representing the optimal coupling between two distributions. Under sharp assumptions on the distributions, we prove that our model eliminates the fake solution issue and correctly solves the OT problem. Our experiments show that the OTP model recovers the optimal transport map where existing methods fail and outperforms current OT-based models in image-to-image translation tasks. Notably, the OTP model can learn stochastic transport maps when deterministic OT Maps do not exist, such as one-to-many tasks like colorization.
Estimating optimal transport (OT) maps (a.k.a. Monge maps) between two measures $P$ and $Q$ is a problem fraught with computational and statistical challenges. A promising approach lies in using the … Estimating optimal transport (OT) maps (a.k.a. Monge maps) between two measures $P$ and $Q$ is a problem fraught with computational and statistical challenges. A promising approach lies in using the dual potential functions obtained when solving an entropy-regularized OT problem between samples $P_n$ and $Q_n$, which can be used to recover an approximately optimal map. The negentropy penalization in that scheme introduces, however, an estimation bias that grows with the regularization strength. A well-known remedy to debias such estimates, which has gained wide popularity among practitioners of regularized OT, is to center them, by subtracting auxiliary problems involving $P_n$ and itself, as well as $Q_n$ and itself. We do prove that, under favorable conditions on $P$ and $Q$, debiasing can yield better approximations to the Monge map. However, and perhaps surprisingly, we present a few cases in which debiasing is provably detrimental in a statistical sense, notably when the regularization strength is large or the number of samples is small. These claims are validated experimentally on synthetic and real datasets, and should reopen the debate on whether debiasing is needed when using entropic optimal transport.
Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can … Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can be efficiently solved using the Sinkhorn algorithm. However, entropy keeps the transportation plan strictly positive and therefore completely dense, unlike unregularized OT. This lack of sparsity can be problematic in applications where the transportation plan itself is of interest. In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations. We show how to incorporate squared $2$-norm and group lasso regularizations within that framework, leading to sparse and group-sparse transportation plans. On the theoretical side, we bound the approximation error introduced by regularizing the primal and dual formulations. Our results suggest that, for the regularized primal, the approximation error can often be smaller with squared $2$-norm than with entropic regularization. We showcase our proposed framework on the task of color transfer.
Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can … Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can be efficiently solved using the Sinkhorn algorithm. However, entropy keeps the transportation plan strictly positive and therefore completely dense, unlike unregularized OT. This lack of sparsity can be problematic in applications where the transportation plan itself is of interest. In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations. We show how to incorporate squared $2$-norm and group lasso regularizations within that framework, leading to sparse and group-sparse transportation plans. On the theoretical side, we bound the approximation error introduced by regularizing the primal and dual formulations. Our results suggest that, for the regularized primal, the approximation error can often be smaller with squared $2$-norm than with entropic regularization. We showcase our proposed framework on the task of color transfer.