This work introduces and rigorously analyzes the problem of finding a constrained approximate optimal transport (OT) map. The core challenge addressed is learning a map g from a source measure μ to a target measure ν such that the push-forward g#μ is “close” to ν, while g is restricted to lie within a predefined function class G. This framework is highly significant because traditional unconstrained OT maps may not exist, may lack desirable regularity properties (e.g., Lipschitz continuity), or may not align with the structural constraints imposed by specific modeling choices, particularly in high-dimensional or complex data settings.
The relevance of this problem is particularly pronounced in modern machine learning applications, such as generative modeling (e.g., GANs, VAEs, normalizing flows, diffusion models), where models implicitly learn push-forward mappings. In these contexts, imposing constraints on the learned map g—such as Lipschitz continuity for robustness or architectural limitations (e.g., neural networks)—is crucial for training stability, generalization, and interpretability, even if it means sacrificing perfect alignment (g#μ = ν). The problem also finds utility in scenarios requiring geometric invariances or comparisons between distributions in spaces of different dimensions, such as shape matching or word embeddings.
The key innovations presented are multi-faceted:
- Formalization and Existence Guarantees: The work formally defines the constrained approximate OT map problem and, crucially, establishes general conditions for the existence of solutions. These conditions involve a coercive cost function, a Lipschitz-constrained function class G that is closed under a compact-open topology, and problem finiteness.
- Analysis of Specific Function Classes: The theoretical existence results are applied to two prominent function classes:
- Gradients of (strongly) convex functions: It is proven that these L-Lipschitz functions satisfy the conditions for existence of a solution, providing a rigorous foundation for their use as constrained maps.
- Neural Networks (NNs): For NNs with Lipschitz activation functions and compact parameter sets, existence of solutions is also established, legitimizing their widespread use in learning generative mappings.
- Connection to Transport Plan Approximation: The paper explores a variant where the goal is to approximate a full optimal transport plan (rather than just the target measure). For the squared Euclidean cost, this “plan approximation problem” is shown to be equivalent to the map problem.
- Alternating Minimization Scheme: For the squared Euclidean cost, an alternating minimization strategy is proposed. The sub-problem of optimizing g (given a transport plan) is elegantly reinterpreted as an L2 projection of the barycentric map of the plan onto the constrained function class G. A critical theoretical finding is that while the equivalence between minimizing the OT cost and projecting onto the barycentric map holds in one dimension, it does not extend to higher dimensions, demonstrated by a concrete counterexample in 2D.
- Practical Numerical Methods with Convergence Guarantees: Concrete algorithms are developed for various function classes:
- For gradients of convex functions, a method based on Quadratic Constrained Quadratic Programs (QCQP) within an alternating minimization framework is provided.
- For maps in Reproducing Kernel Hilbert Spaces (RKHS), a regularized optimization problem is formulated, allowing for reduction to a finite-dimensional problem solvable by gradient descent.
- For Neural Networks, a Stochastic Gradient Descent (SGD) approach is detailed, leveraging concepts from non-smooth non-convex optimization (Clarke sub-gradients, semi-algebraicity) to ensure convergence to critical points.
- Illustrative Application: The utility of the developed methods is showcased through an application to color transfer, where learned maps are applied to new images to demonstrate robustness to outliers.
The main prior ingredients needed for this work span several areas of mathematics and computer science:
- Optimal Transport Theory: Fundamental concepts such as push-forward measures, the Monge-Kantorovich duality, Wasserstein distances (especially W2), and the properties of optimal transport plans and barycentric maps are central. Works by Villani, Santambrogio, and Ambrosio form the theoretical backbone.
- Calculus of Variations: The direct method of calculus of variations is a primary tool for proving the existence of minimizers, relying on properties like coercivity, lower semi-continuity, and compactness arguments (e.g., Arzelà-Ascoli theorem).
- Functional Analysis: Concepts of L-Lipschitz functions are crucial for defining the constrained function class G. Reproducing Kernel Hilbert Spaces (RKHS) provide a specific function space for one of the numerical methods, utilizing their reproducing property.
- Convex Analysis: The properties of convex functions, particularly strongly convex ones, and their gradients are extensively used to define and analyze a specific, well-behaved class of constrained maps.
- Optimization Theory: The work relies on:
- Alternating Minimization: A standard iterative optimization technique.
- Gradient Descent and Stochastic Gradient Descent (SGD): Foundational algorithms for minimizing objective functions, adapted for non-convex and non-smooth settings.
- Non-smooth Non-convex Optimization: Advanced theory concerning Clarke sub-gradients and semi-algebraic functions (e.g., works by Bolte, Pauwels) is essential for proving convergence guarantees for SGD in complex settings like neural networks.
- Quadratic Programming: Used to solve the sub-problems arising from the gradients of convex functions class.
- Numerical Optimal Transport: The implementation of the proposed algorithms leverages existing discrete OT solvers (e.g., Sinkhorn algorithm, network simplex method) to compute transport costs and plans efficiently.
- Probability Theory: Concepts of probability distributions, empirical measures, and their manipulations (e.g., quantile functions, conditional expectation) are fundamental for defining and analyzing the problem.