Author Description

Login to generate an author description

Ask a Question About This Mathematician

In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored … In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored domain, and existing methods could not compete with the performance of the best patch-based methods. The approach we introduce in this paper, called FastDVDnet, shows similar or better performance than other state-of-the-art competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model. The characteristics of its architecture make it possible to avoid using a costly motion compensation stage while achieving excellent performance. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics.
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Previous neural network based approaches to video denoising have been unsuccessful as their … In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Previous neural network based approaches to video denoising have been unsuccessful as their performance cannot compete with the performance of patch-based methods. However, our approach outperforms other patch-based competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as a small memory footprint, and the ability to handle a wide range of noise levels with a single network model. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics. The experiments show that our algorithm compares favorably to other state-of-art methods. Video examples, code and models are publicly available at https://github.com/m-tassano/dvdnet.
Since the seminal work of Venkatakrishnan, Bouman, and Wohlberg [Proceedings of the Global Conference on Signal and Information Processing, IEEE, 2013, pp. 945--948] in 2013, Plug & Play (PnP) methods … Since the seminal work of Venkatakrishnan, Bouman, and Wohlberg [Proceedings of the Global Conference on Signal and Information Processing, IEEE, 2013, pp. 945--948] in 2013, Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that is implicitly defined by an image denoising algorithm. In the case of optimization schemes, some recent works guarantee the convergence to a fixed point, albeit not necessarily a maximum a posteriori Bayesian estimate. In the case of Monte Carlo sampling schemes for general Bayesian computation, to the best of our knowledge there is no known proof of convergence. Algorithm convergence issues aside, there are important open questions regarding whether the underlying Bayesian models and estimators are well defined, are well posed, and have the basic regularity properties required to support efficient Bayesian computation schemes. This paper develops theory for Bayesian analysis and computation with PnP priors. We introduce PnP-ULA (Plug & Play unadjusted Langevin algorithm) for Monte Carlo sampling and minimum mean square error estimation. Using recent results on the quantitative convergence of Markov chains, we establish detailed convergence guarantees for this algorithm under realistic assumptions on the denoising operators used, with special attention to denoisers based on deep neural networks. We also show that these algorithms approximately target a decision-theoretically optimal Bayesian model that is well posed and meaningful from a frequentist viewpoint. PnP-ULA is demonstrated on several canonical problems such as image deblurring and inpainting, where it is used for point estimation as well as for uncertainty visualization and quantification.
Recently, impressive denoising results have been achieved by Bayesian approaches which assume Gaussian models for the image patches. This improvement in performance can be attributed to the use of per-patch … Recently, impressive denoising results have been achieved by Bayesian approaches which assume Gaussian models for the image patches. This improvement in performance can be attributed to the use of per-patch models. Unfortunately such an approach is particularly unstable for most inverse problems beyond denoising. In this paper, we propose the use of a hyperprior to model image patches, in order to stabilize the estimation procedure. There are two main advantages to the proposed restoration scheme: First, it is adapted to diagonal degradation matrices, and in particular to missing data problems (e.g., inpainting of missing pixels or zooming). Second, it can deal with signal dependent noise models, particularly suited to digital cameras. As such, the scheme is especially adapted to computational photography. In order to illustrate this point, we provide an application to high dynamic range imaging from a single image taken with a modified sensor, which shows the effectiveness of the proposed scheme.
Consider the problem of optimally matching two measures on the circle, or equivalently two periodic measures on $\mathbb{R}$, and suppose that the cost $c(x,y)$ of matching two points x, y … Consider the problem of optimally matching two measures on the circle, or equivalently two periodic measures on $\mathbb{R}$, and suppose that the cost $c(x,y)$ of matching two points x, y satisfies the Monge condition: $c(x_1,y_1)+c(x_2,y_2)<c(x_1,y_2)+c(x_2,y_1)$ whenever $x_1<x_2$ and $y_1<y_2$. We introduce a notion of locally optimal transport plan, motivated by the weak KAM (Aubry–Mather) theory, and show that all locally optimal transport plans are conjugate to shifts and that the cost of a locally optimal transport plan is a convex function of a shift parameter. This theory is applied to a transportation problem arising in image processing: for two sets of point masses on the circle, both of which have the same total mass, find an optimal transport plan with respect to a given cost function c satisfying the Monge condition. In the circular case the sorting strategy fails to provide a unique candidate solution, and a naive approach requires a quadratic number of operations. For the case of N real-valued point masses we present an $O(N|\log\epsilon|)$ algorithm that approximates the optimal cost within $\epsilon$; when all masses are integer multiples of $1/M$, the algorithm gives an exact solution in $O(N\log M)$ operations.
In this paper, we introduce a class of indicators that enable to compute efficiently optimal transport plans associated to arbitrary distributions of N demands and M supplies in R in … In this paper, we introduce a class of indicators that enable to compute efficiently optimal transport plans associated to arbitrary distributions of N demands and M supplies in R in the case where the cost function is concave. The computational cost of these indicators is small and independent of N. A hierarchical use of them enables to obtain an efficient algorithm.
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored … In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored domain, and existing methods could not compete with the performance of the best patch-based methods. The approach we introduce in this paper, called FastDVDnet, shows similar or better performance than other state-of-the-art competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model. The characteristics of its architecture make it possible to avoid using a costly motion compensation stage while achieving excellent performance. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics.
In this paper we introduce a Wasserstein-type distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal … In this paper we introduce a Wasserstein-type distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal transport problem to Gaussian mixture models. We derive a very simple discrete formulation for this distance, which makes it suitable for high dimensional problems. We also study the corresponding multi-marginal and barycenter formulations. We show some properties of this Wasserstein-type distance, and we illustrate its practical use with some examples in image processing.
This paper presents a unified computational framework for the estimation of distances, geodesics and barycenters of merge trees. We extend recent work on the edit distance [106] and introduce a … This paper presents a unified computational framework for the estimation of distances, geodesics and barycenters of merge trees. We extend recent work on the edit distance [106] and introduce a new metric, called the Wasserstein distance between merge trees, which is purposely designed to enable efficient computations of geodesics and barycenters. Specifically, our new distance is strictly equivalent to the L2-Wasserstein distance between extremum persistence diagrams, but it is restricted to a smaller solution space, namely, the space of rooted partial isomorphisms between branch decomposition trees. This enables a simple extension of existing optimization frameworks [112] for geodesics and barycenters from persistence diagrams to merge trees. We introduce a task-based algorithm which can be generically applied to distance, geodesic, barycenter or cluster computation. The task-based nature of our approach enables further accelerations with shared-memory parallelism. Extensive experiments on public ensembles and SciVis contest benchmarks demonstrate the efficiency of our approach -- with barycenter computations in the orders of minutes for the largest examples -- as well as its qualitative ability to generate representative barycenter merge trees, visually summarizing the features of interest found in the ensemble. We show the utility of our contributions with dedicated visualization applications: feature tracking, temporal reduction and ensemble clustering. We provide a lightweight C++ implementation that can be used to reproduce our results.
Abstract Gromov–Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein … Abstract Gromov–Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein distances for comparing probability measures living on Euclidean spaces of different dimensions. We focus on the Gromov–Wasserstein distance with a ground cost defined as the squared Euclidean distance, and we study the form of the optimal plan between Gaussian distributions. We show that when the optimal plan is restricted to Gaussian distributions, the problem has a very simple linear solution, which is also a solution of the linear Gromov–Monge problem. We also study the problem without restriction on the optimal plan, and provide lower and upper bounds for the value of the Gromov–Wasserstein distance between Gaussian distributions.
Since the seminal work of Venkatakrishnan et al. [83] in 2013, Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive Minimum Mean Square Error (MMSE) … Since the seminal work of Venkatakrishnan et al. [83] in 2013, Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive Minimum Mean Square Error (MMSE) or Maximum A Posteriori (MAP) estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that is implicitly defined by an image denoising algorithm. The PnP algorithms proposed in the literature mainly differ in the iterative schemes they use for optimisation or for sampling. In the case of optimisation schemes, some recent works guarantee the convergence to a fixed point, albeit not necessarily a MAP estimate. In the case of sampling schemes, to the best of our knowledge, there is no known proof of convergence. There also remain important open questions regarding whether the underlying Bayesian models and estimators are well defined, well-posed, and have the basic regularity properties required to support these numerical schemes. To address these limitations, this paper develops theory, methods, and provably convergent algorithms for performing Bayesian inference with PnP priors. We introduce two algorithms: 1) PnP-ULA (Plug & Play Unadjusted Langevin Algorithm) for Monte Carlo sampling and MMSE inference; and 2) PnP-SGD (Plug & Play Stochastic Gradient Descent) for MAP inference. Using recent results on the quantitative convergence of Markov chains, we establish detailed convergence guarantees for these two algorithms under realistic assumptions on the denoising operators used, with special attention to denoisers based on deep neural networks. We also show that these algorithms approximately target a decision-theoretically optimal Bayesian model that is well-posed. The proposed algorithms are demonstrated on several canonical problems such as image deblurring, inpainting, and denoising, where they are used for point estimation as well as for uncertainty visualisation and quantification.
HDR+ is an image processing pipeline presented by Google in 2016. At its core lies a denoising algorithm that uses a burst of raw images to produce a single higher … HDR+ is an image processing pipeline presented by Google in 2016. At its core lies a denoising algorithm that uses a burst of raw images to produce a single higher quality image. Since it is designed as a versatile solution for smartphone cameras, it does not necessarily aim for the maximization of standard denoising metrics, but rather for the production of natural, visually pleasing images. In this article, we specifically discuss and analyze the HDR+ burst denoising algorithm architecture and the impact of its various parameters. With this publication, we provide an open source Python implementation of the algorithm, along with an interactive demo.
The Gromov-Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein … The Gromov-Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein distances for comparing probability measures living on Euclidean spaces of different dimensions. In this paper, we focus on the Gromov-Wasserstein distance with a ground cost defined as the squared Euclidean distance and we study the form of the optimal plan between Gaussian distributions. We show that when the optimal plan is restricted to Gaussian distributions, the problem has a very simple linear solution, which is also solution of the linear Gromov-Monge problem. We also study the problem without restriction on the optimal plan, and provide lower and upper bounds for the value of the Gromov-Wasserstein distance between Gaussian distributions.
Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In … Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In this work, we call them "push-forward" models and study their expressivity. We show that the Lipschitz constant of these generative networks has to be large in order to fit multimodal distributions. More precisely, we show that the total variation distance and the Kullback-Leibler divergence between the generated and the data distribution are bounded from below by a constant depending on the mode separation and the Lipschitz constant. Since constraining the Lipschitz constants of neural networks is a common way to stabilize generative models, there is a provable trade-off between the ability of push-forward models to approximate multimodal distributions and the stability of their training. We validate our findings on one-dimensional and image datasets and empirically show that generative models consisting of stacked networks with stochastic input at each step, such as diffusion models do not suffer of such limitations.
This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport and its applications, in the special case of one-dimensional and circular distributions. More precisely, we … This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport and its applications, in the special case of one-dimensional and circular distributions. More precisely, we study the Monge-Kantorovich distances between discrete sets of points on the unit circle, in the case where the ground distance between two points x and y is defined as h(d(x,y)), where d is the geodesic distance on the circle and h a convex and increasing function. We first prove that computing a Monge-Kantorovich distance between two given sets of pairwise different points boils down to cut the circle at a well chosen point and to compute the same distance on the real line. This result is then used to obtain a metric between 1D and circular discrete histograms, which can be computed in linear time. A particular case of this formula has already been used in [Rabin, Delon and Gousseau SIAM 09}] for the matching of local features between images, involving circular histograms of gradient orientations. In this paper, other applications are investigated, in particular dealing with the hue component of color images. In a last part, a study is conducted to compare the advantages and drawbacks of transportation distances relying on convex or concave cost functions, and of the classical L-1 distance.
This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport and its applications, in the special case of one-dimensional and circular distributions. More precisely, we … This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport and its applications, in the special case of one-dimensional and circular distributions. More precisely, we study the Monge-Kantorovich distances between discrete sets of points on the unit circle, in the case where the ground distance between two points x and y is defined as h(d(x,y)), where d is the geodesic distance on the circle and h a convex and increasing function. We first prove that computing a Monge-Kantorovich distance between two given sets of pairwise different points boils down to cut the circle at a well chosen point and to compute the same distance on the real line. This result is then used to obtain a metric between 1D and circular discrete histograms, which can be computed in linear time. A particular case of this formula has already been used in [Rabin, Delon and Gousseau SIAM 09}] for the matching of local features between images, involving circular histograms of gradient orientations. In this paper, other applications are investigated, in particular dealing with the hue component of color images. In a last part, a study is conducted to compare the advantages and drawbacks of transportation distances relying on convex or concave cost functions, and of the classical L-1 distance.
This paper presents a novel method for restoring digital videos via a Deep Plug-and-Play (PnP) approach. Under a Bayesian formalism, the method consists in using a deep convolutional denoising network … This paper presents a novel method for restoring digital videos via a Deep Plug-and-Play (PnP) approach. Under a Bayesian formalism, the method consists in using a deep convolutional denoising network in place of the proximal operator of the prior in an alternating optimization scheme. We distinguish ourselves from prior PnP work by directly applying that method to restore a digital video from a degraded video observation. This way, a network trained once for denoising can be repurposed for other video restoration tasks. Our experiments in video deblurring, super-resolution, and interpolation of random missing pixels all show a clear benefit to using a network specifically designed for video denoising, as it yields better restoration performance and better temporal stability than a single image network with similar denoising performance using the same PnP formulation. Moreover, our method compares favorably to applying a different state-of-the-art PnP scheme separately on each frame of the sequence. This opens new perspectives in the field of video restoration.
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it … The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the SW energy. In this paper we study the properties of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E colon upper Y long right-arrow from bar normal upper S normal upper W Subscript 2 Superscript 2 Baseline left-parenthesis gamma Subscript upper Y Baseline comma gamma Subscript upper Z Baseline right-parenthesis"> <mml:semantics> <mml:mrow> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mo>:</mml:mo> <mml:mi>Y</mml:mi> <mml:mo stretchy="false">⟼</mml:mo> <mml:msubsup> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi mathvariant="normal">S</mml:mi> <mml:mi mathvariant="normal">W</mml:mi> </mml:mrow> <mml:mn>2</mml:mn> <mml:mn>2</mml:mn> </mml:msubsup> <mml:mo stretchy="false">(</mml:mo> <mml:msub> <mml:mi>γ</mml:mi> <mml:mi>Y</mml:mi> </mml:msub> <mml:mo>,</mml:mo> <mml:msub> <mml:mi>γ</mml:mi> <mml:mi>Z</mml:mi> </mml:msub> <mml:mo stretchy="false">)</mml:mo> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {E}: Y \longmapsto \mathrm {SW}_2^2(\gamma _Y, \gamma _Z)</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper Y element-of double-struck upper R Superscript n times d"> <mml:semantics> <mml:mrow> <mml:mi>Y</mml:mi> <mml:mo>∈</mml:mo> <mml:msup> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi mathvariant="double-struck">R</mml:mi> </mml:mrow> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>n</mml:mi> <mml:mo>×</mml:mo> <mml:mi>d</mml:mi> </mml:mrow> </mml:msup> </mml:mrow> <mml:annotation encoding="application/x-tex">Y \in \mathbb {R}^{n \times d}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E Subscript p"> <mml:semantics> <mml:msub> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mi>p</mml:mi> </mml:msub> <mml:annotation encoding="application/x-tex">\mathcal {E}_p</mml:annotation> </mml:semantics> </mml:math> </inline-formula> (estimating the expectation in SW using only <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="p"> <mml:semantics> <mml:mi>p</mml:mi> <mml:annotation encoding="application/x-tex">p</mml:annotation> </mml:semantics> </mml:math> </inline-formula> samples) and show convergence results on the critical points of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E Subscript p"> <mml:semantics> <mml:msub> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mi>p</mml:mi> </mml:msub> <mml:annotation encoding="application/x-tex">\mathcal {E}_p</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to those of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {E}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E Subscript p"> <mml:semantics> <mml:msub> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mi>p</mml:mi> </mml:msub> <mml:annotation encoding="application/x-tex">\mathcal {E}_p</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {E}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E Subscript p"> <mml:semantics> <mml:msub> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mi>p</mml:mi> </mml:msub> <mml:annotation encoding="application/x-tex">\mathcal {E}_p</mml:annotation> </mml:semantics> </mml:math> </inline-formula> converge towards (Clarke) critical points of these energies.
In this paper, we introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of R^d. We study the existence and … In this paper, we introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of R^d. We study the existence and uniqueness of this barycenter, we show how it is related to a larger multi-marginal optimal transport problem, and we propose a dual formulation. Finally, we explain how to compute numerically this generalized barycenter on discrete distributions, and we propose an explicit solution for Gaussian distributions.
This paper deals with the reconstruction of a discrete measure γ Z on ℝ d from the knowledge of its pushforward measures P i #γ Z by linear applications P … This paper deals with the reconstruction of a discrete measure γ Z on ℝ d from the knowledge of its pushforward measures P i #γ Z by linear applications P i :ℝ d →ℝ d i (for instance projections onto subspaces). The measure γ Z being fixed, assuming that the rows of the matrices P i are independent realizations of laws which do not give mass to hyperplanes, we show that if ∑ i d i >d, this reconstruction problem has almost certainly a unique solution. This holds for any number of points in γ Z . A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance.
In this paper, we introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of Rd. We study the existence and … In this paper, we introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of Rd. We study the existence and uniqueness of this barycenter, we show how it is related to a larger multimarginal optimal transport problem, and we propose a dual formulation. Finally, we explain how to compute numerically this generalized barycenter on discrete distributions, and we propose an explicit solution for Gaussian distributions.
Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors … Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors have been explored in the literature, from simple ones expressing local properties to more involved ones exploiting image redundancy at a non-local scale. In a departure from explicit modelling, several recent works have proposed and studied the use of implicit priors defined by an image denoising algorithm. This approach, commonly known as Plug & Play (PnP) regularisation, can deliver remarkably accurate results, particularly when combined with state-of-the-art denoisers based on convolutional neural networks. However, the theoretical analysis of PnP Bayesian models and algorithms is difficult and works on the topic often rely on unrealistic assumptions on the properties of the image denoiser. This papers studies maximum-a-posteriori (MAP) estimation for Bayesian models with PnP priors. We first consider questions related to existence, stability and well-posedness, and then present a convergence proof for MAP computation by PnP stochastic gradient descent (PnP-SGD) under realistic assumptions on the denoiser used. We report a range of imaging experiments demonstrating PnP-SGD as well as comparisons with other PnP schemes.
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored … In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored domain, and existing methods could not compete with the performance of the best patch-based methods. The approach we introduce in this paper, called FastDVDnet, shows similar or better performance than other state-of-the-art competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model. The characteristics of its architecture make it possible to avoid using a costly motion compensation stage while achieving excellent performance. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics.
This paper deals with the reconstruction of a discrete measure $\gamma_Z$ on $\mathbb{R}^d$ from the knowledge of its pushforward measures $P_i\#\gamma_Z$ by linear applications $P_i: \mathbb{R}^d \rightarrow \mathbb{R}^{d_i}$ (for instance … This paper deals with the reconstruction of a discrete measure $\gamma_Z$ on $\mathbb{R}^d$ from the knowledge of its pushforward measures $P_i\#\gamma_Z$ by linear applications $P_i: \mathbb{R}^d \rightarrow \mathbb{R}^{d_i}$ (for instance projections onto subspaces). The measure $\gamma_Z$ being fixed, assuming that the rows of the matrices $P_i$ are independent realizations of laws which do not give mass to hyperplanes, we show that if $\sum_i d_i > d$, this reconstruction problem has almost certainly a unique solution. This holds for any number of points in $\gamma_Z$. A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance.
This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [100], [102] of a dictionary of atom … This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [100], [102] of a dictionary of atom diagrams. We introduce a multi-scale gradient descent approach for the efficient resolution of the corresponding minimization problem, which interleaves the optimization of the barycenter weights with the optimization of the atom diagrams. Our approach leverages the analytic expressions for the gradient of both sub-problems to ensure fast iterations and it additionally exploits shared-memory parallelism. Extensive experiments on public ensembles demonstrate the efficiency of our approach, with Wasserstein dictionary computations in the orders of minutes for the largest examples. We show the utility of our contributions in two applications. First, we apply Wassserstein dictionaries to data reduction and reliably compress persistence diagrams by concisely representing them with their weights in the dictionary. Second, we present a dimensionality reduction framework based on a Wasserstein dictionary defined with a small number of atoms (typically three) and encode the dictionary as a low dimensional simplex embedded in a visual space (typically in 2D). In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used to reproduce our results.
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it … The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(γ_Y, γ_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.
The Gromov-Wasserstein (GW) distance is frequently used in machine learning to compare distributions across distinct metric spaces. Despite its utility, it remains computationally intensive, especially for large-scale problems. Recently, a … The Gromov-Wasserstein (GW) distance is frequently used in machine learning to compare distributions across distinct metric spaces. Despite its utility, it remains computationally intensive, especially for large-scale problems. Recently, a novel Wasserstein distance specifically tailored for Gaussian mixture models and known as MW (mixture Wasserstein) has been introduced by several authors. In scenarios where data exhibit clustering, this approach simplifies to a small-scale discrete optimal transport problem, which complexity depends solely on the number of Gaussian components in the GMMs. This paper aims to extend MW by introducing new Gromov-type distances. These distances are designed to be isometry-invariant in Euclidean spaces and are applicable for comparing GMMs across different dimensional spaces. Our first contribution is the Mixture Gromov Wasserstein distance (MGW), which can be viewed as a Gromovized version of MW. This new distance has a straightforward discrete formulation, making it highly efficient for estimating distances between GMMs in practical applications. To facilitate the derivation of a transport plan between GMMs, we present a second distance, the Embedded Wasserstein distance (EW). This distance turns out to be closely related to several recent alternatives to Gromov-Wasserstein. We show that EW can be adapted to derive a distance as well as optimal transportation plans between GMMs. We demonstrate the efficiency of these newly proposed distances on medium to large-scale problems, including shape matching and hyperspectral image color transfer.
This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [100], [102] of a dictionary of atom … This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [100], [102] of a dictionary of atom diagrams.We introduce a multi-scale gradient descent approach for the efficient resolution of the corresponding minimization problem, which interleaves the optimization of the barycenter weights with the optimization of the atom diagrams.Our approach leverages the analytic expressions for the gradient of both sub-problems to ensure fast iterations and it additionally exploits shared-memory parallelism.Extensive experiments on public ensembles demonstrate the efficiency of our approach, with Wasserstein dictionary computations in the orders of minutes for the largest examples.We show the utility of our contributions in two applications.First, we apply Wassserstein dictionaries to data reduction and reliably compress persistence diagrams by concisely representing them with their weights in the dictionary.Second, we present a dimensionality reduction framework based on a Wasserstein dictionary defined with a small number of atoms (typically three) and encode the dictionary as a low dimensional simplex embedded in a visual space (typically in 2D).In both applications, quantitative experiments assess the relevance of our framework.Finally, we provide a C++ implementation that can be used to reproduce our results.
In this paper we introduce a Wasserstein-type distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal … In this paper we introduce a Wasserstein-type distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal transport problem to Gaussian mixture models. We derive a very simple discrete formulation for this distance, which makes it suitable for high dimensional problems. We also study the corresponding multi-marginal and barycenter formulations. We show some properties of this Wasserstein-type distance, and we illustrate its practical use with some examples in image processing.
We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $\nu$ and the image by $g$ of a … We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $\nu$ and the image by $g$ of a source measure $\mu$. This is relevant when an OT map from $\mu$ to $\nu$ does not exist or does not satisfy the desired constraints of $G$. We address existence and uniqueness for generic subclasses of $L$-Lipschitz functions, including gradients of (strongly) convex functions and typical Neural Networks. We explore a variant that approaches a transport plan, showing equivalence to a map problem in some cases. For the squared Euclidean cost, we propose alternating minimisation over a transport plan $\pi$ and map $g$, with the optimisation over $g$ being the $L^2$ projection on $G$ of the barycentric mapping $\overline{\pi}$. In dimension one, this global problem equates the $L^2$ projection of $\overline{\pi^*}$ onto $G$ for an OT plan $\pi^*$ between $\mu$ and $\nu$, but this does not extend to higher dimensions. We introduce a simple kernel method to find $g$ within a Reproducing Kernel Hilbert Space in the discrete case. Finally, we present numerical methods for $L$-Lipschitz gradients of $\ell$-strongly convex potentials.
Wasserstein barycentres represent average distributions between multiple probability measures for the Wasserstein distance. The numerical computation of Wasserstein barycentres is notoriously challenging. A common approach is to use Sinkhorn iterations, … Wasserstein barycentres represent average distributions between multiple probability measures for the Wasserstein distance. The numerical computation of Wasserstein barycentres is notoriously challenging. A common approach is to use Sinkhorn iterations, where an entropic regularisation term is introduced to make the problem more manageable. Another approach involves using fixed-point methods, akin to those employed for computing Fr\'echet means on manifolds. The convergence of such methods for 2-Wasserstein barycentres, specifically with a quadratic cost function and absolutely continuous measures, was studied by Alvarez-Esteban et al. (2016). In this paper, we delve into the main ideas behind this fixed-point method and explore how it can be generalised to accommodate more diverse transport costs and generic probability measures, thereby extending its applicability to a broader range of problems. We show convergence results for this approach and illustrate its numerical behaviour on several barycentre problems.
This paper serves as a user guide to sampling strategies for sliced optimal transport. We provide reminders and additional regularity results on the Sliced Wasserstein distance. We detail the construction … This paper serves as a user guide to sampling strategies for sliced optimal transport. We provide reminders and additional regularity results on the Sliced Wasserstein distance. We detail the construction methods, generation time complexity, theoretical guarantees, and conditions for each strategy. Additionally, we provide insights into their suitability for sliced optimal transport in theory. Extensive experiments on both simulated and real-world data offer a representative comparison of the strategies, culminating in practical recommendations for their best usage.
This paper serves as a user guide to sampling strategies for sliced optimal transport. We provide reminders and additional regularity results on the Sliced Wasserstein distance. We detail the construction … This paper serves as a user guide to sampling strategies for sliced optimal transport. We provide reminders and additional regularity results on the Sliced Wasserstein distance. We detail the construction methods, generation time complexity, theoretical guarantees, and conditions for each strategy. Additionally, we provide insights into their suitability for sliced optimal transport in theory. Extensive experiments on both simulated and real-world data offer a representative comparison of the strategies, culminating in practical recommendations for their best usage.
Wasserstein barycentres represent average distributions between multiple probability measures for the Wasserstein distance. The numerical computation of Wasserstein barycentres is notoriously challenging. A common approach is to use Sinkhorn iterations, … Wasserstein barycentres represent average distributions between multiple probability measures for the Wasserstein distance. The numerical computation of Wasserstein barycentres is notoriously challenging. A common approach is to use Sinkhorn iterations, where an entropic regularisation term is introduced to make the problem more manageable. Another approach involves using fixed-point methods, akin to those employed for computing Fr\'echet means on manifolds. The convergence of such methods for 2-Wasserstein barycentres, specifically with a quadratic cost function and absolutely continuous measures, was studied by Alvarez-Esteban et al. (2016). In this paper, we delve into the main ideas behind this fixed-point method and explore how it can be generalised to accommodate more diverse transport costs and generic probability measures, thereby extending its applicability to a broader range of problems. We show convergence results for this approach and illustrate its numerical behaviour on several barycentre problems.
This paper deals with the reconstruction of a discrete measure γ Z on ℝ d from the knowledge of its pushforward measures P i #γ Z by linear applications P … This paper deals with the reconstruction of a discrete measure γ Z on ℝ d from the knowledge of its pushforward measures P i #γ Z by linear applications P i :ℝ d →ℝ d i (for instance projections onto subspaces). The measure γ Z being fixed, assuming that the rows of the matrices P i are independent realizations of laws which do not give mass to hyperplanes, we show that if ∑ i d i >d, this reconstruction problem has almost certainly a unique solution. This holds for any number of points in γ Z . A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance.
We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $\nu$ and the image by $g$ of a … We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $\nu$ and the image by $g$ of a source measure $\mu$. This is relevant when an OT map from $\mu$ to $\nu$ does not exist or does not satisfy the desired constraints of $G$. We address existence and uniqueness for generic subclasses of $L$-Lipschitz functions, including gradients of (strongly) convex functions and typical Neural Networks. We explore a variant that approaches a transport plan, showing equivalence to a map problem in some cases. For the squared Euclidean cost, we propose alternating minimisation over a transport plan $\pi$ and map $g$, with the optimisation over $g$ being the $L^2$ projection on $G$ of the barycentric mapping $\overline{\pi}$. In dimension one, this global problem equates the $L^2$ projection of $\overline{\pi^*}$ onto $G$ for an OT plan $\pi^*$ between $\mu$ and $\nu$, but this does not extend to higher dimensions. We introduce a simple kernel method to find $g$ within a Reproducing Kernel Hilbert Space in the discrete case. Finally, we present numerical methods for $L$-Lipschitz gradients of $\ell$-strongly convex potentials.
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it … The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the SW energy. In this paper we study the properties of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E colon upper Y long right-arrow from bar normal upper S normal upper W Subscript 2 Superscript 2 Baseline left-parenthesis gamma Subscript upper Y Baseline comma gamma Subscript upper Z Baseline right-parenthesis"> <mml:semantics> <mml:mrow> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mo>:</mml:mo> <mml:mi>Y</mml:mi> <mml:mo stretchy="false">⟼</mml:mo> <mml:msubsup> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi mathvariant="normal">S</mml:mi> <mml:mi mathvariant="normal">W</mml:mi> </mml:mrow> <mml:mn>2</mml:mn> <mml:mn>2</mml:mn> </mml:msubsup> <mml:mo stretchy="false">(</mml:mo> <mml:msub> <mml:mi>γ</mml:mi> <mml:mi>Y</mml:mi> </mml:msub> <mml:mo>,</mml:mo> <mml:msub> <mml:mi>γ</mml:mi> <mml:mi>Z</mml:mi> </mml:msub> <mml:mo stretchy="false">)</mml:mo> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {E}: Y \longmapsto \mathrm {SW}_2^2(\gamma _Y, \gamma _Z)</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper Y element-of double-struck upper R Superscript n times d"> <mml:semantics> <mml:mrow> <mml:mi>Y</mml:mi> <mml:mo>∈</mml:mo> <mml:msup> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi mathvariant="double-struck">R</mml:mi> </mml:mrow> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>n</mml:mi> <mml:mo>×</mml:mo> <mml:mi>d</mml:mi> </mml:mrow> </mml:msup> </mml:mrow> <mml:annotation encoding="application/x-tex">Y \in \mathbb {R}^{n \times d}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E Subscript p"> <mml:semantics> <mml:msub> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mi>p</mml:mi> </mml:msub> <mml:annotation encoding="application/x-tex">\mathcal {E}_p</mml:annotation> </mml:semantics> </mml:math> </inline-formula> (estimating the expectation in SW using only <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="p"> <mml:semantics> <mml:mi>p</mml:mi> <mml:annotation encoding="application/x-tex">p</mml:annotation> </mml:semantics> </mml:math> </inline-formula> samples) and show convergence results on the critical points of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E Subscript p"> <mml:semantics> <mml:msub> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mi>p</mml:mi> </mml:msub> <mml:annotation encoding="application/x-tex">\mathcal {E}_p</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to those of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {E}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E Subscript p"> <mml:semantics> <mml:msub> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mi>p</mml:mi> </mml:msub> <mml:annotation encoding="application/x-tex">\mathcal {E}_p</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {E}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper E Subscript p"> <mml:semantics> <mml:msub> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">E</mml:mi> </mml:mrow> <mml:mi>p</mml:mi> </mml:msub> <mml:annotation encoding="application/x-tex">\mathcal {E}_p</mml:annotation> </mml:semantics> </mml:math> </inline-formula> converge towards (Clarke) critical points of these energies.
In this paper, we introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of Rd. We study the existence and … In this paper, we introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of Rd. We study the existence and uniqueness of this barycenter, we show how it is related to a larger multimarginal optimal transport problem, and we propose a dual formulation. Finally, we explain how to compute numerically this generalized barycenter on discrete distributions, and we propose an explicit solution for Gaussian distributions.
This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [100], [102] of a dictionary of atom … This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [100], [102] of a dictionary of atom diagrams.We introduce a multi-scale gradient descent approach for the efficient resolution of the corresponding minimization problem, which interleaves the optimization of the barycenter weights with the optimization of the atom diagrams.Our approach leverages the analytic expressions for the gradient of both sub-problems to ensure fast iterations and it additionally exploits shared-memory parallelism.Extensive experiments on public ensembles demonstrate the efficiency of our approach, with Wasserstein dictionary computations in the orders of minutes for the largest examples.We show the utility of our contributions in two applications.First, we apply Wassserstein dictionaries to data reduction and reliably compress persistence diagrams by concisely representing them with their weights in the dictionary.Second, we present a dimensionality reduction framework based on a Wasserstein dictionary defined with a small number of atoms (typically three) and encode the dictionary as a low dimensional simplex embedded in a visual space (typically in 2D).In both applications, quantitative experiments assess the relevance of our framework.Finally, we provide a C++ implementation that can be used to reproduce our results.
This paper deals with the reconstruction of a discrete measure $\gamma_Z$ on $\mathbb{R}^d$ from the knowledge of its pushforward measures $P_i\#\gamma_Z$ by linear applications $P_i: \mathbb{R}^d \rightarrow \mathbb{R}^{d_i}$ (for instance … This paper deals with the reconstruction of a discrete measure $\gamma_Z$ on $\mathbb{R}^d$ from the knowledge of its pushforward measures $P_i\#\gamma_Z$ by linear applications $P_i: \mathbb{R}^d \rightarrow \mathbb{R}^{d_i}$ (for instance projections onto subspaces). The measure $\gamma_Z$ being fixed, assuming that the rows of the matrices $P_i$ are independent realizations of laws which do not give mass to hyperplanes, we show that if $\sum_i d_i > d$, this reconstruction problem has almost certainly a unique solution. This holds for any number of points in $\gamma_Z$. A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance.
This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [100], [102] of a dictionary of atom … This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [100], [102] of a dictionary of atom diagrams. We introduce a multi-scale gradient descent approach for the efficient resolution of the corresponding minimization problem, which interleaves the optimization of the barycenter weights with the optimization of the atom diagrams. Our approach leverages the analytic expressions for the gradient of both sub-problems to ensure fast iterations and it additionally exploits shared-memory parallelism. Extensive experiments on public ensembles demonstrate the efficiency of our approach, with Wasserstein dictionary computations in the orders of minutes for the largest examples. We show the utility of our contributions in two applications. First, we apply Wassserstein dictionaries to data reduction and reliably compress persistence diagrams by concisely representing them with their weights in the dictionary. Second, we present a dimensionality reduction framework based on a Wasserstein dictionary defined with a small number of atoms (typically three) and encode the dictionary as a low dimensional simplex embedded in a visual space (typically in 2D). In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used to reproduce our results.
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it … The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(γ_Y, γ_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.
The Gromov-Wasserstein (GW) distance is frequently used in machine learning to compare distributions across distinct metric spaces. Despite its utility, it remains computationally intensive, especially for large-scale problems. Recently, a … The Gromov-Wasserstein (GW) distance is frequently used in machine learning to compare distributions across distinct metric spaces. Despite its utility, it remains computationally intensive, especially for large-scale problems. Recently, a novel Wasserstein distance specifically tailored for Gaussian mixture models and known as MW (mixture Wasserstein) has been introduced by several authors. In scenarios where data exhibit clustering, this approach simplifies to a small-scale discrete optimal transport problem, which complexity depends solely on the number of Gaussian components in the GMMs. This paper aims to extend MW by introducing new Gromov-type distances. These distances are designed to be isometry-invariant in Euclidean spaces and are applicable for comparing GMMs across different dimensional spaces. Our first contribution is the Mixture Gromov Wasserstein distance (MGW), which can be viewed as a Gromovized version of MW. This new distance has a straightforward discrete formulation, making it highly efficient for estimating distances between GMMs in practical applications. To facilitate the derivation of a transport plan between GMMs, we present a second distance, the Embedded Wasserstein distance (EW). This distance turns out to be closely related to several recent alternatives to Gromov-Wasserstein. We show that EW can be adapted to derive a distance as well as optimal transportation plans between GMMs. We demonstrate the efficiency of these newly proposed distances on medium to large-scale problems, including shape matching and hyperspectral image color transfer.
Abstract Gromov–Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein … Abstract Gromov–Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein distances for comparing probability measures living on Euclidean spaces of different dimensions. We focus on the Gromov–Wasserstein distance with a ground cost defined as the squared Euclidean distance, and we study the form of the optimal plan between Gaussian distributions. We show that when the optimal plan is restricted to Gaussian distributions, the problem has a very simple linear solution, which is also a solution of the linear Gromov–Monge problem. We also study the problem without restriction on the optimal plan, and provide lower and upper bounds for the value of the Gromov–Wasserstein distance between Gaussian distributions.
Since the seminal work of Venkatakrishnan, Bouman, and Wohlberg [Proceedings of the Global Conference on Signal and Information Processing, IEEE, 2013, pp. 945--948] in 2013, Plug & Play (PnP) methods … Since the seminal work of Venkatakrishnan, Bouman, and Wohlberg [Proceedings of the Global Conference on Signal and Information Processing, IEEE, 2013, pp. 945--948] in 2013, Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that is implicitly defined by an image denoising algorithm. In the case of optimization schemes, some recent works guarantee the convergence to a fixed point, albeit not necessarily a maximum a posteriori Bayesian estimate. In the case of Monte Carlo sampling schemes for general Bayesian computation, to the best of our knowledge there is no known proof of convergence. Algorithm convergence issues aside, there are important open questions regarding whether the underlying Bayesian models and estimators are well defined, are well posed, and have the basic regularity properties required to support efficient Bayesian computation schemes. This paper develops theory for Bayesian analysis and computation with PnP priors. We introduce PnP-ULA (Plug & Play unadjusted Langevin algorithm) for Monte Carlo sampling and minimum mean square error estimation. Using recent results on the quantitative convergence of Markov chains, we establish detailed convergence guarantees for this algorithm under realistic assumptions on the denoising operators used, with special attention to denoisers based on deep neural networks. We also show that these algorithms approximately target a decision-theoretically optimal Bayesian model that is well posed and meaningful from a frequentist viewpoint. PnP-ULA is demonstrated on several canonical problems such as image deblurring and inpainting, where it is used for point estimation as well as for uncertainty visualization and quantification.
Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors … Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors have been explored in the literature, from simple ones expressing local properties to more involved ones exploiting image redundancy at a non-local scale. In a departure from explicit modelling, several recent works have proposed and studied the use of implicit priors defined by an image denoising algorithm. This approach, commonly known as Plug & Play (PnP) regularisation, can deliver remarkably accurate results, particularly when combined with state-of-the-art denoisers based on convolutional neural networks. However, the theoretical analysis of PnP Bayesian models and algorithms is difficult and works on the topic often rely on unrealistic assumptions on the properties of the image denoiser. This papers studies maximum-a-posteriori (MAP) estimation for Bayesian models with PnP priors. We first consider questions related to existence, stability and well-posedness, and then present a convergence proof for MAP computation by PnP stochastic gradient descent (PnP-SGD) under realistic assumptions on the denoiser used. We report a range of imaging experiments demonstrating PnP-SGD as well as comparisons with other PnP schemes.
This paper presents a novel method for restoring digital videos via a Deep Plug-and-Play (PnP) approach. Under a Bayesian formalism, the method consists in using a deep convolutional denoising network … This paper presents a novel method for restoring digital videos via a Deep Plug-and-Play (PnP) approach. Under a Bayesian formalism, the method consists in using a deep convolutional denoising network in place of the proximal operator of the prior in an alternating optimization scheme. We distinguish ourselves from prior PnP work by directly applying that method to restore a digital video from a degraded video observation. This way, a network trained once for denoising can be repurposed for other video restoration tasks. Our experiments in video deblurring, super-resolution, and interpolation of random missing pixels all show a clear benefit to using a network specifically designed for video denoising, as it yields better restoration performance and better temporal stability than a single image network with similar denoising performance using the same PnP formulation. Moreover, our method compares favorably to applying a different state-of-the-art PnP scheme separately on each frame of the sequence. This opens new perspectives in the field of video restoration.
Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In … Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In this work, we call them "push-forward" models and study their expressivity. We show that the Lipschitz constant of these generative networks has to be large in order to fit multimodal distributions. More precisely, we show that the total variation distance and the Kullback-Leibler divergence between the generated and the data distribution are bounded from below by a constant depending on the mode separation and the Lipschitz constant. Since constraining the Lipschitz constants of neural networks is a common way to stabilize generative models, there is a provable trade-off between the ability of push-forward models to approximate multimodal distributions and the stability of their training. We validate our findings on one-dimensional and image datasets and empirically show that generative models consisting of stacked networks with stochastic input at each step, such as diffusion models do not suffer of such limitations.
This paper presents a unified computational framework for the estimation of distances, geodesics and barycenters of merge trees. We extend recent work on the edit distance [106] and introduce a … This paper presents a unified computational framework for the estimation of distances, geodesics and barycenters of merge trees. We extend recent work on the edit distance [106] and introduce a new metric, called the Wasserstein distance between merge trees, which is purposely designed to enable efficient computations of geodesics and barycenters. Specifically, our new distance is strictly equivalent to the L2-Wasserstein distance between extremum persistence diagrams, but it is restricted to a smaller solution space, namely, the space of rooted partial isomorphisms between branch decomposition trees. This enables a simple extension of existing optimization frameworks [112] for geodesics and barycenters from persistence diagrams to merge trees. We introduce a task-based algorithm which can be generically applied to distance, geodesic, barycenter or cluster computation. The task-based nature of our approach enables further accelerations with shared-memory parallelism. Extensive experiments on public ensembles and SciVis contest benchmarks demonstrate the efficiency of our approach -- with barycenter computations in the orders of minutes for the largest examples -- as well as its qualitative ability to generate representative barycenter merge trees, visually summarizing the features of interest found in the ensemble. We show the utility of our contributions with dedicated visualization applications: feature tracking, temporal reduction and ensemble clustering. We provide a lightweight C++ implementation that can be used to reproduce our results.
HDR+ is an image processing pipeline presented by Google in 2016. At its core lies a denoising algorithm that uses a burst of raw images to produce a single higher … HDR+ is an image processing pipeline presented by Google in 2016. At its core lies a denoising algorithm that uses a burst of raw images to produce a single higher quality image. Since it is designed as a versatile solution for smartphone cameras, it does not necessarily aim for the maximization of standard denoising metrics, but rather for the production of natural, visually pleasing images. In this article, we specifically discuss and analyze the HDR+ burst denoising algorithm architecture and the impact of its various parameters. With this publication, we provide an open source Python implementation of the algorithm, along with an interactive demo.
Since the seminal work of Venkatakrishnan et al. [83] in 2013, Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive Minimum Mean Square Error (MMSE) … Since the seminal work of Venkatakrishnan et al. [83] in 2013, Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive Minimum Mean Square Error (MMSE) or Maximum A Posteriori (MAP) estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that is implicitly defined by an image denoising algorithm. The PnP algorithms proposed in the literature mainly differ in the iterative schemes they use for optimisation or for sampling. In the case of optimisation schemes, some recent works guarantee the convergence to a fixed point, albeit not necessarily a MAP estimate. In the case of sampling schemes, to the best of our knowledge, there is no known proof of convergence. There also remain important open questions regarding whether the underlying Bayesian models and estimators are well defined, well-posed, and have the basic regularity properties required to support these numerical schemes. To address these limitations, this paper develops theory, methods, and provably convergent algorithms for performing Bayesian inference with PnP priors. We introduce two algorithms: 1) PnP-ULA (Plug & Play Unadjusted Langevin Algorithm) for Monte Carlo sampling and MMSE inference; and 2) PnP-SGD (Plug & Play Stochastic Gradient Descent) for MAP inference. Using recent results on the quantitative convergence of Markov chains, we establish detailed convergence guarantees for these two algorithms under realistic assumptions on the denoising operators used, with special attention to denoisers based on deep neural networks. We also show that these algorithms approximately target a decision-theoretically optimal Bayesian model that is well-posed. The proposed algorithms are demonstrated on several canonical problems such as image deblurring, inpainting, and denoising, where they are used for point estimation as well as for uncertainty visualisation and quantification.
The Gromov-Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein … The Gromov-Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein distances for comparing probability measures living on Euclidean spaces of different dimensions. In this paper, we focus on the Gromov-Wasserstein distance with a ground cost defined as the squared Euclidean distance and we study the form of the optimal plan between Gaussian distributions. We show that when the optimal plan is restricted to Gaussian distributions, the problem has a very simple linear solution, which is also solution of the linear Gromov-Monge problem. We also study the problem without restriction on the optimal plan, and provide lower and upper bounds for the value of the Gromov-Wasserstein distance between Gaussian distributions.
In this paper, we introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of R^d. We study the existence and … In this paper, we introduce a generalization of the Wasserstein barycenter, to a case where the initial probability measures live on different subspaces of R^d. We study the existence and uniqueness of this barycenter, we show how it is related to a larger multi-marginal optimal transport problem, and we propose a dual formulation. Finally, we explain how to compute numerically this generalized barycenter on discrete distributions, and we propose an explicit solution for Gaussian distributions.
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored … In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored domain, and existing methods could not compete with the performance of the best patch-based methods. The approach we introduce in this paper, called FastDVDnet, shows similar or better performance than other state-of-the-art competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model. The characteristics of its architecture make it possible to avoid using a costly motion compensation stage while achieving excellent performance. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics.
In this paper we introduce a Wasserstein-type distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal … In this paper we introduce a Wasserstein-type distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal transport problem to Gaussian mixture models. We derive a very simple discrete formulation for this distance, which makes it suitable for high dimensional problems. We also study the corresponding multi-marginal and barycenter formulations. We show some properties of this Wasserstein-type distance, and we illustrate its practical use with some examples in image processing.
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Previous neural network based approaches to video denoising have been unsuccessful as their … In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Previous neural network based approaches to video denoising have been unsuccessful as their performance cannot compete with the performance of patch-based methods. However, our approach outperforms other patch-based competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as a small memory footprint, and the ability to handle a wide range of noise levels with a single network model. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics. The experiments show that our algorithm compares favorably to other state-of-art methods. Video examples, code and models are publicly available at https://github.com/m-tassano/dvdnet.
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored … In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored domain, and existing methods could not compete with the performance of the best patch-based methods. The approach we introduce in this paper, called FastDVDnet, shows similar or better performance than other state-of-the-art competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model. The characteristics of its architecture make it possible to avoid using a costly motion compensation stage while achieving excellent performance. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics.
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored … In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored domain, and existing methods could not compete with the performance of the best patch-based methods. The approach we introduce in this paper, called FastDVDnet, shows similar or better performance than other state-of-the-art competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model. The characteristics of its architecture make it possible to avoid using a costly motion compensation stage while achieving excellent performance. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics.
In this paper we introduce a Wasserstein-type distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal … In this paper we introduce a Wasserstein-type distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal transport problem to Gaussian mixture models. We derive a very simple discrete formulation for this distance, which makes it suitable for high dimensional problems. We also study the corresponding multi-marginal and barycenter formulations. We show some properties of this Wasserstein-type distance, and we illustrate its practical use with some examples in image processing.
Recently, impressive denoising results have been achieved by Bayesian approaches which assume Gaussian models for the image patches. This improvement in performance can be attributed to the use of per-patch … Recently, impressive denoising results have been achieved by Bayesian approaches which assume Gaussian models for the image patches. This improvement in performance can be attributed to the use of per-patch models. Unfortunately such an approach is particularly unstable for most inverse problems beyond denoising. In this paper, we propose the use of a hyperprior to model image patches, in order to stabilize the estimation procedure. There are two main advantages to the proposed restoration scheme: First, it is adapted to diagonal degradation matrices, and in particular to missing data problems (e.g., inpainting of missing pixels or zooming). Second, it can deal with signal dependent noise models, particularly suited to digital cameras. As such, the scheme is especially adapted to computational photography. In order to illustrate this point, we provide an application to high dynamic range imaging from a single image taken with a modified sensor, which shows the effectiveness of the proposed scheme.
In this paper, we introduce a class of indicators that enable to compute efficiently optimal transport plans associated to arbitrary distributions of N demands and M supplies in R in … In this paper, we introduce a class of indicators that enable to compute efficiently optimal transport plans associated to arbitrary distributions of N demands and M supplies in R in the case where the cost function is concave. The computational cost of these indicators is small and independent of N. A hierarchical use of them enables to obtain an efficient algorithm.
Consider the problem of optimally matching two measures on the circle, or equivalently two periodic measures on $\mathbb{R}$, and suppose that the cost $c(x,y)$ of matching two points x, y … Consider the problem of optimally matching two measures on the circle, or equivalently two periodic measures on $\mathbb{R}$, and suppose that the cost $c(x,y)$ of matching two points x, y satisfies the Monge condition: $c(x_1,y_1)+c(x_2,y_2)<c(x_1,y_2)+c(x_2,y_1)$ whenever $x_1<x_2$ and $y_1<y_2$. We introduce a notion of locally optimal transport plan, motivated by the weak KAM (Aubry–Mather) theory, and show that all locally optimal transport plans are conjugate to shifts and that the cost of a locally optimal transport plan is a convex function of a shift parameter. This theory is applied to a transportation problem arising in image processing: for two sets of point masses on the circle, both of which have the same total mass, find an optimal transport plan with respect to a given cost function c satisfying the Monge condition. In the circular case the sorting strategy fails to provide a unique candidate solution, and a naive approach requires a quadratic number of operations. For the case of N real-valued point masses we present an $O(N|\log\epsilon|)$ algorithm that approximates the optimal cost within $\epsilon$; when all masses are integer multiples of $1/M$, the algorithm gives an exact solution in $O(N\log M)$ operations.
This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport and its applications, in the special case of one-dimensional and circular distributions. More precisely, we … This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport and its applications, in the special case of one-dimensional and circular distributions. More precisely, we study the Monge-Kantorovich distances between discrete sets of points on the unit circle, in the case where the ground distance between two points x and y is defined as h(d(x,y)), where d is the geodesic distance on the circle and h a convex and increasing function. We first prove that computing a Monge-Kantorovich distance between two given sets of pairwise different points boils down to cut the circle at a well chosen point and to compute the same distance on the real line. This result is then used to obtain a metric between 1D and circular discrete histograms, which can be computed in linear time. A particular case of this formula has already been used in [Rabin, Delon and Gousseau SIAM 09}] for the matching of local features between images, involving circular histograms of gradient orientations. In this paper, other applications are investigated, in particular dealing with the hue component of color images. In a last part, a study is conducted to compare the advantages and drawbacks of transportation distances relying on convex or concave cost functions, and of the classical L-1 distance.
This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport and its applications, in the special case of one-dimensional and circular distributions. More precisely, we … This paper is devoted to the study of the Monge-Kantorovich theory of optimal mass transport and its applications, in the special case of one-dimensional and circular distributions. More precisely, we study the Monge-Kantorovich distances between discrete sets of points on the unit circle, in the case where the ground distance between two points x and y is defined as h(d(x,y)), where d is the geodesic distance on the circle and h a convex and increasing function. We first prove that computing a Monge-Kantorovich distance between two given sets of pairwise different points boils down to cut the circle at a well chosen point and to compute the same distance on the real line. This result is then used to obtain a metric between 1D and circular discrete histograms, which can be computed in linear time. A particular case of this formula has already been used in [Rabin, Delon and Gousseau SIAM 09}] for the matching of local features between images, involving circular histograms of gradient orientations. In this paper, other applications are investigated, in particular dealing with the hue component of color images. In a last part, a study is conducted to compare the advantages and drawbacks of transportation distances relying on convex or concave cost functions, and of the classical L-1 distance.