Weak convergence of adaptive Markov chain Monte Carlo

Type: Article
Publication Date: 2025-04-30
Citations: 0
DOI: https://doi.org/10.1017/jpr.2025.4

Abstract

Abstract We develop general conditions for weak convergence of adaptive Markov chain Monte Carlo processes and this is shown to imply a weak law of large numbers for bounded Lipschitz continuous functions. This allows an estimation theory for adaptive Markov chain Monte Carlo where previously developed theory in total variation may fail or be difficult to establish. Extensions of weak convergence to general Wasserstein distances are established, along with a weak law of large numbers for possibly unbounded Lipschitz functions. Applications are applied to autoregressive processes in various settings, unadjusted Langevin processes, and adaptive Metropolis–Hastings.

Locations

  • Journal of Applied Probability
This article develops general conditions for weak convergence of adaptive Markov chain Monte Carlo processes and is shown to imply a weak law of large numbers for bounded Lipschitz continuous … This article develops general conditions for weak convergence of adaptive Markov chain Monte Carlo processes and is shown to imply a weak law of large numbers for bounded Lipschitz continuous functions. This allows an estimation theory for adaptive Markov chain Monte Carlo where previously developed theory in total variation may fail or be difficult to establish. Extensions of weak convergence to general Wasserstein distances are established along with a weak law of large numbers for possibly unbounded Lipschitz functions. Applications are applied to auto-regressive processes in various settings, unadjusted Langevin processes, and adaptive Metropolis-Hastings.
We investigate lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo under any adaptation strategy. In particular, we prove general lower bounds in total variation and on … We investigate lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo under any adaptation strategy. In particular, we prove general lower bounds in total variation and on the weak convergence rate under general adaptation plans. If the adaptation diminishes sufficiently fast, we also develop comparable convergence rate upper bounds that are capable of approximately matching the convergence rate in the subgeometric lower bound. These results provide insight into the optimal design of adaptation strategies and also limitations on the convergence behavior of adaptive Markov chain Monte Carlo. Applications to an adaptive unadjusted Langevin algorithm as well as adaptive Metropolis-Hastings with independent proposals and random-walk proposals are explored.
We prove a strong law of large numbers for a class of strongly mixing processes. Our result rests on recent advances in understanding of concentration of measure. It is simple … We prove a strong law of large numbers for a class of strongly mixing processes. Our result rests on recent advances in understanding of concentration of measure. It is simple to apply and gives finite-sample (as opposed to asymptotic) bounds, with readily computable rate constants. In particular, this makes it suitable for analysis of inhomogeneous Markov processes. We demonstrate how it can be applied to establish an almost-sure convergence result for a class of models that includes as a special case a class of adaptive Markov chain Monte Carlo algorithms.
We prove a strong law of large numbers for a class of strongly mixing processes. Our result rests on recent advances in understanding of concentration of measure. It is simple … We prove a strong law of large numbers for a class of strongly mixing processes. Our result rests on recent advances in understanding of concentration of measure. It is simple to apply and gives finite-sample (as opposed to asymptotic) bounds, with readily computable rate constants. In particular, this makes it suitable for analysis of inhomogeneous Markov processes. We demonstrate how it can be applied to establish an almost-sure convergence result for a class of models that includes as a special case a class of adaptive Markov chain Monte Carlo algorithms.
Many tools are available to bound the convergence rate of Markov chains in total variation (TV) distance. Such results can be used to establish central limit theorems (CLT) that enable … Many tools are available to bound the convergence rate of Markov chains in total variation (TV) distance. Such results can be used to establish central limit theorems (CLT) that enable error evaluations of Monte Carlo estimates in practice. However, convergence analysis based on TV distance is often non-scalable to high-dimensional Markov chains (Qin and Hobert (2018); Rajaratnam and Sparks (2015)). Alternatively, robust bounds in Wasserstein distance are often easier to obtain, thanks to a coupling argument. Our work is concerned with the implication of such convergence results, in particular, do they lead to CLTs of the corresponding Markov chains? One indirect and typically non-trivial way is to first convert Wasserstein bounds into total variation bounds. Alternatively, we provide two CLTs that directly depend on (sub-geometric) convergence rates in Wasserstein distance. Our CLTs hold for Lipschitz functions under certain moment conditions. Finally, we apply these CLTs to four sets of Markov chain examples including a class of nonlinear autoregressive processes, an exponential integrator version of the metropolis adjusted Langevin algorithm (EI-MALA), an unadjusted Langevin algorithm (ULA), and a special autoregressive model that generates reducible chains.
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content. A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
We consider adaptive increasingly rare Markov chain Monte Carlo (AIR MCMC), which is an adaptive MCMC method, where the adaptation concerning the past happens less and less frequently over time. … We consider adaptive increasingly rare Markov chain Monte Carlo (AIR MCMC), which is an adaptive MCMC method, where the adaptation concerning the past happens less and less frequently over time. Under a contraction assumption for a Wasserstein-like function we deduce upper bounds of the convergence rate of Monte Carlo sums taking a renormalisation factor into account that is close to the one that appears in a law of the iterated logarithm. We demonstrate the applicability of our results by considering different settings, among which are those of simultaneous geometric and uniform ergodicity. All proofs are carried out on an augmented state space, including the classical non-augmented setting as a special case. In contrast to other adaptive MCMC limit theory, some technical assumptions, like diminishing adaptation, are not needed.
Abstract Abstract Convergence rates of adaptive algorithms for weak approximations of Itoˆ stochastic differential equations are proved for the Monte Carlo Euler method. Two algorithms based either on optimal stochastic … Abstract Abstract Convergence rates of adaptive algorithms for weak approximations of Itoˆ stochastic differential equations are proved for the Monte Carlo Euler method. Two algorithms based either on optimal stochastic time steps or optimal deterministic time steps are studied. The analysis of their computational complexity combines the error expansions with a posteriori leading order term introduced in Szepessy et al. [Szepessy, A., R. Tempone, and G. Zouraris. 2001. Comm. Pure Appl. Math. 54:1169–1214] and an extension of the convergence results for adaptive algorithms approximating deterministic ordinary differential equations, derived in Moon et al. [Moon, K.-S., A. Szepessy, R. Tempone, and G. Zouraris. 2003. Numer. Math. 93:99–129]. The main step in the extension is the proof of the almost sure convergence of the error density. Both adaptive alogrithms are proven to stop with asymptotically optimal number of steps up to a problem independent factor defined in the algorithm. Numerical examples illustrate the behavior of the adaptive algorithms, motivating when stochastic and deterministic adaptive time steps are more efficient than constant time steps and when adaptive stochastic steps are more efficient than adaptive deterministic steps. Key Words: Adaptive mesh refinement algorithmAlmost sure convergenceComputational complexityMonte Carlo methodStochastic differential equationsMathematics Subject Classificaton: 65C3065Y2065L5060H35 Acknowledgment This work is supported by the Swedish Research Council grants 2002-6285 and 2002-4961, UdelaR and UdeM in Uruguay, and the European network HYKE, funded by the EC as contract HPRN-CT-2002-00282.
The strong approximations of a class of non-homogenous Markov chains by Wiener processes are considered. As an application, the strong consistency, the law of the iterated logarithm and the weak … The strong approximations of a class of non-homogenous Markov chains by Wiener processes are considered. As an application, the strong consistency, the law of the iterated logarithm and the weak convergence are established for the Markov chain adaptive designs in clinical trials.
We consider basic ergodicity properties of adaptive Markov chain Monte Carlo algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. … We consider basic ergodicity properties of adaptive Markov chain Monte Carlo algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. We also give counterexamples to demonstrate that the assumptions we make are not redundant.
We consider basic ergodicity properties of adaptive Markov chain Monte Carlo algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. … We consider basic ergodicity properties of adaptive Markov chain Monte Carlo algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. We also give counterexamples to demonstrate that the assumptions we make are not redundant.
We consider basic ergodicity properties of adaptive Markov chain Monte Carlo algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. … We consider basic ergodicity properties of adaptive Markov chain Monte Carlo algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. We also give counterexamples to demonstrate that the assumptions we make are not redundant.
Abstract We study a class of adaptive Markov Chain Monte Carlo (MCMC) processes which aim at behaving as an optimal target process via a learning procedure. We show, under appropriate … Abstract We study a class of adaptive Markov Chain Monte Carlo (MCMC) processes which aim at behaving as an optimal target process via a learning procedure. We show, under appropriate conditions, that the adaptive process and the optimal (nonadaptive) MCMC process share identical asymptotic properties. The special case of adaptive MCMC algorithms governed by stochastic approximation is considered in details and we apply our results to the adaptive Metropolis algorithm of Haario et al. (2001). We also propose a new class of adaptive MCMC algorithms, called quasi-perfect adaptive MCMC which possesses appealing theoretical and practical properties, as demonstrated through numerical simulations.
We introduce an explicit adaptive Milstein method for stochastic differential equations (SDEs) with no commutativity condition. The drift and diffusion are separately locally Lipschitz and together satisfy a monotone condition. … We introduce an explicit adaptive Milstein method for stochastic differential equations (SDEs) with no commutativity condition. The drift and diffusion are separately locally Lipschitz and together satisfy a monotone condition. This method relies on a class of path-bounded time-stepping strategies which work by reducing the stepsize as solutions approach the boundary of a sphere, invoking a backstop method in the event that the timestep becomes too small. We prove that such schemes are strongly $L_2$ convergent of order one. This order is inherited by an explicit adaptive Euler-Maruyama scheme in the additive noise case. Moreover we show that the probability of using the backstop method at any step can be made arbitrarily small. We compare our method to other fixed-step Milstein variants on a range of test problems.
In the thesis, we study ergodicity of adaptive Markov Chain Monte Carlo methods (MCMC) based on two conditions (Diminishing Adaptation and Containment which together imply ergodicity), explain the advantages of … In the thesis, we study ergodicity of adaptive Markov Chain Monte Carlo methods (MCMC) based on two conditions (Diminishing Adaptation and Containment which together imply ergodicity), explain the advantages of adaptive MCMC, and apply the theoretical result for some applications. First we show several facts: 1. Diminishing Adaptation alone may not guarantee ergodicity; 2. Containment is not necessary for ergodicity; 3. under some additional condition, Containment is necessary for ergodicity. Since Diminishing Adaptation is relatively easy to check and Containment is abstract, we focus on the sufficient conditions of Containment. In order to study Containment, we consider the quantitative bounds of the distance between samplers and targets in total variation norm. From early results, the quantitative bounds are connected with nested drift conditions for polynomial rates of convergence. For ergodicity of adaptive MCMC, assuming that all samplers simultaneously satisfy nested polynomial drift conditions, we find that either when the number of nested drift conditions is greater than or equal to two, or when the number of drift conditions with some specific form is one, the adaptive MCMC algorithm is ergodic. For adaptive MCMC algorithm with Markovian adaptation, the algorithm satisfying simultaneous polynomial ergodicity is ergodic without those restrictions. We also discuss some recent results related to this topic. Second we consider ergodicity of certain adaptive Markov Chain Monte Carlo algorithms for multidimensional target distributions, in particular, adaptive Metropolis and adaptive Metropolis-within-Gibbs algorithms. We derive various sufficient conditions to ensure Containment, and connect the convergence rates of algorithms with the tail properties of the corresponding target distributions. We also present a Summable Adaptive Condition which, when satisfied, proves ergodicity more easily. Finally, we propose a simple adaptive Metropolis-within-Gibbs algorithm attempting to study directions on which the Metropolis algorithm can be run flexibly. The algorithm avoids the wasting moves in wrong directions by proposals from the full dimensional adaptive Metropolis algorithm. We also prove its ergodicity, and test it on a Gaussian Needle example and a real-life Case-Cohort study with competing risks. For the Cohort study, we describe an extensive version of Competing Risks Regression model, define censor variables for competing risks, and then apply the algorithm to estimate coefficients based on the posterior distribution.
A Metropolis-Hastings step is widely used for gradient-based Markov chain Monte Carlo methods in uncertainty quantification. By calculating acceptance probabilities on batches, a stochastic Metropolis-Hastings step saves computational costs, but … A Metropolis-Hastings step is widely used for gradient-based Markov chain Monte Carlo methods in uncertainty quantification. By calculating acceptance probabilities on batches, a stochastic Metropolis-Hastings step saves computational costs, but reduces the effective sample size. We show that this obstacle can be avoided by a simple correction term. We study statistical properties of the resulting stationary distribution of the chain if the corrected stochastic Metropolis-Hastings approach is applied to sample from a Gibbs posterior distribution in a nonparametric regression setting. Focusing on deep neural network regression, we prove a PAC-Bayes oracle inequality which yields optimal contraction rates and we analyze the diameter and show high coverage probability of the resulting credible sets. With a numerical example in a high-dimensional parameter space, we illustrate that credible sets and contraction rates of the stochastic Metropolis-Hastings algorithm indeed behave similar to those obtained from the classical Metropolis-adjusted Langevin algorithm.
Now in its second edition, this book gives a systematic and self-contained presentation of basic results on stochastic evolution equations in infinite dimensional, typically Hilbert and Banach, spaces. In the … Now in its second edition, this book gives a systematic and self-contained presentation of basic results on stochastic evolution equations in infinite dimensional, typically Hilbert and Banach, spaces. In the first part the authors give a self-contained exposition of the basic properties of probability measure on separable Banach and Hilbert spaces, as required later; they assume a reasonable background in probability theory and finite dimensional stochastic processes. The second part is devoted to the existence and uniqueness of solutions of a general stochastic evolution equation, and the third concerns the qualitative properties of those solutions. Appendices gather together background results from analysis that are otherwise hard to find under one roof. This revised edition includes two brand new chapters surveying recent developments in the area and an even more comprehensive bibliography, making this book an essential and up-to-date resource for all those working in stochastic differential equations.
Introduction The Kantorovich duality Geometry of optimal transportation Brenier's polar factorization theorem The Monge-Ampere equation Displacement interpolation and displacement convexity Geometric and Gaussian inequalities The metric side of optimal transportation … Introduction The Kantorovich duality Geometry of optimal transportation Brenier's polar factorization theorem The Monge-Ampere equation Displacement interpolation and displacement convexity Geometric and Gaussian inequalities The metric side of optimal transportation A differential point of view on optimal transportation Entropy production and transportation inequalities Problems Bibliography Table of short statements Index.
We look at adaptive Markov chain Monte Carlo algorithms that generate stochastic processes based on sequences of transition kernels, where each transition kernel is allowed to depend on the history … We look at adaptive Markov chain Monte Carlo algorithms that generate stochastic processes based on sequences of transition kernels, where each transition kernel is allowed to depend on the history of the process. We show under certain conditions that the stochastic process generated is ergodic, with appropriate stationary distribution. We use this result to analyse an adaptive version of the random walk Metropolis algorithm where the scale parameter σ is sequentially adapted using a Robbins-Monro type algorithm in order to find the optimal scale parameter σopt. We close with a simulation example.
In this paper we consider a continuous-time method of approximating a given distribution using the Langevin diusion dL t dW t 1 2 r log (L t )dt.We ®nd conditions … In this paper we consider a continuous-time method of approximating a given distribution using the Langevin diusion dL t dW t 1 2 r log (L t )dt.We ®nd conditions under this diusion converges exponentially quickly to or does not: in one dimension, these are essentially that for distributions with exponential tails of the form (x) / exp (ÿ|x| , 0<<1, exponential convergence occurs if and only if 1.We then consider conditions under which the discrete approximations to the diusion converge.We ®rst show that even when the diusion itself converges, naive discretizations need not do so.We then consider a `Metropolis-adjusted' version of the algorithm, and ®nd conditions under which this also converges at an exponential rate: perhaps surprisingly, even the Metropolized version need not converge exponentially fast even if the diusion does.We brie¯y discuss a truncated form of the algorithm which, in practice, should avoid the diculties of the other forms.
A proper choice of a proposal distribution for Markov chain Monte Carlo methods, for example for the Metropolis-Hastings algorithm, is well known to be a crucial factor for the convergence … A proper choice of a proposal distribution for Markov chain Monte Carlo methods, for example for the Metropolis-Hastings algorithm, is well known to be a crucial factor for the convergence of the algorithm. In this paper we introduce an adaptive Metropolis (AM) algorithm, where the Gaussian proposal distribution is updated along the process using the full information cumulated so far. Due to the adaptive nature of the process, the AM algorithm is non-Markovian, but we establish here that it has the correct ergodic properties. We also include the results of our numerical tests, which indicate that the AM algorithm competes well with traditional Metropolis-Hastings algorithms, and demonstrate that the AM algorithm is easy to use in practical computation.
We consider basic ergodicity properties of adaptive Markov chain Monte Carlo algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. … We consider basic ergodicity properties of adaptive Markov chain Monte Carlo algorithms under minimal assumptions, using coupling constructions. We prove convergence in distribution and a weak law of large numbers. We also give counterexamples to demonstrate that the assumptions we make are not redundant.
Suppose {Pn(x, A)} denotes the transition law of a general state space Markov chain {Xn}. We find conditions under which weak convergence of {Xn} to a random variable X with … Suppose {Pn(x, A)} denotes the transition law of a general state space Markov chain {Xn}. We find conditions under which weak convergence of {Xn} to a random variable X with law L (essentially defined by ∝ Pn(x, dy) g(y) → ∝ L(dy) g(y) for bounded continuous g) implies that {Xn} tends to X in total variation (in the sense that ∥ Pn(x, .) − L ∥ → 0), which then shows that L is an invariant measure for {Xn}. The conditions we find involve some irreducibility assumptions on {Xn} and some continuity conditions on the one-step transition law {P(x, A)}.
We review and extend results related to optimal scaling of Metropolis–Hastings algorithms. We present various theoretical results for the high-dimensional limit. We also present simulation studies which confirm the theoretical … We review and extend results related to optimal scaling of Metropolis–Hastings algorithms. We present various theoretical results for the high-dimensional limit. We also present simulation studies which confirm the theoretical results in finite-dimensional contexts.
Abstract In this paper, we show how the time for convergence to stationarity of a Markov chain can be assessed using the Wasserstein metric, rather than the usual choice of … Abstract In this paper, we show how the time for convergence to stationarity of a Markov chain can be assessed using the Wasserstein metric, rather than the usual choice of total variation distance. The Wasserstein metric may be more easily applied in some applications, particularly those on continuous state spaces. Bounds on convergence time are established by considering the number of iterations required to approximately couple two realizations of the Markov chain to within ε tolerance. The particular application considered is the use of the Gibbs sampler in the Bayesian restoration of a degraded image, with pixels that are a continuous grey-scale and with pixels that can only take two colours. On finite state spaces, a bound in the Wasserstein metric can be used to find a bound in total variation distance. We use this relationship to get a precise O(N log N) bound on the convergence time of the stochastic Ising model that holds for appropriate values of its parameter as well as other binary image models. Our method employing convergence in the Wasserstein metric can also be applied to perfect sampling algorithms involving coupling from the past to obtain estimates of their running times. Key Words: Bayesian image restorationCouplingGibbs samplerIsing modelPerfect simulationProbability metricsTotal variation distanceMathematics Subject Classification: Primary 60J20Secondary 62M40 Acknowledgments This work is part of my doctoral research under the direction of Jeffrey S. Rosenthal at the University of Toronto. I wish to thank him, Radford Neal, Neal Madras and Francis Su for helpful discussions.
Summary We consider the optimal scaling problem for proposal distributions in Hastings-Metropolis algorithms derived from Langevin diffusions. We prove an asymptotic diffusion limit theorem and show that the relative efficiency … Summary We consider the optimal scaling problem for proposal distributions in Hastings-Metropolis algorithms derived from Langevin diffusions. We prove an asymptotic diffusion limit theorem and show that the relative efficiency of the algorithm can be characterized by its overall acceptance rate, independently of the target distribution. The asymptotically optimal acceptance rate is 0.574. We show that, as a function of dimension n, the complexity of the algorithm is O(n1/3), which compares favourably with the O(n) complexity of random walk Metropolis algorithms. We illustrate this comparison with some example simulations.
In this paper, we consider the question of which convergence properties of Markov chains are preserved under small perturbations. Properties considered include geometric ergodicity and rates of convergence. Perturbations considered … In this paper, we consider the question of which convergence properties of Markov chains are preserved under small perturbations. Properties considered include geometric ergodicity and rates of convergence. Perturbations considered include roundoff error from computer simulation. We are motivated primarily by interest in Markov chain Monte Carlo algorithms.
We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters during a run. Examples include the Adaptive Metropolis (AM) multivariate algorithm of Haario, Saksman, and … We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters during a run. Examples include the Adaptive Metropolis (AM) multivariate algorithm of Haario, Saksman, and Tamminen (2001), Metropolis-within-Gibbs algorithms for nonconjugate hierarchical models, regionally adjusted Metropolis algorithms, and logarithmic scalings. Computer simulations indicate that the algorithms perform very well compared to nonadaptive algorithms, even in high dimension.
In this paper we study the ergodicity properties of some adaptive Markov chain Monte Carlo algorithms (MCMC) that have been recently proposed in the literature. We prove that under a … In this paper we study the ergodicity properties of some adaptive Markov chain Monte Carlo algorithms (MCMC) that have been recently proposed in the literature. We prove that under a set of verifiable conditions, ergodic averages calculated from the output of a so-called adaptive MCMC sampler converge to the required value and can even, under more stringent assumptions, satisfy a central limit theorem. We prove that the conditions required are satisfied for the independent Metropolis–Hastings algorithm and the random walk Metropolis algorithm with symmetric increments. Finally, we propose an application of these results to the case where the proposal distribution of the Metropolis–Hastings update is a mixture of distributions from a curved exponential family.
First an integral representation of a continuous linear functional dominated by a support function in integral form is given (Theorem 1). From this the theorem of Blackwell-Stein-Sherman-Cartier [2], [20], [4], … First an integral representation of a continuous linear functional dominated by a support function in integral form is given (Theorem 1). From this the theorem of Blackwell-Stein-Sherman-Cartier [2], [20], [4], is deduced as well as a result on capacities alternating of order 2 in the sense of Choquet [5], which includes Satz 4.3 of [23] and a result of Kellerer [10], [12], under somewhat stronger assumptions. Next (Theorem 7), the existence of probability distributions with given marginals is studied under typically weaker assumptions, than those which are required by the use of Theorem 1. As applications we derive necessary and sufficient conditions for a sequence of probability measures to be the sequence of distributions of a martingale (Theorem 8), an upper semi-martingale (Theorem 9) or of partial sums of independent random variables (Theorem 10). Moreover an alternative definition of Levy-Prokhorov's distance between probability measures in a complete separable metric space is obtained (corollary of Theorem 11). Section 6 can be read independently of the former sections.
This short note investigates convergence of adaptive Markov chain Monte Carlo algorithms, i.e. algorithms which modify the Markov chain update probabilities on the fly. We focus on the containment condition … This short note investigates convergence of adaptive Markov chain Monte Carlo algorithms, i.e. algorithms which modify the Markov chain update probabilities on the fly. We focus on the containment condition introduced Roberts and Rosenthal (2007). We show that if the containment condition is not satisfied, then the algorithm will perform very poorly. Specifically, with positive probability, the adaptive algorithm will be asymptotically less efficient then any nonadaptive ergodic MCMC algorithm. We call such algorithms AdapFail, and conclude that they should not be used.
Many problems arising in applications result in the need to probe a probability distribution for functions. Examples include Bayesian nonparametric statistics and conditioned diffusion processes. Standard MCMC algorithms typically become … Many problems arising in applications result in the need to probe a probability distribution for functions. Examples include Bayesian nonparametric statistics and conditioned diffusion processes. Standard MCMC algorithms typically become arbitrarily slow under the mesh refinement dictated by nonparametric description of the unknown function. We describe an approach to modifying a whole range of MCMC methods, applicable whenever the target measure has density with respect to a Gaussian process or Gaussian random field reference measure, which ensures that their speed of convergence is robust under mesh refinement. Gaussian processes or random fields are fields whose marginal distributions, when evaluated at any finite set of $N$ points, are $\mathbb{R}^{N}$-valued Gaussians. The algorithmic approach that we describe is applicable not only when the desired probability measure has density with respect to a Gaussian process or Gaussian random field reference measure, but also to some useful non-Gaussian reference measures constructed through random truncation. In the applications of interest the data is often sparse and the prior specification is an essential part of the overall modelling strategy. These Gaussian-based reference measures are a very flexible modelling tool, finding wide-ranging application. Examples are shown in density estimation, data assimilation in fluid mechanics, subsurface geophysics and image registration. The key design principle is to formulate the MCMC method so that it is, in principle, applicable for functions; this may be achieved by use of proposals based on carefully chosen time-discretizations of stochastic dynamical systems which exactly preserve the Gaussian reference measure. Taking this approach leads to many new algorithms which can be implemented via minor modification of existing algorithms, yet which show enormous speed-up on a wide range of applied problems.
We consider in this paper the problem of sampling a high-dimensional probability distribution $\pi$ having a density w.r.t. the Lebesgue measure on $\mathbb{R}^{d}$, known up to a normalization constant $x\mapsto\pi(x)=\mathrm{e}^{-U(x)}/\int_{\mathbb{R}^{d}}\mathrm{e}^{-U(y)}\,\mathrm{d}y$. … We consider in this paper the problem of sampling a high-dimensional probability distribution $\pi$ having a density w.r.t. the Lebesgue measure on $\mathbb{R}^{d}$, known up to a normalization constant $x\mapsto\pi(x)=\mathrm{e}^{-U(x)}/\int_{\mathbb{R}^{d}}\mathrm{e}^{-U(y)}\,\mathrm{d}y$. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that $U$ is continuously differentiable, $\nabla U$ is globally Lipschitz and $U$ is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order $2$ and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims.
Dans cet article, nous donnons des conditions suffisantes pour l'existence d'une probabilité invariante et qui permettent d'établir des taux de convergence sous-géométriques en distance de Wasserstein, pour des chaînes de … Dans cet article, nous donnons des conditions suffisantes pour l'existence d'une probabilité invariante et qui permettent d'établir des taux de convergence sous-géométriques en distance de Wasserstein, pour des chaînes de Markov définies sur des espaces d'états généraux et non nécessairement irréductibles. Comparée à (Ann. Appl. Probab. 24 (2) (2014) 526–552), notre approche est basée sur une construction par couplage purement probabiliste, ce qui permet de retrouver les taux de convergence obtenus précédement pour la variation total dans (Bernoulli 13 (3) (2007) 831–848). Par application de ces résultats, nous établissons la convergence sous-géométrique en distance de Wasserstein de modèles non linéaires auto-régressifs et l'algorithme de MCMC, l'algorithme de Crank–Nicolson pré-conditionné dans les espaces de Hilbert, pour une certaine classe de mesure cible.
Over the last 25 years, techniques based on drift and minorization (d&m) have been mainstays in the convergence analysis of MCMC algorithms. However, results presented herein suggest that d&m may … Over the last 25 years, techniques based on drift and minorization (d&m) have been mainstays in the convergence analysis of MCMC algorithms. However, results presented herein suggest that d&m may be less useful in the emerging area of convergence complexity analysis, which is the study of how the convergence behavior of Monte Carlo Markov chains scales with sample size, n, and/or number of covariates, p. The problem appears to be that minorization can become a serious liability as dimension increases. Alternative methods of constructing convergence rate bounds (with respect to total variation distance) that do not require minorization are investigated. Based on Wasserstein distances and random mappings, these methods can produce bounds that are substantially more robust to increasing dimension than those based on d&m. The Wasserstein-based bounds are used to develop strong convergence complexity results for MCMC algorithms used in Bayesian probit regression and random effects models in the challenging asymptotic regime where n and p are both large.
We propose a new Monte Carlo method for sampling from multimodal distributions. The idea of this technique is based on splitting the task into two: finding the modes of a … We propose a new Monte Carlo method for sampling from multimodal distributions. The idea of this technique is based on splitting the task into two: finding the modes of a target distribution $\pi$ and sampling, given the knowledge of the locations of the modes. The sampling algorithm relies on steps of two types: local ones, preserving the mode; and jumps to regions associated with different modes. Besides, the method learns the optimal parameters of the algorithm, while it runs, without requiring user intervention. Our technique should be considered as a flexible framework, in which the design of moves can follow various strategies known from the broad MCMC literature. In order to design an adaptive scheme that facilitates both local and jump moves, we introduce an auxiliary variable representing each mode, and we define a new target distribution $\tilde{\pi}$ on an augmented state space $\mathcal{X}\times\mathcal{I}$, where $\mathcal{X}$ is the original state space of $\pi$ and $\mathcal{I}$ is the set of the modes. As the algorithm runs and updates its parameters, the target distribution $\tilde{\pi}$ also keeps being modified. This motivates a new class of algorithms, Auxiliary Variable Adaptive MCMC. We prove general ergodic results for the whole class before specialising to the case of our algorithm.
We establish subgeometric bounds on convergence rate of general Markov processes in the Wasserstein metric. In the discrete time setting we prove that the Lyapunov drift condition and the existence … We establish subgeometric bounds on convergence rate of general Markov processes in the Wasserstein metric. In the discrete time setting we prove that the Lyapunov drift condition and the existence of a "good" $d$-small set imply subgeometric convergence to the invariant measure. In the continuous time setting we obtain the same convergence rate provided that there exists a "good" $d$-small set and the Douc-Fort-Guillin supermartingale condition holds. As an application of our results, we prove that the Veretennikov-Khasminskii condition is sufficient for subexponential convergence of strong solutions of stochastic delay differential equations.
The Bouncy Particle sampler (BPS) and the Zig-Zag sampler (ZZS) are continuous time, non-reversible Monte Carlo methods based on piecewise deterministic Markov processes. Experiments show that the speed of convergence … The Bouncy Particle sampler (BPS) and the Zig-Zag sampler (ZZS) are continuous time, non-reversible Monte Carlo methods based on piecewise deterministic Markov processes. Experiments show that the speed of convergence of these samplers can be affected by the shape of the target distribution, as for instance in the case of anisotropic targets. We propose an adaptive scheme that iteratively learns all or part of the covariance matrix of the target and takes advantage of the obtained information to modify the underlying process with the aim of increasing the speed of convergence. Moreover, we define an adaptive scheme that automatically tunes the refreshment rate of the BPS or ZZS. We prove ergodicity and a law of large numbers for all the proposed adaptive algorithms. Finally, we show the benefits of the adaptive samplers with several numerical simulations.
Written by one of the best-known probabilists in the world this text offers a clear and modern presentation of modern probability theory and an exposition of the interplay between the … Written by one of the best-known probabilists in the world this text offers a clear and modern presentation of modern probability theory and an exposition of the interplay between the properties of metric spaces and those of probability measures. This text is the first at this level to include discussions of the subadditive ergodic theorems, metrics for convergence in laws and the Borel isomorphism theory. The proofs for the theorems are consistently brief and clear and each chapter concludes with a set of historical notes and references. This book should be of interest to students taking degree courses in real analysis and/or probability theory.
Over the last three decades, there has been a considerable effort within the applied probability community to develop techniques for bounding the convergence rates of general state space Markov chains. … Over the last three decades, there has been a considerable effort within the applied probability community to develop techniques for bounding the convergence rates of general state space Markov chains. Most of these results assume the existence of drift and minorization (d&m) conditions. It has often been observed that convergence rate bounds based on single-step d&m tend to be overly conservative, especially in high-dimensional situations. This article builds a framework for studying this phenomenon. It is shown that any convergence rate bound based on a set of d&m conditions cannot do better than a certain unknown optimal bound. Strategies are designed to put bounds on the optimal bound itself, and this allows one to quantify the extent to which a d&m-based convergence rate bound can be sharp. The new theory is applied to several examples, including a Gaussian autoregressive process (whose true convergence rate is known), and a Metropolis adjusted Langevin algorithm. The results strongly suggest that convergence rate bounds based on single-step d&m conditions are quite inadequate in high-dimensional settings.