Author Description

Login to generate an author description

Ask a Question About This Mathematician

A guiding principle in the efficient estimation of rare-event probabilities by Monte Carlo is that importance sampling based on the change of measure suggested by a large deviations analysis can … A guiding principle in the efficient estimation of rare-event probabilities by Monte Carlo is that importance sampling based on the change of measure suggested by a large deviations analysis can reduce variance by many orders of magnitude. In a variety of settings, this approach has led to estimators that are optimal in an asymptotic sense. We give examples, however, in which importance sampling estimators based on a large deviations change of measure have provably poor performance. The estimators can have variance that decreases at a slower rate than a naive estimator, variance that increases with the rarity of the event, and even infinite variance. For each example, we provide an alternative estimator with provably efficient performance. A common feature of our examples is that they allow more than one way for a rare event to occur; our alternative estimators give explicit weight to lower probability paths neglected by leading-term asymptotics.
We analyze the performance of an importance sampling estimator for a rare-event probability in tandem Jackson networks. The rare event we consider corresponds to the network population reaching K before … We analyze the performance of an importance sampling estimator for a rare-event probability in tandem Jackson networks. The rare event we consider corresponds to the network population reaching K before returning to ø, starting from ø, with K large. The estimator we study is based on interchanging the arrival rate and the smallest service rate and is therefore a generalization of the asymptotically optimal estimator for an M/M/1 queue. We examine its asymptotic performance for large K , showing that in certain parameter regions the estimator has an asymptotic efficiency property, but that in other regions it does not. The setting we consider is perhaps the simplest case of a rare-event simulation problem in which boundaries on the state space play a significant role.
We analyze the performance of a multilevel splitting method for rare event simulation related to one recently proposed in the telecommunications literature. This method splits promising paths into subpaths at … We analyze the performance of a multilevel splitting method for rare event simulation related to one recently proposed in the telecommunications literature. This method splits promising paths into subpaths at intermediate levels to increase the number of observations of a rare event. In our previous paper (1997) we gave sufficient conditions, in specific classes of models, for this method to be asymptotically optimal; here we focus on necessary conditions in a general setting. We show, through a variety of results, the importance of choosing the intermediate thresholds in a way consistent with the most likely path to a rare set, both when the number of levels is fixed and when it increases with the rarity of the event. In the latter case, we give very general necessary conditions based on large deviations rate functions. These indicate that even when the intermediate levels are chosen appropriately, the method will frequently fail to be asymptotically optimal. We illustrate the conditions with examples.
An approach to rare event simulation uses the technique of splitting. The basic idea is to split sample paths of the stochastic process into multiple copies when they approach closer … An approach to rare event simulation uses the technique of splitting. The basic idea is to split sample paths of the stochastic process into multiple copies when they approach closer to the rare set; this increases the overall number of hits to the rare set for a given amount of simulation time. This paper analyzes the bias and efficiency of some simple cases of this method.
A gradient-estimation procedure for a general class of stochastic discrete-event systems is developed. In contrast to most previous work, the authors focus on performance measures whose realizations are inherently discontinuous … A gradient-estimation procedure for a general class of stochastic discrete-event systems is developed. In contrast to most previous work, the authors focus on performance measures whose realizations are inherently discontinuous (in fact, piecewise constant) functions of the parameter of differentiation. Two broad classes of finite-horizon discontinuous performance measures arising naturally in applications are considered. Because of their discontinuity, these important classes of performance measures are not susceptible to infinitesimal perturbation analysis (IPA). Instead, the authors apply smoothed perturbation analysis, formalizing it and generalizing it in the process. Smoothed perturbation analysis uses conditional expectations to smooth jumps. The resulting gradient estimator involves two factors: the conditional rate at which jumps occur, and the expected effect of a jump. Among the types of performance measures to which the methods can be applied are transient state probabilities, finite-horizon throughputs, distributions on arrival, and expected terminal cost.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
We give a unified presentation of stability results for stochastic vector difference equations based on various choices of binary operations and , assuming that are stationary and ergodic. In the … We give a unified presentation of stability results for stochastic vector difference equations based on various choices of binary operations and , assuming that are stationary and ergodic. In the scalar case, under standard addition and multiplication, the key condition for stability is E [log | A 0 |] &lt; 0. In the generalizations, the condition takes the form γ&lt; 0, where γis the limit of a subadditive process associated with . Under this and mild additional conditions, the process has a unique finite stationary distribution to which it converges from all initial conditions. The variants of standard matrix algebra we consider replace the operations + and × with (max, +), (max,×), (min, +), or (min,×). In each case, the appropriate stability condition parallels that for the standard recursions, involving certain subadditive limits. Since these limits are difficult to evaluate, we provide bounds, thus giving alternative, computable conditions for stability.
This paper proves a convergence result for a discretization scheme for simulating jump–diffusion processes with state–dependent jump intensities. With a bound on the intensity, the point process of jump times … This paper proves a convergence result for a discretization scheme for simulating jump–diffusion processes with state–dependent jump intensities. With a bound on the intensity, the point process of jump times can be constructed by thinning a Poisson random measure using state–dependent thinning probabilities. Between the jump epochs of the Poisson random measure, the dynamics of the constructed process are purely diffusive and may be simulated using standard discretization methods. Under conditions on the coefficient functions of the jump–diffusion process, we show that the weak convergence order of this method equals the weak convergence order of the scheme used for the purely diffusive intervals: the construction of jumps does not degrade the convergence of the method.
A general approach to improving simulation accuracy uses information about auxiliary control variables with known expected values to improve the estimation of unknown quantities. We analyze weighted Monte Carlo estimators … A general approach to improving simulation accuracy uses information about auxiliary control variables with known expected values to improve the estimation of unknown quantities. We analyze weighted Monte Carlo estimators that implement this idea by applying weights to independent replications. The weights are chosen to constrain the weighted averages of the control variables. We distinguish two cases (unbiased and biased), depending on whether the weighted averages of the controls are constrained to equal their expected values or some other values. In both cases, the number of constraints is usually smaller than the number of replications, so there may be many feasible weights. We select maximally uniform weights by minimizing a separable convex function of the weights subject to the control variable constraints. Estimators of this form arise (sometimes implicitly) in several settings, including at least two in finance: calibrating a model to market data (as in the work of Avellaneda et al. 2001) and calculating conditional expectations to price American options. We analyze properties of these estimators as the number of replications increases. We show that in the unbiased case, weighted Monte Carlo reduces asymptotic variance, and that all convex objective functions within a large class produce estimators that are very close to each other in a strong sense. In contrast, in the biased case the choice of objective function does matter. We show explicitly how the choice of objective determines the limit to which the estimator converges.
Countable-state, continuous-time Markov chains are often analyzed through simulation when simple analytical expressions are unavailable. Simulation is typically used to estimate costs or performance measures associated with the chain and … Countable-state, continuous-time Markov chains are often analyzed through simulation when simple analytical expressions are unavailable. Simulation is typically used to estimate costs or performance measures associated with the chain and also characteristics like state probabilities and mean passage times. Here we consider the problem of estimating derivatives of these types of quantities with respect to a parameter of the process. In particular, we consider the case where some or all transition rates depend on a parameter. We derive derivative estimates of the infinitesimal perturbation analysis type for Markov chains satisfying a simple condition, and argue that the condition has significant scope. The unbiasedness of these estimates may be surprising—a “naive” estimator would fail in our setting. What makes our estimates work is a special construction of specially structured parameteric families of Markov chains. In addition to proving unbiasedness, we consider a variance reduction technique and make comparisions with derivative estimates based on likelihood ratios.
This paper develops formulas for pricing caps and swaptions in Libor market models with jumps. The arbitrage-free dynamics of this class of models were characterized in Glasserman and Kou (2003) … This paper develops formulas for pricing caps and swaptions in Libor market models with jumps. The arbitrage-free dynamics of this class of models were characterized in Glasserman and Kou (2003) in a framework allowing for very general jump processes. For computational purposes, it is convenient to model jump times as Poisson processes; however, the Poisson property is not preserved under the changes of measure commonly used to derive prices in the Libor market model framework. In particular, jumps cannot be Poisson under both a forward measure and the spot measure, and this complicates pricing. To develop pricing formulas, we approximate the dynamics of a forward rate or swap rate using a scalar jump-diffusion process with time-varying parameters. We develop an exact formula for the price of an option on this jump-diffusion through explicit inversion of a Fourier transform. We then use this formula to price caps and swaptions by choosing the parameters of the scalar diffusion to approximate the arbitrage-free dynamics of the underlying forward or swap rate. We apply this method to two classes of models: one in which the jumps in all forward rates are Poisson under the spot measure, and one in which the jumps in each forward rate are Poisson under its associated forward measure. Numerical examples demonstrate the accuracy of the approximations.
A generalized semi-Markov scheme models the structure of a discrete event system, such as a network of queues. By studying combinatorial and geometric representations of schemes we find conditions for … A generalized semi-Markov scheme models the structure of a discrete event system, such as a network of queues. By studying combinatorial and geometric representations of schemes we find conditions for second-order properties—convexity/concavity, sub/supermodularity—of their event epochs and event counting processes. A scheme generates a language of feasible strings of events. We show that monotonicity of the event epochs is equivalent to this language forming an antimatroid with repetition. This connection gives rise to a rich combinatorial structure, and serves as a starting point for other properties. For example, by strengthening the antimatroid condition we give several equivalent characterizations of the convexity of event epochs within a scheme. All of these correspond, in slightly different ways, to making a certain score space a lattice, to closing an ordinary antimatroid under intersections. We also establish second-order properties across schemes tied together through a synchronization mechanism. A geometric view based on the score space facilitates verification of these properties in certain queueing systems.
This paper compares methods for computing the distribution of loss from defaults in a credit portfolio. The methods are applied in the Gaussian copula framework for credit risk and take … This paper compares methods for computing the distribution of loss from defaults in a credit portfolio. The methods are applied in the Gaussian copula framework for credit risk and take advantage of the conditional independence of defaults in this framework. As a benchmark we use vanilla Monte Carlo simulation to estimate the tail probabilities of the total losses of the credit portfolio. The first method to be compared is a recursive algorithm to obtain the exact distribution of the total loss of the portfolio, conditional on observed values for the systematic risk factors. Then, we apply the saddlepoint approximation to the distribution of the losses, which has proven to give very accurate approximations in the tail. Finally, the method of numerically inverting the Laplace transform of the tail distribution of the losses of the credit portfolio, conditional on observed systematic risk factors, is combined with Euler summation to obtain an approximation. We compare and rank these methods in terms of mean square errors for a fixed computing time. Perhaps surprisingly, we find that vanilla Monte Carlo is hard to beat.
By a filtered Monte Carlo estimator we mean one whose constituent parts—summands or integral increments—are conditioned on an increasing family of σ-fields. Unbiased estimators of this type are suggested by … By a filtered Monte Carlo estimator we mean one whose constituent parts—summands or integral increments—are conditioned on an increasing family of σ-fields. Unbiased estimators of this type are suggested by compensator identities. Replacing a point-process integrator with its intensity gives rise to one class of examples; exploiting Levy's formula gives rise to another. We establish variance inequalities complementing compensator identities. Among estimators that are (Stieltjes) stochastic integrals, we show that filtering reduces variance if the integrand and the increments of the integrator have conditional positive correlation. We also provide more primitive hypotheses that ensure this condition, making use of stochastic monotonicity properties. Our most detailed conditions apply in a Markov setting where monotone, up-down, and convex generators play a central role. We give examples. As an application of our results, we compare certain estimators that do and do not exploit the property that Poisson arrivals see time averages.
Article Free Access Share on Gradient estimation for regenerative processes Authors: Paul Glasserman View Profile , Peter W. Glynn View Profile Authors Info & Claims WSC '92: Proceedings of the … Article Free Access Share on Gradient estimation for regenerative processes Authors: Paul Glasserman View Profile , Peter W. Glynn View Profile Authors Info & Claims WSC '92: Proceedings of the 24th conference on Winter simulationDecember 1992 Pages 280–288https://doi.org/10.1145/167293.167350Online:01 December 1992Publication History 12citation141DownloadsMetricsTotal Citations12Total Downloads141Last 12 Months7Last 6 weeks1 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
Financial risk measurement relies on models of prices and other market variables, but models inevitably rely on imperfect assumptions and estimates, creating model risk. Moreover, optimization decisions, such as portfolio … Financial risk measurement relies on models of prices and other market variables, but models inevitably rely on imperfect assumptions and estimates, creating model risk. Moreover, optimization decisions, such as portfolio selection, amplify the effect of model error. In this work, we develop a framework for quantifying the impact of model error and for measuring and minimizing risk in a way that is robust to model error. This robust approach starts from a baseline model and finds the worst-case error in risk measurement that would be incurred through a deviation from the baseline model, given a precise constraint on the plausibility of the deviation. Using relative entropy to constrain model distance leads to an explicit characterization of worst-case model errors; this characterization lends itself to Monte Carlo simulation, allowing straightforward calculation of bounds on model error with very little computational effort beyond that required to evaluate performance under the baseline nominal model. This approach goes well beyond the effect of errors in parameter estimates to consider errors in the underlying stochastic assumptions of the model and to characterize the greatest vulnerabilities to error in a model. We apply this approach to problems of portfolio risk measurement, credit risk, delta hedging, and counterparty risk measured through credit valuation adjustment.
We derive conditions under which the increments of a vector process are associated — i.e. under which all pairs of increasing functions of the increments are positively correlated. The process … We derive conditions under which the increments of a vector process are associated — i.e. under which all pairs of increasing functions of the increments are positively correlated. The process itself is associated if it is generated by a family of associated and monotone kernels. We show that the increments are associated if the kernels are associated and, in a suitable sense, convex. In the Markov case, we note a connection between associated increments and temporal stochastic convexity. Our analysis is motivated by a question in variance reduction: assuming that a normalized process and its normalized compensator converge to the same value, which is the better estimator of that limit? Under some additional hypotheses we show that, for processes with conditionally associated increments, the compensator has smaller asymptotic variance.
We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling … We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM algorithm will often overfit returns to the detriment of the topic model. We obtain better out-of-sample performance through a random search of plain LDA models. A branching procedure that reinforces effective topic assignments often performs best. We test these methods on an archive of over 90,000 news articles about S&P 500 firms.
Computational methods play an important role in modern finance. Through the theory of arbitrage-free pricing, the price of a derivative security can be expressed as the expected value of its … Computational methods play an important role in modern finance. Through the theory of arbitrage-free pricing, the price of a derivative security can be expressed as the expected value of its payouts under a particular probability measure. The resulting integral becomes quite complicated if there are several state variables or if payouts are path-dependent. Simulation has proved to be a valuable tool for these calculations. This paper summarizes some of the recent applications and developments of the Monte Carlo method to security pricing problems.
An optimal coupling is a bivariate distribution with specified marginals achieving maximal correlation. We show that optimal couplings are totally positive and, in fact, satisfy a strictly stronger condition we … An optimal coupling is a bivariate distribution with specified marginals achieving maximal correlation. We show that optimal couplings are totally positive and, in fact, satisfy a strictly stronger condition we call the nonintersection property. For discrete distributions we illustrate the equivalence between optimal coupling and a certain transportation problem. Specifically, the optimal solutions of greedily-solvable transportation problems are totally positive, and even nonintersecting, through a rearrangement of matrix entries that results in a Monge sequence. In coupling continuous random variables or random vectors, we exploit a characterization of optimal couplings in terms of subgradients of a closed convex function to establish a generalization of the nonintersection property. We argue that nonintersection is not only stronger than total positivity, it is the more natural concept for the singular distributions that arise in coupling continuous random variables.
Born in Sydney, Australia, on April 20, 1939, Chris Heyde shifted his interest from sport to mathematics thanks to inspiration from a schoolteacher. After earning an M.Sc. degree from the … Born in Sydney, Australia, on April 20, 1939, Chris Heyde shifted his interest from sport to mathematics thanks to inspiration from a schoolteacher. After earning an M.Sc. degree from the University of Sydney and a Ph.D. from the Australian National University (ANU), he began his academic career in the United States at Michigan State University, and then in the United Kingdom at the University of Sheffield and the University of Manchester. In 1968, Chris moved back to Australia to teach at ANU until 1975, when he joined CSIRO, where he was Acting Chief of the Division of Mathematics and Statistics. From 1983 to 1986, he was a Professor and Chairman of the Department of Statistics at the University of Melbourne. Chris then returned to ANU to become the Head of the Statistics Department, and later the Foundation Dean of the School of Mathematical Sciences (now the Mathematical Sciences Institute). Since 1993, he has also spent one semester each year teaching at the Department of Statistics, Columbia University, and has been the director of the Center for Applied Probability at Columbia University since its creation in 1993. Chris has been honored worldwide for his contributions in probability, statistics and the history of statistics. He is a Fellow of the International Statistical Institute and the Institute of Mathematical Statistics, and he is one of three people to be a member of both the Australian Academy of Science and the Australian Academy of Social Sciences. In 2003, he received the Order of Australia from the Australian government. He has been awarded the Pitman Medal and the Hannan Medal. Chris was conferred a D.Sc. honoris causa by University of Sydney in 1998. Chris has been very active in serving the statistical community, including as the Vice President of the International Statistical Institute, President of the Bernoulli Society and Vice President of the Australian Mathematical Society. He has served on numerous editorial boards, most notably as Editor of Stochastic Processes and Their Applications from 1983 to 1989, and as Editor-in-Chief of Journal of Applied Probability and Advances in Applied Probability since 1990. His research has spanned almost all areas of probability and statistics, ranging from random walks to branching processes, from martingales to quasi-likelihood inference, from genetics to option pricing, from queueing theory to long-range dependence. He has edited twelve books, and authored or co-authored three books, I. J. Bienaymé: Statistical Theory Anticipated (1977), with E. Seneta, Martingale Limit Theory and Its Application (1980), with P. Hall, and Quasi-Likelihood and Its Applications (1977). Chris Heyde has been an outstanding citizen and leader of the probability and statistics research community. The interview ranges over his education in Australia, moves to the USA and the UK, return to Australia, appointment at Columbia, major research contributions, and professional society and editorial activities. It ends with a look forward in time and some concerned comments about the future for statistics departments.
We develop two methods for estimating derivatives of expectations from simulation of functions whose realizations are discontinuous in the parameter of differentiation. We take as motivating example the estimation of … We develop two methods for estimating derivatives of expectations from simulation of functions whose realizations are discontinuous in the parameter of differentiation. We take as motivating example the estimation of the sensitivity of expected terminal reward for processes on discrete state spaces. Both our methods use conditional expectations to smooth discontinuites. The first smooths the dependence on the differentiation parameter, while the second smooths dependence on the time parameter. The methods are illustrated through examples, including stochastic networks, networks of queues, and Markov processes.
We analyze covariance matrix estimation from the perspective of market risk management, where the goal is to obtain accurate estimates of portfolio risk across essentially all portfolios—even those with small … We analyze covariance matrix estimation from the perspective of market risk management, where the goal is to obtain accurate estimates of portfolio risk across essentially all portfolios—even those with small standard deviations. We propose a simple but effective visualisation tool to assess bias across a wide range of portfolios. We employ a portfolio perspective to determine covariance matrix loss functions particularly suitable for market risk management. Proper regularisation of the covariance matrix estimate significantly improves performance. These methods are applied to credit default swaps, for which covariance matrices are used to set portfolio margin requirements for central clearing. Among the methods we test, the graphical lasso estimator performs particularly well. The graphical lasso and a hierarchical clustering estimator also yield economically meaningful representations of market structure through a graphical model and a hierarchy, respectively.
Article Free Access Share on Correlation of Markov chains simulated in parallel Authors: Paul Glasserman View Profile , Pirooz Vakili View Profile Authors Info & Claims WSC '92: Proceedings of … Article Free Access Share on Correlation of Markov chains simulated in parallel Authors: Paul Glasserman View Profile , Pirooz Vakili View Profile Authors Info & Claims WSC '92: Proceedings of the 24th conference on Winter simulationDecember 1992Pages 475–482https://doi.org/10.1145/167293.167404Published:01 December 1992Publication History 4citation427DownloadsMetricsTotal Citations4Total Downloads427Last 12 Months10Last 6 weeks2 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
We derive conditions under which the increments of a vector process are associated — i.e. under which all pairs of increasing functions of the increments are positively correlated. The process … We derive conditions under which the increments of a vector process are associated — i.e. under which all pairs of increasing functions of the increments are positively correlated. The process itself is associated if it is generated by a family of associated and monotone kernels. We show that the increments are associated if the kernels are associated and, in a suitable sense, convex. In the Markov case, we note a connection between associated increments and temporal stochastic convexity. Our analysis is motivated by a question in variance reduction: assuming that a normalized process and its normalized compensator converge to the same value, which is the better estimator of that limit? Under some additional hypotheses we show that, for processes with conditionally associated increments, the compensator has smaller asymptotic variance.
We give a unified presentation of stability results for stochastic vector difference equations based on various choices of binary operations and , assuming that are stationary and ergodic. In the … We give a unified presentation of stability results for stochastic vector difference equations based on various choices of binary operations and , assuming that are stationary and ergodic. In the scalar case, under standard addition and multiplication, the key condition for stability is E [log | A 0 |] &amp;lt; 0. In the generalizations, the condition takes the form γ&amp;lt; 0, where γis the limit of a subadditive process associated with . Under this and mild additional conditions, the process has a unique finite stationary distribution to which it converges from all initial conditions. The variants of standard matrix algebra we consider replace the operations + and × with (max, +), (max,×), (min, +), or (min,×). In each case, the appropriate stability condition parallels that for the standard recursions, involving certain subadditive limits. Since these limits are difficult to evaluate, we provide bounds, thus giving alternative, computable conditions for stability.
Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many … Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many years of data, and backtesting produces biased results if the training and backtesting periods overlap. This bias can take two forms: a look-ahead bias, in which the LLM may have specific knowledge of the stock returns that followed a news article, and a distraction effect, in which general knowledge of the companies named interferes with the measurement of a text's sentiment. We investigate these sources of bias through trading strategies driven by the sentiment of financial news headlines. We compare trading performance based on the original headlines with de-biased strategies in which we remove the relevant company's identifiers from the text. In-sample (within the LLM training window), we find, surprisingly, that the anonymized headlines outperform, indicating that the distraction effect has a greater impact than look-ahead bias. This tendency is particularly strong for larger companies --- companies about which we expect an LLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a concern but distraction remains possible. Our proposed anonymization procedure is therefore potentially useful in out-of-sample implementation, as well as for de-biased backtesting.
An optimal coupling is a bivariate distribution with specified marginals achieving maximal correlation. We show that optimal couplings are totally positive and, in fact, satisfy a strictly stronger condition we … An optimal coupling is a bivariate distribution with specified marginals achieving maximal correlation. We show that optimal couplings are totally positive and, in fact, satisfy a strictly stronger condition we call the nonintersection property. For discrete distributions we illustrate the equivalence between optimal coupling and a certain transportation problem. Specifically, the optimal solutions of greedily-solvable transportation problems are totally positive, and even nonintersecting, through a rearrangement of matrix entries that results in a Monge sequence. In coupling continuous random variables or random vectors, we exploit a characterization of optimal couplings in terms of subgradients of a closed convex function to establish a generalization of the nonintersection property. We argue that nonintersection is not only stronger than total positivity, it is the more natural concept for the singular distributions that arise in coupling continuous random variables.
Regulatory stress tests have become the primary tool for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios … Regulatory stress tests have become the primary tool for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We present evidence that the overall impact can be material. We also discuss extensions to nonlinear models.
We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling … We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM algorithm will often overfit returns to the detriment of the topic model. We obtain better out-of-sample performance through a random search of plain LDA models. A branching procedure that reinforces effective topic assignments often performs best. We test methods on an archive of over 90,000 news articles about S&P 500 firms.
An American option grants the holder the right to select the time at which to exercise the option, so pricing an American option entails solving an optimal stopping problem. Difficulties … An American option grants the holder the right to select the time at which to exercise the option, so pricing an American option entails solving an optimal stopping problem. Difficulties in applying standard numerical methods to complex pricing problems have motivated the development of techniques that combine Monte Carlo simulation with dynamic programming. One class of methods approximates the option value at each time using a linear combination of basis functions, and combines Monte Carlo with backward induction to estimate optimal coefficients in each approximation. We analyze the convergence of such a method as both the number of basis functions and the number of simulated paths increase. We get explicit results when the basis functions are polynomials and the underlying process is either Brownian motion or geometric Brownian motion. We show that the number of paths required for worst-case convergence grows exponentially in the degree of the approximating polynomials in the case of Brownian motion and faster in the case of geometric Brownian motion.
Understanding Linear Classifiers in the Face of Severe Data Imbalance In “Linear Classifiers Under Infinite Imbalance,” Paul Glasserman and Mike Li tackle the challenge of binary classification when data are … Understanding Linear Classifiers in the Face of Severe Data Imbalance In “Linear Classifiers Under Infinite Imbalance,” Paul Glasserman and Mike Li tackle the challenge of binary classification when data are severely imbalanced—a common dilemma in fields like healthcare and finance. They build upon the work of Owen by examining the behavior of logistic regression and extending the analysis to a broader class of linear discriminant functions. Their key contribution is the proof of infinite-imbalance limits for these functions’ coefficient vectors, providing explicit expressions for these limits and distinguishing between classifiers with subexponential and exponential weight functions. This distinction allows for a better understanding of how to adjust classifiers in the context of extreme imbalance, ultimately leading to improved specificity or sensitivity in predictions. The authors also link their findings to the concepts of robustness and conservatism in classification decisions, offering insight into optimal classifier design against the most challenging alternatives. The practical implications of their theoretical work are illustrated through numerical examples and a credit risk case study, offering a new perspective on managing classification tasks in the face of infinite imbalance.
We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of … We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of the other remains fixed. The coefficients of the classifier minimize an empirical loss specified through a weight function. We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite almost sure limit under infinite imbalance, extending prior work on logistic regression. The limit depends on the left-tail growth rate of the weight function, for which we distinguish two cases: subexponential and exponential. The limiting coefficient vectors reflect robustness or conservatism properties in the sense that they optimize against certain worst-case alternatives. In the subexponential case, the limit is equivalent to an implicit choice of upsampling distribution for the minority class. We apply these ideas in a credit risk setting, with particular emphasis on performance in the high-sensitivity and high-specificity regions.
We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of … We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of the other remains fixed. The coefficients of the classifier minimize an expected loss specified through a weight function. We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite limit under infinite imbalance, extending prior work on logistic regression. The limit depends on the left tail of the weight function, for which we distinguish three cases: bounded, asymptotically polynomial, and asymptotically exponential. The limiting coefficient vectors reflect robustness or conservatism properties in the sense that they optimize against certain worst-case alternatives. In the bounded and polynomial cases, the limit is equivalent to an implicit choice of upsampling distribution for the minority class. We apply these ideas in a credit risk setting, with particular emphasis on performance in the high-sensitivity and high-specificity regions.
We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling … We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM algorithm will often overfit returns to the detriment of the topic model. We obtain better out-of-sample performance through a random search of plain LDA models. A branching procedure that reinforces effective topic assignments often performs best. We test methods on an archive of over 90,000 news articles about S&P 500 firms.
Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for … Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We present evidence that the overall impact can be material. We also discuss extensions to nonlinear models.
An increase in the novelty of news predicts negative stock market returns and negative macroeconomic outcomes over the next year. We quantify news novelty - changes in the distribution of … An increase in the novelty of news predicts negative stock market returns and negative macroeconomic outcomes over the next year. We quantify news novelty - changes in the distribution of news text - through an entropy measure, calculated using a recurrent neural network applied to a large news corpus. Entropy is a better out-of-sample predictor of market returns than a collection of standard measures. Cross-sectional entropy exposure carries a negative risk premium, suggesting that assets that positively covary with entropy hedge the aggregate risk associated with shifting news language. Entropy risk cannot be explained by existing long-short factors.
Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many … Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many years of data, and backtesting produces biased results if the training and backtesting periods overlap. This bias can take two forms: a look-ahead bias, in which the LLM may have specific knowledge of the stock returns that followed a news article, and a distraction effect, in which general knowledge of the companies named interferes with the measurement of a text's sentiment. We investigate these sources of bias through trading strategies driven by the sentiment of financial news headlines. We compare trading performance based on the original headlines with de-biased strategies in which we remove the relevant company's identifiers from the text. In-sample (within the LLM training window), we find, surprisingly, that the anonymized headlines outperform, indicating that the distraction effect has a greater impact than look-ahead bias. This tendency is particularly strong for larger companies--companies about which we expect an LLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a concern but distraction remains possible. Our proposed anonymization procedure is therefore potentially useful in out-of-sample implementation, as well as for de-biased backtesting.
Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for … Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We also discuss extensions to nonlinear models. This paper was accepted by Kay Giesecke, finance. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.02060 .
Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for … Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We also discuss extensions to nonlinear models. This paper was accepted by Kay Giesecke, finance. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.02060 .
Understanding Linear Classifiers in the Face of Severe Data Imbalance In “Linear Classifiers Under Infinite Imbalance,” Paul Glasserman and Mike Li tackle the challenge of binary classification when data are … Understanding Linear Classifiers in the Face of Severe Data Imbalance In “Linear Classifiers Under Infinite Imbalance,” Paul Glasserman and Mike Li tackle the challenge of binary classification when data are severely imbalanced—a common dilemma in fields like healthcare and finance. They build upon the work of Owen by examining the behavior of logistic regression and extending the analysis to a broader class of linear discriminant functions. Their key contribution is the proof of infinite-imbalance limits for these functions’ coefficient vectors, providing explicit expressions for these limits and distinguishing between classifiers with subexponential and exponential weight functions. This distinction allows for a better understanding of how to adjust classifiers in the context of extreme imbalance, ultimately leading to improved specificity or sensitivity in predictions. The authors also link their findings to the concepts of robustness and conservatism in classification decisions, offering insight into optimal classifier design against the most challenging alternatives. The practical implications of their theoretical work are illustrated through numerical examples and a credit risk case study, offering a new perspective on managing classification tasks in the face of infinite imbalance.
An increase in the novelty of news predicts negative stock market returns and negative macroeconomic outcomes over the next year. We quantify news novelty - changes in the distribution of … An increase in the novelty of news predicts negative stock market returns and negative macroeconomic outcomes over the next year. We quantify news novelty - changes in the distribution of news text - through an entropy measure, calculated using a recurrent neural network applied to a large news corpus. Entropy is a better out-of-sample predictor of market returns than a collection of standard measures. Cross-sectional entropy exposure carries a negative risk premium, suggesting that assets that positively covary with entropy hedge the aggregate risk associated with shifting news language. Entropy risk cannot be explained by existing long-short factors.
Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many … Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many years of data, and backtesting produces biased results if the training and backtesting periods overlap. This bias can take two forms: a look-ahead bias, in which the LLM may have specific knowledge of the stock returns that followed a news article, and a distraction effect, in which general knowledge of the companies named interferes with the measurement of a text's sentiment. We investigate these sources of bias through trading strategies driven by the sentiment of financial news headlines. We compare trading performance based on the original headlines with de-biased strategies in which we remove the relevant company's identifiers from the text. In-sample (within the LLM training window), we find, surprisingly, that the anonymized headlines outperform, indicating that the distraction effect has a greater impact than look-ahead bias. This tendency is particularly strong for larger companies--companies about which we expect an LLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a concern but distraction remains possible. Our proposed anonymization procedure is therefore potentially useful in out-of-sample implementation, as well as for de-biased backtesting.
Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many … Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many years of data, and backtesting produces biased results if the training and backtesting periods overlap. This bias can take two forms: a look-ahead bias, in which the LLM may have specific knowledge of the stock returns that followed a news article, and a distraction effect, in which general knowledge of the companies named interferes with the measurement of a text's sentiment. We investigate these sources of bias through trading strategies driven by the sentiment of financial news headlines. We compare trading performance based on the original headlines with de-biased strategies in which we remove the relevant company's identifiers from the text. In-sample (within the LLM training window), we find, surprisingly, that the anonymized headlines outperform, indicating that the distraction effect has a greater impact than look-ahead bias. This tendency is particularly strong for larger companies --- companies about which we expect an LLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a concern but distraction remains possible. Our proposed anonymization procedure is therefore potentially useful in out-of-sample implementation, as well as for de-biased backtesting.
Regulatory stress tests have become the primary tool for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios … Regulatory stress tests have become the primary tool for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We present evidence that the overall impact can be material. We also discuss extensions to nonlinear models.
Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for … Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We present evidence that the overall impact can be material. We also discuss extensions to nonlinear models.
We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of … We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of the other remains fixed. The coefficients of the classifier minimize an empirical loss specified through a weight function. We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite almost sure limit under infinite imbalance, extending prior work on logistic regression. The limit depends on the left-tail growth rate of the weight function, for which we distinguish two cases: subexponential and exponential. The limiting coefficient vectors reflect robustness or conservatism properties in the sense that they optimize against certain worst-case alternatives. In the subexponential case, the limit is equivalent to an implicit choice of upsampling distribution for the minority class. We apply these ideas in a credit risk setting, with particular emphasis on performance in the high-sensitivity and high-specificity regions.
We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of … We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of the other remains fixed. The coefficients of the classifier minimize an expected loss specified through a weight function. We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite limit under infinite imbalance, extending prior work on logistic regression. The limit depends on the left tail of the weight function, for which we distinguish three cases: bounded, asymptotically polynomial, and asymptotically exponential. The limiting coefficient vectors reflect robustness or conservatism properties in the sense that they optimize against certain worst-case alternatives. In the bounded and polynomial cases, the limit is equivalent to an implicit choice of upsampling distribution for the minority class. We apply these ideas in a credit risk setting, with particular emphasis on performance in the high-sensitivity and high-specificity regions.
We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling … We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM algorithm will often overfit returns to the detriment of the topic model. We obtain better out-of-sample performance through a random search of plain LDA models. A branching procedure that reinforces effective topic assignments often performs best. We test these methods on an archive of over 90,000 news articles about S&P 500 firms.
We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling … We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM algorithm will often overfit returns to the detriment of the topic model. We obtain better out-of-sample performance through a random search of plain LDA models. A branching procedure that reinforces effective topic assignments often performs best. We test methods on an archive of over 90,000 news articles about S&P 500 firms.
We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling … We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM algorithm will often overfit returns to the detriment of the topic model. We obtain better out-of-sample performance through a random search of plain LDA models. A branching procedure that reinforces effective topic assignments often performs best. We test methods on an archive of over 90,000 news articles about S&P 500 firms.
We analyze covariance matrix estimation from the perspective of market risk management, where the goal is to obtain accurate estimates of portfolio risk across essentially all portfolios—even those with small … We analyze covariance matrix estimation from the perspective of market risk management, where the goal is to obtain accurate estimates of portfolio risk across essentially all portfolios—even those with small standard deviations. We propose a simple but effective visualisation tool to assess bias across a wide range of portfolios. We employ a portfolio perspective to determine covariance matrix loss functions particularly suitable for market risk management. Proper regularisation of the covariance matrix estimate significantly improves performance. These methods are applied to credit default swaps, for which covariance matrices are used to set portfolio margin requirements for central clearing. Among the methods we test, the graphical lasso estimator performs particularly well. The graphical lasso and a hierarchical clustering estimator also yield economically meaningful representations of market structure through a graphical model and a hierarchy, respectively.
Financial risk measurement relies on models of prices and other market variables, but models inevitably rely on imperfect assumptions and estimates, creating model risk. Moreover, optimization decisions, such as portfolio … Financial risk measurement relies on models of prices and other market variables, but models inevitably rely on imperfect assumptions and estimates, creating model risk. Moreover, optimization decisions, such as portfolio selection, amplify the effect of model error. In this work, we develop a framework for quantifying the impact of model error and for measuring and minimizing risk in a way that is robust to model error. This robust approach starts from a baseline model and finds the worst-case error in risk measurement that would be incurred through a deviation from the baseline model, given a precise constraint on the plausibility of the deviation. Using relative entropy to constrain model distance leads to an explicit characterization of worst-case model errors; this characterization lends itself to Monte Carlo simulation, allowing straightforward calculation of bounds on model error with very little computational effort beyond that required to evaluate performance under the baseline nominal model. This approach goes well beyond the effect of errors in parameter estimates to consider errors in the underlying stochastic assumptions of the model and to characterize the greatest vulnerabilities to error in a model. We apply this approach to problems of portfolio risk measurement, credit risk, delta hedging, and counterparty risk measured through credit valuation adjustment.
Born in Sydney, Australia, on April 20, 1939, Chris Heyde shifted his interest from sport to mathematics thanks to inspiration from a schoolteacher. After earning an M.Sc. degree from the … Born in Sydney, Australia, on April 20, 1939, Chris Heyde shifted his interest from sport to mathematics thanks to inspiration from a schoolteacher. After earning an M.Sc. degree from the University of Sydney and a Ph.D. from the Australian National University (ANU), he began his academic career in the United States at Michigan State University, and then in the United Kingdom at the University of Sheffield and the University of Manchester. In 1968, Chris moved back to Australia to teach at ANU until 1975, when he joined CSIRO, where he was Acting Chief of the Division of Mathematics and Statistics. From 1983 to 1986, he was a Professor and Chairman of the Department of Statistics at the University of Melbourne. Chris then returned to ANU to become the Head of the Statistics Department, and later the Foundation Dean of the School of Mathematical Sciences (now the Mathematical Sciences Institute). Since 1993, he has also spent one semester each year teaching at the Department of Statistics, Columbia University, and has been the director of the Center for Applied Probability at Columbia University since its creation in 1993. Chris has been honored worldwide for his contributions in probability, statistics and the history of statistics. He is a Fellow of the International Statistical Institute and the Institute of Mathematical Statistics, and he is one of three people to be a member of both the Australian Academy of Science and the Australian Academy of Social Sciences. In 2003, he received the Order of Australia from the Australian government. He has been awarded the Pitman Medal and the Hannan Medal. Chris was conferred a D.Sc. honoris causa by University of Sydney in 1998. Chris has been very active in serving the statistical community, including as the Vice President of the International Statistical Institute, President of the Bernoulli Society and Vice President of the Australian Mathematical Society. He has served on numerous editorial boards, most notably as Editor of Stochastic Processes and Their Applications from 1983 to 1989, and as Editor-in-Chief of Journal of Applied Probability and Advances in Applied Probability since 1990. His research has spanned almost all areas of probability and statistics, ranging from random walks to branching processes, from martingales to quasi-likelihood inference, from genetics to option pricing, from queueing theory to long-range dependence. He has edited twelve books, and authored or co-authored three books, I. J. Bienaymé: Statistical Theory Anticipated (1977), with E. Seneta, Martingale Limit Theory and Its Application (1980), with P. Hall, and Quasi-Likelihood and Its Applications (1977). Chris Heyde has been an outstanding citizen and leader of the probability and statistics research community. The interview ranges over his education in Australia, moves to the USA and the UK, return to Australia, appointment at Columbia, major research contributions, and professional society and editorial activities. It ends with a look forward in time and some concerned comments about the future for statistics departments.
This paper compares methods for computing the distribution of loss from defaults in a credit portfolio. The methods are applied in the Gaussian copula framework for credit risk and take … This paper compares methods for computing the distribution of loss from defaults in a credit portfolio. The methods are applied in the Gaussian copula framework for credit risk and take advantage of the conditional independence of defaults in this framework. As a benchmark we use vanilla Monte Carlo simulation to estimate the tail probabilities of the total losses of the credit portfolio. The first method to be compared is a recursive algorithm to obtain the exact distribution of the total loss of the portfolio, conditional on observed values for the systematic risk factors. Then, we apply the saddlepoint approximation to the distribution of the losses, which has proven to give very accurate approximations in the tail. Finally, the method of numerically inverting the Laplace transform of the tail distribution of the losses of the credit portfolio, conditional on observed systematic risk factors, is combined with Euler summation to obtain an approximation. We compare and rank these methods in terms of mean square errors for a fixed computing time. Perhaps surprisingly, we find that vanilla Monte Carlo is hard to beat.
A general approach to improving simulation accuracy uses information about auxiliary control variables with known expected values to improve the estimation of unknown quantities. We analyze weighted Monte Carlo estimators … A general approach to improving simulation accuracy uses information about auxiliary control variables with known expected values to improve the estimation of unknown quantities. We analyze weighted Monte Carlo estimators that implement this idea by applying weights to independent replications. The weights are chosen to constrain the weighted averages of the control variables. We distinguish two cases (unbiased and biased), depending on whether the weighted averages of the controls are constrained to equal their expected values or some other values. In both cases, the number of constraints is usually smaller than the number of replications, so there may be many feasible weights. We select maximally uniform weights by minimizing a separable convex function of the weights subject to the control variable constraints. Estimators of this form arise (sometimes implicitly) in several settings, including at least two in finance: calibrating a model to market data (as in the work of Avellaneda et al. 2001) and calculating conditional expectations to price American options. We analyze properties of these estimators as the number of replications increases. We show that in the unbiased case, weighted Monte Carlo reduces asymptotic variance, and that all convex objective functions within a large class produce estimators that are very close to each other in a strong sense. In contrast, in the biased case the choice of objective function does matter. We show explicitly how the choice of objective determines the limit to which the estimator converges.
An American option grants the holder the right to select the time at which to exercise the option, so pricing an American option entails solving an optimal stopping problem. Difficulties … An American option grants the holder the right to select the time at which to exercise the option, so pricing an American option entails solving an optimal stopping problem. Difficulties in applying standard numerical methods to complex pricing problems have motivated the development of techniques that combine Monte Carlo simulation with dynamic programming. One class of methods approximates the option value at each time using a linear combination of basis functions, and combines Monte Carlo with backward induction to estimate optimal coefficients in each approximation. We analyze the convergence of such a method as both the number of basis functions and the number of simulated paths increase. We get explicit results when the basis functions are polynomials and the underlying process is either Brownian motion or geometric Brownian motion. We show that the number of paths required for worst-case convergence grows exponentially in the degree of the approximating polynomials in the case of Brownian motion and faster in the case of geometric Brownian motion.
This paper proves a convergence result for a discretization scheme for simulating jump–diffusion processes with state–dependent jump intensities. With a bound on the intensity, the point process of jump times … This paper proves a convergence result for a discretization scheme for simulating jump–diffusion processes with state–dependent jump intensities. With a bound on the intensity, the point process of jump times can be constructed by thinning a Poisson random measure using state–dependent thinning probabilities. Between the jump epochs of the Poisson random measure, the dynamics of the constructed process are purely diffusive and may be simulated using standard discretization methods. Under conditions on the coefficient functions of the jump–diffusion process, we show that the weak convergence order of this method equals the weak convergence order of the scheme used for the purely diffusive intervals: the construction of jumps does not degrade the convergence of the method.
An optimal coupling is a bivariate distribution with specified marginals achieving maximal correlation. We show that optimal couplings are totally positive and, in fact, satisfy a strictly stronger condition we … An optimal coupling is a bivariate distribution with specified marginals achieving maximal correlation. We show that optimal couplings are totally positive and, in fact, satisfy a strictly stronger condition we call the nonintersection property. For discrete distributions we illustrate the equivalence between optimal coupling and a certain transportation problem. Specifically, the optimal solutions of greedily-solvable transportation problems are totally positive, and even nonintersecting, through a rearrangement of matrix entries that results in a Monge sequence. In coupling continuous random variables or random vectors, we exploit a characterization of optimal couplings in terms of subgradients of a closed convex function to establish a generalization of the nonintersection property. We argue that nonintersection is not only stronger than total positivity, it is the more natural concept for the singular distributions that arise in coupling continuous random variables.
An optimal coupling is a bivariate distribution with specified marginals achieving maximal correlation. We show that optimal couplings are totally positive and, in fact, satisfy a strictly stronger condition we … An optimal coupling is a bivariate distribution with specified marginals achieving maximal correlation. We show that optimal couplings are totally positive and, in fact, satisfy a strictly stronger condition we call the nonintersection property. For discrete distributions we illustrate the equivalence between optimal coupling and a certain transportation problem. Specifically, the optimal solutions of greedily-solvable transportation problems are totally positive, and even nonintersecting, through a rearrangement of matrix entries that results in a Monge sequence. In coupling continuous random variables or random vectors, we exploit a characterization of optimal couplings in terms of subgradients of a closed convex function to establish a generalization of the nonintersection property. We argue that nonintersection is not only stronger than total positivity, it is the more natural concept for the singular distributions that arise in coupling continuous random variables.
This paper develops formulas for pricing caps and swaptions in Libor market models with jumps. The arbitrage-free dynamics of this class of models were characterized in Glasserman and Kou (2003) … This paper develops formulas for pricing caps and swaptions in Libor market models with jumps. The arbitrage-free dynamics of this class of models were characterized in Glasserman and Kou (2003) in a framework allowing for very general jump processes. For computational purposes, it is convenient to model jump times as Poisson processes; however, the Poisson property is not preserved under the changes of measure commonly used to derive prices in the Libor market model framework. In particular, jumps cannot be Poisson under both a forward measure and the spot measure, and this complicates pricing. To develop pricing formulas, we approximate the dynamics of a forward rate or swap rate using a scalar jump-diffusion process with time-varying parameters. We develop an exact formula for the price of an option on this jump-diffusion through explicit inversion of a Fourier transform. We then use this formula to price caps and swaptions by choosing the parameters of the scalar diffusion to approximate the arbitrage-free dynamics of the underlying forward or swap rate. We apply this method to two classes of models: one in which the jumps in all forward rates are Poisson under the spot measure, and one in which the jumps in each forward rate are Poisson under its associated forward measure. Numerical examples demonstrate the accuracy of the approximations.
We analyze the performance of a multilevel splitting method for rare event simulation related to one recently proposed in the telecommunications literature. This method splits promising paths into subpaths at … We analyze the performance of a multilevel splitting method for rare event simulation related to one recently proposed in the telecommunications literature. This method splits promising paths into subpaths at intermediate levels to increase the number of observations of a rare event. In our previous paper (1997) we gave sufficient conditions, in specific classes of models, for this method to be asymptotically optimal; here we focus on necessary conditions in a general setting. We show, through a variety of results, the importance of choosing the intermediate thresholds in a way consistent with the most likely path to a rare set, both when the number of levels is fixed and when it increases with the rarity of the event. In the latter case, we give very general necessary conditions based on large deviations rate functions. These indicate that even when the intermediate levels are chosen appropriately, the method will frequently fail to be asymptotically optimal. We illustrate the conditions with examples.
A guiding principle in the efficient estimation of rare-event probabilities by Monte Carlo is that importance sampling based on the change of measure suggested by a large deviations analysis can … A guiding principle in the efficient estimation of rare-event probabilities by Monte Carlo is that importance sampling based on the change of measure suggested by a large deviations analysis can reduce variance by many orders of magnitude. In a variety of settings, this approach has led to estimators that are optimal in an asymptotic sense. We give examples, however, in which importance sampling estimators based on a large deviations change of measure have provably poor performance. The estimators can have variance that decreases at a slower rate than a naive estimator, variance that increases with the rarity of the event, and even infinite variance. For each example, we provide an alternative estimator with provably efficient performance. A common feature of our examples is that they allow more than one way for a rare event to occur; our alternative estimators give explicit weight to lower probability paths neglected by leading-term asymptotics.
An approach to rare event simulation uses the technique of splitting. The basic idea is to split sample paths of the stochastic process into multiple copies when they approach closer … An approach to rare event simulation uses the technique of splitting. The basic idea is to split sample paths of the stochastic process into multiple copies when they approach closer to the rare set; this increases the overall number of hits to the rare set for a given amount of simulation time. This paper analyzes the bias and efficiency of some simple cases of this method.
We give a unified presentation of stability results for stochastic vector difference equations based on various choices of binary operations and , assuming that are stationary and ergodic. In the … We give a unified presentation of stability results for stochastic vector difference equations based on various choices of binary operations and , assuming that are stationary and ergodic. In the scalar case, under standard addition and multiplication, the key condition for stability is E [log | A 0 |] &lt; 0. In the generalizations, the condition takes the form γ&lt; 0, where γis the limit of a subadditive process associated with . Under this and mild additional conditions, the process has a unique finite stationary distribution to which it converges from all initial conditions. The variants of standard matrix algebra we consider replace the operations + and × with (max, +), (max,×), (min, +), or (min,×). In each case, the appropriate stability condition parallels that for the standard recursions, involving certain subadditive limits. Since these limits are difficult to evaluate, we provide bounds, thus giving alternative, computable conditions for stability.
Computational methods play an important role in modern finance. Through the theory of arbitrage-free pricing, the price of a derivative security can be expressed as the expected value of its … Computational methods play an important role in modern finance. Through the theory of arbitrage-free pricing, the price of a derivative security can be expressed as the expected value of its payouts under a particular probability measure. The resulting integral becomes quite complicated if there are several state variables or if payouts are path-dependent. Simulation has proved to be a valuable tool for these calculations. This paper summarizes some of the recent applications and developments of the Monte Carlo method to security pricing problems.
We give a unified presentation of stability results for stochastic vector difference equations based on various choices of binary operations and , assuming that are stationary and ergodic. In the … We give a unified presentation of stability results for stochastic vector difference equations based on various choices of binary operations and , assuming that are stationary and ergodic. In the scalar case, under standard addition and multiplication, the key condition for stability is E [log | A 0 |] &amp;lt; 0. In the generalizations, the condition takes the form γ&amp;lt; 0, where γis the limit of a subadditive process associated with . Under this and mild additional conditions, the process has a unique finite stationary distribution to which it converges from all initial conditions. The variants of standard matrix algebra we consider replace the operations + and × with (max, +), (max,×), (min, +), or (min,×). In each case, the appropriate stability condition parallels that for the standard recursions, involving certain subadditive limits. Since these limits are difficult to evaluate, we provide bounds, thus giving alternative, computable conditions for stability.
We analyze the performance of an importance sampling estimator for a rare-event probability in tandem Jackson networks. The rare event we consider corresponds to the network population reaching K before … We analyze the performance of an importance sampling estimator for a rare-event probability in tandem Jackson networks. The rare event we consider corresponds to the network population reaching K before returning to ø, starting from ø, with K large. The estimator we study is based on interchanging the arrival rate and the smallest service rate and is therefore a generalization of the asymptotically optimal estimator for an M/M/1 queue. We examine its asymptotic performance for large K , showing that in certain parameter regions the estimator has an asymptotic efficiency property, but that in other regions it does not. The setting we consider is perhaps the simplest case of a rare-event simulation problem in which boundaries on the state space play a significant role.
By a filtered Monte Carlo estimator we mean one whose constituent parts—summands or integral increments—are conditioned on an increasing family of σ-fields. Unbiased estimators of this type are suggested by … By a filtered Monte Carlo estimator we mean one whose constituent parts—summands or integral increments—are conditioned on an increasing family of σ-fields. Unbiased estimators of this type are suggested by compensator identities. Replacing a point-process integrator with its intensity gives rise to one class of examples; exploiting Levy's formula gives rise to another. We establish variance inequalities complementing compensator identities. Among estimators that are (Stieltjes) stochastic integrals, we show that filtering reduces variance if the integrand and the increments of the integrator have conditional positive correlation. We also provide more primitive hypotheses that ensure this condition, making use of stochastic monotonicity properties. Our most detailed conditions apply in a Markov setting where monotone, up-down, and convex generators play a central role. We give examples. As an application of our results, we compare certain estimators that do and do not exploit the property that Poisson arrivals see time averages.
We derive conditions under which the increments of a vector process are associated — i.e. under which all pairs of increasing functions of the increments are positively correlated. The process … We derive conditions under which the increments of a vector process are associated — i.e. under which all pairs of increasing functions of the increments are positively correlated. The process itself is associated if it is generated by a family of associated and monotone kernels. We show that the increments are associated if the kernels are associated and, in a suitable sense, convex. In the Markov case, we note a connection between associated increments and temporal stochastic convexity. Our analysis is motivated by a question in variance reduction: assuming that a normalized process and its normalized compensator converge to the same value, which is the better estimator of that limit? Under some additional hypotheses we show that, for processes with conditionally associated increments, the compensator has smaller asymptotic variance.
We derive conditions under which the increments of a vector process are associated — i.e. under which all pairs of increasing functions of the increments are positively correlated. The process … We derive conditions under which the increments of a vector process are associated — i.e. under which all pairs of increasing functions of the increments are positively correlated. The process itself is associated if it is generated by a family of associated and monotone kernels. We show that the increments are associated if the kernels are associated and, in a suitable sense, convex. In the Markov case, we note a connection between associated increments and temporal stochastic convexity. Our analysis is motivated by a question in variance reduction: assuming that a normalized process and its normalized compensator converge to the same value, which is the better estimator of that limit? Under some additional hypotheses we show that, for processes with conditionally associated increments, the compensator has smaller asymptotic variance.
A generalized semi-Markov scheme models the structure of a discrete event system, such as a network of queues. By studying combinatorial and geometric representations of schemes we find conditions for … A generalized semi-Markov scheme models the structure of a discrete event system, such as a network of queues. By studying combinatorial and geometric representations of schemes we find conditions for second-order properties—convexity/concavity, sub/supermodularity—of their event epochs and event counting processes. A scheme generates a language of feasible strings of events. We show that monotonicity of the event epochs is equivalent to this language forming an antimatroid with repetition. This connection gives rise to a rich combinatorial structure, and serves as a starting point for other properties. For example, by strengthening the antimatroid condition we give several equivalent characterizations of the convexity of event epochs within a scheme. All of these correspond, in slightly different ways, to making a certain score space a lattice, to closing an ordinary antimatroid under intersections. We also establish second-order properties across schemes tied together through a synchronization mechanism. A geometric view based on the score space facilitates verification of these properties in certain queueing systems.
Countable-state, continuous-time Markov chains are often analyzed through simulation when simple analytical expressions are unavailable. Simulation is typically used to estimate costs or performance measures associated with the chain and … Countable-state, continuous-time Markov chains are often analyzed through simulation when simple analytical expressions are unavailable. Simulation is typically used to estimate costs or performance measures associated with the chain and also characteristics like state probabilities and mean passage times. Here we consider the problem of estimating derivatives of these types of quantities with respect to a parameter of the process. In particular, we consider the case where some or all transition rates depend on a parameter. We derive derivative estimates of the infinitesimal perturbation analysis type for Markov chains satisfying a simple condition, and argue that the condition has significant scope. The unbiasedness of these estimates may be surprising—a “naive” estimator would fail in our setting. What makes our estimates work is a special construction of specially structured parameteric families of Markov chains. In addition to proving unbiasedness, we consider a variance reduction technique and make comparisions with derivative estimates based on likelihood ratios.
Article Free Access Share on Correlation of Markov chains simulated in parallel Authors: Paul Glasserman View Profile , Pirooz Vakili View Profile Authors Info & Claims WSC '92: Proceedings of … Article Free Access Share on Correlation of Markov chains simulated in parallel Authors: Paul Glasserman View Profile , Pirooz Vakili View Profile Authors Info & Claims WSC '92: Proceedings of the 24th conference on Winter simulationDecember 1992Pages 475–482https://doi.org/10.1145/167293.167404Published:01 December 1992Publication History 4citation427DownloadsMetricsTotal Citations4Total Downloads427Last 12 Months10Last 6 weeks2 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
Article Free Access Share on Gradient estimation for regenerative processes Authors: Paul Glasserman View Profile , Peter W. Glynn View Profile Authors Info & Claims WSC '92: Proceedings of the … Article Free Access Share on Gradient estimation for regenerative processes Authors: Paul Glasserman View Profile , Peter W. Glynn View Profile Authors Info & Claims WSC '92: Proceedings of the 24th conference on Winter simulationDecember 1992 Pages 280–288https://doi.org/10.1145/167293.167350Online:01 December 1992Publication History 12citation141DownloadsMetricsTotal Citations12Total Downloads141Last 12 Months7Last 6 weeks1 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
A gradient-estimation procedure for a general class of stochastic discrete-event systems is developed. In contrast to most previous work, the authors focus on performance measures whose realizations are inherently discontinuous … A gradient-estimation procedure for a general class of stochastic discrete-event systems is developed. In contrast to most previous work, the authors focus on performance measures whose realizations are inherently discontinuous (in fact, piecewise constant) functions of the parameter of differentiation. Two broad classes of finite-horizon discontinuous performance measures arising naturally in applications are considered. Because of their discontinuity, these important classes of performance measures are not susceptible to infinitesimal perturbation analysis (IPA). Instead, the authors apply smoothed perturbation analysis, formalizing it and generalizing it in the process. Smoothed perturbation analysis uses conditional expectations to smooth jumps. The resulting gradient estimator involves two factors: the conditional rate at which jumps occur, and the expected effect of a jump. Among the types of performance measures to which the methods can be applied are transient state probabilities, finite-horizon throughputs, distributions on arrival, and expected terminal cost.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
We develop two methods for estimating derivatives of expectations from simulation of functions whose realizations are discontinuous in the parameter of differentiation. We take as motivating example the estimation of … We develop two methods for estimating derivatives of expectations from simulation of functions whose realizations are discontinuous in the parameter of differentiation. We take as motivating example the estimation of the sensitivity of expected terminal reward for processes on discrete state spaces. Both our methods use conditional expectations to smooth discontinuites. The first smooths the dependence on the differentiation parameter, while the second smooths dependence on the time parameter. The methods are illustrated through examples, including stochastic networks, networks of queues, and Markov processes.
A decision-theoretic framework is proposed for evaluating the efficiency of simulation estimators. The framework includes the cost of obtaining the estimate as well as the cost of acting based on … A decision-theoretic framework is proposed for evaluating the efficiency of simulation estimators. The framework includes the cost of obtaining the estimate as well as the cost of acting based on the estimate. The cost of obtaining the estimate and the estimate itself are represented as realizations of jointly distributed stochastic processes. In this context, the efficiency of a simulation estimator based on a given computational budget is defined as the reciprocal of the risk (the overall expected cost). This framework is appealing philosophically, but it is often difficult to apply in practice (e.g., to compare the efficiency of two different estimators) because only rarely can the efficiency associated with a given computational budget be calculated. However, a useful practical framework emerges in a large sample context when we consider the limiting behavior as the computational budget increases. A limit theorem established for this model supports and extends a fairly well known efficiency principle, proposed by J. M. Hammersley and D. C. Handscomb: “The efficiency of a Monte Carlo process may be taken as inversely proportional to the product of the sampling variance and the amount of labour expended in obtaining this estimate.”
In this paper we discuss characterizations, basic properties and applications of a partial ordering, in the set of probabilities on a partially ordered Polish space $E$, defined by $P_1 \prec … In this paper we discuss characterizations, basic properties and applications of a partial ordering, in the set of probabilities on a partially ordered Polish space $E$, defined by $P_1 \prec P_2 \operatorname{iff} \int f dP_1\leqq \int f dP_2$ for all real bounded increasing $f$. A result of Strassen implies that $P_1 \prec P_2$ is equivalent to the existence of $E$-valued random variables $X_1 \leqq X_2$ with distributions $P_1$ and $P_2$. After treating similar characterizations we relate the convergence properties of $P_1 \prec P_2 \prec \cdots$ to those of the associated $X_1 \leqq X_2 \leqq \cdots$. The principal purpose of the paper is to apply the basic characterization to the problem of comparison of stochastic processes and to the question of the computation of the $\bar{d}-$distance (defined by Ornstein) of stationary processes. In particular we get a generalization of the comparison theorem of O'Brien to vector-valued processes. The method also allows us to treat processes with continuous time parameter and with paths in $D\lbrack 0, 1\rbrack$.
The problem of using importance sampling to estimate the average time to buffer overflow in a stable GI/GI/m queue is considered. Using the notion of busy cycles, estimation of the … The problem of using importance sampling to estimate the average time to buffer overflow in a stable GI/GI/m queue is considered. Using the notion of busy cycles, estimation of the expected time to buffer overflow is reduced to the problem of estimating p/sub n/=P (buffer overflow during a cycle) where n is the buffer size. The probability p/sub n/ is a large deviations probability (p/sub n/ vanishes exponentially fast as n to infinity ). A rigorous analysis of the method is presented. It is demonstrated that the exponentially twisted distribution of S. Parekh and J. Walrand (1989) has the following strong asymptotic-optimality property within the nonparametric class of all GI/GI importance sampling simulation distributions. As n to infinity , the computational cost of the optimal twisted distribution of large deviations theory grows less than exponentially fast, and conversely, all other GI/GI simulation distributions incur a computational cost that grows with strictly positive exponential rate.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
A theorem of Harris states that a monotone Markov process on a finite partially ordered set has positive correlations at time $t$ (assuming positive correlations at time 0) if and … A theorem of Harris states that a monotone Markov process on a finite partially ordered set has positive correlations at time $t$ (assuming positive correlations at time 0) if and only if each jump of the process is either up or down. A new proof of the sufficiency of the jump condition is presented.
We show that under the (sufficient) conditions usually given for infinitesimal perturbation analysis (IPA) to apply for derivative estimation, a finite-difference scheme with common random numbers (FDC) has the same … We show that under the (sufficient) conditions usually given for infinitesimal perturbation analysis (IPA) to apply for derivative estimation, a finite-difference scheme with common random numbers (FDC) has the same order of convergence, namely O(n −1/2 ), provided that the size of the finite-difference interval converges to zero fast enough. This holds for both one- and two-sided FDC. This also holds for different variants of IPA, such as some versions of smoothed perturbation analysis (SPA), which is based on conditional expectation. Finally, this also holds for the estimation of steady-state performance measures by truncated-horizon estimators, under some ergodicity assumptions. Our developments do not involve monotonicity, but are based on continuity and smoothness. We give examples and numerical illustrations which show that the actual difference in mean square error (MSE) between IPA and FDC is typically negligible. We also obtain the order of convergence of that difference, which is faster than the convergence of the MSE to zero.
Let $x_1, x_2,\cdots$ be independent random variables which under $P_\theta$ have probability density function of the form $P_\theta\{x_k \in dx\} = \exp(\theta x - \Psi(\theta)) dH(x)$, where $\Psi$ is normalized … Let $x_1, x_2,\cdots$ be independent random variables which under $P_\theta$ have probability density function of the form $P_\theta\{x_k \in dx\} = \exp(\theta x - \Psi(\theta)) dH(x)$, where $\Psi$ is normalized so that $\Psi(0) = \Psi'(0) = 0.$ Let $a \leqq 0 < b, s_n = \sum^n_1 x_k$, and $T = \inf \{n: s_n \not\in (a, b)\}.$ For $u < 0$, an unbiased Monte Carlo estimate of $P_u(s_T \geqq b)$ is the average of independent $P_\theta$-realizations of $I_{\{s_T \geqq b\}} \exp\{(u - \theta)s_T - T(\Psi(u) - \Psi(\theta))\}$. It is shown that the choice $\theta = w$, where $w > 0$ is defined by $\Psi(w) = \Psi(u)$, is an asymptotically (as $b \rightarrow \infty)$ optimal choice of $\theta$ in a sense to be defined. Implications of this result for Monte Carlo studies in sequential analysis are discussed.
Preface 1. Monte Carlo methods and Quasi-Monte Carlo methods 2. Quasi-Monte Carlo methods for numerical integration 3. Low-discrepancy point sets and sequences 4. Nets and (t,s)-sequences 5. Lattice rules for … Preface 1. Monte Carlo methods and Quasi-Monte Carlo methods 2. Quasi-Monte Carlo methods for numerical integration 3. Low-discrepancy point sets and sequences 4. Nets and (t,s)-sequences 5. Lattice rules for numerical integration 6. Quasi- Monte Carlo methods for optimization 7. Random numbers and pseudorandom numbers 8. Nonlinear congruential pseudorandom numbers 9. Shift-Register pseudorandom numbers 10. Pseudorandom vector generation Appendix A. Finite fields and linear recurring sequences Appendix B. Continued fractions Bibliography Index.
We analyze the performance of an importance sampling estimator for a rare-event probability in tandem Jackson networks. The rare event we consider corresponds to the network population reaching K before … We analyze the performance of an importance sampling estimator for a rare-event probability in tandem Jackson networks. The rare event we consider corresponds to the network population reaching K before returning to ø, starting from ø, with K large. The estimator we study is based on interchanging the arrival rate and the smallest service rate and is therefore a generalization of the asymptotically optimal estimator for an M/M/1 queue. We examine its asymptotic performance for large K , showing that in certain parameter regions the estimator has an asymptotic efficiency property, but that in other regions it does not. The setting we consider is perhaps the simplest case of a rare-event simulation problem in which boundaries on the state space play a significant role.
We study the links between the likelihood-ratio (LR) gradient-estimation technique (sometimes called the score-function (SF) method), and infinitesimal perturbation analysis (IPA). We show how IPA can be viewed as a … We study the links between the likelihood-ratio (LR) gradient-estimation technique (sometimes called the score-function (SF) method), and infinitesimal perturbation analysis (IPA). We show how IPA can be viewed as a (degenerate) special case of the LR and SF techniques by selecting an appropriate representation of the underlying sample space for a given simulation experiment. We also show how different definitions of the sample space yield different variants of the LR method, some of them mixing IPA with more straightforward LR. We illustrate this by many examples. We also give sufficient conditions under which the gradient estimators are unbiased.
The expectations of certain integrals of functionals of continuous-time Markov chains over a finite horizon, fixed or random, are estimated via simulation. By computing conditional expectations given the sequence of … The expectations of certain integrals of functionals of continuous-time Markov chains over a finite horizon, fixed or random, are estimated via simulation. By computing conditional expectations given the sequence of states visited (and possibly other information), variance is reduced. This is discrete-time conversion. Efficiency is increased further by combining discrete-time conversion with stratification and splitting.
Monte Carlo simulation is one alternative for analyzing options markets when the assumptions of simpler analytical models are violated. We introduce techniques for the sensitivity analysis of option pricing, which … Monte Carlo simulation is one alternative for analyzing options markets when the assumptions of simpler analytical models are violated. We introduce techniques for the sensitivity analysis of option pricing, which can be efficiently carried out in the simulation. In particular, using these techniques, a single run of the simulation would often provide not only an estimate of the option value but also estimates of the sensitivities of the option value to various parameters of the model. Both European and American options are considered, starting with simple analytically tractable models to present the idea and proceeding to more complicated examples. We then propose an approach for the pricing of options with early exercise features by incorporating the gradient estimates in an iterative stochastic approximation algorithm. The procedure is illustrated in a simple example estimating the option value of an American call. Numerical results indicate that the additional computational effort required over that required to estimate a European option is relatively small.
The Construction, and Other General Results.- Some Basic Tools.- Spin Systems.- Stochastic Ising Models.- The Voter Model.- The Contact Process.- Nearest-Particle Systems.- The Exclusion Process.- Linear Systems with Values in … The Construction, and Other General Results.- Some Basic Tools.- Spin Systems.- Stochastic Ising Models.- The Voter Model.- The Contact Process.- Nearest-Particle Systems.- The Exclusion Process.- Linear Systems with Values in [0, ?)s.
Abstract The backfitting algorithm is an iterative procedure for fitting additive models in which, at each step, one component is estimated keeping the other components fixed, the algorithm proceeding component … Abstract The backfitting algorithm is an iterative procedure for fitting additive models in which, at each step, one component is estimated keeping the other components fixed, the algorithm proceeding component by component and iterating until convergence. Convergence of the algorithm has been studied by Buja, Hastie, and Tibshirani (1989). We give a simple, but more general, geometric proof of the convergence of the backfitting algorithm when the additive components are estimated by penalized least squares. Our treatment covers spline smoothers and structural time series models, and we give a full discussion of the degenerate case. Our proof is based on Halperin's (1962) generalization of von Neumann's alternating projection theorem.
Pathwise probabilistic arguments are used to justify a simple rule of thumb by which buffer allocation can be carried out. The model for the underlying network is the skeleton of … Pathwise probabilistic arguments are used to justify a simple rule of thumb by which buffer allocation can be carried out. The model for the underlying network is the skeleton of an open Jackson network. The problem of how to distribute in the best possible way a fixed number N of available buffer spaces among the nodes of the network is considered. The goal is to optimize some performance criterion associated with the time to buffer overflow, such as its mean or the probability that it exceeds some value. It is argued that for any such performance criterion the assignment should be done roughly in inverse proportion to the logarithms of the effective service rates at the nodes. Effective service means the ratio of the service rate to the stationary arrival rate at the node in the network with inifinite buffers.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
We compare empirically accuracy and speed of low-discrepancy sequence generators of Sobol' and Faure. These generators are useful for multidimensional integration and global optimization. We discuss our implementation of the … We compare empirically accuracy and speed of low-discrepancy sequence generators of Sobol' and Faure. These generators are useful for multidimensional integration and global optimization. We discuss our implementation of the Sobol' generator.
In this paper, we discuss some research issues related to the general topic of optimizing a stochastic system via simulation. In particular, we devote extensive attention to finite-difference estimators of … In this paper, we discuss some research issues related to the general topic of optimizing a stochastic system via simulation. In particular, we devote extensive attention to finite-difference estimators of objective function gradients and present a number of new limit theorems. We also discuss a new family of orthogonal function approximations to the global behavior of the objective function. We show that if the objective function is sufficiently smooth, the convergence rate can be made arbitrarily close to n-1/2 in the number of observations required. The paper concludes with a brief discussion of how these ideas can be integrated into an optimization algorithm.
A gradient-estimation procedure for a general class of stochastic discrete-event systems is developed. In contrast to most previous work, the authors focus on performance measures whose realizations are inherently discontinuous … A gradient-estimation procedure for a general class of stochastic discrete-event systems is developed. In contrast to most previous work, the authors focus on performance measures whose realizations are inherently discontinuous (in fact, piecewise constant) functions of the parameter of differentiation. Two broad classes of finite-horizon discontinuous performance measures arising naturally in applications are considered. Because of their discontinuity, these important classes of performance measures are not susceptible to infinitesimal perturbation analysis (IPA). Instead, the authors apply smoothed perturbation analysis, formalizing it and generalizing it in the process. Smoothed perturbation analysis uses conditional expectations to smooth jumps. The resulting gradient estimator involves two factors: the conditional rate at which jumps occur, and the expected effect of a jump. Among the types of performance measures to which the methods can be applied are transient state probabilities, finite-horizon throughputs, distributions on arrival, and expected terminal cost.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
A generalized semi-Markov scheme models the structure of a discrete event system, such as a network of queues. By studying combinatorial and geometric representations of schemes we find conditions for … A generalized semi-Markov scheme models the structure of a discrete event system, such as a network of queues. By studying combinatorial and geometric representations of schemes we find conditions for second-order properties—convexity/concavity, sub/supermodularity—of their event epochs and event counting processes. A scheme generates a language of feasible strings of events. We show that monotonicity of the event epochs is equivalent to this language forming an antimatroid with repetition. This connection gives rise to a rich combinatorial structure, and serves as a starting point for other properties. For example, by strengthening the antimatroid condition we give several equivalent characterizations of the convexity of event epochs within a scheme. All of these correspond, in slightly different ways, to making a certain score space a lattice, to closing an ordinary antimatroid under intersections. We also establish second-order properties across schemes tied together through a synchronization mechanism. A geometric view based on the score space facilitates verification of these properties in certain queueing systems.
Estimation of the large deviations probability p/sub n/=P(S/sub n/>or= gamma n) via importance sampling is considered, where S/sub n/ is a sum of n i.i.d. random variables. It has been … Estimation of the large deviations probability p/sub n/=P(S/sub n/>or= gamma n) via importance sampling is considered, where S/sub n/ is a sum of n i.i.d. random variables. It has been previously shown that within the nonparametric candidate family of all i.i.d. (or, more generally, Markov) distributions, the optimized exponentially twisted distribution is the unique asymptotically optimal sampling distribution. As n to infinity , the sampling cost required to stabilize the normalized variance grows with strictly positive exponential rate for any suboptimal sampling distribution, while this sampling cost for the optimal exponentially twisted distribution is only O(n/sup 1/2/). Here, it is established that the optimality is actually much stronger. As n to infinity , this solution simultaneously stabilizes all error moments of both the sample mean and the sample variance estimators with sampling cost O(n/sup 1/2/). In addition, it is shown that the embedded parametric family of exponentially twisted distributions has a certain uniform asymptotic stability property. The technique is stable even if the optimal twisting parameter(s) cannot be precisely determined.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
In this paper we consider the multivariate equation $X_{n+1} = A_{n+1}X_n + B_{n+1}$ with i.i.d. coefficients which have only a logarithmic moment. We give a necessary and sufficient condition for … In this paper we consider the multivariate equation $X_{n+1} = A_{n+1}X_n + B_{n+1}$ with i.i.d. coefficients which have only a logarithmic moment. We give a necessary and sufficient condition for existence of a strictly stationary solution independent of the future. As an application we characterize the multivariate ARMA equations with general noise which have such a solution.
A constrained Monte Carlo problem arises when one computes an expectation in the presence of a priori computable constraints on the expectations of quantities that are correlated with the estimand. … A constrained Monte Carlo problem arises when one computes an expectation in the presence of a priori computable constraints on the expectations of quantities that are correlated with the estimand. This paper discusses different applications settings in which such constrained Monte Carlo computations arise, and establishes a close connection with the method of control variates when the constraints are of equality form.
A guiding principle in the efficient estimation of rare-event probabilities by Monte Carlo is that importance sampling based on the change of measure suggested by a large deviations analysis can … A guiding principle in the efficient estimation of rare-event probabilities by Monte Carlo is that importance sampling based on the change of measure suggested by a large deviations analysis can reduce variance by many orders of magnitude. In a variety of settings, this approach has led to estimators that are optimal in an asymptotic sense. We give examples, however, in which importance sampling estimators based on a large deviations change of measure have provably poor performance. The estimators can have variance that decreases at a slower rate than a naive estimator, variance that increases with the rarity of the event, and even infinite variance. For each example, we provide an alternative estimator with provably efficient performance. A common feature of our examples is that they allow more than one way for a rare event to occur; our alternative estimators give explicit weight to lower probability paths neglected by leading-term asymptotics.
We develop two methods for estimating derivatives of expectations from simulation of functions whose realizations are discontinuous in the parameter of differentiation. We take as motivating example the estimation of … We develop two methods for estimating derivatives of expectations from simulation of functions whose realizations are discontinuous in the parameter of differentiation. We take as motivating example the estimation of the sensitivity of expected terminal reward for processes on discrete state spaces. Both our methods use conditional expectations to smooth discontinuites. The first smooths the dependence on the differentiation parameter, while the second smooths dependence on the time parameter. The methods are illustrated through examples, including stochastic networks, networks of queues, and Markov processes.
An approach to rare event simulation uses the technique of splitting. The basic idea is to split sample paths of the stochastic process into multiple copies when they approach closer … An approach to rare event simulation uses the technique of splitting. The basic idea is to split sample paths of the stochastic process into multiple copies when they approach closer to the rare set; this increases the overall number of hits to the rare set for a given amount of simulation time. This paper analyzes the bias and efficiency of some simple cases of this method.
This article describes an efficient technique for estimating, via simulation, the probability of buffer overflows in a queueing model that arises in the analysis of ATM (Asynchronous Transfer Mode) communication … This article describes an efficient technique for estimating, via simulation, the probability of buffer overflows in a queueing model that arises in the analysis of ATM (Asynchronous Transfer Mode) communication switches. There are multiple streams of (autocorrelated) traffic feeding the switch that has a buffer of finite capacity. Each stream is designated as being of either high or low priority. When the queue length reaches a certain threshold, only high priority packets are admitted to the switch's buffer. The problem is to estimate the loss rate of high priority packets. An asymptotically optimal importance sampling approach is developed for this rare event simulation problem. In this approach, the importance sampling is done in two distinct phases. In the first phase, an importance sampling change of measure is used to bring the queue length up to the threshold at which low priority packets get rejected. In the second phase a different importance sampling change of measure is used to move the queue length from the threshold to the buffer capacity.
If $B_n$ denotes the time of the first birth in the $n$th generation of an age-dependent branching process of Crump-Mode type, then under a weak condition there is a constant … If $B_n$ denotes the time of the first birth in the $n$th generation of an age-dependent branching process of Crump-Mode type, then under a weak condition there is a constant $\gamma$ such that $B_n/n \rightarrow \gamma$ as $n \rightarrow \infty$, almost surely on the event of ultimate survival. This strengthens a result of Hammersley, who proved convergence in probability for the more special Bellman-Harris process. The proof depends on a class of martingales which arise from a `collective marks' argument.
In this note we deal with the stochastic difference equation of the form Y n +1 = A n Y n + B n , n ∊ℤ, where the sequence … In this note we deal with the stochastic difference equation of the form Y n +1 = A n Y n + B n , n ∊ℤ, where the sequence is assumed to be strictly stationary and ergodic. By means of simple arguments a unique stationary solution of this equation is constructed. The stability of the stationary solution is the second subject of investigation. It is shown that under some additional assumptions
Abstract In regression analysis the response variable Y and the predictor variables X 1 …, Xp are often replaced by functions θ(Y) and Ø1(X 1), …, Ø p (Xp ). … Abstract In regression analysis the response variable Y and the predictor variables X 1 …, Xp are often replaced by functions θ(Y) and Ø1(X 1), …, Ø p (Xp ). We discuss a procedure for estimating those functions θ and Ø1, …, Ø p that minimize e 2 = E{[θ(Y) — Σ Ø j (Xj )]2}/var[θ(Y)], given only a sample {(yk , xk1 , …, xkp ), 1 ⩽ k ⩽ N} and making minimal assumptions concerning the data distribution or the form of the solution functions. For the bivariate case, p = 1, θ and Ø satisfy ρ = p(θ, Ø) = maxθ,Øρ[θ(Y), Ø(X)], where ρ is the product moment correlation coefficient and ρ is the maximal correlation between X and Y. Our procedure thus also provides a method for estimating the maximal correlation between two variables.
In this paper, we introduce a notion of barycenter in the Wasserstein space which generalizes McCann's interpolation to the case of more than two measures. We provide existence, uniqueness, characterizations, … In this paper, we introduce a notion of barycenter in the Wasserstein space which generalizes McCann's interpolation to the case of more than two measures. We provide existence, uniqueness, characterizations, and regularity of the barycenter and relate it to the multimarginal optimal transport problem considered by Gangbo and Święch in [Comm. Pure Appl. Math., 51 (1998), pp. 23–45]. We also consider some examples and, in particular, rigorously solve the Gaussian case. We finally discuss convexity of functionals in the Wasserstein space.
The likelihood ratio method for gradient estimation is briefly surveyed. Two applications settings are described, namely Monte Carlo optimization and statistical analysis of complex stochastic systems. Steady-state gradient estimation is … The likelihood ratio method for gradient estimation is briefly surveyed. Two applications settings are described, namely Monte Carlo optimization and statistical analysis of complex stochastic systems. Steady-state gradient estimation is emphasized, and both regenerative and non-regenerative approaches are given. The paper also indicates how these methods apply to general discrete-event simulations; the idea is to view such systems as general state space Markov chains.
article Free AccessArtifacts Evaluated & ReusableArtifacts Available Share on Algorithm 647: Implementation and Relative Efficiency of Quasirandom Sequence Generators Author: Bennett L. Fox Computer Science Department, University of Montreal, P.O. … article Free AccessArtifacts Evaluated & ReusableArtifacts Available Share on Algorithm 647: Implementation and Relative Efficiency of Quasirandom Sequence Generators Author: Bennett L. Fox Computer Science Department, University of Montreal, P.O. Box 6128, Station A, Montreal, Quebec, Canada H3C 3J7 Computer Science Department, University of Montreal, P.O. Box 6128, Station A, Montreal, Quebec, Canada H3C 3J7View Profile Authors Info & Claims ACM Transactions on Mathematical SoftwareVolume 12Issue 4pp 362–376https://doi.org/10.1145/22721.356187Published:01 December 1986Publication History 117citation1,807DownloadsMetricsTotal Citations117Total Downloads1,807Last 12 Months78Last 6 weeks10 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
Let $X_1, X_2, \cdots$ be a strictly stationary second order sequence which is "associated"; i.e., is such that any two coordinatewise nondecreasing functions of the $X_i$'s (of finite variance) are … Let $X_1, X_2, \cdots$ be a strictly stationary second order sequence which is "associated"; i.e., is such that any two coordinatewise nondecreasing functions of the $X_i$'s (of finite variance) are nonnegatively correlated. If $\sum_j \operatorname{Cov}(X_1, X_j) < \infty$, then the partial sum processes, $W_n(t)$, defined in the usual way so that $W_n(m/n) = (X_1 + \cdots + X_m - mE(X_1))/\sqrt n$ for $m = 1, 2, \cdots$, converge in distribution on $C\lbrack 0, T\rbrack$ to a Wiener process. This result is based on two general theorems concerning associated random variables which are of independent interest.
Bivariate distributions with minimum and maximum correlations for given marginal distributions are characterized. Such extremal distributions were first introduced by Hoeffding (1940) and Frechet (1951). Several proofs are outlined including … Bivariate distributions with minimum and maximum correlations for given marginal distributions are characterized. Such extremal distributions were first introduced by Hoeffding (1940) and Frechet (1951). Several proofs are outlined including ones based on rearrangement theorems. The effect of convolution on correlation is also studied. Convolution makes arbitrary correlations less extreme while convolution of identical measures on $R^2$ makes extreme correlations more extreme. Extreme correlations have applications in data analysis and variance reduction in Monte Carlo studies, especially in the technique of antithetic variates.
Let $E$ be a finite partially ordered set and $M_p$ the set of probability measures in $E$ giving a positive correlation to each pair of increasing functions on $E$. Given … Let $E$ be a finite partially ordered set and $M_p$ the set of probability measures in $E$ giving a positive correlation to each pair of increasing functions on $E$. Given a Markov process with state space $E$ whose transition operator (on functions) maps increasing functions into increasing functions, let $U_t$ be the transition operator on measures. In order that $U_tM_p \subset M_p$ for each $t \geqq 0$, it is necessary and sufficient that every jump of the sample paths is up or down.
This paper introduces and illustrates a new version of the Monte Carlo method that has attractive properties for the numerical valuation of derivatives. The traditional Monte Carlo method has proven … This paper introduces and illustrates a new version of the Monte Carlo method that has attractive properties for the numerical valuation of derivatives. The traditional Monte Carlo method has proven to be a powerful and flexible tool for many types of derivatives calculations. Under the conventional approach pseudo-random numbers are used to evaluate the expression of interest. Unfortunately, the use of pseudo-random numbers yields an error bound that is probabilistic which can be a disadvantage. Another drawback of the standard approach is that many simulations may be required to obtain a high level of accuracy. There are several ways to improve the convergence of the standard method. This paper suggests a new approach which promises to be very useful for applications in finance. Quasi-Monte Carlo methods use sequences that are deterministic instead of random. These sequences improve convergence and give rise to deterministic error bounds. The method is explained and illustrated with several examples. These examples include complex derivatives such as basket options, Asian options, and energy swaps.