Author Description

Login to generate an author description

Ask a Question About This Mathematician

We provide an example of a Hamilton-Jacobi equation in which stochastic homogenization does not occur. The Hamiltonian involved in this example satisfies the standard assumptions of the literature, except that … We provide an example of a Hamilton-Jacobi equation in which stochastic homogenization does not occur. The Hamiltonian involved in this example satisfies the standard assumptions of the literature, except that it is not convex.
We prove a Tauberian theorem for nonexpansive operators and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the λ-discounted game … We prove a Tauberian theorem for nonexpansive operators and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the λ-discounted game converges uniformly when λ goes to zero if and only if the value of the n-stage game converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin [Lehrer E, Sorin S (1992) A uniform Tauberian theorem in dynamic programming. Math. Oper. Res. 17(2):303–307] to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets, and action sets, in which for some initial state known by both players, the value of the λ-discounted game and the value of the n-stage game starting at that initial state converge to distinct limits.
In the classic prophet inequality, a problem in optimal stopping theory, samples from independent random variables (possibly differently distributed) arrive online. A gambler that knows the distributions, but cannot see … In the classic prophet inequality, a problem in optimal stopping theory, samples from independent random variables (possibly differently distributed) arrive online. A gambler that knows the distributions, but cannot see the future, must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expected value the gambler gets and what a prophet, that sees all the realizations in advance, gets. In the late seventies, Krengel and Sucheston, and Garling [16], established that this worst case ratio is a constant and that 1/2 is the best possible such constant. In the last decade the theory of prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms, frequently used in online sales. A particularly interesting variant is the so-called Prophet Secretary problem, in which the only difference is that the samples arrive in a uniformly random order. For this variant several algorithms are known to achieve a constant of 1 − 1/e and very recently this barrier was slightly improved by Azar et al. [3].In this paper we derive a way of analyzing multi-threshold strategies that basically sets a nonincreasing sequence of thresholds to be applied at different times. The gambler will thus stop the first time a sample surpasses the corresponding threshold. Specifically we consider a class of very robust strategies that we call blind quantile strategies. These constitute a clever generalization of single threshold strategies and consist in fixing a function which is used to define a sequence of thresholds once the instance is revealed. Our main result shows that these strategies can achieve a constant of 0.669 in the Prophet Secretary problem, improving upon the best known result of Azar et al. [3], and even that of Beyhaghi et al. [4] that works in the case the gambler can select the order of the samples. The crux of the analysis is a very precise analysis of the underlying stopping time distribution for the gambler's strategy that is inspired by the theory of Schur convex functions. We further prove that our family of blind strategies cannot lead to a constant better than 0.675.Finally we prove that no nonadaptive algorithm for the gambler can achieve a constant better than 0.732, which also improves upon a recent result of Azar et al. [3]. Here, a nonadaptive algorithm is an algorithm whose decision to stop can depend on the index of the random variable being sampled, on the value sampled, and on the time, but not on the history that has been observed.
In the classic prophet inequality, a problem in optimal stopping theory, samples from independent random variables (possibly differently distributed) arrive online. A gambler that knows the distributions, but cannot see … In the classic prophet inequality, a problem in optimal stopping theory, samples from independent random variables (possibly differently distributed) arrive online. A gambler that knows the distributions, but cannot see the future, must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expected value the gambler gets and what a prophet, that sees all the realizations in advance, gets. In the late seventies, Krengel and Sucheston, and Garling [16], established that this worst case ratio is a constant and that 1/2 is the best possible such constant. In the last decade the theory of prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms, frequently used in online sales. A particularly interesting variant is the so-called Prophet Secretary problem, in which the only difference is that the samples arrive in a uniformly random order. For this variant several algorithms are known to achieve a constant of 1 – 1/e and very recently this barrier was slightly improved by Azar et al. [3].In this paper we derive a way of analyzing multithreshold strategies that basically sets a nonincreasing sequence of thresholds to be applied at different times. The gambler will thus stop the first time a sample surpasses the corresponding threshold. Specifically we consider a class of very robust strategies that we call blind quantile strategies. These constitute a clever generalization of single threshold strategies and consist in fixing a function which is used to define a sequence of thresholds once the instance is revealed. Our main result shows that these strategies can achieve a constant of 0.669 in the Prophet Secretary problem, improving upon the best known result of Azar et al. [3], and even that of Beyhaghi et al. [4] that works in the case the gambler can select the order of the samples. The crux of the analysis is a very precise analysis of the underlying stopping time distribution for the gambler's strategy that is inspired by the theory of Schur convex functions. We further prove that our family of blind strategies cannot lead to a constant better than 0.675.Finally we prove that no nonadaptive algorithm for the gambler can achieve a constant better than 0.732, which also improves upon a recent result of Azar et al. [3]. Here, a nonadaptive algorithm is an algorithm whose decision to stop can depend on the index of the random variable being sampled, on the value sampled, and on the time, but not on the history that has been observed.
We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game … We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game v_{lambda} converges uniformly when lambda goes to 0 if and only if the value of the n-stage game v_n converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin (1992) to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets and action sets, in which for some initial state k_1 known by both players, (v_{lambda}(k_1)) and (v_n(k_1)) converge to distinct limits.
We study the limit of equilibrium payoffs, as the discount factor goes to one, in non-zero-sum stochastic games. We first show that the set of stationary equilibrium payoffs always converges. … We study the limit of equilibrium payoffs, as the discount factor goes to one, in non-zero-sum stochastic games. We first show that the set of stationary equilibrium payoffs always converges. We then provide two-player examples in which the whole set of equilibrium payoffs diverges. The construction is robust to perturbations of the payoffs and to the introduction of normal-form correlation.
We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian. We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian.
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the … Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.
Dans un jeu stochastique à somme nulle, à chaque étape, deux joueurs adversaires prennent des décisions et reçoivent un paiement d’étape déterminé par ces décisions, ainsi que par une variable … Dans un jeu stochastique à somme nulle, à chaque étape, deux joueurs adversaires prennent des décisions et reçoivent un paiement d’étape déterminé par ces décisions, ainsi que par une variable aléatoire contrôlée qui représente l’état de la nature. Le paiement total est la somme escomptée et normalisée des paiements d’étape. Dans cet article, nous résolvons la conjecture du “paiement constant”, formulée par Sorin, Venel et Vigeral (Sankhya A 72 (1) (2010) 237–245) : si les deux joueurs jouent des stratégies optimales, alors pour tout α>0, l’espérance du paiement escompté entre les étapes 1 et α/λ tend vers la limite de la valeur escomptée du jeu, lorsque le facteur d’escompte λ tend vers 0.
In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a random variable representing the state of … In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a random variable representing the state of nature. The total payoff is the discounted sum of the stage payoffs. Assume that the players are very patient and use optimal strategies. We then prove that, at any point in the game, players get essentially the same expected payoff: the payoff is constant. This solves a conjecture by Sorin, Venel and Vigeral (2010). The proof relies on the semi-algebraic approach for discounted stochastic games introduced by Bewley and Kohlberg (1976), on the theory of Markov chains with rare transitions, initiated by Friedlin and Wentzell (1984), and on some variational inequalities for value functions inspired by the recent work of Davini, Fathi, Iturriaga and Zavidovique (2016)
We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and … We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete.
Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space … Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space and finite action sets. In their work, the total payoff in a stochastic game is defined either as a Cesaro mean or an Abel mean of the stage payoffs. This paper presents two findings: first, we generalize the result of Bewley and Kohlberg to a more general class of payoff evaluations and we prove with a counterexample that this result is tight. We also investigate the particular case of absorbing games. Second, for the uniform approach of Mertens and Neyman, we provide another counterexample to demonstrate that there is no natural way to generalize the result of Mertens and Neyman to a wider class of payoff evaluations.
A prophet inequality states, for some α ∈ [0, 1], that the expected value achievable by a gambler who sequentially observes random variables X1, . . . , Xn and … A prophet inequality states, for some α ∈ [0, 1], that the expected value achievable by a gambler who sequentially observes random variables X1, . . . , Xn and selects one of them is at least an α fraction of the maximum value in the sequence. We obtain three distinct improvements for a setting that was first studied by Correa et al. (EC, 2019) and is particularly relevant to modern applications in algorithmic pricing. In this setting, the random variables are i.i.d. from an unknown distribution and the gambler has access to an additional βn samples for some β ≥ 0. We first give improved lower bounds on α for a wide range of values of β; specifically, α ≥ (1 + β)/e when β ≤ 1/(e − 1), which is tight, and α ≥ 0.648 when β = 1, which improves on a bound of around 0.635 due to Correa et al. (SODA, 2020). Adding to their practical appeal, specifically in the context of algorithmic pricing, we then show that the new bounds can be obtained even in a streaming model of computation and thus in situations where the use of relevant data is complicated by the sheer amount of data available. We finally establish that the upper bound of 1/e for the case without samples is robust to additional information about the distribution, and applies also to sequences of i.i.d. random variables whose distribution is itself drawn, according to a known distribution, from a finite set of known candidate distributions. This implies a tight prophet inequality for exchangeable sequences of random variables, answering a question of Hill and Kertz (Contemporary Mathematics, 1992), but leaves open the possibility of better guarantees when the number of candidate distributions is small, a setting we believe is of strong interest to applications.
In a prophet inequality problem, n independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent … In a prophet inequality problem, n independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent value as reward. We evaluate a stopping rule by the worst-case ratio between its expected reward and the expectation of the maximum variable. In the classic setting, the order is fixed, and the optimal ratio is known to be 1/2. Three variants of this problem have been extensively studied: the prophet-secretary model, where variables arrive in uniformly random order; the free-order model, where the gambler chooses the arrival order; and the i.i.d. model, where the distributions are all the same, rendering the arrival order irrelevant. Most of the literature assumes that distributions are known to the gambler. Recent work has considered the question of what is achievable when the gambler has access only to a few samples per distribution. Surprisingly, in the fixed-order case, a single sample from each distribution is enough to approximate the optimal ratio, but this is not the case in any of the three variants. We provide a unified proof that for all three variants of the problem, a constant number of samples (independent of n) for each distribution is good enough to approximate the optimal ratios. Prior to our work, this was known to be the case only in the i.i.d. variant. Previous works relied on explicitly constructing sample-based algorithms that match the best possible ratio. Remarkably, the optimal ratios for the prophet-secretary and the free-order variants with full information are still unknown. Consequently, our result requires a significantly different approach than for the classic problem and the i.i.d. variant, where the optimal ratios and the algorithms that achieve them are known. We complement our result showing that our algorithms can be implemented in polynomial time. A key ingredient in our proof is an existential result based on a minimax argument, which states that there must exist an algorithm that attains the optimal ratio and does not rely on the knowledge of the upper tail of the distributions. A second key ingredient is a refined sample-based version of a decomposition of the instance into "small" and "large" variables, first introduced by Liu et al. [EC'21]. The universality of our approach opens avenues for generalization to other sample-based models. Furthermore, we uncover structural properties that might help pinpoint the optimal ratios in the full-information cases.
Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space … Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space and finite action sets. In their work, the total payoff in a stochastic game is defined either as a Cesaro mean or an Abel mean of the stage payoffs. This paper presents two findings: first, we generalize the result of Bewley and Kohlberg to a more general class of payoff evaluations and we prove with a counterexample that this result is tight. We also investigate the particular case of absorbing games. Second, for the uniform approach of Mertens and Neyman, we provide another counterexample to demonstrate that there is no natural way to generalize the result of Mertens and Neyman to a wider class of payoff evaluations.
We consider POMDPs in which weight of stage payoff depends on past sequence of signals and actions occurring in infinitely repeated problem. We prove that for all epsilon>0, there exists … We consider POMDPs in which weight of stage payoff depends on past sequence of signals and actions occurring in infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as the decision-maker is patient enough. This unifies and generalizes several results of literature, and applies notably to POMDPs with limsup payoffs.
A prophet inequality states, for some $α\in[0,1]$, that the expected value achievable by a gambler who sequentially observes random variables $X_1,\dots,X_n$ and selects one of them is at least an … A prophet inequality states, for some $α\in[0,1]$, that the expected value achievable by a gambler who sequentially observes random variables $X_1,\dots,X_n$ and selects one of them is at least an $α$ fraction of the maximum value in the sequence. We obtain three distinct improvements for a setting that was first studied by Correa et al. (EC, 2019) and is particularly relevant to modern applications in algorithmic pricing. In this setting, the random variables are i.i.d. from an unknown distribution and the gambler has access to an additional $βn$ samples for some $β\geq 0$. We first give improved lower bounds on $α$ for a wide range of values of $β$; specifically, $α\geq(1+β)/e$ when $β\leq 1/(e-1)$, which is tight, and $α\geq 0.648$ when $β=1$, which improves on a bound of around $0.635$ due to Correa et al. (SODA, 2020). Adding to their practical appeal, specifically in the context of algorithmic pricing, we then show that the new bounds can be obtained even in a streaming model of computation and thus in situations where the use of relevant data is complicated by the sheer amount of data available. We finally establish that the upper bound of $1/e$ for the case without samples is robust to additional information about the distribution, and applies also to sequences of i.i.d. random variables whose distribution is itself drawn, according to a known distribution, from a finite set of known candidate distributions. This implies a tight prophet inequality for exchangeable sequences of random variables, answering a question of Hill and Kertz (Contemporary Mathematics, 1992), but leaves open the possibility of better guarantees when the number of candidate distributions is small, a setting we believe is of strong interest to applications.
Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 20 April 2020Accepted: 03 February 2021Published online: 29 April 2021KeywordsMarkov decision process, partial … Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 20 April 2020Accepted: 03 February 2021Published online: 29 April 2021KeywordsMarkov decision process, partial observation, long-run average payoffAMS Subject Headings90C39, 90C40, 37A50, 60J20Publication DataISSN (print): 0363-0129ISSN (online): 1095-7138Publisher: Society for Industrial and Applied MathematicsCODEN: sjcodc
This paper introduces a discrete-time stochastic game class on [Formula: see text], which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton–Jacobi equations. … This paper introduces a discrete-time stochastic game class on [Formula: see text], which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton–Jacobi equations. Conditions are provided under which the n-stage game value converges as n tends to infinity, and connections with homogenization theory are discussed. Funding: The second author acknowledges the support of the French Agence Nationale de la Recherche (ANR) [Grant ANR-21-CE40-0020] (CONVERGENCE project).
This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage … This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage payoffs. The first application is to provide conditions under which the existence of the asymptotic value implies the convergence of the values of the weighted game, as players get more and more patient. The second application concerns stochastic games with finite state space and action sets. This paper builds a simple class of asymptotically optimal strategies in the weighted game, that at each stage play optimally in a discounted game with a discount factor corresponding to the relative weight of the current stage.
This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional … This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional continuous and coercive Hamiltonian.
In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise … In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.
This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are … This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are provided under which the $n$-stage game value converges as $n$ tends to infinity, and connections with homogenization theory is discussed.
We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian. We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian.
In the classic prophet inequality, samples from independent random variables arrive online. A gambler that knows the distributions must decide at each point in time whether to stop and pick … In the classic prophet inequality, samples from independent random variables arrive online. A gambler that knows the distributions must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expected value the gambler gets and what a prophet, that sees all the realizations in advance, gets. In the late seventies, Krengel and Sucheston, and Gairing (1977) established that this worst case ratio is a universal constant equal to 1/2. In the last decade prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms, frequently used in online sales. A very interesting variant is the Prophet Secretary problem, in which the only difference is that the samples arrive in a uniformly random order. For this variant several algorithms achieve a constant of 1-1/e and very recently this barrier was slightly improved. This paper analyzes strategies that set a nonincreasing sequence of thresholds to be applied at different times. The gambler stops the first time a sample surpasses the corresponding threshold. Specifically we consider a class of strategies called blind quantile strategies. They consist in fixing a function which is used to define a sequence of thresholds once the instance is revealed. Our main result shows that they can achieve a constant of 0.665, improving upon the best known result of Azar et al. (2018), and on Beyhaghi et al. (2018) (order selection). Our proof analyzes precisely the underlying stopping time distribution, relying on Schur-convexity theory. We further prove that blind strategies cannot achieve better than 0.675. Finally we prove that no algorithm for the gambler can achieve better than 0.732.
This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage … This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage payoffs. The first application is to provide conditions under which the existence of the asymptotic value implies the convergence of the values of the weighted game, as players get more and more patient. The second application concerns stochastic games with finite state space and action sets. This paper builds a simple class of asymptotically optimal strategies in the weighted game, that at each stage play optimally in a discounted game with a discount factor corresponding to the relative weight of the current stage.
This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional … This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional continuous and coercive Hamiltonian.
We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game … We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game v_{lambda} converges uniformly when lambda goes to 0 if and only if the value of the n-stage game v_n converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin (1992) to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets and action sets, in which for some initial state k_1 known by both players, (v_{lambda}(k_1)) and (v_n(k_1)) converge to distinct limits.
Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space … Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space and finite action sets. In their work, the total payoff in a stochastic game is defined either as a Cesaro mean or an Abel mean of the stage payoffs. This paper presents two findings: first, we generalize the result of Bewley and Kohlberg to a more general class of payoff evaluations and we prove with a counterexample that this result is tight. We also investigate the particular case of absorbing games. Second, for the uniform approach of Mertens and Neyman, we provide another counterexample to demonstrate that there is no natural way to generalize the result of Mertens and Neyman to a wider class of payoff evaluations.
This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are … This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are provided under which the $n$-stage game value converges as $n$ tends to infinity, and connections with homogenization theory is discussed.
Blackwell's approachability (Blackwell, 1954, 1956) is a very general online learning framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a … Blackwell's approachability (Blackwell, 1954, 1956) is a very general online learning framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a given ``target'' set. Blackwell gave a sufficient condition for the decision maker having a strategy guaranteeing such a convergence against an adversarial environment, as well as what we now call the Blackwell's algorithm, which then ensures convergence. Blackwell's approachability has since been applied to numerous problems, in regret minimization and game theory, in particular. We extend this framework by allowing the outcome function and the inner product to be time-dependent. We establish a general guarantee for the natural extension to this framework of Blackwell's algorithm. In the case where the target set is an orthant, we present a family of time-dependent inner products which yields different convergence speeds for each coordinate of the average outcome. We apply this framework to absorbing games (an important class of stochastic games) for which we construct $\varepsilon$-uniformly optimal strategies using Blackwell's algorithm in a well-chosen auxiliary approachability problem, thereby giving a novel illustration of the relevance of online learning tools for solving games.
In a prophet inequality problem, $n$ independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent … In a prophet inequality problem, $n$ independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent value as reward. We evaluate a stopping rule by the worst-case ratio between its expected reward and the expectation of the maximum variable. In the classic setting, the order is fixed, and the optimal ratio is known to be 1/2. Three variants of this problem have been extensively studied: the prophet-secretary model, where variables arrive in uniformly random order; the free-order model, where the gambler chooses the arrival order; and the i.i.d. model, where the distributions are all the same, rendering the arrival order irrelevant. Most of the literature assumes that distributions are known to the gambler. Recent work has considered the question of what is achievable when the gambler has access only to a few samples per distribution. Surprisingly, in the fixed-order case, a single sample from each distribution is enough to approximate the optimal ratio, but this is not the case in any of the three variants. We provide a unified proof that for all three variants of the problem, a constant number of samples (independent of n) for each distribution is good enough to approximate the optimal ratios. Prior to our work, this was known to be the case only in the i.i.d. variant. We complement our result showing that our algorithms can be implemented in polynomial time. A key ingredient in our proof is an existential result based on a minimax argument, which states that there must exist an algorithm that attains the optimal ratio and does not rely on the knowledge of the upper tail of the distributions. A second key ingredient is a refined sample-based version of a decomposition of the instance into "small" and "large" variables, first introduced by Liu et al. [EC'21].
This paper considers a class of two-player zero-sum games on directed graphs whose vertices are equipped with random payoffs of bounded support known by both players. Starting from a fixed … This paper considers a class of two-player zero-sum games on directed graphs whose vertices are equipped with random payoffs of bounded support known by both players. Starting from a fixed vertex, players take turns to move a token along the edges of the graph. On the one hand, for acyclic directed graphs of bounded degree and sub-exponential expansion, we show that the value of the game converges almost surely to a constant at an exponential rate dominated in terms of the expansion. On the other hand, for the infinite $d$-ary tree that does not fall into the previous class of graphs, we show convergence at a double-exponential rate in terms of the expansion.
We consider a mean-field game model where the cost functions depend on a fixed parameter, called \textit{state}, which is unknown to players. Players learn about the state from a a … We consider a mean-field game model where the cost functions depend on a fixed parameter, called \textit{state}, which is unknown to players. Players learn about the state from a a stream of private signals they receive throughout the game. We derive a mean field system satisfied by the equilibrium payoff of the game and prove existence of a solution under standard regularity assumptions. Additionally, we establish the uniqueness of the solution when the cost function satisfies the monotonicity assumption of Lasry and Lions at each state.
Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the … Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the existence of algorithms. In general, the computation of the exact and approximated values is undecidable for UMDPs with the long-run average objective. Building on matrix product theory and ergodic properties, we introduce a novel subclass of UMDPs, termed ergodic UMDPs. Our main result demonstrates that approximating the value within this subclass is decidable. However, we show that the exact problem remains undecidable. Finally, we discuss the primary challenges of extending these results to partially observable Markov decision processes.
We prove stochastic homogenization for a class of non-convex and non-coercive first-order Hamilton-Jacobi equations in a finite-range of dependence environment for Hamiltonians that can be expressed by a max-min formula. … We prove stochastic homogenization for a class of non-convex and non-coercive first-order Hamilton-Jacobi equations in a finite-range of dependence environment for Hamiltonians that can be expressed by a max-min formula. We make use of the representation of the solution as a value function of a differential game to implement a game-theoretic approach to the homogenization problem.
We characterize the critical parameter of oriented percolation on $\mathbb{Z}^2$ through the value of a zero-sum game. Specifically, we define a zero-sum game on a percolation configuration of $\mathbb{Z}^2$, where … We characterize the critical parameter of oriented percolation on $\mathbb{Z}^2$ through the value of a zero-sum game. Specifically, we define a zero-sum game on a percolation configuration of $\mathbb{Z}^2$, where two players move a token along the non-oriented edges of $\mathbb{Z}^2$, collecting a cost of 1 for each edge that is open, and 0 otherwise. The total cost is given by the limit superior of the average cost. We demonstrate that the value of this game is deterministic and equals 1 if and only if the percolation parameter exceeds $p_c$, the critical exponent of oriented percolation. Additionally, we establish that the value of the game is continuous at $p_c$. Finally, we show that for $p$ close to 0, the value of the game is equal to 0.
Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the … Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the existence of algorithms. In general, the computation of the exact and approximated values is undecidable for UMDPs with the long-run average objective. Building on matrix product theory and ergodic properties, we introduce a novel subclass of UMDPs, termed ergodic UMDPs. Our main result demonstrates that approximating the value within this subclass is decidable. However, we show that the exact problem remains undecidable. Finally, we discuss the primary challenges of extending these results to partially observable Markov decision processes.
We characterize the critical parameter of oriented percolation on $\mathbb{Z}^2$ through the value of a zero-sum game. Specifically, we define a zero-sum game on a percolation configuration of $\mathbb{Z}^2$, where … We characterize the critical parameter of oriented percolation on $\mathbb{Z}^2$ through the value of a zero-sum game. Specifically, we define a zero-sum game on a percolation configuration of $\mathbb{Z}^2$, where two players move a token along the non-oriented edges of $\mathbb{Z}^2$, collecting a cost of 1 for each edge that is open, and 0 otherwise. The total cost is given by the limit superior of the average cost. We demonstrate that the value of this game is deterministic and equals 1 if and only if the percolation parameter exceeds $p_c$, the critical exponent of oriented percolation. Additionally, we establish that the value of the game is continuous at $p_c$. Finally, we show that for $p$ close to 0, the value of the game is equal to 0.
In a prophet inequality problem, n independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent … In a prophet inequality problem, n independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent value as reward. We evaluate a stopping rule by the worst-case ratio between its expected reward and the expectation of the maximum variable. In the classic setting, the order is fixed, and the optimal ratio is known to be 1/2. Three variants of this problem have been extensively studied: the prophet-secretary model, where variables arrive in uniformly random order; the free-order model, where the gambler chooses the arrival order; and the i.i.d. model, where the distributions are all the same, rendering the arrival order irrelevant. Most of the literature assumes that distributions are known to the gambler. Recent work has considered the question of what is achievable when the gambler has access only to a few samples per distribution. Surprisingly, in the fixed-order case, a single sample from each distribution is enough to approximate the optimal ratio, but this is not the case in any of the three variants. We provide a unified proof that for all three variants of the problem, a constant number of samples (independent of n) for each distribution is good enough to approximate the optimal ratios. Prior to our work, this was known to be the case only in the i.i.d. variant. Previous works relied on explicitly constructing sample-based algorithms that match the best possible ratio. Remarkably, the optimal ratios for the prophet-secretary and the free-order variants with full information are still unknown. Consequently, our result requires a significantly different approach than for the classic problem and the i.i.d. variant, where the optimal ratios and the algorithms that achieve them are known. We complement our result showing that our algorithms can be implemented in polynomial time. A key ingredient in our proof is an existential result based on a minimax argument, which states that there must exist an algorithm that attains the optimal ratio and does not rely on the knowledge of the upper tail of the distributions. A second key ingredient is a refined sample-based version of a decomposition of the instance into "small" and "large" variables, first introduced by Liu et al. [EC'21]. The universality of our approach opens avenues for generalization to other sample-based models. Furthermore, we uncover structural properties that might help pinpoint the optimal ratios in the full-information cases.
We prove stochastic homogenization for a class of non-convex and non-coercive first-order Hamilton-Jacobi equations in a finite-range of dependence environment for Hamiltonians that can be expressed by a max-min formula. … We prove stochastic homogenization for a class of non-convex and non-coercive first-order Hamilton-Jacobi equations in a finite-range of dependence environment for Hamiltonians that can be expressed by a max-min formula. We make use of the representation of the solution as a value function of a differential game to implement a game-theoretic approach to the homogenization problem.
This paper considers a class of two-player zero-sum games on directed graphs whose vertices are equipped with random payoffs of bounded support known by both players. Starting from a fixed … This paper considers a class of two-player zero-sum games on directed graphs whose vertices are equipped with random payoffs of bounded support known by both players. Starting from a fixed vertex, players take turns to move a token along the edges of the graph. On the one hand, for acyclic directed graphs of bounded degree and sub-exponential expansion, we show that the value of the game converges almost surely to a constant at an exponential rate dominated in terms of the expansion. On the other hand, for the infinite $d$-ary tree that does not fall into the previous class of graphs, we show convergence at a double-exponential rate in terms of the expansion.
We consider a mean-field game model where the cost functions depend on a fixed parameter, called \textit{state}, which is unknown to players. Players learn about the state from a a … We consider a mean-field game model where the cost functions depend on a fixed parameter, called \textit{state}, which is unknown to players. Players learn about the state from a a stream of private signals they receive throughout the game. We derive a mean field system satisfied by the equilibrium payoff of the game and prove existence of a solution under standard regularity assumptions. Additionally, we establish the uniqueness of the solution when the cost function satisfies the monotonicity assumption of Lasry and Lions at each state.
Blackwell's approachability (Blackwell, 1954, 1956) is a very general online learning framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a … Blackwell's approachability (Blackwell, 1954, 1956) is a very general online learning framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a given ``target'' set. Blackwell gave a sufficient condition for the decision maker having a strategy guaranteeing such a convergence against an adversarial environment, as well as what we now call the Blackwell's algorithm, which then ensures convergence. Blackwell's approachability has since been applied to numerous problems, in regret minimization and game theory, in particular. We extend this framework by allowing the outcome function and the inner product to be time-dependent. We establish a general guarantee for the natural extension to this framework of Blackwell's algorithm. In the case where the target set is an orthant, we present a family of time-dependent inner products which yields different convergence speeds for each coordinate of the average outcome. We apply this framework to absorbing games (an important class of stochastic games) for which we construct $\varepsilon$-uniformly optimal strategies using Blackwell's algorithm in a well-chosen auxiliary approachability problem, thereby giving a novel illustration of the relevance of online learning tools for solving games.
In a prophet inequality problem, $n$ independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent … In a prophet inequality problem, $n$ independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent value as reward. We evaluate a stopping rule by the worst-case ratio between its expected reward and the expectation of the maximum variable. In the classic setting, the order is fixed, and the optimal ratio is known to be 1/2. Three variants of this problem have been extensively studied: the prophet-secretary model, where variables arrive in uniformly random order; the free-order model, where the gambler chooses the arrival order; and the i.i.d. model, where the distributions are all the same, rendering the arrival order irrelevant. Most of the literature assumes that distributions are known to the gambler. Recent work has considered the question of what is achievable when the gambler has access only to a few samples per distribution. Surprisingly, in the fixed-order case, a single sample from each distribution is enough to approximate the optimal ratio, but this is not the case in any of the three variants. We provide a unified proof that for all three variants of the problem, a constant number of samples (independent of n) for each distribution is good enough to approximate the optimal ratios. Prior to our work, this was known to be the case only in the i.i.d. variant. We complement our result showing that our algorithms can be implemented in polynomial time. A key ingredient in our proof is an existential result based on a minimax argument, which states that there must exist an algorithm that attains the optimal ratio and does not rely on the knowledge of the upper tail of the distributions. A second key ingredient is a refined sample-based version of a decomposition of the instance into "small" and "large" variables, first introduced by Liu et al. [EC'21].
This paper introduces a discrete-time stochastic game class on [Formula: see text], which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton–Jacobi equations. … This paper introduces a discrete-time stochastic game class on [Formula: see text], which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton–Jacobi equations. Conditions are provided under which the n-stage game value converges as n tends to infinity, and connections with homogenization theory are discussed. Funding: The second author acknowledges the support of the French Agence Nationale de la Recherche (ANR) [Grant ANR-21-CE40-0020] (CONVERGENCE project).
This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are … This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are provided under which the $n$-stage game value converges as $n$ tends to infinity, and connections with homogenization theory is discussed.
Dans un jeu stochastique à somme nulle, à chaque étape, deux joueurs adversaires prennent des décisions et reçoivent un paiement d’étape déterminé par ces décisions, ainsi que par une variable … Dans un jeu stochastique à somme nulle, à chaque étape, deux joueurs adversaires prennent des décisions et reçoivent un paiement d’étape déterminé par ces décisions, ainsi que par une variable aléatoire contrôlée qui représente l’état de la nature. Le paiement total est la somme escomptée et normalisée des paiements d’étape. Dans cet article, nous résolvons la conjecture du “paiement constant”, formulée par Sorin, Venel et Vigeral (Sankhya A 72 (1) (2010) 237–245) : si les deux joueurs jouent des stratégies optimales, alors pour tout α>0, l’espérance du paiement escompté entre les étapes 1 et α/λ tend vers la limite de la valeur escomptée du jeu, lorsque le facteur d’escompte λ tend vers 0.
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the … Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.
Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 20 April 2020Accepted: 03 February 2021Published online: 29 April 2021KeywordsMarkov decision process, partial … Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 20 April 2020Accepted: 03 February 2021Published online: 29 April 2021KeywordsMarkov decision process, partial observation, long-run average payoffAMS Subject Headings90C39, 90C40, 37A50, 60J20Publication DataISSN (print): 0363-0129ISSN (online): 1095-7138Publisher: Society for Industrial and Applied MathematicsCODEN: sjcodc
This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are … This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are provided under which the $n$-stage game value converges as $n$ tends to infinity, and connections with homogenization theory is discussed.
A prophet inequality states, for some α ∈ [0, 1], that the expected value achievable by a gambler who sequentially observes random variables X1, . . . , Xn and … A prophet inequality states, for some α ∈ [0, 1], that the expected value achievable by a gambler who sequentially observes random variables X1, . . . , Xn and selects one of them is at least an α fraction of the maximum value in the sequence. We obtain three distinct improvements for a setting that was first studied by Correa et al. (EC, 2019) and is particularly relevant to modern applications in algorithmic pricing. In this setting, the random variables are i.i.d. from an unknown distribution and the gambler has access to an additional βn samples for some β ≥ 0. We first give improved lower bounds on α for a wide range of values of β; specifically, α ≥ (1 + β)/e when β ≤ 1/(e − 1), which is tight, and α ≥ 0.648 when β = 1, which improves on a bound of around 0.635 due to Correa et al. (SODA, 2020). Adding to their practical appeal, specifically in the context of algorithmic pricing, we then show that the new bounds can be obtained even in a streaming model of computation and thus in situations where the use of relevant data is complicated by the sheer amount of data available. We finally establish that the upper bound of 1/e for the case without samples is robust to additional information about the distribution, and applies also to sequences of i.i.d. random variables whose distribution is itself drawn, according to a known distribution, from a finite set of known candidate distributions. This implies a tight prophet inequality for exchangeable sequences of random variables, answering a question of Hill and Kertz (Contemporary Mathematics, 1992), but leaves open the possibility of better guarantees when the number of candidate distributions is small, a setting we believe is of strong interest to applications.
We consider POMDPs in which weight of stage payoff depends on past sequence of signals and actions occurring in infinitely repeated problem. We prove that for all epsilon>0, there exists … We consider POMDPs in which weight of stage payoff depends on past sequence of signals and actions occurring in infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as the decision-maker is patient enough. This unifies and generalizes several results of literature, and applies notably to POMDPs with limsup payoffs.
We study the limit of equilibrium payoffs, as the discount factor goes to one, in non-zero-sum stochastic games. We first show that the set of stationary equilibrium payoffs always converges. … We study the limit of equilibrium payoffs, as the discount factor goes to one, in non-zero-sum stochastic games. We first show that the set of stationary equilibrium payoffs always converges. We then provide two-player examples in which the whole set of equilibrium payoffs diverges. The construction is robust to perturbations of the payoffs and to the introduction of normal-form correlation.
A prophet inequality states, for some $α\in[0,1]$, that the expected value achievable by a gambler who sequentially observes random variables $X_1,\dots,X_n$ and selects one of them is at least an … A prophet inequality states, for some $α\in[0,1]$, that the expected value achievable by a gambler who sequentially observes random variables $X_1,\dots,X_n$ and selects one of them is at least an $α$ fraction of the maximum value in the sequence. We obtain three distinct improvements for a setting that was first studied by Correa et al. (EC, 2019) and is particularly relevant to modern applications in algorithmic pricing. In this setting, the random variables are i.i.d. from an unknown distribution and the gambler has access to an additional $βn$ samples for some $β\geq 0$. We first give improved lower bounds on $α$ for a wide range of values of $β$; specifically, $α\geq(1+β)/e$ when $β\leq 1/(e-1)$, which is tight, and $α\geq 0.648$ when $β=1$, which improves on a bound of around $0.635$ due to Correa et al. (SODA, 2020). Adding to their practical appeal, specifically in the context of algorithmic pricing, we then show that the new bounds can be obtained even in a streaming model of computation and thus in situations where the use of relevant data is complicated by the sheer amount of data available. We finally establish that the upper bound of $1/e$ for the case without samples is robust to additional information about the distribution, and applies also to sequences of i.i.d. random variables whose distribution is itself drawn, according to a known distribution, from a finite set of known candidate distributions. This implies a tight prophet inequality for exchangeable sequences of random variables, answering a question of Hill and Kertz (Contemporary Mathematics, 1992), but leaves open the possibility of better guarantees when the number of candidate distributions is small, a setting we believe is of strong interest to applications.
We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian. We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian.
We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and … We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete.
In the classic prophet inequality, a problem in optimal stopping theory, samples from independent random variables (possibly differently distributed) arrive online. A gambler that knows the distributions, but cannot see … In the classic prophet inequality, a problem in optimal stopping theory, samples from independent random variables (possibly differently distributed) arrive online. A gambler that knows the distributions, but cannot see the future, must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expected value the gambler gets and what a prophet, that sees all the realizations in advance, gets. In the late seventies, Krengel and Sucheston, and Garling [16], established that this worst case ratio is a constant and that 1/2 is the best possible such constant. In the last decade the theory of prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms, frequently used in online sales. A particularly interesting variant is the so-called Prophet Secretary problem, in which the only difference is that the samples arrive in a uniformly random order. For this variant several algorithms are known to achieve a constant of 1 − 1/e and very recently this barrier was slightly improved by Azar et al. [3].In this paper we derive a way of analyzing multi-threshold strategies that basically sets a nonincreasing sequence of thresholds to be applied at different times. The gambler will thus stop the first time a sample surpasses the corresponding threshold. Specifically we consider a class of very robust strategies that we call blind quantile strategies. These constitute a clever generalization of single threshold strategies and consist in fixing a function which is used to define a sequence of thresholds once the instance is revealed. Our main result shows that these strategies can achieve a constant of 0.669 in the Prophet Secretary problem, improving upon the best known result of Azar et al. [3], and even that of Beyhaghi et al. [4] that works in the case the gambler can select the order of the samples. The crux of the analysis is a very precise analysis of the underlying stopping time distribution for the gambler's strategy that is inspired by the theory of Schur convex functions. We further prove that our family of blind strategies cannot lead to a constant better than 0.675.Finally we prove that no nonadaptive algorithm for the gambler can achieve a constant better than 0.732, which also improves upon a recent result of Azar et al. [3]. Here, a nonadaptive algorithm is an algorithm whose decision to stop can depend on the index of the random variable being sampled, on the value sampled, and on the time, but not on the history that has been observed.
In the classic prophet inequality, a problem in optimal stopping theory, samples from independent random variables (possibly differently distributed) arrive online. A gambler that knows the distributions, but cannot see … In the classic prophet inequality, a problem in optimal stopping theory, samples from independent random variables (possibly differently distributed) arrive online. A gambler that knows the distributions, but cannot see the future, must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expected value the gambler gets and what a prophet, that sees all the realizations in advance, gets. In the late seventies, Krengel and Sucheston, and Garling [16], established that this worst case ratio is a constant and that 1/2 is the best possible such constant. In the last decade the theory of prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms, frequently used in online sales. A particularly interesting variant is the so-called Prophet Secretary problem, in which the only difference is that the samples arrive in a uniformly random order. For this variant several algorithms are known to achieve a constant of 1 – 1/e and very recently this barrier was slightly improved by Azar et al. [3].In this paper we derive a way of analyzing multithreshold strategies that basically sets a nonincreasing sequence of thresholds to be applied at different times. The gambler will thus stop the first time a sample surpasses the corresponding threshold. Specifically we consider a class of very robust strategies that we call blind quantile strategies. These constitute a clever generalization of single threshold strategies and consist in fixing a function which is used to define a sequence of thresholds once the instance is revealed. Our main result shows that these strategies can achieve a constant of 0.669 in the Prophet Secretary problem, improving upon the best known result of Azar et al. [3], and even that of Beyhaghi et al. [4] that works in the case the gambler can select the order of the samples. The crux of the analysis is a very precise analysis of the underlying stopping time distribution for the gambler's strategy that is inspired by the theory of Schur convex functions. We further prove that our family of blind strategies cannot lead to a constant better than 0.675.Finally we prove that no nonadaptive algorithm for the gambler can achieve a constant better than 0.732, which also improves upon a recent result of Azar et al. [3]. Here, a nonadaptive algorithm is an algorithm whose decision to stop can depend on the index of the random variable being sampled, on the value sampled, and on the time, but not on the history that has been observed.
We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian. We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian.
In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a random variable representing the state of … In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a random variable representing the state of nature. The total payoff is the discounted sum of the stage payoffs. Assume that the players are very patient and use optimal strategies. We then prove that, at any point in the game, players get essentially the same expected payoff: the payoff is constant. This solves a conjecture by Sorin, Venel and Vigeral (2010). The proof relies on the semi-algebraic approach for discounted stochastic games introduced by Bewley and Kohlberg (1976), on the theory of Markov chains with rare transitions, initiated by Friedlin and Wentzell (1984), and on some variational inequalities for value functions inspired by the recent work of Davini, Fathi, Iturriaga and Zavidovique (2016)
This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional … This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional continuous and coercive Hamiltonian.
In the classic prophet inequality, samples from independent random variables arrive online. A gambler that knows the distributions must decide at each point in time whether to stop and pick … In the classic prophet inequality, samples from independent random variables arrive online. A gambler that knows the distributions must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expected value the gambler gets and what a prophet, that sees all the realizations in advance, gets. In the late seventies, Krengel and Sucheston, and Gairing (1977) established that this worst case ratio is a universal constant equal to 1/2. In the last decade prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms, frequently used in online sales. A very interesting variant is the Prophet Secretary problem, in which the only difference is that the samples arrive in a uniformly random order. For this variant several algorithms achieve a constant of 1-1/e and very recently this barrier was slightly improved. This paper analyzes strategies that set a nonincreasing sequence of thresholds to be applied at different times. The gambler stops the first time a sample surpasses the corresponding threshold. Specifically we consider a class of strategies called blind quantile strategies. They consist in fixing a function which is used to define a sequence of thresholds once the instance is revealed. Our main result shows that they can achieve a constant of 0.665, improving upon the best known result of Azar et al. (2018), and on Beyhaghi et al. (2018) (order selection). Our proof analyzes precisely the underlying stopping time distribution, relying on Schur-convexity theory. We further prove that blind strategies cannot achieve better than 0.675. Finally we prove that no algorithm for the gambler can achieve better than 0.732.
This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional … This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional continuous and coercive Hamiltonian.
We provide an example of a Hamilton-Jacobi equation in which stochastic homogenization does not occur. The Hamiltonian involved in this example satisfies the standard assumptions of the literature, except that … We provide an example of a Hamilton-Jacobi equation in which stochastic homogenization does not occur. The Hamiltonian involved in this example satisfies the standard assumptions of the literature, except that it is not convex.
We prove a Tauberian theorem for nonexpansive operators and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the λ-discounted game … We prove a Tauberian theorem for nonexpansive operators and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the λ-discounted game converges uniformly when λ goes to zero if and only if the value of the n-stage game converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin [Lehrer E, Sorin S (1992) A uniform Tauberian theorem in dynamic programming. Math. Oper. Res. 17(2):303–307] to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets, and action sets, in which for some initial state known by both players, the value of the λ-discounted game and the value of the n-stage game starting at that initial state converge to distinct limits.
This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage … This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage payoffs. The first application is to provide conditions under which the existence of the asymptotic value implies the convergence of the values of the weighted game, as players get more and more patient. The second application concerns stochastic games with finite state space and action sets. This paper builds a simple class of asymptotically optimal strategies in the weighted game, that at each stage play optimally in a discounted game with a discount factor corresponding to the relative weight of the current stage.
This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage … This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage payoffs. The first application is to provide conditions under which the existence of the asymptotic value implies the convergence of the values of the weighted game, as players get more and more patient. The second application concerns stochastic games with finite state space and action sets. This paper builds a simple class of asymptotically optimal strategies in the weighted game, that at each stage play optimally in a discounted game with a discount factor corresponding to the relative weight of the current stage.
We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game … We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game v_{lambda} converges uniformly when lambda goes to 0 if and only if the value of the n-stage game v_n converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin (1992) to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets and action sets, in which for some initial state k_1 known by both players, (v_{lambda}(k_1)) and (v_n(k_1)) converge to distinct limits.
In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise … In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.
We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game … We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game v_{lambda} converges uniformly when lambda goes to 0 if and only if the value of the n-stage game v_n converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin (1992) to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets and action sets, in which for some initial state k_1 known by both players, (v_{lambda}(k_1)) and (v_n(k_1)) converge to distinct limits.
Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space … Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space and finite action sets. In their work, the total payoff in a stochastic game is defined either as a Cesaro mean or an Abel mean of the stage payoffs. This paper presents two findings: first, we generalize the result of Bewley and Kohlberg to a more general class of payoff evaluations and we prove with a counterexample that this result is tight. We also investigate the particular case of absorbing games. Second, for the uniform approach of Mertens and Neyman, we provide another counterexample to demonstrate that there is no natural way to generalize the result of Mertens and Neyman to a wider class of payoff evaluations.
Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space … Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space and finite action sets. In their work, the total payoff in a stochastic game is defined either as a Cesaro mean or an Abel mean of the stage payoffs. This paper presents two findings: first, we generalize the result of Bewley and Kohlberg to a more general class of payoff evaluations and we prove with a counterexample that this result is tight. We also investigate the particular case of absorbing games. Second, for the uniform approach of Mertens and Neyman, we provide another counterexample to demonstrate that there is no natural way to generalize the result of Mertens and Neyman to a wider class of payoff evaluations.
Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space … Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space and finite action sets. In their work, the total payoff in a stochastic game is defined either as a Cesaro mean or an Abel mean of the stage payoffs. This paper presents two findings: first, we generalize the result of Bewley and Kohlberg to a more general class of payoff evaluations and we prove with a counterexample that this result is tight. We also investigate the particular case of absorbing games. Second, for the uniform approach of Mertens and Neyman, we provide another counterexample to demonstrate that there is no natural way to generalize the result of Mertens and Neyman to a wider class of payoff evaluations.
Mertens [In Proceedings of the International Congress of Mathematicians (Berkeley, Calif., 1986) (1987) 1528–1577 Amer. Math. Soc.] proposed two general conjectures about repeated games: the first one is that, in … Mertens [In Proceedings of the International Congress of Mathematicians (Berkeley, Calif., 1986) (1987) 1528–1577 Amer. Math. Soc.] proposed two general conjectures about repeated games: the first one is that, in any two-person zero-sum repeated game, the asymptotic value exists, and the second one is that, when Player 1 is more informed than Player 2, in the long run Player 1 is able to guarantee the asymptotic value. We disprove these two long-standing conjectures by providing an example of a zero-sum repeated game with public signals and perfect observation of the actions, where the value of the $\lambda$-discounted game does not converge when $\lambda$ goes to 0. The aforementioned example involves seven states, two actions and two signals for each player. Remarkably, players observe the payoffs, and play in turn.
Mertens [In Proceedings of the International Congress of Mathematicians (Berkeley, Calif., 1986) (1987) 1528–1577 Amer. Math. Soc.] proposed two general conjectures about repeated games: the first one is that, in … Mertens [In Proceedings of the International Congress of Mathematicians (Berkeley, Calif., 1986) (1987) 1528–1577 Amer. Math. Soc.] proposed two general conjectures about repeated games: the first one is that, in any two-person zero-sum repeated game, the asymptotic value exists, and the second one is that, when Player 1 is more informed than Player 2, in the long run Player 1 is able to guarantee the asymptotic value. We disprove these two long-standing conjectures by providing an example of a zero-sum repeated game with public signals and perfect observation of the actions, where the value of the λ-discounted game does not converge when λ goes to 0. The aforementioned example involves seven states, two actions and two signals for each player. Remarkably, players observe the payoffs, and play in turn.
We prove that, in dynamic programming framework, uniform convergence of v λ implies uniform convergence of v n and vice versa. Moreover, both have the same limit. We prove that, in dynamic programming framework, uniform convergence of v λ implies uniform convergence of v n and vice versa. Moreover, both have the same limit.
We consider dynamic programming problems with a large time horizon, and give sufficient conditions for the existence of the uniform value. As a consequence, we obtain an existence result when … We consider dynamic programming problems with a large time horizon, and give sufficient conditions for the existence of the uniform value. As a consequence, we obtain an existence result when the state space is precompact, payoffs are uniformly continuous and the transition correspondence is non expansive. In the same spirit, we give an existence result for the limit value. We also apply our results to Markov decision processes and obtain a few generalizations of existing results.
Implicitly defined (and easily approximated) universal constants $1.1 < a_n < 1.6, n = 2,3, \cdots$, are found so that if $X_1, X_2, \cdots$ are i.i.d. non-negative random variables and … Implicitly defined (and easily approximated) universal constants $1.1 < a_n < 1.6, n = 2,3, \cdots$, are found so that if $X_1, X_2, \cdots$ are i.i.d. non-negative random variables and if $T_n$ is the set of stop rules for $X_1, \cdots, X_n$, then $E(\max\{X_1, \cdots, X_n\}) \leq a_n \sup\{EX_t: t \in T_n\}$, and the bound $a_n$ is best possible. Similar universal constants $0 < b_n < \frac{1}{4}$ are found so that if the $\{X_i\}$ are i.i.d. random variables taking values only in $\lbrack a, b\rbrack$, then $E(\max\{X_1, \cdots, X_n\}) \leq \sup\{EX_t: t \in T_n\} + b_n(b - a)$, where again the bound $b_n$ is best possible. In both situations, extremal distributions for which equality is attained (or nearly attained) are given in implicit form.
Abstract We show that a broad class of fully nonlinear, second‐order parabolic or elliptic PDEs can be realized as the Hamilton‐Jacobi‐Bellman equations of deterministic two‐person games. More precisely: given the … Abstract We show that a broad class of fully nonlinear, second‐order parabolic or elliptic PDEs can be realized as the Hamilton‐Jacobi‐Bellman equations of deterministic two‐person games. More precisely: given the PDE, we identify a deterministic, discrete‐time, two‐person game whose value function converges in the continuous‐time limit to the viscosity solution of the desired equation. Our game is, roughly speaking, a deterministic analogue of the stochastic representation recently introduced by Cheridito, Soner, Touzi, and Victoir. In the parabolic setting with no u ‐dependence, it amounts to a semidiscrete numerical scheme whose timestep is a min‐max. Our result is interesting, because the usual control‐based interpretations of second‐order PDEs involve stochastic rather than deterministic control. © 2009 Wiley Periodicals, Inc.
We provide an example of a Hamilton-Jacobi equation in which stochastic homogenization does not occur. The Hamiltonian involved in this example satisfies the standard assumptions of the literature, except that … We provide an example of a Hamilton-Jacobi equation in which stochastic homogenization does not occur. The Hamiltonian involved in this example satisfies the standard assumptions of the literature, except that it is not convex.
We study random homogenization of second-order, degenerate and quasilinear Hamilton–Jacobi equations which are positively homogeneous in the gradient. Included are the equations of forced mean curvature motion and others describing … We study random homogenization of second-order, degenerate and quasilinear Hamilton–Jacobi equations which are positively homogeneous in the gradient. Included are the equations of forced mean curvature motion and others describing geometric motions of level sets as well as a large class of viscous, non-convex Hamilton–Jacobi equations. The main results include the first proof of qualitative stochastic homogenization for such equations. We also present quantitative error estimates which give an algebraic rate of homogenization.
The secretary and the prophet inequality problems are central to the field of Stopping Theory. Recently, there has been a lot of work in generalizing these models to multiple items … The secretary and the prophet inequality problems are central to the field of Stopping Theory. Recently, there has been a lot of work in generalizing these models to multiple items because of their applications in mechanism design. The most important of these generalizations are to matroids and to combinatorial auctions (extends bipartite matching). Kleinberg-Weinberg [33] and Feldman et al. [17] show that for adversarial arrival order of random variables the optimal prophet inequalities give a 1/2-approximation. For many settings, however, it's conceivable that the arrival order is chosen uniformly at random, akin to the secretary problem. For such a random arrival model, we improve upon the 1/2-approximation and obtain (1 – 1/e)-approximation prophet inequalities for both matroids and combinatorial auctions. This also gives improvements to the results of Yan [45] and Esfandiari et al. [15] who worked in the special cases where we can fully control the arrival order or when there is only a single item.Our techniques are threshold based. We convert our discrete problem into a continuous setting and then give a generic template on how to dynamically adjust these thresholds to lower bound the expected total welfare.
ABSTRACT We study the homogenization of “viscous” Hamilton–Jacobi equations in stationary ergodic media. The “viscosity” and the spatial oscillations are assumed to be of the same order. We identify the … ABSTRACT We study the homogenization of “viscous” Hamilton–Jacobi equations in stationary ergodic media. The “viscosity” and the spatial oscillations are assumed to be of the same order. We identify the asymptotic (effective) equation, which is a first-order deterministic Hamilton–Jacobi equation. We also provide examples that show that the associated macroscopic problem does not admit suitable solutions (correctors). Finally, we present as applications results about large deviations of diffusion processes and front propagation (asymptotics of reaction-diffusion equations) in random environments. Key Words: HomogenizationRandom mediaStationary ergodic mediaStochastic homogenizationViscous Hamilton-Jacobi equationsMathematics Subject Classification: 35B2735B9976M5034K5093E0360H10 Acknowledgment The second author wishes to thank the National Science Foundation for partial support.
We study the homogenization of some Hamilton-Jacobi-Bellman equations with a vanishing second-order term in a stationary ergodic random medium under the hyperbolic scaling of time and space. Imposing certain convexity, … We study the homogenization of some Hamilton-Jacobi-Bellman equations with a vanishing second-order term in a stationary ergodic random medium under the hyperbolic scaling of time and space. Imposing certain convexity, growth, and regularity assumptions on the Hamiltonian, we show the locally uniform convergence of solutions of such equations to the solution of a deterministic "effective" first-order Hamilton-Jacobi equation. The effective Hamiltonian is obtained from the original stochastic Hamiltonian by a minimax formula. Our homogenization results have a large-deviations interpretation for a diffusion in a random environment. © 2005 Wiley Periodicals, Inc.
Abstract The level‐set formulation of motion by mean curvature is a degenerate parabolic equation. We show that its solution can be interpreted as the value function of a deterministic two‐person … Abstract The level‐set formulation of motion by mean curvature is a degenerate parabolic equation. We show that its solution can be interpreted as the value function of a deterministic two‐person game. More precisely, we give a family of discrete‐time, two‐person games whose value functions converge in the continuous‐time limit to the solution of the motion‐by‐curvature PDE. For a convex domain, the boundary's “first arrival time” solves a degenerate elliptic equation; this corresponds, in our game‐theoretic setting, to a minimum‐exit‐time problem. For a nonconvex domain the two‐person game still makes sense; we draw a connection between its minimum exit time and the evolution of curves with velocity equal to the “positive part of the curvature.” These results are unexpected, because the value function of a deterministic control problem is normally the solution of a first‐order Hamilton‐Jacobi equation. Our situation is different because the usual first‐order calculation is singular. © 2005 Wiley Periodicals, Inc.
In an optimal control framework, we consider the value V T (x) of the problem starting from state x with finite horizon T, as well as the value W λ(x) … In an optimal control framework, we consider the value V T (x) of the problem starting from state x with finite horizon T, as well as the value W λ(x) of the λ-discounted problem starting from x. We prove that uniform convergence (on the set of states) of the values V T ( ⋅) as T tends to infinity is equivalent to uniform convergence of the values W λ( ⋅) as λ tends to 0, and that the limits are identical. An example is also provided to show that the result does not hold for pointwise convergence. This work is an extension, using similar techniques, of a related result by Lehrer and Sorin in a discrete-time framework.
Homogenization‐type results for the Cauchy problem for first‐order PDE (Hamilton–Jacobi equations) are presented. The main assumption is that the Hamiltonian is superlinear and convex with respect to the gradient and … Homogenization‐type results for the Cauchy problem for first‐order PDE (Hamilton–Jacobi equations) are presented. The main assumption is that the Hamiltonian is superlinear and convex with respect to the gradient and stationary and ergodic with respect to the spatial variable. Some of applications to related problems as well as to the asymptotics of reaction–diffusion equations and turbulent combustion are also presented.
We prove that games with absorbing states with compact action sets have a value. We prove that games with absorbing states with compact action sets have a value.
The main purpose of this paper is to approximate several non-local evolution equations by zero-sum repeated games in the spirit of the previous works of Kohn and the second author … The main purpose of this paper is to approximate several non-local evolution equations by zero-sum repeated games in the spirit of the previous works of Kohn and the second author (2006 and 2009): general fully non-linear parabolic integro-differential equations on the one hand, and the integral curvature flow of an on the other hand. In order to do so, we start by constructing such a game for eikonal equations whose speed has a non-constant sign. This provides a (discrete) deterministic control interpretation of these evolution equations. &nbsp In all our games, two players choose positions successively, and their final payoff is determined by their positions and additional parameters of choice. Because of the non-locality of the problems approximated, by contrast with local problems, their choices have to 'collect' information far from their current position. For parabolic integro-differential equations, players choose smooth functions on the whole space.For integral curvature flows, players choose hypersurfaces in the whole space and positions on these hypersurfaces.
We study long-term Markov decision processes (MDPs) and gambling houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We … We study long-term Markov decision processes (MDPs) and gambling houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We consider a decision maker who is maximizing the weighted sum ∑ t ≥ 1 𝜃 t r t , where r t is the expected reward of the t-th stage. We prove the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision maker can play well independently of the evaluations (𝜃 t ) t ≥ 1 over stages, provided the total variation (or impatience) [Formula: see text] is small enough. This result generalizes previous results of the literature that focus on arithmetic means and discounted evaluations. Moreover, we give a variational characterization of the general uniform value via the introduction of appropriate invariant measures for the decision problems, generalizing the fundamental theorem of gambling or the Aumann–Maschler cav(u) formula for repeated games with incomplete information. Apart the introduction of appropriate invariant measures, the main innovation in our proofs is the introduction of a new metric, d * , such that partial observation MDPs and repeated games with an informed controller may be associated with auxiliary problems that are nonexpansive with respect to d * . Given two Borel probabilities over a compact subset X of a normed vector space, we define [Formula: see text], where D 1 is the set of functions satisfying ∀ x, y ∈ X, ∀ a, b ≥ 0, af(x) − bf(y) ≤ ‖ax − by‖. The particular case where X is a simplex endowed with the L 1 -norm is particularly interesting: d * is the largest distance over the probabilities with finite support over X, which makes every disintegration nonexpansive. Moreover, we obtain a Kantorovich–Rubinstein-type duality formula for d * (u, v), involving couples of measures (α, β) over X × X such that the first marginal of α is u and the second marginal of β is v.
We present stochastic homogenization results for viscous Hamilton-Jacobi equations using a new argument that is based only on the subadditive structure of maximal subsolutions (i.e., solutions of the "metric problem").This … We present stochastic homogenization results for viscous Hamilton-Jacobi equations using a new argument that is based only on the subadditive structure of maximal subsolutions (i.e., solutions of the "metric problem").This permits us to give qualitative homogenization results under very general hypotheses: in particular, we treat nonuniformly coercive Hamiltonians that satisfy instead a weaker averaging condition.As an application, we derive a general quenched large deviation principle for diffusions in random environments and with absorbing random potentials.
Consider a gambler who observes a sequence of independent, non-negative random numbers and is allowed to stop the sequence at any time, claiming a reward equal to the most recent … Consider a gambler who observes a sequence of independent, non-negative random numbers and is allowed to stop the sequence at any time, claiming a reward equal to the most recent observation. The famous prophet inequality of Krengel, Sucheston, and Garling asserts that a gambler who knows the distribution of each random variable can achieve at least half as much reward, in expectation, as a "prophet" who knows the sampled values of each random variable and can choose the largest one. We generalize this result to the setting in which the gambler and the prophet are allowed to make more than one selection, subject to a matroid constraint. We show that the gambler can still achieve at least half as much reward as the prophet; this result is the best possible, since it is known that the ratio cannot be improved even in the original prophet inequality, which corresponds to the special case of rank-one matroids. Generalizing the result still further, we show that under an intersection of $p$ matroid constraints, the prophet's reward exceeds the gambler's by a factor of at most $O(p)$, and this factor is also tight.
Let $X_i \geq 0$ be independent, $i = 1, \cdots, n$, and $X^\ast_n = \max(X_1, \cdots, X_n)$. Let $t(c) (s(c))$ be the threshold stopping rule for $X_1, \cdots, X_n$, defined … Let $X_i \geq 0$ be independent, $i = 1, \cdots, n$, and $X^\ast_n = \max(X_1, \cdots, X_n)$. Let $t(c) (s(c))$ be the threshold stopping rule for $X_1, \cdots, X_n$, defined by $t(c) = \text{smallest} i$ for which $X_i \geq c(s(c) = \text{smallest} i$ for which $X_i > c), = n$ otherwise. Let $m$ be a median of the distribution of $X^\ast_n$. It is shown that for every $n$ and $\underline{X}$ either $EX^\ast_n \leq 2EX_{t(m)}$ or $EX^\ast_n \leq 2EX_{s(m)}$. This improves previously known results, [1], [4]. Some results for i.i.d. $X_i$ are also included.
Journal Article Stochastic Homogenization of Level-Set Convex Hamilton–Jacobi Equations Get access Scott N. Armstrong, Scott N. Armstrong 1Department of Mathematics, University of Wisconsin, 480 Lincoln Drive, Madison, WI 53706, USA … Journal Article Stochastic Homogenization of Level-Set Convex Hamilton–Jacobi Equations Get access Scott N. Armstrong, Scott N. Armstrong 1Department of Mathematics, University of Wisconsin, 480 Lincoln Drive, Madison, WI 53706, USA Correspondence to be sent to: [email protected] Search for other works by this author on: Oxford Academic Google Scholar Panagiotis E. Souganidis Panagiotis E. Souganidis 2Department of Mathematics, The University of Chicago, 5734 S. University Avenue, Chicago, IL 60637, USA Search for other works by this author on: Oxford Academic Google Scholar International Mathematics Research Notices, Volume 2013, Issue 15, 2013, Pages 3420–3449, https://doi.org/10.1093/imrn/rns155 Published: 13 June 2012 Article history Received: 30 March 2012 Accepted: 21 May 2012 Published: 13 June 2012
We are interested in the convergence of the value of n-stage games as n goes to infinity and the existence of the uniform value in stochastic games with a general … We are interested in the convergence of the value of n-stage games as n goes to infinity and the existence of the uniform value in stochastic games with a general set of states and finite sets of actions where the transition is commutative. This means that playing an action profile a 1 followed by an action profile a 2 , leads to the same distribution on states as playing first the action profile a 2 and then a 1. For example, absorbing games can be reformulated as commutative stochastic games. When there is only one player and the transition function is deterministic, we show that the existence of a uniform value in pure strategies implies the existence of 0-optimal strategies. In the framework of two-player stochastic games, we study a class of games where the set of states is R m and the transition is deterministic and 1-Lipschitz for the L 1-norm, and prove that these games have a uniform value. A similar proof shows the existence of an equilibrium in the non zero-sum case. These results remain true if one considers a general model of finite repeated games, where the transition is commutative and the players observe the past actions but not the state.
Previous chapter Next chapter Full AccessProceedings Proceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)Prophet Inequalities with Limited InformationPablo D. Azar, Robert Kleinberg, and S. Matthew WeinbergPablo D. … Previous chapter Next chapter Full AccessProceedings Proceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)Prophet Inequalities with Limited InformationPablo D. Azar, Robert Kleinberg, and S. Matthew WeinbergPablo D. Azar, Robert Kleinberg, and S. Matthew Weinbergpp.1358 - 1377Chapter DOI:https://doi.org/10.1137/1.9781611973402.100PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract In the classical prophet inequality, a gambler observes a sequence of stochastic rewards V1, …, Vn and must decide, for each reward Vi, whether to keep it and stop the game or to forfeit the reward forever and reveal the next value Vi. The gambler's goal is to obtain a constant fraction of the expected reward that the optimal offline algorithm would get. Recently, prophet inequalities have been generalized to settings where the gambler can choose k items, and, more generally, where he can choose any independent set in a matroid. However, all the existing algorithms require the gambler to know the distribution from which the rewards V1, …, Vn are drawn. The assumption that the gambler knows the distribution from which V1, …, Vn are drawn is very strong. Instead, we work with the much simpler assumption that the gambler only knows a few samples from this distribution. We construct the first single-sample prophet inequalities for many settings of interest, whose guarantees all match the best possible asymptotically, even with full knowledge of the distribution. Specifically, we provide a novel single-sample algorithm when the gambler can choose any k elements whose analysis is based on random walks with limited correlation. In addition, we provide a black-box method for converting specific types of solutions to the related secretary problem to single-sample prophet inequalities, and apply it to several existing algorithms. Finally, we provide a constant-sample prophet inequality for constant-degree bipartite matchings. In addition, we apply these results to design the first posted-price and multi-dimensional auction mechanisms with limited information in settings with asymmetric bidders. Connections between prophet inequalities and posted-price mechanisms are already known, but applying the existing framework requires knowledge of the underlying distributions, as well as the so-called "virtual values" even when the underlying prophet inequalities do not. We therefore provide an extension of this framework that bypasses virtual values altogether, allowing our mechanisms to take full advantage of the limited information required by our new prophet inequalities. Previous chapter Next chapter RelatedDetails Published:2014ISBN:978-1-61197-338-9eISBN:978-1-61197-340-2 https://doi.org/10.1137/1.9781611973402Book Series Name:ProceedingsBook Code:PRDA14Book Pages:viii + 1885
A central object in optimal stopping theory is the single-choice prophet inequality for independent, identically distributed random variables: given a sequence of random variables X1, ..., Xn drawn independently from … A central object in optimal stopping theory is the single-choice prophet inequality for independent, identically distributed random variables: given a sequence of random variables X1, ..., Xn drawn independently from a distribution F, the goal is to choose a stopping time τ so as to maximize α such that for all distributions F we have E[Xτ]≥α•E[maxt Xt]. What makes this problem challenging is that the decision whether τ=t may only depend on the values of the random variables X1, ..., Xt and on the distribution F. For a long time the best known bound for the problem had been α≥1-1/e≅0.632, but quite recently a tight bound of α≅0.745 was obtained. The case where F is unknown, such that the decision whether τ=t may depend only on the values of the random variables X1, ..., Xt, is equally well motivated but has received much less attention. A straightforward guarantee for this case of α≥1-1/e≅0.368 can be derived from the solution to the secretary problem, where an arbitrary set of values arrive in random order and the goal is to maximize the probability of selecting the largest value. We show that this bound is in fact tight. We then investigate the case where the stopping time may additionally depend on a limited number of samples from~F, and show that even with o(n) samples α≥1/e. On the other hand, n samples allow for a significant improvement, while O(n2) samples are equivalent to knowledge of the distribution: specifically, with n samples α≥1-1/e≅0.632 and α≥ln(2)≅0.693, and with O(n2) samples α≥0.745-ε for any ε>0.
We prove explicit estimates for the error in random homogenization of degenerate, second-order Hamilton-Jacobi equations, assuming the coefficients satisfy a finite range of dependence. In particular, we obtain an algebraic … We prove explicit estimates for the error in random homogenization of degenerate, second-order Hamilton-Jacobi equations, assuming the coefficients satisfy a finite range of dependence. In particular, we obtain an algebraic rate of convergence with overwhelming probability under certain structural conditions on the Hamiltonian.
We prove a Tauberian theorem for nonexpansive operators and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the λ-discounted game … We prove a Tauberian theorem for nonexpansive operators and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the λ-discounted game converges uniformly when λ goes to zero if and only if the value of the n-stage game converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin [Lehrer E, Sorin S (1992) A uniform Tauberian theorem in dynamic programming. Math. Oper. Res. 17(2):303–307] to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets, and action sets, in which for some initial state known by both players, the value of the λ-discounted game and the value of the n-stage game starting at that initial state converge to distinct limits.
Given a finite set $K$, we denote by $X=Δ(K)$ the set of probabilities on $K$ and by $Z=Δ_f(X)$ the set of Borel probabilities on $X$ with finite support. Studying a … Given a finite set $K$, we denote by $X=Δ(K)$ the set of probabilities on $K$ and by $Z=Δ_f(X)$ the set of Borel probabilities on $X$ with finite support. Studying a Markov Decision Process with partial information on $K$ naturally leads to a Markov Decision Process with full information on $X$. We introduce a new metric $d_*$ on $Z$ such that the transitions become 1-Lipschitz from $(X, \|.\|_1)$ to $(Z,d_*)$. In the first part of the article, we define and prove several properties of the metric $d_*$. Especially, $d_*$ satisfies a Kantorovich-Rubinstein type duality formula and can be characterized by using disintegrations. In the second part, we characterize the limit values in several classes of "compact non expansive" Markov Decision Processes. In particular we use the metric $d_*$ to characterize the limit value in Partial Observation MDP with finitely many states and in Repeated Games with an informed controller with finite sets of states and actions. Moreover in each case we can prove the existence of a generalized notion of uniform value where we consider not only the Cesàro mean when the number of stages is large enough but any evaluation function $θ\in Δ(\N^*)$ when the impatience $I(θ)=\sum_{t\geq 1} |θ_{t+1}-θ_t|$ is small enough.
We consider the general model of zero-sum repeated games (or stochastic games with signals), and assume that one of the players is fully informed and controls the transitions of the … We consider the general model of zero-sum repeated games (or stochastic games with signals), and assume that one of the players is fully informed and controls the transitions of the state variable. We prove the existence of the uniform value, generalizing several results of the literature. A preliminary existence result is obtained for a certain class of stochastic games played with pure strategies.
Abstract : The purpose of this paper is to discuss the asymptotic behavior of the sequence (f sub n(i)) generated by a nonlinear recurrence relation. This problem arises in connection … Abstract : The purpose of this paper is to discuss the asymptotic behavior of the sequence (f sub n(i)) generated by a nonlinear recurrence relation. This problem arises in connection with an equipment replacement problem, cf. S. Dreyfus, A Note on an Industrial Replacement Process.
This paper considers the problem of homogeniza- tion of a class of convex Hamilton-Jacobi equations in spatio-temporal stationary ergodic environments. Special attention is placed on the interplay between the use … This paper considers the problem of homogeniza- tion of a class of convex Hamilton-Jacobi equations in spatio-temporal stationary ergodic environments. Special attention is placed on the interplay between the use of the Subadditive Ergodic Theorem and continuity estimates for the solutions that are independent of the oscillations in the equation. Moreover, an inf-sup formula for the effective Hamiltonian is provided.
Hill and Kertz studied the prophet inequality on iid distributions [The Annals of Probability 1982]. They proved a theoretical bound of 1 - 1/e on the approximation factor of their … Hill and Kertz studied the prophet inequality on iid distributions [The Annals of Probability 1982]. They proved a theoretical bound of 1 - 1/e on the approximation factor of their algorithm. They conjectured that the best approximation factor for arbitrarily large n is 1/1+1/e ≃ 0.731. This conjecture remained open prior to this paper for over 30 years. In this paper we present a threshold-based algorithm for the prophet inequality with n iid distributions. Using a nontrivial and novel approach we show that our algorithm is a 0.738-approximation algorithm. By beating the bound of 1/1+1/e, this refutes the conjecture of Hill and Kertz. Moreover, we generalize our results to non-uniform distributions and discuss its applications in mechanism design.
We consider two-person zero-sum stochastic games with limit superior payoff function and Borel measurable state and action spaces. The games are shown to have a value and the value function … We consider two-person zero-sum stochastic games with limit superior payoff function and Borel measurable state and action spaces. The games are shown to have a value and the value function is calculated by transfinite iteration of an operator and proved to be upper analytic. The paper extends results of our earlier article [17] in which the same class of games was considered for countable state spaces and finite action sets.
We present exponential error estimates and demonstrate an algebraic convergence rate for the homogenization of level-set convex Hamilton-Jacobi equations in i.i.d. random environments, the first quantitative homogenization results for these … We present exponential error estimates and demonstrate an algebraic convergence rate for the homogenization of level-set convex Hamilton-Jacobi equations in i.i.d. random environments, the first quantitative homogenization results for these equations in the stochastic setting. By taking advantage of a connection between the metric approach to homogenization and the theory of first-passage percolation, we obtain estimates on the fluctuations of the solutions to the approximate cell problem in the ballistic regime (away from the flat spot of the effective Hamiltonian). In the sub-ballistic regime (on the flat spot), we show that the fluctuations are governed by an entirely different mechanism and the homogenization may proceed, without further assumptions, at an arbitrarily slow rate. We identify a necessary and sufficient condition on the law of the Hamiltonian for an algebraic rate of convergence to hold in the sub-ballistic regime and show, under this hypothesis, that the two rates may be merged to yield comprehensive error estimates and an algebraic rate of convergence for homogenization. Our methods are novel and quite different from the techniques employed in the periodic setting, although we benefit from previous works in both first-passage percolation and homogenization. The link between the rate of homogenization and the flat spot of the effective Hamiltonian, which is related to the nonexistence of correctors, is a purely random phenomenon observed here for the first time.
We prove that every bounded Lipschitz function F on a subset Y of a length space X admits a tautest extension to X, i.e., a unique Lipschitz extension u for … We prove that every bounded Lipschitz function F on a subset Y of a length space X admits a tautest extension to X, i.e., a unique Lipschitz extension u for which Lip_U u = Lip_{boundary of U} u for all open subsets U of X that do not intersect Y. This was previously known only for bounded domains R^n, in which case u is infinity harmonic, that is, a viscosity solution to Delta_infty u = 0. We also prove the first general uniqueness results for Delta_infty u = g on bounded subsets of R^n (when g is uniformly continuous and bounded away from zero), and analogous results for bounded length spaces. The proofs rely on a new game-theoretic description of u. Let u^epsilon(x) be the value of the following two-player zero-sum game, called tug-of-war: fix x_0=x \in X minus Y. At the kth turn, the players toss a coin and the winner chooses an x_k with d(x_k, x_{k-1})< epsilon. The game ends when x_k is in Y, and player one's payoff is F(x_k) - (epsilon^2/2) sum_{i=0}^{k-1} g(x_i) We show that the u^\epsilon converge uniformly to u as epsilon tends to zero. Even for bounded domains in R^n, the game theoretic description of infinity-harmonic functions yields new intuition and estimates; for instance, we prove power law bounds for infinity-harmonic functions in the unit disk with boundary values supported in a delta-neighborhood of a Cantor set on the unit circle.
The paper is concerned with two-person games with saddle point. We investigate the limits of value functions for long-time-average payoff, discounted average payoff, and the payoff that follows a probability … The paper is concerned with two-person games with saddle point. We investigate the limits of value functions for long-time-average payoff, discounted average payoff, and the payoff that follows a probability density on ℝ≥0.
In the design and analysis of revenue-maximizing auctions, auction performance is typically measured with respect to a prior distribution over inputs. The most obvious source for such a distribution is … In the design and analysis of revenue-maximizing auctions, auction performance is typically measured with respect to a prior distribution over inputs. The most obvious source for such a distribution is past data. The goal of this paper is to understand how much data is necessary and sufficient to guarantee near-optimal expected revenue.
Fix a bounded domain Ω⊂Rd, a continuous function F:∂Ω→R, and constants ε>0 and 1<p,q<∞ with p−1+q−1=1. For each x∈Ω, let uε(x) be the value for player I of the following … Fix a bounded domain Ω⊂Rd, a continuous function F:∂Ω→R, and constants ε>0 and 1<p,q<∞ with p−1+q−1=1. For each x∈Ω, let uε(x) be the value for player I of the following two-player, zero-sum game. The initial game position is x. At each stage, a fair coin is tossed, and the player who wins the toss chooses a vector v∈B̲(0,ε) to add to the game position, after which a random noise vector with mean zero and variance (q/p)|v|2 in each orthogonal direction is also added. The game ends when the game position reaches some y∈∂Ω, and player I's payoff is F(y). We show that (for sufficiently regular Ω) as ε tends to zero, the functions uε converge uniformly to the unique p-harmonic extension of F. Using a modified game (in which ε gets smaller as the game position approaches ∂Ω), we prove similar statements for general bounded domains Ω and resolutive functions F. These games and their variants interpolate between the tug-of-war games studied by Peres, Schramm, Sheffield, and Wilson [15], [16] (p=∞) and the motion-by-curvature games introduced by Spencer [17] and studied by Kohn and Serfaty [9] (p=1). They generalize the relationship between Brownian motion and the ordinary Laplacian and yield new results about p-capacity and p-harmonic measure
We give a proof of asymptotic Lipschitz continuity of p-harmonious functions, that are tug-of-war game analogies of ordinary p-harmonic functions. This result is used to obtain a new proof of … We give a proof of asymptotic Lipschitz continuity of p-harmonious functions, that are tug-of-war game analogies of ordinary p-harmonic functions. This result is used to obtain a new proof of Lipschitz continuity and Harnack's inequality for p-harmonic functions in the case p > 2. The proof avoids classical techniques like Moser iteration, but instead relies on suitable choices of strategies for the stochastic tug-of-war game.