We propose a new algorithm that finds an $\varepsilon$-approximate fixed point of a smooth function from the $n$-dimensional $\ell_2$ unit ball to itself. We use the general framework of finding …
We propose a new algorithm that finds an $\varepsilon$-approximate fixed point of a smooth function from the $n$-dimensional $\ell_2$ unit ball to itself. We use the general framework of finding approximate solutions to a variational inequality, a problem that subsumes fixed point computation and the computation of a Nash Equilibrium. The algorithm's runtime is bounded by $e^{O(n)}/\varepsilon$, under the smoothed-analysis framework. This is the first known algorithm in such a generality whose runtime is faster than $(1/\varepsilon)^{O(n)}$, which is a time that suffices for an exhaustive search. We complement this result with a lower bound of $e^{\Omega(n)}$ on the query complexity for finding an $O(1)$-approximate fixed point on the unit ball, which holds even in the smoothed-analysis model, yet without the assumption that the function is smooth. Existing lower bounds are only known for the hypercube, and adapting them to the ball does not give non-trivial results even for finding $O(1/\sqrt{n})$-approximate fixed points.
We propose efficient no-regret learning dynamics and ellipsoid-based methods for computing linear correlated equilibria$\unicode{x2014}$a relaxation of correlated equilibria and a strengthening of coarse correlated equilibria$\unicode{x2014}$in general convex games. These are …
We propose efficient no-regret learning dynamics and ellipsoid-based methods for computing linear correlated equilibria$\unicode{x2014}$a relaxation of correlated equilibria and a strengthening of coarse correlated equilibria$\unicode{x2014}$in general convex games. These are games where the number of pure strategies is potentially exponential in the natural representation of the game, such as extensive-form games. Our work identifies linear correlated equilibria as the tightest known notion of equilibrium that is computable in polynomial time and is efficiently learnable for general convex games. Our results are enabled by a generalization of the seminal framework of of Gordon et al. [2008] for $\Phi$-regret minimization, providing extensions to this framework that can be used even when the set of deviations $\Phi$ is intractable to separate/optimize over. Our polynomial-time algorithms are similarly enabled by extending the Ellipsoid-Against-Hope approach of Papadimitriou and Roughgarden [2008] and its generalization to games of non-polynomial type proposed by Farina and Pipis [2024a]. We provide an extension to these approaches when we do not have access to the separation oracles required by these works for the dual player.
Stories are records of our experiences and their analysis reveals insights into the nature of being human. Successful analyses are often interdisciplinary, leveraging mathematical tools to extract structure from stories …
Stories are records of our experiences and their analysis reveals insights into the nature of being human. Successful analyses are often interdisciplinary, leveraging mathematical tools to extract structure from stories and insights from structure. Historically, these tools have been restricted to one dimensional charts and dynamic social networks; however, modern AI offers the possibility of identifying more fully the plot structure, character incentives, and, importantly, counterfactual plot lines that the story could have taken but did not take. In this work, we use AI to model the structure of stories as game-theoretic objects, amenable to quantitative analysis. This allows us to not only interrogate each character's decision making, but also possibly peer into the original author's conception of the characters' world. We demonstrate our proposed technique on Shakespeare's famous Romeo and Juliet. We conclude with a discussion of how our analysis could be replicated in broader contexts, including real-life scenarios.
A fundamental problem in network experiments is selecting an appropriate experimental design in order to precisely estimate a given causal effect of interest. In fact, optimal rates of estimation remain …
A fundamental problem in network experiments is selecting an appropriate experimental design in order to precisely estimate a given causal effect of interest. In fact, optimal rates of estimation remain unknown for essentially all causal effects in network experiments. In this work, we propose a general approach for constructing experiment designs under network interference with the goal of precisely estimating a pre-specified causal effect. A central aspect of our approach is the notion of a conflict graph, which captures the fundamental unobservability associated with the casual effect and the underlying network. We refer to our experimental design as the Conflict Graph Design. In order to estimate effects, we propose a modified Horvitz--Thompson estimator. We show that its variance under the Conflict Graph Design is bounded as $O(\lambda(H) / n )$, where $\lambda(H)$ is the largest eigenvalue of the adjacency matrix of the conflict graph. These rates depend on both the underlying network and the particular causal effect under investigation. Not only does this yield the best known rates of estimation for several well-studied causal effects (e.g. the global and direct effects) but it also provides new methods for effects which have received less attention from the perspective of experiment design (e.g. spill-over effects). Our results corroborate two implicitly understood points in the literature: (1) that in order to increase precision, experiment designs should be tailored to specific causal effects of interest and (2) that "more local" effects are easier to estimate than "more global" effects. In addition to point estimation, we construct conservative variance estimators which facilitate the construction of asymptotically valid confidence intervals for the casual effect of interest.
The quality of generative models depends on the quality of the data they are trained on. Creating large-scale, high-quality datasets is often expensive and sometimes impossible, e.g. in certain scientific …
The quality of generative models depends on the quality of the data they are trained on. Creating large-scale, high-quality datasets is often expensive and sometimes impossible, e.g. in certain scientific applications where there is no access to clean data due to physical or instrumentation constraints. Ambient Diffusion and related frameworks train diffusion models with solely corrupted data (which are usually cheaper to acquire) but ambient models significantly underperform models trained on clean data. We study this phenomenon at scale by training more than $80$ models on data with different corruption levels across three datasets ranging from $30,000$ to $\approx 1.3$M samples. We show that it is impossible, at these sample sizes, to match the performance of models trained on clean data when only training on noisy data. Yet, a combination of a small set of clean data (e.g.~$10\%$ of the total dataset) and a large set of highly noisy data suffices to reach the performance of models trained solely on similar-size datasets of clean data, and in particular to achieve near state-of-the-art performance. We provide theoretical evidence for our findings by developing novel sample complexity bounds for learning from Gaussian Mixtures with heterogeneous variances. Our theoretical model suggests that, for large enough datasets, the effective marginal utility of a noisy sample is exponentially worse than that of a clean sample. Providing a small set of clean samples can significantly reduce the sample size requirements for noisy data, as we also observe in our experiments.
A set of probabilistic forecasts is calibrated if each prediction of the forecaster closely approximates the empirical distribution of outcomes on the subset of timesteps where that prediction was made. …
A set of probabilistic forecasts is calibrated if each prediction of the forecaster closely approximates the empirical distribution of outcomes on the subset of timesteps where that prediction was made. We study the fundamental problem of online calibrated forecasting of binary sequences, which was initially studied by Foster & Vohra (1998). They derived an algorithm with $O(T^{2/3})$ calibration error after $T$ time steps, and showed a lower bound of $\Omega(T^{1/2})$. These bounds remained stagnant for two decades, until Qiao & Valiant (2021) improved the lower bound to $\Omega(T^{0.528})$ by introducing a combinatorial game called sign preservation and showing that lower bounds for this game imply lower bounds for calibration. We introduce a strengthening of Qiao & Valiant's game that we call sign preservation with reuse (SPR). We prove that the relationship between SPR and calibrated forecasting is bidirectional: not only do lower bounds for SPR translate into lower bounds for calibration, but algorithms for SPR also translate into new algorithms for calibrated forecasting. In particular, any strategy that improves the trivial upper bound for the value of the SPR game would imply a forecasting algorithm with calibration error exponent less than 2/3, improving Foster & Vohra's upper bound for the first time. Using similar ideas, we then prove a slightly stronger lower bound than that of Qiao & Valiant, namely $\Omega(T^{0.54389})$. Our lower bound is obtained by an oblivious adversary, marking the first $\omega(T^{1/2})$ calibration lower bound for oblivious adversaries.
Recent simultaneous works by Peng and Rubinstein [2024] and Dagan et al. [2024] have demonstrated the existence of a no-swap-regret learning algorithm that can reach $\epsilon$ average swap regret against …
Recent simultaneous works by Peng and Rubinstein [2024] and Dagan et al. [2024] have demonstrated the existence of a no-swap-regret learning algorithm that can reach $\epsilon$ average swap regret against an adversary in any extensive-form game within $m^{\tilde{\mathcal O}(1/\epsilon)}$ rounds, where $m$ is the number of nodes in the game tree. However, the question of whether a $\mathrm{poly}(m, 1/\epsilon)$-round algorithm could exist remained open. In this paper, we show a lower bound that precludes the existence of such an algorithm. In particular, we show that achieving average swap regret $\epsilon$ against an oblivious adversary in general extensive-form games requires at least $\mathrm{exp}\left(\Omega\left(\min\left\{m^{1/14}, \epsilon^{-1/6}\right\}\right)\right)$ rounds.
The empirical risk minimization (ERM) principle has been highly impactful in machine learning, leading both to near-optimal theoretical guarantees for ERM-based learning algorithms as well as driving many of the …
The empirical risk minimization (ERM) principle has been highly impactful in machine learning, leading both to near-optimal theoretical guarantees for ERM-based learning algorithms as well as driving many of the recent empirical successes in deep learning. In this paper, we investigate the question of whether the ability to perform ERM, which computes a hypothesis minimizing empirical risk on a given dataset, is necessary for efficient learning: in particular, is there a weaker oracle than ERM which can nevertheless enable learnability? We answer this question affirmatively, showing that in the realizable setting of PAC learning for binary classification, a concept class can be learned using an oracle which only returns a single bit indicating whether a given dataset is realizable by some concept in the class. The sample complexity and oracle complexity of our algorithm depend polynomially on the VC dimension of the hypothesis class, thus showing that there is only a polynomial price to pay for use of our weaker oracle. Our results extend to the agnostic learning setting with a slight strengthening of the oracle, as well as to the partial concept, multiclass and real-valued learning settings. In the setting of partial concept classes, prior to our work no oracle-efficient algorithms were known, even with a standard ERM oracle. Thus, our results address a question of Alon et al. (2021) who asked whether there are algorithmic principles which enable efficient learnability in this setting.
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from …
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibility results, we consider several notions of separation of the constituent MDPs. The main thrust of this paper is in establishing a nearly-sharp *statistical threshold* for the horizon length necessary for efficient learning. On the computational side, we show that under a weaker assumption of separability under the optimal policy, there is a quasi-polynomial algorithm with time complexity scaling in terms of the statistical threshold. We further show a near-matching time complexity lower bound under the exponential time hypothesis.
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to …
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to approximations which deteriorate performance. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in this space. Our key technical contribution is a method that uses a double application of Tweedie's formula and a consistency loss function that allows us to extend sampling at noise levels below the observed data noise. We also provide further evidence that diffusion models memorize from their training sets by identifying extremely corrupted images that are almost perfectly reconstructed, raising copyright and privacy concerns. Our method for training using corrupted samples can be used to mitigate this problem. We demonstrate this by fine-tuning Stable Diffusion XL to generate samples from a distribution using only noisy samples. Our framework reduces the amount of memorization of the fine-tuning dataset, while maintaining competitive performance.
While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, …
While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when the utilities are non-concave, a situation that is common in machine learning applications where the agents' strategies are parameterized by deep neural networks, or the agents' utilities are computed by a neural network, or both. Indeed, non-concave games present a host of game-theoretic and optimization challenges: (i) Nash equilibria may fail to exist; (ii) local Nash equilibria exist but are intractable; and (iii) mixed Nash, correlated, and coarse correlated equilibria have infinite support in general, and are intractable. To sidestep these challenges we propose a new solution concept, termed $(\varepsilon, \Phi(\delta))$-local equilibrium, which generalizes local Nash equilibrium in non-concave games, as well as (coarse) correlated equilibrium in concave games. Importantly, we show that two instantiations of this solution concept capture the convergence guarantees of Online Gradient Descent and no-regret learning, which we show efficiently converge to this type of equilibrium in non-concave games with smooth utilities.
In the classical setting of self-selection, the goal is to learn k models simultaneously, from observations (x(i), y(i)) where y(i) is the output of one of k underlying models on …
In the classical setting of self-selection, the goal is to learn k models simultaneously, from observations (x(i), y(i)) where y(i) is the output of one of k underlying models on input x(i). In contrast to mixture models, where we observe the output of a randomly selected model (and therefore the selection of which model is observed is exogenous), in self-selection models the observed model depends on the realized outputs of the underlying models themselves, as determined by some known selection criterion (e.g., we might observe the highest output, the smallest output, or the median output of the k models), and is thus endogenous. In known-index self-selection, the identity of the observed model output is observable; in unknown-index self-selection, it is not. Self-selection has a long history in Econometrics (going back to the works of Roy, Gronau, Lewis, Heckman, and others) and many applications in various theoretical and applied fields, including treatment effect estimation, imitation learning, learning from strategically reported data, and learning from markets at disequilibrium.
We introduce adversarial learning methods for data-driven generative modeling of the dynamics of $n^{th}$-order stochastic systems. Our approach builds on Generative Adversarial Networks (GANs) with generative model classes based on …
We introduce adversarial learning methods for data-driven generative modeling of the dynamics of $n^{th}$-order stochastic systems. Our approach builds on Generative Adversarial Networks (GANs) with generative model classes based on stable $m$-step stochastic numerical integrators. We introduce different formulations and training methods for learning models of stochastic dynamics based on observation of trajectory samples. We develop approaches using discriminators based on Maximum Mean Discrepancy (MMD), training protocols using conditional and marginal distributions, and methods for learning dynamic responses over different time-scales. We show how our approaches can be used for modeling physical systems to learn force-laws, damping coefficients, and noise-related parameters. The adversarial learning approaches provide methods for obtaining stable generative models for dynamic tasks including long-time prediction and developing simulations for stochastic systems.
Imperfect score-matching leads to a shift between the training and the sampling distribution of diffusion models. Due to the recursive nature of the generation process, errors in previous steps yield …
Imperfect score-matching leads to a shift between the training and the sampling distribution of diffusion models. Due to the recursive nature of the generation process, errors in previous steps yield sampling iterates that drift away from the training distribution. Yet, the standard training objective via Denoising Score Matching (DSM) is only designed to optimize over non-drifted data. To train on drifted data, we propose to enforce a \emph{consistency} property which states that predictions of the model on its own generated data are consistent across time. Theoretically, we show that if the score is learned perfectly on some non-drifted points (via DSM) and if the consistency property is enforced everywhere, then the score is learned accurately everywhere. Empirically we show that our novel training objective yields state-of-the-art results for conditional and unconditional generation in CIFAR-10 and baseline improvements in AFHQ and FFHQ. We open-source our code and models: https://github.com/giannisdaras/cdm
While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general …
While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.
A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normal-form games is PPAD-hard. In this paper, inspired by the ideas of smoothed …
A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normal-form games is PPAD-hard. In this paper, inspired by the ideas of smoothed analysis, we introduce a relaxed variant of Nash equilibrium called $\sigma$-smooth Nash equilibrium, for a smoothness parameter $\sigma$. In a $\sigma$-smooth Nash equilibrium, players only need to achieve utility at least as high as their best deviation to a $\sigma$-smooth strategy, which is a distribution that does not put too much mass (as parametrized by $\sigma$) on any fixed action. We distinguish two variants of $\sigma$-smooth Nash equilibria: strong $\sigma$-smooth Nash equilibria, in which players are required to play $\sigma$-smooth strategies under equilibrium play, and weak $\sigma$-smooth Nash equilibria, where there is no such requirement. We show that both weak and strong $\sigma$-smooth Nash equilibria have superior computational properties to Nash equilibria: when $\sigma$ as well as an approximation parameter $\epsilon$ and the number of players are all constants, there is a constant-time randomized algorithm to find a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in normal-form games. In the same parameter regime, there is a polynomial-time deterministic algorithm to find a strong $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in a normal-form game. These results stand in contrast to the optimal algorithm for computing $\epsilon$-approximate Nash equilibria, which cannot run in faster than quasipolynomial-time. We complement our upper bounds by showing that when either $\sigma$ or $\epsilon$ is an inverse polynomial, finding a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibria becomes computationally intractable.
We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour [BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness …
We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour [BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness of the space of actions. We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class. For the problem of learning with expert advice, our result implies that it is possible to guarantee that the swap regret is bounded by {\epsilon} after $\log(N)^{O(1/\epsilon)}$ rounds and with $O(N)$ per iteration complexity, where $N$ is the number of experts, while the classical reductions of Blum-Mansour and Stolz-Lugosi require $O(N/\epsilon^2)$ rounds and at least $\Omega(N^2)$ per iteration complexity. Our result comes with an associated lower bound, which -- in contrast to that in [BM07] -- holds for oblivious and $\ell_1$-constrained adversaries and learners that can employ distributions over experts, showing that the number of rounds must be $\tilde\Omega(N/\epsilon^2)$ or exponential in $1/\epsilon$. Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria, of arbitrarily good approximation. This strengthens the folklore implication of no-regret learning that approximate coarse correlated equilibria exist. Importantly, it provides a sufficient condition for the existence of correlated equilibrium which vastly extends the requirement that the action set is finite, thus answering a question left open by [DG22; Ass+23]. Moreover, it answers several outstanding questions about equilibrium computation and learning in games.
We present a general framework for designing approximately revenue-optimal mechanisms for multi-item additive auctions, which applies to both truthful and nontruthful auctions. Given a (not necessarily truthful) single-item auction format …
We present a general framework for designing approximately revenue-optimal mechanisms for multi-item additive auctions, which applies to both truthful and nontruthful auctions. Given a (not necessarily truthful) single-item auction format satisfying certain technical conditions, we run simultaneous item auctions augmented with a personalized entry fee for each bidder that must be paid before the auction can be accessed. These entry fees depend only on the prior distribution of bidder types and in particular are independent of realized bids. We bound the revenue of the resulting two-part tariff mechanism using a novel geometric technique that enables revenue guarantees for many common nontruthful auctions that previously had none. Our approach adapts and extends the duality framework of Cai, Devanur, and Weinberg [SIAM J. Comput., 50 (2021), pp. STOC16-160–STOC16-200] beyond truthful auctions. Our framework can be used with many common auction formats, such as simultaneous first-price, simultaneous second-price, and simultaneous all-pay auctions. Our results for first-price and all-pay are the first revenue guarantees of nontruthful mechanisms in multidimensional environments, addressing an open question in the literature [T. Roughgarden, V. Syrgkanis, and E. Tardos, J. Artificial Intelligence Res., 59 (2017), pp. 59–101]. If all-pay auctions are used, we prove that the resulting mechanism is also credible in the sense that the auctioneer cannot benefit by deviating from the stated mechanism after observing agent bids. This is the first static credible mechanism for multi-item additive auctions that achieves a constant factor of the optimal revenue. If second-price auctions are used, we obtain a truthful -approximate mechanism with fixed entry fees that are amenable to tuning via online learning techniques.
Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, …
Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, however, these mechanisms can be extremely sensitive to the Bayesian prior that they target, which becomes problematic when that prior is only approximately known. At the same time, even if access to the exact Bayesian prior is given, it is known that optimal or even approximately optimal multi-item mechanisms run into sample, computational, representation and communication intractability barriers.
We provide efficient estimation methods for first- and second-price auctions under independent (asymmetric) private values and partial observability. Given a finite set of observations, each comprising the identity of the …
We provide efficient estimation methods for first- and second-price auctions under independent (asymmetric) private values and partial observability. Given a finite set of observations, each comprising the identity of the winner and the price they paid in a sequence of identical auctions, we provide algorithms for non-parametrically estimating the bid distribution of each bidder, as well as their value distributions under equilibrium assumptions. We provide finite-sample estimation bounds which are uniform in that their error rates do not depend on the bid/value distributions being estimated. Our estimation guarantees advance a body of work in Econometrics wherein only identification results have been obtained, unless the setting is symmetric, parametric, or all bids are observable. Our guarantees also provide computationally and statistically effective alternatives to classical techniques from reliability theory. Finally, our results are immediately applicable to Dutch and English auctions.
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In …
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback–Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.
Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player …
Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of $\tilde{O}(T^{-1})$. This substantially improves over the prior best rate of convergence for correlated equilibria of $O(T^{-3/4})$ due to Chen and Peng (NeurIPS`20), and it is optimal -- within the no-regret framework -- up to polylogarithmic factors in $T$. To obtain these results, we develop new techniques for establishing higher-order smoothness for learning dynamics involving fixed point operations. Specifically, we establish that the no-internal-regret learning dynamics of Stoltz and Lugosi (Mach Learn`05) are equivalently simulated by no-external-regret dynamics on a combinatorial space. This allows us to trade the computation of the stationary distribution on a polynomial-sized Markov chain for a (much more well-behaved) linear transformation on an exponential-sized set, enabling us to leverage similar techniques as DFG to near-optimally bound the internal regret. Moreover, we establish an $O(\textrm{polylog}(T))$ no-swap-regret bound for the classic algorithm of Blum and Mansour (BM) (JMLR`07). We do so by introducing a technique based on the Cauchy Integral Formula that circumvents the more limited combinatorial arguments of DFG. In addition to shedding clarity on the near-optimal regret guarantees of BM, our arguments provide insights into the various ways in which the techniques by DFG can be extended and leveraged in the analysis of more involved learning algorithms.
We study fast rates of convergence in the setting of nonparametric online regression, namely where regret is defined with respect to an arbitrary function class which has bounded complexity. Our …
We study fast rates of convergence in the setting of nonparametric online regression, namely where regret is defined with respect to an arbitrary function class which has bounded complexity. Our contributions are two-fold:
Generative Ensemble Regression: Learning Particle Dynamics from Observations of Ensembles with Physics-informed Deep Generative Models
Generative Ensemble Regression: Learning Particle Dynamics from Observations of Ensembles with Physics-informed Deep Generative Models
We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is turn-based, the discount …
We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is turn-based, the discount factor is an absolute constant, and the approximation is an absolute constant. Our intractability results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable. A fortiori, our results imply that there are no efficient algorithms for learning stationary Markov CCE policies in multi-agent reinforcement learning (MARL), even when the interaction is two-player and turn-based, and both the discount factor and the desired approximation of the learned policies is an absolute constant. In turn, these results stand in sharp contrast to single-agent reinforcement learning (RL) where near-optimal stationary Markov policies can be efficiently learned. Complementing our intractability results for stationary Markov CCEs, we provide a decentralized algorithm (assuming shared randomness among players) for learning a nonstationary Markov CCE policy with polynomial time and sample complexity in all problem parameters. Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.
We provide efficient estimation methods for first- and second-price auctions under independent (asymmetric) private values and partial observability. Given a finite set of observations, each comprising the identity of the …
We provide efficient estimation methods for first- and second-price auctions under independent (asymmetric) private values and partial observability. Given a finite set of observations, each comprising the identity of the winner and the price they paid in a sequence of identical auctions, we provide algorithms for non-parametrically estimating the bid distribution of each bidder, as well as their value distributions under equilibrium assumptions. We provide finite-sample estimation bounds which are uniform in that their error rates do not depend on the bid/value distributions being estimated. Our estimation guarantees advance a body of work in Econometrics wherein only identification results have been obtained, unless the setting is symmetric, parametric, or all bids are observable. Our guarantees also provide computationally and statistically effective alternatives to classical techniques from reliability theory. Finally, our results are immediately applicable to Dutch and English auctions.
In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on …
In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on input $x^{(i)}$. In contrast to mixture models, where we observe the output of a randomly selected model, here the observed model depends on the outputs themselves, and is determined by some known selection criterion. For example, we might observe the highest output, the smallest output, or the median output of the $k$ models. In known-index self-selection, the identity of the observed model output is observable; in unknown-index self-selection, it is not. Self-selection has a long history in Econometrics and applications in various theoretical and applied fields, including treatment effect estimation, imitation learning, learning from strategically reported data, and learning from markets at disequilibrium. In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear. In the known-index case, we require poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate quite general selection criteria. In the more challenging unknown-index case, even the identifiability of the linear models (from infinitely many samples) was not known. We show three results in this case for the commonly studied $\max$ self-selection criterion: (1) we show that the linear models are indeed identifiable, (2) for general $k$ we provide an algorithm with poly$(d) \exp(\text{poly}(k))$ sample and time complexity to estimate the regression parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and time complexity.
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. This result extends the work of Hand and Voroninski from efficient …
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. This result extends the work of Hand and Voroninski from efficient inversion to efficient posterior sampling. In practice, to allow for increased expressivity, we propose to do posterior sampling in the latent space of a pre-trained generative model. To achieve that, we train a score-based model in the latent space of a StyleGAN-2 and we use it to solve inverse problems. Our framework, Score-Guided Intermediate Layer Optimization (SGILO), extends prior work by replacing the sparsity regularization with a generative prior in the intermediate layer. Experimentally, we obtain significant improvements over the previous state-of-the-art, especially in the low measurement regime.
Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if …
Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of~\citet{tobin1958estimation,amemiya1973regression}. When the distribution of the error is normal with known variance, recent work of~\citet{daskalakis2019truncatedregression} provides computationally and statistically efficient estimators of the linear model, $w$. In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates.
We study the optimization landscape of the log-likelihood function and the convergence of the Expectation-Maximization (EM) algorithm in latent Gaussian tree models, i.e. tree-structured Gaussian graphical models whose leaf nodes …
We study the optimization landscape of the log-likelihood function and the convergence of the Expectation-Maximization (EM) algorithm in latent Gaussian tree models, i.e. tree-structured Gaussian graphical models whose leaf nodes are observable and non-leaf nodes are unobservable. We show that the unique non-trivial stationary point of the population log-likelihood is its global maximum, and establish that the expectation-maximization algorithm is guaranteed to converge to it in the single latent variable case. Our results for the landscape of the log-likelihood function in general latent tree models provide support for the extensive practical use of maximum likelihood based-methods in this setting. Our results for the EM algorithm extend an emerging line of work on obtaining global convergence guarantees for this celebrated algorithm. We show our results for the non-trivial stationary points of the log-likelihood by arguing that a certain system of polynomial equations obtained from the EM updates has a unique non-trivial solution. The global convergence of the EM algorithm follows by arguing that all trivial fixed points are higher-order saddle points.
We provide time- and sample-efficient algorithms for learning and testing latent-tree Ising models, i.e. Ising models that may only be observed at their leaf nodes. On the learning side, we …
We provide time- and sample-efficient algorithms for learning and testing latent-tree Ising models, i.e. Ising models that may only be observed at their leaf nodes. On the learning side, we obtain efficient algorithms for learning a tree-structured Ising model whose leaf node distribution is close in Total Variation Distance, improving on the results of prior work. On the testing side, we provide an efficient algorithm with fewer samples for testing whether two latent-tree Ising models have leaf-node distributions that are close or far in Total Variation distance. We obtain our algorithms by showing novel localization results for the total variation distance between the leaf-node distributions of tree-structured Ising models, in terms of their marginals on pairs of leaves.
Min-max optimization problems involving nonconvex-nonconcave objectives have found important applications in adversarial training and other multi-agent learning settings. Yet, no known gradient descent-based method is guaranteed to converge to (even …
Min-max optimization problems involving nonconvex-nonconcave objectives have found important applications in adversarial training and other multi-agent learning settings. Yet, no known gradient descent-based method is guaranteed to converge to (even local notions of) min-max equilibrium in the nonconvex-nonconcave setting. For all known methods, there exist relatively simple objectives for which they cycle or exhibit other undesirable behavior different from converging to a point, let alone to some game-theoretically meaningful one~\cite{flokas2019poincare,hsieh2021limits}. The only known convergence guarantees hold under the strong assumption that the initialization is very close to a local min-max equilibrium~\cite{wang2019solving}. Moreover, the afore-described challenges are not just theoretical curiosities. All known methods are unstable in practice, even in simple settings. We propose the first method that is guaranteed to converge to a local min-max equilibrium for smooth nonconvex-nonconcave objectives. Our method is second-order and provably escapes limit cycles as long as it is initialized at an easy-to-find initial point. Both the definition of our method and its convergence analysis are motivated by the topological nature of the problem. In particular, our method is not designed to decrease some potential function, such as the distance of its iterate from the set of local min-max equilibria or the projected gradient of the objective, but is designed to satisfy a topological property that guarantees the avoidance of cycles and implies its convergence.
We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games. In particular, when every player of …
We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${\rm poly}(\log T)$. Our bound improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard no-regret learners in games, the $O(T^{1/4})$ regret attainable by no-regret learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of two-player games (Chen & Pen, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $\tilde{O}\left(\frac 1T\right)$.
Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player …
Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of $\tilde{O}(T^{-1})$. This substantially improves over the prior best rate of convergence for correlated equilibria of $O(T^{-3/4})$ due to Chen and Peng (NeurIPS`20), and it is optimal -- within the no-regret framework -- up to polylogarithmic factors in $T$. To obtain these results, we develop new techniques for establishing higher-order smoothness for learning dynamics involving fixed point operations. Specifically, we establish that the no-internal-regret learning dynamics of Stoltz and Lugosi (Mach Learn`05) are equivalently simulated by no-external-regret dynamics on a combinatorial space. This allows us to trade the computation of the stationary distribution on a polynomial-sized Markov chain for a (much more well-behaved) linear transformation on an exponential-sized set, enabling us to leverage similar techniques as DGF to near-optimally bound the internal regret. Moreover, we establish an $O(\textrm{polylog}(T))$ no-swap-regret bound for the classic algorithm of Blum and Mansour (BM) (JMLR`07). We do so by introducing a technique based on the Cauchy Integral Formula that circumvents the more limited combinatorial arguments of DFG. In addition to shedding clarity on the near-optimal regret guarantees of BM, our arguments provide insights into the various ways in which the techniques by DFG can be extended and leveraged in the analysis of more involved learning algorithms.
We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games. In particular, when every player of …
We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${\rm poly}(\log T)$. Our bound improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard no-regret learners in games, the $O(T^{1/4})$ regret attainable by no-regret learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of two-player games (Chen & Pen, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $\tilde{O}\left(\frac 1T\right)$.
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are …
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a {\em single} sample. {We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.}
We show a statistical version of Taylor's theorem and apply this result to non-parametric density estimation from truncated samples, which is a classical challenge in Statistics [Woodroofe 1985, Stute 1993]. …
We show a statistical version of Taylor's theorem and apply this result to non-parametric density estimation from truncated samples, which is a classical challenge in Statistics [Woodroofe 1985, Stute 1993]. The single-dimensional version of our theorem has the following implication: For any distribution P on [0, 1] with a smooth log-density function, given samples from the conditional distribution of P on [a, a + \varepsilon] \subset [0, 1], we can efficiently identify an approximation to P over the whole interval [0, 1], with quality of approximation that improves with the smoothness of P.
To the best of knowledge, our result is the first in the area of non-parametric density estimation from truncated samples, which works under the hard truncation model, where the samples outside some survival set S are never observed, and applies to multiple dimensions. In contrast, previous works assume single dimensional data where each sample has a different survival set $S$ so that samples from the whole support will ultimately be collected.
Despite its important applications in Machine Learning, min-max optimization of objective functions that are nonconvex-nonconcave remains elusive. Not only are there no known first-order methods converging to even approximate local …
Despite its important applications in Machine Learning, min-max optimization of objective functions that are nonconvex-nonconcave remains elusive. Not only are there no known first-order methods converging to even approximate local min-max equilibria (a.k.a. approximate saddle points), but the computational complexity of identifying them is also poorly understood. In this paper, we provide a characterization of the computational complexity as well as of the limitations of first-order methods in this problem. Specifically, we show that in linearly constrained min-max optimization problems with nonconvex-nonconcave objectives an approximate local min-max equilibrium of large enough approximation is guaranteed to exist, but computing such a point is PPAD-complete. The same is true of computing an approximate fixed point of the (Projected) Gradient Descent/Ascent update dynamics, which is computationally equivalent to computing approximate local min-max equilibria. An important byproduct of our proof is to establish an unconditional hardness result in the Nemirovsky-Yudin 1983 oracle optimization model, where we are given oracle access to the values of some function f : P → [−1, 1] and its gradient ∇ f, where P ⊆ [0, 1]d is a known convex polytope. We show that any algorithm that uses such first-order oracle access to f and finds an ε-approximate local min-max equilibrium needs to make a number of oracle queries that is exponential in at least one of 1/ε, L, G, or d, where L and G are respectively the smoothness and Lipschitzness of f. This comes in sharp contrast to minimization problems, where finding approximate local minima in the same setting can be done with Projected Gradient Descent using O(L/ε) many queries. Our result is the first to show an exponential separation between these two fundamental optimization problems in the oracle model.
We show that n-variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance є from an optimal O(n lnn/є2) samples, where O(·) hides an absolute constant which, …
We show that n-variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance є from an optimal O(n lnn/є2) samples, where O(·) hides an absolute constant which, importantly, does not depend on the model being learned—neither its tree nor the magnitude of its edge strengths, on which we place no assumptions. Our guarantees hold, in fact, for the celebrated Chow-Liu algorithm [1968], using the plug-in estimator for estimating mutual information. While this (or any other) algorithm may fail to identify the structure of the underlying model correctly from a finite sample, we show that it will still learn a tree-structured model that is є-close to the true one in total variation distance, a guarantee called "proper learning."
There have been two main lines of work on estimating Ising models: (1) estimating them from multiple independent samples under minimal assumptions about the model's interaction matrix ; and (2) …
There have been two main lines of work on estimating Ising models: (1) estimating them from multiple independent samples under minimal assumptions about the model's interaction matrix ; and (2) estimating them from one sample in restrictive settings. We propose a unified framework that smoothly interpolates between these two settings, enabling significantly richer estimation guarantees from one, a few, or many samples.
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each …
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.
We show a statistical version of Taylor's theorem and apply this result to non-parametric density estimation from truncated samples, which is a classical challenge in Statistics \cite{woodroofe1985estimating, stute1993almost}. The single-dimensional …
We show a statistical version of Taylor's theorem and apply this result to non-parametric density estimation from truncated samples, which is a classical challenge in Statistics \cite{woodroofe1985estimating, stute1993almost}. The single-dimensional version of our theorem has the following implication: "For any distribution $P$ on $[0, 1]$ with a smooth log-density function, given samples from the conditional distribution of $P$ on $[a, a + \varepsilon] \subset [0, 1]$, we can efficiently identify an approximation to $P$ over the \emph{whole} interval $[0, 1]$, with quality of approximation that improves with the smoothness of $P$." To the best of knowledge, our result is the first in the area of non-parametric density estimation from truncated samples, which works under the hard truncation model, where the samples outside some survival set $S$ are never observed, and applies to multiple dimensions. In contrast, previous works assume single dimensional data where each sample has a different survival set $S$ so that samples from the whole support will ultimately be collected.
We study fast rates of convergence in the setting of nonparametric online regression, namely where regret is defined with respect to an arbitrary function class which has bounded complexity. Our …
We study fast rates of convergence in the setting of nonparametric online regression, namely where regret is defined with respect to an arbitrary function class which has bounded complexity. Our contributions are two-fold: - In the realizable setting of nonparametric online regression with the absolute loss, we propose a randomized proper learning algorithm which gets a near-optimal cumulative loss in terms of the sequential fat-shattering dimension of the hypothesis class. In the setting of online classification with a class of Littlestone dimension $d$, our bound reduces to $d \cdot {\rm poly} \log T$. This result answers a question as to whether proper learners could achieve near-optimal cumulative loss; previously, even for online classification, the best known cumulative loss was $\tilde O( \sqrt{dT})$. Further, for the real-valued (regression) setting, a cumulative loss bound with near-optimal scaling on sequential fat-shattering dimension was not even known for improper learners, prior to this work. - Using the above result, we exhibit an independent learning algorithm for general-sum binary games of Littlestone dimension $d$, for which each player achieves regret $\tilde O(d^{3/4} \cdot T^{1/4})$. This result generalizes analogous results of Syrgkanis et al. (2015) who showed that in finite games the optimal regret can be accelerated from $O(\sqrt{T})$ in the adversarial setting to $O(T^{1/4})$ in the game setting. To establish the above results, we introduce several new techniques, including: a hierarchical aggregation rule to achieve the optimal cumulative loss for real-valued classes, a multi-scale extension of the proper online realizable learner of Hanneke et al. (2021), an approach to show that the output of such nonparametric learning algorithms is stable, and a proof that the minimax theorem holds in all online learnable games.
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are …
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a {\em single} sample. {We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.}
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each …
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.
Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, …
Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, however, these mechanisms can be extremely sensitive to the Bayesian prior that they target, which becomes problematic when that prior is only approximately known. At the same time, even if access to the exact Bayesian prior is given, it is known that optimal or even approximately optimal multi-item mechanisms run into sample, computational, representation and communication intractability barriers. We consider a natural class of multi-item mechanism design problems with very large numbers of items, but where the bidders' value distributions can be well-approximated by a topic model akin to those used in recommendation systems with very large numbers of possible recommendations. We propose a mechanism design framework for this setting, building on a recent robustification framework by Brustle et al., which disentangles the statistical challenge of estimating a multi-dimensional prior from the task of designing a good mechanism for it, and robustifies the performance of the latter against the estimation error of the former. We provide an extension of this framework appropriate for our setting, which allows us to exploit the expressive power of topic models to reduce the effective dimensionality of the mechanism design problem and remove the dependence of its computational, communication and representation complexity on the number of items.
We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games. In particular, when every player of …
We show that Optimistic Hedge -- a common variant of multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log T)$ regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${\rm poly}(\log T)$. Our bound improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard no-regret learners in games, the $O(T^{1/4})$ regret attainable by no-regret learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of two-player games (Chen & Pen, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $\tilde{O}\left(\frac 1T\right)$.
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In …
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.
The use of min-max optimization in adversarial training of deep neural network classifiers and training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise …
The use of min-max optimization in adversarial training of deep neural network classifiers and training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications. Unfortunately, recent results have established that even approximate first-order stationary points of such objectives are intractable, even under smoothness conditions, motivating the study of min-max objectives with additional structure. We introduce a new class of structured nonconvex-nonconcave min-max optimization problems, proposing a generalization of the extragradient algorithm which provably converges to a stationary point. The algorithm applies not only to Euclidean spaces, but also to general $\ell_p$-normed finite-dimensional real vector spaces. We also discuss its stability under stochastic oracles and provide bounds on its sample complexity. Our iteration complexity and sample complexity bounds either match or improve the best known bounds for the same or less general nonconvex-nonconcave settings, such as those that satisfy variational coherence or in which a weak solution to the associated variational inequality problem is assumed to exist.
Algorithmic pricing is the computational problem that sellers (e.g.,in supermarkets) face when trying to set prices for their items to maximize their profit in the presence of a known demand. …
Algorithmic pricing is the computational problem that sellers (e.g.,in supermarkets) face when trying to set prices for their items to maximize their profit in the presence of a known demand. Guruswami etal. (SODA, 2005) proposed this problem and gave logarithmic approximations (in the number of consumers) for the unit-demand and single-parameter cases where there is a specific set of consumers and their valuations for bundles are known precisely. Subsequently several versions of the problem have been shown to have poly-logarithmic in approximability. This problem has direct ties to the important open question of better understanding the Bayesian optimal mechanism in multi-parameter agent settings; however, for this purpose approximation factors logarithmic in the number of agents are inadequate. It is therefore of vital interest to consider special cases where constant approximations are possible. We consider the unit-demand variant of this pricing problem. Here a consumer has a valuation for each different item and their value for aset of items is simply the maximum value they have for any item in the set. Instead of considering a set of consumers with precisely known preferences, like the prior algorithmic pricing literature, we assume that the preferences of the consumers are drawn from a distribution. This is the standard assumption in economics; furthermore, the setting of a specific set of customers with specific preferences, which is employed in all of the prior work in algorithmic pricing, is a special case of this general Bayesian pricing problem, where there is a discrete Bayesian distribution for preferences specified by picking one consumer uniformly from the given set of consumers. Notice that the distribution over the valuations for the individual items that this generates is obviously correlated. Our work complements these existing works by considering the case where the consumer's valuations for the different items are independent random variables. Our main result is a constant approximation algorithm for this problem that makes use of an interesting connection between this problem and the concept of virtual valuations from the single-parameter Bayesian optimal mechanism design literature.
For Bayesian combinatorial auctions, we present a general framework for approximately reducing the mechanism design problem for multiple buyers to single buyer sub-problems. Our framework can be applied to any …
For Bayesian combinatorial auctions, we present a general framework for approximately reducing the mechanism design problem for multiple buyers to single buyer sub-problems. Our framework can be applied to any setting which roughly satisfies the following assumptions: (i) buyers' types must be distributed independently (not necessarily identically), (ii) objective function must be linearly separable over the buyers, and (iii) except for the supply constraints, there should be no other inter-buyer constraints. Our framework is general in the sense that it makes no explicit assumption about buyers' valuations, type distributions, and single buyer constraints (e.g., budget, incentive compatibility, etc).
We present two generic multi buyer mechanisms which use single buyer mechanisms as black boxes; if an $\alpha$-approximate single buyer mechanism can be constructed for each buyer, and if no buyer requires more than $\frac{1}{k}$ of all units of each item, then our generic multi buyer mechanisms are $\gamma_k\alpha$-approximation of the optimal multi buyer mechanism, where $\gamma_k$ is a constant which is at least $1-\frac{1}{\sqrt{k+3}}$. Observe that $\gamma_k$ is at least 1/2 (for $k=1$) and approaches 1 as $k \to \infty$. As a byproduct of our construction, we present a generalization of prophet inequalities. Furthermore, as applications of our framework, we present multi buyer mechanisms with improved approximation factor for several settings from the literature.
In this paper, we present the first approximation algorithms for the problem of designing revenue optimal Bayesian incentive compatible auctions when there are multiple (heterogeneous) items and when bidders have …
In this paper, we present the first approximation algorithms for the problem of designing revenue optimal Bayesian incentive compatible auctions when there are multiple (heterogeneous) items and when bidders have arbitrary demand and budget constraints (and additive valuations). Our mechanisms are surprisingly simple: We show that a sequential all-pay mechanism is a 4 approximation to the revenue of the optimal ex-interim truthful mechanism with a discrete type space for each bidder, where her valuations for different items can be correlated. We also show that a sequential posted price mechanism is a O(1) approximation to the revenue of the optimal ex-post truthful mechanism when the type space of each bidder is a product distribution that satisfies the standard hazard rate condition. We further show a logarithmic approximation when the hazard rate condition is removed, and complete the picture by showing that achieving a sub-logarithmic approximation, even for regular distributions and one bidder, requires pricing bundles of items. Our results are based on formulating novel LP relaxations for these problems, and developing generic rounding schemes from first principles.
We give a simple, multiplicative-weight update algorithm for learning undirected graphical models or Markov random fields (MRFs). The approach is new, and for the well-studied case of Ising models or …
We give a simple, multiplicative-weight update algorithm for learning undirected graphical models or Markov random fields (MRFs). The approach is new, and for the well-studied case of Ising models or Boltzmann machines we obtain an algorithm that uses a nearly optimal number of samples and has running time Õ(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) (where n is the dimension), subsuming and improving on all prior work. Additionally, we give the first efficient algorithm for learning Ising models over non-binary alphabets. Our main application is an algorithm for learning the structure of t-wise MRFs with nearly-optimal sample complexity (up to polynomial losses in necessary terms that depend on the weights) and running time that is n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(t)</sup> . In addition, given n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(t)</sup> samples, we can also learn the parameters of the model and generate a hypothesis that is close in statistical distance to the true MRF. All prior work runs in time n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Ω(d)</sup> for graphs of bounded degree d and does not generate a hypothesis close in statistical distance even for t = 3. We observe that our runtime has the correct dependence on n and t assuming the hardness of learning sparse parities with noise. Our algorithm- the Sparsitron- is easy to implement (has only one parameter) and holds in the on-line setting. Its analysis applies a regret bound from Freund and Schapires classic Hedge algorithm. It also gives the first solution to the problem of learning sparse Generalized Linear Models (GLMs).
A major task of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. The evolutionary model is given by a Markov chain on a tree. Given samples from …
A major task of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. The evolutionary model is given by a Markov chain on a tree. Given samples from the leaves of the Markov chain, the goal is to reconstruct the leaf-labelled tree. It is well known that in order to reconstruct a tree on n leaves, sample sequences of length Ω(log n) are needed. It was conjectured by Steel that for the CFN/Ising evolutionary model, if the mutation probability on all edges of the tree is less than $${p^{\ast} = (\sqrt{2}-1)/2^{3/2}}$$ , then the tree can be recovered from sequences of length O(log n). The value p* is given by the transition point for the extremality of the free Gibbs measure for the Ising model on the binary tree. Steel's conjecture was proven by the second author in the special case where the tree is "balanced." The second author also proved that if all edges have mutation probability larger than p* then the length needed is n Ω(1). Here we show that Steel's conjecture holds true for general trees by giving a reconstruction algorithm that recovers the tree from O(log n)-length sequences when the mutation probabilities are discretized and less than p*. Our proof and results demonstrate that extremality of the free Gibbs measure on the infinite binary tree, which has been studied before in probability, statistical physics and computer science, determines how distinguishable are Gibbs measures on finite binary trees.
We study an abstract optimal auction problem for selecting a subset of self-interested agents to whom to provide a service. A feasibility constraint governs which subsets can be simultaneously served; …
We study an abstract optimal auction problem for selecting a subset of self-interested agents to whom to provide a service. A feasibility constraint governs which subsets can be simultaneously served; however, the mechanism may additionally choose to bundle unconstrained attributes such as payments or add-ons with the service. An agent's preference over service and attributes is given by her private type and may be multi-dimensional and non-linear. A single-agent problem is to optimizes a menu to offer an agent subject to constraints on the probabilities with which each of the agent's types is served. We give computationally tractable reductions from multi-agent auction problems to these single-agent problems. Our discussion focuses on maximizing revenue, but our results can be applied to other objectives (e.g., welfare).
The principal problem in algorithmic mechanism design is in merging the incentive constraints imposed by selfish behavior with the algorithmic constraints imposed by computational intractability. This field is motivated by …
The principal problem in algorithmic mechanism design is in merging the incentive constraints imposed by selfish behavior with the algorithmic constraints imposed by computational intractability. This field is motivated by the observation that the preeminent approach for designing incentive compatible mechanisms, namely that of Vickrey, Clarke, and Groves; and the central approach for circumventing computational obstacles, that of approximation algorithms, are fundamentally incompatible: natural applications of the VCG approach to an approximation algorithm fails to yield an incentive compatible mechanism. We consider relaxing the desideratum of (ex post) incentive compatibility (IC) to Bayesian incentive compatibility (BIC), where truthtelling is a Bayes-Nash equilibrium (the standard notion of incentive compatibility in economics). For welfare maximization in single-parameter agent settings, we give a general black-box reduction that turns any approximation algorithm into a Bayesian incentive compatible mechanism with essentially the same approximation factor.
The problem of graphical model selection is to estimate the graph structure of a Markov random field given samples from it. We analyze the information-theoretic limitations of the problem of …
The problem of graphical model selection is to estimate the graph structure of a Markov random field given samples from it. We analyze the information-theoretic limitations of the problem of graph selection for binary Markov random fields under high-dimensional scaling, in which the graph size and the number of edges k, and/or the maximal node degree d, are allowed to increase to infinity as a function of the sample size n. For pair-wise binary Markov random fields, we derive both necessary and sufficient conditions for correct graph selection over the class G <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p,k</sub> of graphs on vertices with at most k edges, and over the class G <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p,d</sub> of graphs on p vertices with maximum degree at most d. For the class G <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p,k</sub> , we establish the existence of constants c and c' such that if n <; ck log p, any method has error probability at least 1/2 uniformly over the family, and we demonstrate a graph decoder that succeeds with high probability uniformly over the family for sample sizes n >; c' k <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> log p. Similarly, for the class G <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p,d</sub> , we exhibit constants c and c' such that for n <; cd <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> log p, any method fails with probability at least 1/2, and we demonstrate a graph decoder that succeeds with high probability for n >; c' d <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> log p.
The Ising spin glass is a one-parameter exponential family model for binary data with quadratic sufficient statistic. In this paper, we show that given a single realization from this model, …
The Ising spin glass is a one-parameter exponential family model for binary data with quadratic sufficient statistic. In this paper, we show that given a single realization from this model, the maximum pseudolikelihood estimate (MPLE) of the natural parameter is $\sqrt{a_{N}}$-consistent at a point whenever the log-partition function has order $a_{N}$ in a neighborhood of that point. This gives consistency rates of the MPLE for ferromagnetic Ising models on general weighted graphs in all regimes, extending the results of Chatterjee (Ann. Statist. 35 (2007) 1931–1946) where only $\sqrt{N}$-consistency of the MPLE was shown. It is also shown that consistent testing, and hence estimation, is impossible in the high temperature phase in ferromagnetic Ising models on a converging sequence of simple graphs, which include the Curie–Weiss model. In this regime, the sufficient statistic is distributed as a weighted sum of independent $\chi^{2}_{1}$ random variables, and the asymptotic power of the most powerful test is determined. We also illustrate applications of our results on synthetic and real-world network data.
Journal Article Crime and Social Interactions Get access Edward L. Glaeser, Edward L. Glaeser Harvard University and National Bureau of Economic Research Search for other works by this author on: …
Journal Article Crime and Social Interactions Get access Edward L. Glaeser, Edward L. Glaeser Harvard University and National Bureau of Economic Research Search for other works by this author on: Oxford Academic Google Scholar Bruce Sacerdote, Bruce Sacerdote Harvard University Search for other works by this author on: Oxford Academic Google Scholar José A. Scheinkman José A. Scheinkman University of Chicago Search for other works by this author on: Oxford Academic Google Scholar The Quarterly Journal of Economics, Volume 111, Issue 2, May 1996, Pages 507–548, https://doi.org/10.2307/2946686 Published: 01 May 1996
We provide a reduction from revenue maximization to welfare maximization in multi-dimensional Bayesian auctions with arbitrary (possibly combinatorial) feasibility constraints and independent bidders with arbitrary (possibly combinatorial) demand constraints, appropriately …
We provide a reduction from revenue maximization to welfare maximization in multi-dimensional Bayesian auctions with arbitrary (possibly combinatorial) feasibility constraints and independent bidders with arbitrary (possibly combinatorial) demand constraints, appropriately extending Myerson's result to this setting. We also show that every feasible Bayesian auction can be implemented as a distribution over virtual VCG allocation rules. A virtual VCG allocation rule has the following simple form: Every bidder's type t_i is transformed into a virtual type f_i(t_i), via a bidder-specific function. Then, the allocation maximizing virtual welfare is chosen. Using this characterization, we show how to find and run the revenue-optimal auction given only black box access to an implementation of the VCG allocation rule. We generalize this result to arbitrarily correlated bidders, introducing the notion of a second-order VCG allocation rule. We obtain our reduction from revenue to welfare optimization via two algorithmic results on reduced forms in settings with arbitrary feasibility and demand constraints. First, we provide a separation oracle for determining feasibility of a reduced form. Second, we provide a geometric algorithm to decompose any feasible reduced form into a distribution over virtual VCG allocation rules. In addition, we show how to execute both algorithms given only black box access to an implementation of the VCG allocation rule. Our results are computationally efficient for all multi-dimensional settings where the bidders are additive. In this case, our mechanisms run in time polynomial in the total number of bidder types, but not type profiles. For generic correlated distributions, this is the natural description complexity of the problem. The runtime can be further improved to poly(#items, #bidders) in item-symmetric settings by making use of recent techniques.
Consider a gambler who observes a sequence of independent, non-negative random numbers and is allowed to stop the sequence at any time, claiming a reward equal to the most recent …
Consider a gambler who observes a sequence of independent, non-negative random numbers and is allowed to stop the sequence at any time, claiming a reward equal to the most recent observation. The famous prophet inequality of Krengel, Sucheston, and Garling asserts that a gambler who knows the distribution of each random variable can achieve at least half as much reward, in expectation, as a "prophet" who knows the sampled values of each random variable and can choose the largest one. We generalize this result to the setting in which the gambler and the prophet are allowed to make more than one selection, subject to a matroid constraint. We show that the gambler can still achieve at least half as much reward as the prophet; this result is the best possible, since it is known that the ratio cannot be improved even in the original prophet inequality, which corresponds to the special case of rank-one matroids. Generalizing the result still further, we show that under an intersection of $p$ matroid constraints, the prophet's reward exceeds the gambler's by a factor of at most $O(p)$, and this factor is also tight.
Regularized learning is a fundamental technique in online optimization, machine learning, and many other fields of computer science. A natural question that arises in this context is how regularized learning …
Regularized learning is a fundamental technique in online optimization, machine learning, and many other fields of computer science. A natural question that arises in this context is how regularized learning algorithms behave when faced against each other. We study a natural formulation of this problem by coupling regularized learning dynamics in zero-sum games. We show that the system's behavior is Poincare recurrent, implying that almost every trajectory revisits any (arbitrarily small) neighborhood of its starting point infinitely often. This cycling behavior is robust to the agents' choice of regularization mechanism (each agent could be using a different regularizer), to positive-affine transformations of the agents' utilities, and it also persists in the case of networked competition (zero-sum polymatrix games).
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1-regularized logistic regression, in which the neighborhood of …
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n=Ω(d3log p) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n=Ω(d2log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.
How many independent samples <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</i> do we need from a distribution <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</i> to decide that <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</i> is epsiv-distant from uniform in an L <sub …
How many independent samples <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</i> do we need from a distribution <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</i> to decide that <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</i> is epsiv-distant from uniform in an L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> sense, Sigma <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i=1</sub> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</sup> | <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</i> ( <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</i> ) - 1/ <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</i> | > epsiv? (Here <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</i> is the number of bins on which the distribution is supported, and is assumed known <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a</i> <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">priori</i> .) Somewhat surprisingly, we only need <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</i> epsiv <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> Gt <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</i> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2</sup> to make this decision reliably (this condition is both sufficient and necessary). The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform. Some connections to the classical birthday problem are noted.
This paper is devoted, in the main, to proving the asymptotic minimax character of the sample distribution function (d.f.) for estimating an unknown d.f. in $\mathscr{F}$ or $\mathscr{F}_c$ (defined in …
This paper is devoted, in the main, to proving the asymptotic minimax character of the sample distribution function (d.f.) for estimating an unknown d.f. in $\mathscr{F}$ or $\mathscr{F}_c$ (defined in Section 1) for a wide variety of weight functions. Section 1 contains definitions and a discussion of measurability considerations. Lemma 2 of Section 2 is an essential tool in our proofs and seems to be of interest per se; for example, it implies the convergence of the moment generating function of $G_n$ to that of $G$ (definitions in (2.1)). In Section 3 the asymptotic minimax character is proved for a fundamental class of weight functions which are functions of the maximum deviation between estimating and true d.f. In Section 4 a device (of more general applicability in decision theory) is employed which yields the asymptotic minimax result for a wide class of weight functions of this character as a consequence of the results of Section 3 for weight functions of the fundamental class. In Section 5 the asymptotic minimax character is proved for a class of integrated weight functions. A more general class of weight functions for which the asymptotic minimax character holds is discussed in Section 6. This includes weight functions for which the risk function of the sample d.f. is not a constant over $\mathscr{F}_c.$ Most weight functions of practical interest are included in the considerations of Sections 3 to 6. Section 6 also includes a discussion of multinomial estimation problems for which the asymptotic minimax character of the classical estimator is contained in our results. Finally, Section 7 includes a general discussion of minimization of symmetric convex or monotone functionals of symmetric random elements, with special consideration of the "tied-down" Wiener process, and with a heuristic proof of the results of Sections 3, 4, 5, and much of Section 6.
If a class of games is known to have a Nash equilibrium with probability values that are either zero or Ω(1) -- and thus with support of bounded size -- …
If a class of games is known to have a Nash equilibrium with probability values that are either zero or Ω(1) -- and thus with support of bounded size -- then obviously this equilibrium can be found exhaustively in polynomial time. Somewhat surprisingly, we show that there is a PTAS for the class of games whose equilibria are guaranteed to have small --- O(1/n) -- values, and therefore large -- Ω(n) -- supports. We also point out that there is a PTAS for games with sparse payoff matrices, a family for which the exact problem is known to be PPAD-complete [Chen, Deng, Teng 2006]. Both algorithms are of a special kind that we call oblivious: The algorithm just samples a fixed distribution on pairs of mixed strategies, and the game is only used to determine whether the sampled strategies comprise an ε-Nash equilibrium; the answer is "yes" with inverse polynomial probability (in the second case, the algorithm is actually deterministic). These results bring about the question: Is there an oblivious PTAS for finding a Nash equilibrium in general games? We answer this question in the negative; our lower bound comes close to the quasi-polynomial upper bound of [Lipton, Markakis, Mehta 2003]. Another recent PTAS for anonymous games [Daskalakis, Papadimitriou 2007 and 2008, Daskalakis 2008] is also oblivious in a weaker sense appropriate for this class of games (it samples from a fixed distribution on unordered collections of mixed strategies), but its running time is exponential in 1/ε. We prove that any oblivious PTAS for anonymous games with two strategies and three player types must have 1/εα in the exponent of the running time for some α ≥ 1/3, rendering the algorithm in [Daskalakis 2008] (which works with any bounded number of player types) essentially optimal within oblivious algorithms. In contrast, we devise a poly n • (1/ε)O(\log2(1/ε)) non-oblivious PTAS for anonymous games with two strategies and any bounded number of player types. The key idea of our algorithm is to search not over unordered sets of mixed strategies, but over a carefully crafted set of collections of the first O(log 1/ε) moments of the distribution of the number of players playing strategy 1 at equilibrium. The algorithm works because of a probabilistic result of more general interest that we prove: the total variation distance between two sums of independent indicator random variables decreases exponentially with the number of moments of the two sums that are equal, independent of the number of indicators.
We prove that Bimatrix, the problem of finding a Nash equilibrium in a two-player game, is complete for the complexity class PPAD (Polynomial Parity Argument, Directed version) introduced by Papadimitriou …
We prove that Bimatrix, the problem of finding a Nash equilibrium in a two-player game, is complete for the complexity class PPAD (Polynomial Parity Argument, Directed version) introduced by Papadimitriou in 1991. Our result, building upon the work of Daskalakis et al. [2006a] on the complexity of four-player Nash equilibria, settles a long standing open problem in algorithmic game theory. It also serves as a starting point for a series of results concerning the complexity of two-player Nash equilibria. In particular, we prove the following theorems: —Bimatrix does not have a fully polynomial-time approximation scheme unless every problem in PPAD is solvable in polynomial time. —The smoothed complexity of the classic Lemke-Howson algorithm and, in fact, of any algorithm for Bimatrix is not polynomial unless every problem in PPAD is solvable in randomized polynomial time. Our results also have a complexity implication in mathematical economics: —Arrow-Debreu market equilibria are PPAD -hard to compute.
We show that there is a polynomial-time approximation scheme for computing Nash equilibria in anonymous games with any fixed number of strategies (a very broad and important class of games), …
We show that there is a polynomial-time approximation scheme for computing Nash equilibria in anonymous games with any fixed number of strategies (a very broad and important class of games), extending the two-strategy result of Daskalakis and Papadimitriou 2007. The approximation guarantee follows from a probabilistic result of more general interest: The distribution of the sum of n independent unit vectors with values ranging over {e1, e2, ...,ek}, where ei is the unit vector along dimension i of the k-dimensional Euclidean space, can be approximated by the distribution of the sum of another set of independent unit vectors whose probabilities of obtaining each value are multiples of 1/z for some integer z, and so that the variational distance of the two distributions is at most eps, where eps is bounded by an inverse polynomial in z and a function of k, but with no dependence on n. Our probabilistic result specifies the construction of a surprisingly sparse eps-cover -- under the total variation distance -- of the set of distributions of sums of independent unit vectors, which is of interest on its own right.
We consider a monopolist seller with n heterogeneous items, facing a single buyer. The buyer hasa value for each item drawn independently according to(non-identical) distributions, and his value for a …
We consider a monopolist seller with n heterogeneous items, facing a single buyer. The buyer hasa value for each item drawn independently according to(non-identical) distributions, and his value for a set ofitems is additive. The seller aims to maximize his revenue.It is known that an optimal mechanism in this setting maybe quite complex, requiring randomization [19] and menusof infinite size [15]. Hart and Nisan [17] have initiated astudy of two very simple pricing schemes for this setting:item pricing, in which each item is priced at its monopolyreserve; and bundle pricing, in which the entire set ofitems is priced and sold as one bundle. Hart and Nisan [17]have shown that neither scheme can guarantee more thana vanishingly small fraction of the optimal revenue. Insharp contrast, we show that for any distributions, thebetter of item and bundle pricing is a constant-factorapproximation to the optimal revenue. We further discussextensions to multiple buyers and to valuations that arecorrelated across items.
We study the revenue maximization problem of a seller with n heterogeneous items for sale to a single buyer whose valuation function for sets of items is unknown and drawn …
We study the revenue maximization problem of a seller with n heterogeneous items for sale to a single buyer whose valuation function for sets of items is unknown and drawn from some distribution D. We show that if D is a distribution over subadditive valuations with independent items, then the better of pricing each item separately or pricing only the grand bundle achieves a constant-factor approximation to the revenue of the optimal mechanism. This includes buyers who are k-demand, additive up to a matroid constraint, or additive up to constraints of any downwards-closed set system (and whose values for the individual items are sampled independently), as well as buyers who are fractionally subadditive with item multipliers drawn independently. Our proof makes use of the core-tail decomposition framework developed in prior work showing similar results for the significantly simpler class of additive buyers [Li and Yao 2013; Babaioff et al.2014].
It is shown that the method of exchangeable pairs introduced by Stein [Approximate Computation of Expectations (1986) IMS, Hayward, CA] for normal approximation can effectively be used for translated Poisson …
It is shown that the method of exchangeable pairs introduced by Stein [Approximate Computation of Expectations (1986) IMS, Hayward, CA] for normal approximation can effectively be used for translated Poisson approximation. Introducing an additional smoothness condition, one can obtain approximation results in total variation and also in a local limit metric. The result is applied, in particular, to the anti-voter model on finite graphs as analyzed by Rinott and Rotar [Ann. Appl. Probab. 7 (1997) 1080--1105], obtaining the same rate of convergence, but now for a stronger metric.
We provide a reduction from revenue maximization to welfare maximization in multidimensional Bayesian auctions with arbitrary - possibly combinatorial - feasibility constraints and independent bidders with arbitrary - possibly combinatorial-demand …
We provide a reduction from revenue maximization to welfare maximization in multidimensional Bayesian auctions with arbitrary - possibly combinatorial - feasibility constraints and independent bidders with arbitrary - possibly combinatorial-demand constraints, appropriately extending Myerson's single-dimensional result [21] to this setting. We also show that every feasible Bayesian auction - including in particular the revenue-optimal one - can be implemented as a distribution over virtual VCG allocation rules. A virtual VCG allocation rule has the following simple form: Every bidder's type ti is transformed into a virtual type fi(ti), via a bidder-specific function. Then, the allocation maximizing virtual welfare is chosen. Using this characterization, we show how to find and run the revenue-optimal auction given only black-box access to an implementation of the VCG allocation rule. We generalize this result to arbitrarily correlated bidders, introducing the notion of a second-order VCG allocation rule. Our results are computationally efficient for all multidimensional settings where the bidders are additive, or can be efficiently mapped to be additive, albeit the feasibility and demand constraints may still remain arbitrary combinatorial. In this case, our mechanisms run in time polynomial in the number of items and the total number of bidder types, but not type profiles. This is polynomial in the number of items, the number of bidders, and the cardinality of the support of each bidder's value distribution. For generic correlated distributions, this is the natural description complexity of the problem. The runtime can be further improved to polynomial in only the number of items and the number of bidders in itemsymmetric settings by making use of techniques from [15].
graph underlying an Ising model from i.i.d. samples. Over the last fifteen years this problem has been of significant interest in the statistics, machine learning, and statistical physics communities, and …
graph underlying an Ising model from i.i.d. samples. Over the last fifteen years this problem has been of significant interest in the statistics, machine learning, and statistical physics communities, and much of the effort has been directed towards finding algorithms with low computational cost for various restricted classes of models. Nevertheless, for learning Ising models on general graphs with p nodes of degree at most d, it is not known whether or not it is possible to improve upon the pd computation needed to exhaustively search over all possible neighborhoods for each node.
We investigate the power of randomness in the context of a fundamental Bayesian optimal mechanism design problem - a single seller aims to maximize expected revenue by allocating multiple kinds …
We investigate the power of randomness in the context of a fundamental Bayesian optimal mechanism design problem - a single seller aims to maximize expected revenue by allocating multiple kinds of resources to "unit-demand" agents with preferences drawn from a known distribution. When the agents' preferences are single-dimensional Myerson's seminal work [14] shows that randomness offers no benefit - the optimal mechanism is always deterministic. In the multi-dimensional case, where each agent's preferences are given by different values for each of the available services, Briest et al.[6] recently showed that the gap between the expected revenue obtained by an optimal randomized mechanism and an optimal deterministic mechanism can be unbounded even when a single agent is offered only 4 services. However, this large gap is attained through unnatural instances where values of the agent for different services are correlated in a specific way. We show that when the agent's values involve no correlation or a specific kind of positive correlation, the benefit of randomness is only a small constant factor (4 and 8 respectively). Our model of positively correlated values (that we call the common base value model) is a natural model for unit-demand agents and items that are substitutes. Our results extend to multiple agent settings as well.
We provide a Polynomial Time Approximation Scheme for the multi-dimensional unit-demand pricing problem, when the buyer's values are independent (but not necessarily identically distributed.) For all ϵ >; 0, we …
We provide a Polynomial Time Approximation Scheme for the multi-dimensional unit-demand pricing problem, when the buyer's values are independent (but not necessarily identically distributed.) For all ϵ >; 0, we obtain a (1 + ϵ)-factor approximation to the optimal revenue in time polynomial, when the values are sampled from Monotone Hazard Rate (MHR) distributions, quasi-polynomial, when sampled from regular distributions, and polynomial in n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">poly(log r)</sup> when sampled from general distributions supported on a set [u <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</sub> ,ru <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</sub> ]. We also provide an additive PTAS for all bounded distributions. Our algorithms are based on novel extreme value theorems for MHR and regular distributions, and apply probabilistic techniques to understand the statistical properties of revenue distributions, as well as to reduce the size of the search space of the algorithm. As a byproduct of our techniques, we establish structural properties of optimal solutions. We show that, for all ϵ >; 0, g(1/ϵ) distinct prices suffice to obtain a (1 + ϵ)-factor approximation to the optimal revenue for MHR distributions, where g(1/ϵ) is a quasi-linear function of 1/ϵ that does not depend on the number of items. Similarly, for all ϵ >; 0 and n >; 0, g(1/ϵ · log n) distinct prices suffice for regular distributions, where n is the number of items and g(·) is a polynomial function. Finally, in the i.i.d. MHR case, we show that, as long as the number of items is a sufficiently large function of 1/ϵ, a single price suffices to achieve a (1 + ϵ)-factor approximation. Our results represent significant progress to the single-bidder case of the multidimensional optimal mechanism design problem, following Myerson's celebrated work on optimal mechanism design [Myerson 1981].
The standard linear and logistic regression models assume that the response variables are independent, but share the same linear relationship to their corresponding vectors of covariates. The assumption that the …
The standard linear and logistic regression models assume that the response variables are independent, but share the same linear relationship to their corresponding vectors of covariates. The assumption that the response variables are independent is, however, too strong. In many applications, these responses are collected on nodes of a network, or some spatial or temporal domain, and are dependent. Examples abound in financial and meteorological applications, and dependencies naturally arise in social networks through peer effects. Regression with dependent responses has thus received a lot of attention in the Statistics and Economics literature, but there are no strong consistency results unless multiple independent samples of the vectors of dependent responses can be collected from these models. We present computationally and statistically efficient methods for linear and logistic regression models when the response variables are dependent on a network. Given one sample from a networked linear or logistic regression model and under mild assumptions, we prove strong consistency results for recovering the vector of coefficients and the strength of the dependencies, recovering the rates of standard regression under independent observations. We use projected gradient descent on the negative log-likelihood, or negative log-pseudolikelihood, and establish their strong convexity and consistency using concentration of measure for dependent random variables.
While classic work in convex-concave min-max optimization relies on average-iterate convergence results, the emergence of nonconvex applications such as training Generative Adversarial Networks has led to renewed interest in last-iterate …
While classic work in convex-concave min-max optimization relies on average-iterate convergence results, the emergence of nonconvex applications such as training Generative Adversarial Networks has led to renewed interest in last-iterate convergence guarantees. Proving last-iterate convergence is challenging because many natural algorithms, such as Simultaneous Gradient Descent/Ascent, provably diverge or cycle even in simple convex-concave min-max settings, and previous work on global last-iterate convergence rates has been limited to the bilinear and convex-strongly concave settings. In this work, we show that the Hamiltonian Gradient Descent (HGD) algorithm achieves linear convergence in a variety of more general settings, including convex-concave problems that satisfy a "sufficiently bilinear" condition. We also prove similar convergence rates for the Consensus Optimization (CO) algorithm of [MNG17] for some parameter settings of CO.
The Sherrington–Kirkpatrick model of spin glasses, the Hopfield model of neural networks and the Ising spin glass are all models of binary data belonging to the one-parameter exponential family with …
The Sherrington–Kirkpatrick model of spin glasses, the Hopfield model of neural networks and the Ising spin glass are all models of binary data belonging to the one-parameter exponential family with quadratic sufficient statistic. Under bare minimal conditions, we establish the $\sqrt{N}$-consistency of the maximum pseudolikelihood estimate of the natural parameter in this family, even at critical temperatures. Since very little is known about the low and critical temperature regimes of these extremely difficult models, the proof requires several new ideas. The author's version of Stein's method is a particularly useful tool. We aim to introduce these techniques into the realm of mathematical statistics through an example and present some open questions.
Let $\hat{F}_n$ denote the empirical distribution function for a sample of $n$ i.i.d. random variables with distribution function $F$. In 1956 Dvoretzky, Kiefer and Wolfowitz proved that $P\big(\sqrt{n} \sup_x(\hat{F}_n(x) - …
Let $\hat{F}_n$ denote the empirical distribution function for a sample of $n$ i.i.d. random variables with distribution function $F$. In 1956 Dvoretzky, Kiefer and Wolfowitz proved that $P\big(\sqrt{n} \sup_x(\hat{F}_n(x) - F(x)) > \lambda\big) \leq C \exp(-2\lambda^2),$ where $C$ is some unspecified constant. We show that $C$ can be taken as 1 (as conjectured by Birnbaum and McCarty in 1958), provided that $\exp(-2\lambda^2) \leq \frac{1}{2}$. In particular, the two-sided inequality $P\big(\sqrt{n} \sup_x|\hat{F}_n(x) - F(x)| > \lambda\big) \leq 2 \exp(-2\lambda^2)$ holds without any restriction on $\lambda$. In the one-sided as well as in the two-sided case, the constants cannot be further improved.
We consider the problem of designing a revenue-maximizing auction for a single item, when the values of the bidders are drawn from a correlated distribution. We observe that there exists …
We consider the problem of designing a revenue-maximizing auction for a single item, when the values of the bidders are drawn from a correlated distribution. We observe that there exists an algorithm that finds the optimal randomized mechanism that runs in time polynomial in the size of the support. We leverage this result to show that in the oracle model introduced by Ronen and Saberi [FOCS'02], there exists a polynomial time truthful in expectation mechanism that provides a (1.5+ε)-approximation to the revenue achievable by an optimal truthful-in-expectation mechanism, and a polynomial time deterministic truthful mechanism that guarantees 5/3 approximation to the revenue achievable by an optimal deterministic truthful mechanism.
We provide a computationally efficient black-box reduction from mechanism design to algorithm design in very general settings. Specifically, we give an approximation-preserving reduction from truthfully maximizing any objective under arbitrary …
We provide a computationally efficient black-box reduction from mechanism design to algorithm design in very general settings. Specifically, we give an approximation-preserving reduction from truthfully maximizing any objective under arbitrary feasibility constraints with arbitrary bidder types to (not necessarily truthfully) maximizing the same objective plus virtual welfare (under the same feasibility constraints). Our reduction is based on a fundamentally new approach: we describe a mechanism's behavior indirectly only in terms of the expected value it awards bidders for certain behavior, and never directly access the allocation rule at all. Applying our new approach to revenue, we exhibit settings where our reduction holds both ways. That is, we also provide an approximation-sensitive reduction from (non-truthfully) maximizing virtual welfare to (truthfully) maximizing revenue, and therefore the two problems are computationally equivalent. With this equivalence in hand, we show that both problems are NP-hard to approximate within any polynomial factor, even for a single monotone sub modular bidder. We further demonstrate the applicability of our reduction by providing a truthful mechanism maximizing fractional max-min fairness.
Markov random fields are a popular model for high-dimensional probability distributions. Over the years, many mathematical, statistical and algorithmic problems on them have been studied. Until recently, the only known …
Markov random fields are a popular model for high-dimensional probability distributions. Over the years, many mathematical, statistical and algorithmic problems on them have been studied. Until recently, the only known algorithms for provably learning them relied on exhaustive search, correlation decay or various incoherence assumptions. Bresler gave an algorithm for learning general Ising models on bounded degree graphs. His approach was based on a structural result about mutual information in Ising models. Here we take a more conceptual approach to proving lower bounds on the mutual information. Our proof generalizes well beyond Ising models, to arbitrary Markov random fields with higher order interactions. As an application, we obtain algorithms for learning Markov random fields on bounded degree graphs on $n$ nodes with $r$-order interactions in $n^r$ time and $\log n$ sample complexity. Our algorithms also extend to various partial observation models.
We study distorted metrics on binary trees in the context of phylogenetic reconstruction. Given a binary tree T on n leaves with a path metric d, consider the pairwise distances …
We study distorted metrics on binary trees in the context of phylogenetic reconstruction. Given a binary tree T on n leaves with a path metric d, consider the pairwise distances {d(u,v)} between leaves. It is well known that these determine the tree and the d length of all edges. Here, we consider distortions [symbol: see text] of d such that, for all leaves u and v, it holds that absolute value(d(u,v) - [symbol: see text](u,v)) < f/2 if either d(u,v) < M + f/2 or [symbol: see text](u,v) < M + f/2, where d satisfies f < or = d(e) < or = g for all edges e. Given such distortions, we show how to reconstruct in polynomial time a forest T1, ..., Talpha such that the true tree T may be obtained from that forest by adding alpha - 1 edges and alpha - 1 < or = 2(-omega(M/g)) n. Our distorted metric result implies a reconstruction algorithm of phylogenetic forests with a small number of trees from sequences of length logarithmic in the number of species. The reconstruction algorithm is applicable for the general Markov model. Both the distorted metric result and its applications to phylogeny are almost tight.
Let us consider the class of all unimodal densities defined on some interval of length $L$ and bounded by $H$; we shall study the minimax risk over this class, when …
Let us consider the class of all unimodal densities defined on some interval of length $L$ and bounded by $H$; we shall study the minimax risk over this class, when we estimate using $n$ i.i.d. observations, the loss being measured by the $\mathbb{L}^1$ distance between the estimator and the true density. We shall prove that if $S = \operatorname{Log}(HL + 1)$, upper and lower bounds for the risk are of the form $C(S/n)^{1/3}$ and the ratio between those bounds is smaller than 44 when $S/n$ is smaller than 220$^{-1}$.
Given samples from an unknown multivariate distribution p, is it possible to distinguish whether p is the product of its marginals versus p being far from every product distribution? Similarly, …
Given samples from an unknown multivariate distribution p, is it possible to distinguish whether p is the product of its marginals versus p being far from every product distribution? Similarly, is it possible to distinguish whether p equals a given distribution q versus p and q being far from each other? These problems of testing independence and goodness-of-fit have received enormous attention in statistics, information theory, and theoretical computer science, with sample-optimal algorithms known in several interesting regimes of parameters [BFF+01, Pan08, VV17, ADK15, DK16]. Unfortunately, it has also been understood that these problems become intractable in large dimensions, necessitating exponential sample complexity. Motivated by the exponential lower bounds for general distributions as well as the ubiquity of Markov Random Fields (MRFs) in the modeling of high-dimensional distributions, we initiate the study of distribution testing on structured multivariate distributions, and in particular the prototypical example of MRFs: the Ising Model. We demonstrate that, in this structured setting, we can avoid the curse of dimensionality, obtaining sample and time efficient testers for independence and goodness-of-fit. One of the key technical challenges we face along the way is bounding the variance of functions of the Ising model.
For Bayesian combinatorial auctions, we present a general framework for approximately reducing the mechanism design problem for multiple buyers to the mechanism design problem for each individual buyer. Our frame- …
For Bayesian combinatorial auctions, we present a general framework for approximately reducing the mechanism design problem for multiple buyers to the mechanism design problem for each individual buyer. Our frame- work can be applied to any setting which roughly satisfies the following assumptions: (i) the buyer's types must be distributed independently (not necessarily identically), (ii) the objective function must be linearly separable over the set of buyers, and (iii) the supply constraints must be the only constraints involving more than one buyer. Our framework is general in the sense that it makes no explicit assumption about any of the following: (i) the buyer's valuations (e.g., submodular, additive, etc), (ii) The distribution of types for each buyer, and (iii) the other constraints involving individual buyers (e.g., budget constraints, etc). We present two generic ra-buyer mechanisms that use 1- buyer mechanisms as black boxes. Assuming that we have an α-approximate 1-buyer mechanism for each buyer and assuming that no buyer ever needs more than 1/k of all copies of each item for some integer k ≥ 1, then our generic n- buyer mechanisms are γ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> · α-approximation of the optimal n-buyer mechanism, in which γ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> is a constant which is at least 1 - 1/√(k+3). Observe that γ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> is at least1/2 (for k = 1) and approaches 1 as k increases. As a byproduct of our construction, we improve a generalization of prophet inequalities. Furthermore, as applications of our main theorem, we improve several results from the literature.
We address the issue of limit cycling behavior in training Generative Adversarial Networks and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs. Recent theoretical results have …
We address the issue of limit cycling behavior in training Generative Adversarial Networks and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs. Recent theoretical results have shown that optimistic mirror decent (OMD) can enjoy faster regret rates in the context of zero-sum games. WGANs is exactly a context of solving a zero-sum game with simultaneous no-regret dynamics. Moreover, we show that optimistic mirror decent addresses the limit cycling problem in training WGANs. We formally show that in the case of bi-linear zero-sum games the last iterate of OMD dynamics converges to an equilibrium, in contrast to GD dynamics which are bound to cycle. We also portray the huge qualitative difference between GD and OMD dynamics with toy examples, even when GD is modified with many adaptations proposed in the recent literature, such as gradient penalty or momentum. We apply OMD WGAN training to a bioinformatics problem of generating DNA sequences. We observe that models trained with OMD achieve consistently smaller KL divergence with respect to the true underlying distribution, than models trained with GD variants. Finally, we introduce a new algorithm, Optimistic Adam, which is an optimistic variant of Adam. We apply it to WGAN training on CIFAR10 and observe improved performance in terms of inception score as compared to Adam.
Abstract The effect of stratification and clustering on the asymptotic distributions of standard Pearson chi-squared test statistics for goodness of fit (simple hypothesis) and independence in a two-way contingency table, …
Abstract The effect of stratification and clustering on the asymptotic distributions of standard Pearson chi-squared test statistics for goodness of fit (simple hypothesis) and independence in a two-way contingency table, denoted as X 2 and XI 2, respectively, is investigated. It is shown that both X 2 and XI 2 are asymptotically distributed as weighted sums of independent χ1 2 random variables. The weights are then related to the familiar design effects (deffs) used by survey samplers. A simple correction to X 2, which requires only the knowledge of variance estimates (or deffs) for individual cells in the goodness-of-fit problem, is proposed and empirical results on the performance of corrected X 2 provided. Empirical work on XI 2 indicated that the distortion of nominal significance level is substantially smaller with XI 2 than with X 2. Some results under simple models for clustering are also given.
In rather formal terms, the situation with which this paper is concerned may be described as follows. We are given a fixed system of n sites, labelled by the first …
In rather formal terms, the situation with which this paper is concerned may be described as follows. We are given a fixed system of n sites, labelled by the first n positive integers, and an associated vector x of observations, Xi, . . ., Xn, which, in turn, is presumed to be a realization of a vector X of (dependent) random variables, Xi, . . ., X.. In practice, the sites may represent points or regions in space and the random variables may be either continuous or discrete. The main statistical objectives are the following: firstly, to provide a means of using the available concomitant information, particularly the configuration of the sites, to attach a plausible probability distribution to the random vector X; secondly, to estimate any unknown parameters in the distribution from the realization x; thirdly, where possible, to quantify the extent of disagreement between hypothesis and observation.
We provide a computationally efficient black-box reduction from mechanism design to algorithm design in very general settings. Specifically, we give an approximation-preserving reduction from truthfully maximizing \emph{any} objective under \emph{arbitrary} …
We provide a computationally efficient black-box reduction from mechanism design to algorithm design in very general settings. Specifically, we give an approximation-preserving reduction from truthfully maximizing \emph{any} objective under \emph{arbitrary} feasibility constraints with \emph{arbitrary} bidder types to (not necessarily truthfully) maximizing the same objective plus virtual welfare (under the same feasibility constraints). Our reduction is based on a fundamentally new approach: we describe a mechanism's behavior indirectly only in terms of the expected value it awards bidders for certain behavior, and never directly access the allocation rule at all. Applying our new approach to revenue, we exhibit settings where our reduction holds \emph{both ways}. That is, we also provide an approximation-sensitive reduction from (non-truthfully) maximizing virtual welfare to (truthfully) maximizing revenue, and therefore the two problems are computationally equivalent. With this equivalence in hand, we show that both problems are NP-hard to approximate within any polynomial factor, even for a single monotone submodular bidder. We further demonstrate the applicability of our reduction by providing a truthful mechanism maximizing fractional max-min fairness. This is the first instance of a truthful mechanism that optimizes a non-linear objective.
Let $(\mathscr{X, A})$ be a space with a $\sigma$-field, $M = \{P_s; s \in \Theta\}$ be a family of probability measures on $\mathscr{A}$ with $\Theta$ arbitrary, $X_1, \cdots, X_n$ i.i.d. …
Let $(\mathscr{X, A})$ be a space with a $\sigma$-field, $M = \{P_s; s \in \Theta\}$ be a family of probability measures on $\mathscr{A}$ with $\Theta$ arbitrary, $X_1, \cdots, X_n$ i.i.d. observations on $P_\theta.$ Define $\mu_n(A) = (1/n) \sum^n_{i = 1} I_A(X_i),$ the empirical measure indexed by $A \in \mathscr{A}.$ Assume $\Theta$ is totally bounded when metrized by the $L_1$ distance between measures. Robust minimum distance estimators $\hat{\theta}_n$ are constructed for $\theta$ and the resulting rate of convergence is shown naturally to depend on an entropy function for $\Theta$.
We introduce a method to select a smoothing factor for kernel density estimation such that, for all densities in all dimensions, the $L_1$ error of the corresponding kernel estimate is …
We introduce a method to select a smoothing factor for kernel density estimation such that, for all densities in all dimensions, the $L_1$ error of the corresponding kernel estimate is not larger than three times the error of the estimate with the optimal smoothing factor plus a constant times $\sqrt{\log n/n}$, where n is the sample size, and the constant depends only on the complexity of the kernel used in the estimate. The result is nonasymptotic, that is, the bound is valid for each n. The estimate uses ideas from the minimum distance estimation work of Yatracos. As the inequality is uniform with respect to all densities, the estimate is asymptotically minimax optimal (modulo a constant) over many function classes.
Myerson's classic result provides a full description of how a seller can maximize revenue when selling a single item. We address the question of revenue maximization in the simplest possible …
Myerson's classic result provides a full description of how a seller can maximize revenue when selling a single item. We address the question of revenue maximization in the simplest possible multi-item setting: two items and a single buyer who has independently distributed values for the items, and an additive valuation. In general, the revenue achievable from selling two independent items may be strictly higher than the sum of the revenues obtainable by selling each of them separately. In fact, the structure of optimal (i.e., revenue-maximizing) mechanisms for two items even in this simple setting is not understood.
We consider the problem of maximizing revenue for a monopolist offering multiple items to multiple heterogeneous buyers. We develop a simple mechanism that obtains a constant factor approximation under the …
We consider the problem of maximizing revenue for a monopolist offering multiple items to multiple heterogeneous buyers. We develop a simple mechanism that obtains a constant factor approximation under the assumption that the buyers' values are additive subject to a matroid feasibility constraint and independent across items. Importantly, different buyers in our setting can have different constraints on the sets of items they desire. Our mechanism is a sequential variant of two-part tariffs. Prior to our work, simple approximation mechanisms for such multi-buyer problems were known only for the special cases of all unit-demand or all additive value buyers.
We show that the multiplicative weight update method provides a simple recipe for designing and analyzing optimal Bayesian Incentive Compatible (BIC) auctions, and reduces the time complexity of the problem …
We show that the multiplicative weight update method provides a simple recipe for designing and analyzing optimal Bayesian Incentive Compatible (BIC) auctions, and reduces the time complexity of the problem to pseudo-polynomial in parameters that depend on single agent instead of depending on the size of the joint type space. We use this framework to design computationally efficient optimal auctions that satisfy ex-post Individual Rationality in the presence of constraints such as (hard, private) budgets and envy-freeness. We also design optimal auctions when buyers and a seller's utility functions are non-linear. Scenarios with such functions include (a) auctions with "quitting rights", (b) cost to borrow money beyond budget, (c) a seller's and buyers' risk aversion. Finally, we show how our framework also yields optimal auctions for variety of auction settings considered in Alaei et al and Cai et al, albeit with pseudo-polynomial running times.