Author Description

Login to generate an author description

Ask a Question About This Mathematician

All published works (63)

Problem definition: Motivated by carbon emissions trading schemes (ETSs), Treasury auctions, procurement auctions, and wholesale electricity markets, which all involve the auctioning of homogeneous multiple units, we consider the problem … Problem definition: Motivated by carbon emissions trading schemes (ETSs), Treasury auctions, procurement auctions, and wholesale electricity markets, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multiunit pay-as-bid (PAB) auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. In this work, we study the problem of optimizing bidding strategies from the perspective of a single bidder. Methodology/results: Effective bidding in PAB auctions is complex due to the combinatorial nature of the action space. We show that a utility decoupling trick enables a polynomial time algorithm to solve the offline problem where competing bids are known in advance. Leveraging this structure, we design efficient algorithms for the online problem under both full information and bandit feedback settings that achieve an upper bound on regret of [Formula: see text] and [Formula: see text], respectively, where M is the number of units demanded by the bidder, and T is the total number of auctions. We accompany these results with a regret lower bound of [Formula: see text] for the full information setting and [Formula: see text] for the bandit setting. We also present additional findings on the characterization of PAB equilibria. Managerial implications: Although the Nash equilibria of PAB auctions possess nice properties such as winning bid uniformity and high welfare and revenue, they are not guaranteed under no-regret learning dynamics. Nevertheless, our simulations suggest that these properties hold anyways, regardless of Nash equilibrium existence. Compared with its uniform price counterpart, the PAB dynamics converge faster and achieve higher revenue, making PAB appealing whenever revenue holds significant social value—for example, ETSs and Treasury auctions. Funding: R. Galgana and N. Golrezaei were supported in part by the Young Investigator Program Award from the Office of Naval Research [Grant N00014-21-1-2776] and the Massachusetts Institute of Technology Research Support Award. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.0403 .
We study the problem of bidding in uniform price auctions widely used in practice. Although these auctions are non-truthful for bidders with quasilinear utility functions, several empirical findings suggest that … We study the problem of bidding in uniform price auctions widely used in practice. Although these auctions are non-truthful for bidders with quasilinear utility functions, several empirical findings suggest that the auction format induces truthful bidding from the bidders. We attribute this difference in theory and practice to the assumption of the behavioral model of the bidders. In this pursuit, we study uniform price auctions in a repeated setting from the perspective of a value-maximizing buyer who aims to maximize their acquired cumulative value across $T$ rounds, subject to per-round return-on-investment (RoI) constraints. For a RoI-constrained, value-maximizing buyer, we study a generalized version of the uniform bidding format, commonly used in practice, which we term as $m$-uniform bidding. To characterize the optimal $m$-uniform bid, we introduce and study the notion of universally feasible (UF) bidding policies, which are robust, meaning that RoI feasibility is obtained regardless of the competitors' bids. We show that the optimal class of UF bidding policies is essentially a generalization of truthful bidding policies, which depends only on the valuation curve of the bidder and target RoI. To measure the performance of UF bidding policies against the optimal bidding policy that is not necessarily UF, we introduce a metric called the Price of Universal Feasibility (PoUF) and establish that PoUF is at most 2, irrespective of $m$ and the upper bound is tight. We further compare the generalized $m$-uniform bidding interface against the classical uniform bidding format under which $m=1$, showing the total value under $m$-uniform bidding increases at most by a factor of $m$. Numerical simulations on semi-synthetic data demonstrate that UF bidding policies perform significantly better than the derived theoretical bounds.
Online advertising channels commonly focus on maximizing total advertiser welfare to enhance channel health, and previous literature has studied augmenting ad auctions with machine learning predictions on advertiser values (also … Online advertising channels commonly focus on maximizing total advertiser welfare to enhance channel health, and previous literature has studied augmenting ad auctions with machine learning predictions on advertiser values (also known asmachine-learned advice ) to improve total welfare. Yet, such improvements could come at the cost of individual bidders' welfare and do not shed light on how particular advertiser bidding strategies impact welfare. Motivated by this, we present an analysis on an individual bidder's welfare loss in the autobidding world for auctions with and without machine-learned advice, and also uncover how advertiser strategies relate to such losses. In particular, we demonstrate how ad platforms can utilize ML advice to improve welfare guarantee on the aggregate and individual bidder level by setting ML advice as personalized reserve prices when the platform consists ofautobidders who maximize value while respecting a return on ad spend (ROAS) constraint. Under parallel VCG auctions with such ML advice-based reserves, we present a worst-case welfare lower-bound guarantee for an individual autobidder, and show that the lower-bound guarantee is positively correlated with ML advice quality as well as the scale of bids induced by the autobidder's bidding strategies. Further, we show that no truthful, and possibly randomized mechanism with anonymous allocations can achieve universally better individual welfare guarantees than VCG, in the presence of personalized reserves based on ML-advice of equal quality. Moreover, we extend our individual welfare guarantee results to generalized first price (GFP) and generalized second price (GSP) auctions. Finally, we present numerical studies using semi-synthetic data derived from ad auction logs of a search ad platform to showcase improvements in individual welfare when setting personalized reserve prices with ML-advice.
In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad … In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser's global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. Finally, we argue that all our results hold for both single-item and multi-item auctions from which channels procure impressions on advertisers' behalf.
We consider repeated multi-unit auctions with uniform pricing, which are widely used in practice for allocating goods such as carbon licenses. In each round, $K$ identical units of a good … We consider repeated multi-unit auctions with uniform pricing, which are widely used in practice for allocating goods such as carbon licenses. In each round, $K$ identical units of a good are sold to a group of buyers that have valuations with diminishing marginal returns. The buyers submit bids for the units, and then a price $p$ is set per unit so that all the units are sold. We consider two variants of the auction, where the price is set to the $K$-th highest bid and $(K+1)$-st highest bid, respectively. We analyze the properties of this auction in both the offline and online settings. In the offline setting, we consider the problem that one player $i$ is facing: given access to a data set that contains the bids submitted by competitors in past auctions, find a bid vector that maximizes player $i$'s cumulative utility on the data set. We design a polynomial time algorithm for this problem, by showing it is equivalent to finding a maximum-weight path on a carefully constructed directed acyclic graph. In the online setting, the players run learning algorithms to update their bids as they participate in the auction over time. Based on our offline algorithm, we design efficient online learning algorithms for bidding. The algorithms have sublinear regret, under both full information and bandit feedback structures. We complement our online learning algorithms with regret lower bounds. Finally, we analyze the quality of the equilibria in the worst case through the lens of the core solution concept in the game among the bidders. We show that the $(K+1)$-st price format is susceptible to collusion among the bidders; meanwhile, the $K$-th price format does not have this issue.
In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that … In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that may or may not be incentive-compatible under the presence of budget constraints. Our strategy maximizes the expected total utility across auctions while satisfying the advertiser's budget constraints in expectation. Additionally, we investigate the online setting where the advertiser must submit bids across platforms while learning about other bidders' bids over time. Our algorithm has $O(T^{3/4})$ regret under the full-information setting. Finally, we demonstrate that our algorithms have superior cumulative regret on both synthetic and real-world datasets of ad placement auctions, compared to existing adaptive pacing algorithms.
In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that … In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that may or may not be incentive-compatible under the presence of budget constraints. Our strategy maximizes the expected total utility across auctions while satisfying the advertiser's budget constraints in expectation. Additionally, we investigate the online setting where the advertiser must submit bids across platforms while learning about other bidders' bids over time. Our algorithm has $O(T^{3/4})$ regret under the full-information setting. Finally, we demonstrate that our algorithms have superior cumulative regret on both synthetic and real-world datasets of ad placement auctions, compared to existing adaptive pacing algorithms.
Today's online platforms rely heavily on algorithmic recommendations to bolster user engagement and drive revenue. However, such algorithmic recommendations can impact diverse stakeholders involved, namely the platform, items (seller), and … Today's online platforms rely heavily on algorithmic recommendations to bolster user engagement and drive revenue. However, such algorithmic recommendations can impact diverse stakeholders involved, namely the platform, items (seller), and users (customers), each with their unique objectives. In such multi-sided platforms, finding an appropriate middle ground becomes a complex operational challenge. Motivated by this, we formulate a novel fair recommendation framework, called Problem (FAIR), that not only maximizes the platform's revenue, but also accommodates varying fairness considerations from the perspectives of items and users. Our framework's distinguishing trait lies in its flexibility -- it allows the platform to specify any definitions of item/user fairness that are deemed appropriate, as well as decide the "price of fairness" it is willing to pay to ensure fairness for other stakeholders. We further examine Problem (FAIR) in a dynamic online setting, where the platform needs to learn user data and generate fair recommendations simultaneously in real time, which are two tasks that are often at odds. In face of this additional challenge, we devise a low-regret online recommendation algorithm, called FORM, that effectively balances the act of learning and performing fair recommendation. Our theoretical analysis confirms that FORM proficiently maintains the platform's revenue, while ensuring desired levels of fairness for both items and users. Finally, we demonstrate the efficacy of our framework and method via several case studies on real-world data.
Decision-makers often have access to a machine-learned prediction about demand, referred to as advice, which can potentially be utilized in online decision-making processes for resource allocation. However, exploiting such advice … Decision-makers often have access to a machine-learned prediction about demand, referred to as advice, which can potentially be utilized in online decision-making processes for resource allocation. However, exploiting such advice poses challenges due to its potential inaccuracy. To address this issue, we propose a framework that enhances online resource allocation decisions with potentially unreliable machine-learned (ML) advice. We assume here that this advice is represented by a general convex uncertainty set for the demand vector. We introduce a parameterized class of Pareto optimal online resource allocation algorithms that strike a balance between consistent and robust ratios. The consistent ratio measures the algorithm's performance (compared to the optimal hindsight solution) when the ML advice is accurate, while the robust ratio captures performance under an adversarial demand process when the advice is inaccurate. Specifically, in a C-Pareto optimal setting, we maximize the robust ratio while ensuring that the consistent ratio is at least C. Our proposed C-Pareto optimal algorithm is an adaptive protection level algorithm, which extends the classical fixed protection level algorithm introduced in Littlewood (2005) and Ball and Queyranne (2009). Solving a complex non-convex continuous optimization problem characterizes the adaptive protection level algorithm. To complement our algorithms, we present a simple method for computing the maximum achievable consistent ratio, which serves as an estimate for the maximum value of the ML advice. Additionally, we present numerical studies to evaluate the performance of our algorithm in comparison to benchmark algorithms. The results demonstrate that by adjusting the parameter C, our algorithms effectively strike a balance between worst-case and average performance, outperforming the benchmark algorithms.
Decision-makers often have access to a machine-learned prediction about demand, referred to as advice, which can potentially be utilized in online decision-making processes for resource allocation. However, exploiting such advice … Decision-makers often have access to a machine-learned prediction about demand, referred to as advice, which can potentially be utilized in online decision-making processes for resource allocation. However, exploiting such advice poses challenges due to its potential inaccuracy. To address this issue, we propose a framework that enhances online resource allocation decisions with potentially unreliable machine-learned (ML) advice. We assume here that this advice is represented by a general convex uncertainty set for the demand vector.We introduce a parameterized class of Pareto optimal online resource allocation algorithms that strike a balance between consistent and robust ratios. The consistent ratio measures the algorithm’s performance (compared to the optimal hindsight solution) when the ML advice is accurate, while the robust ratio captures performance under an adversarial demand process when the advice is inaccurate. Specifically, in a C-Pareto optimal setting, we maximize the robust ratio while ensuring that the consistent ratio is at least C. Our proposed C-Pareto optimal algorithm is an adaptive protection level algorithm, which extends the classical fixed protection level algorithm introduced in Littlewood (2005) and Ball and Queyranne (2009). Solving a complex non-convex continuous optimization problem characterizes the adaptive protection level algorithm. To complement our algorithms, we present a simple method for computing the maximum achievable consistent ratio, which serves as an estimate for the maximum value of the ML advice. Additionally, we present numerical studies to evaluate the performance of our algorithm in comparison to benchmark algorithms. The results demonstrate that by adjusting the parameter C, our algorithms effectively strike a balance between worst-case and average performance, outperforming the benchmark algorithms.
Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and Procurement Auctions, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in … Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and Procurement Auctions, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. The problem of learning how to bid in pay-as-bid auctions is challenging due to the combinatorial nature of the action space. We overcome this challenge by focusing on the offline setting, where the bidder optimizes their vector of bids while only having access to the past submitted bids by other bidders. We show that the optimal solution to the offline problem can be obtained using a polynomial time dynamic programming (DP) scheme. We leverage the structure of the DP scheme to design online learning algorithms with polynomial time and space complexity under full information and bandit feedback settings. We achieve an upper bound on regret of $O(M\sqrt{T\log |\mathcal{B}|})$ and $O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|})$ respectively, where $M$ is the number of units demanded by the bidder, $T$ is the total number of auctions, and $|\mathcal{B}|$ is the size of the discretized bid space. We accompany these results with a regret lower bound, which match the linear dependency in $M$. Our numerical results suggest that when all agents behave according to our proposed no regret learning algorithms, the resulting market dynamics mainly converge to a welfare maximizing equilibrium where bidders submit uniform bids. Lastly, our experiments demonstrate that the pay-as-bid auction consistently generates significantly higher revenue compared to its popular alternative, the uniform price auction.
Motivated by online decision making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable … Motivated by online decision making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have [Formula: see text] (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a [Formula: see text] (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We also extend our reduction to greedy-like first-order methods used in continuous optimization, such as those used for maximizing continuous strong DR monotone submodular functions subject to convex constraints. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds. We complement our theoretical studies by conducting numerical simulations for two of our applications, in both of which we observe that the numerical performance of our transformations outperforms the theoretical guarantees in practical instances. This paper was accepted by George Shanthikumar, data science. Funding: R. Niazadeh was supported by research funding from University of Chicago Booth School of Business. N. Golrezaei was supported in part by the Young Investigator Program Award from the Office of Naval Research [N00014-21-1-2776] and the MIT Research Support Award. Supplemental Material: The Online Appendix is available at https://doi.org/10.1287/mnsc.2022.4558
Analytics in the Face of Fraudulent Data This article presents a novel online learning algorithm for identifying optimal product rankings in the presence of fake users and corrupted data. In … Analytics in the Face of Fraudulent Data This article presents a novel online learning algorithm for identifying optimal product rankings in the presence of fake users and corrupted data. In recent years, e-commerce platforms, such as Amazon, have witnessed a growing number of fake users and click farms. These fraudulent actors seek to boost the position of certain products in the display ordering (i.e., product ranking). Further, platforms’ reliance on data analytics exacerbates the effect of these fake users as machine learning algorithms leverage user feedback to determine product rankings. In the face of these challenges, the present article departs from the status quo that is based on detecting fake users and instead proposes a robust learning methodology. More specifically, the article presents a robust online learning algorithm that converges to the optimal product ranking even when it is impossible to distinguish between real and fake users in the data.
We study an online resource allocation problem under uncertainty about demand and about the reward of each type of demand (agents) for the resource. Even though dealing with demand uncertainty … We study an online resource allocation problem under uncertainty about demand and about the reward of each type of demand (agents) for the resource. Even though dealing with demand uncertainty in resource allocation problems has been the topic of many papers in the literature, the challenge of not knowing rewards has been barely explored. The lack of knowledge about agents' rewards is inspired by the problem of allocating units of a new resource (e.g., newly developed vaccines or drugs) with unknown effectiveness/value. For such settings, we assume that we can \emph{test} the market before the allocation period starts. During the test period, we sample each agent in the market with probability $p$. We study how to optimally exploit the \emph{sample information} in our online resource allocation problem under adversarial arrival processes. We present an asymptotically optimal algorithm that achieves $1-\Theta(1/(p\sqrt{m}))$ competitive ratio, where $m$ is the number of available units of the resource. By characterizing an upper bound on the competitive ratio of any randomized and deterministic algorithm, we show that our competitive ratio of $1-\Theta(1/(p\sqrt{m}))$ is tight for any $p =\omega(1/\sqrt{m})$. That asymptotic optimality is possible with sample information highlights the significant advantage of running a test period for new resources. We demonstrate the efficacy of our proposed algorithm using a dataset that contains the number of COVID-19 related hospitalized patients across different age groups.
Assortment planning decisions of online marketplaces can have a significant impact on customers' choices. This incentivizes marketplaces to employ algorithms that prioritize marketplaces' short-term goals by featuring products with the … Assortment planning decisions of online marketplaces can have a significant impact on customers' choices. This incentivizes marketplaces to employ algorithms that prioritize marketplaces' short-term goals by featuring products with the highest popularity. However, these algorithms can lead to too little visibility (customer exposure) for the rest of the products, making them leave the marketplace, and in turn hurting the marketplace's long-term goals. Motivated by that, we introduce and study a fair assortment planning problem, which requires any two products with similar merits (popularities) to be offered similar visibility. We show that the problem can be formulated as a linear program (LP), called (FAIR), that optimizes over the distribution of all feasible assortments. To find a near-optimal solution to (FAIR), we propose a framework based on the Ellipsoid method, which requires a polynomial-time separation oracle to the dual of the LP. We show that finding an optimal separation oracle to the dual problem is an NP-complete problem, and hence we propose two approximate separation oracles: a 1/2-approx. algorithm and an FPTAS. The approximate separation oracles result in a polynomial-time 1/2-approx. algorithm and an FPTAS for the original problem (FAIR) using the Ellipsoid method. Further, they are designed by (i) showing the separation oracle to the dual of the LP is equivalent to solving an infinite series of parameterized knapsack problems, and (ii) taking advantage of the structure of the parameterized knapsack problems. Finally, we conduct a case study using the MovieLens dataset, demonstrating the efficacy of our 1/2-approx. and FPTAS fair Ellipsoid-based assortment planning algorithms.
We study the problem of actively learning a non-parametric choice model based on consumers' decisions. We present a negative result showing that such choice models may not be identifiable. To … We study the problem of actively learning a non-parametric choice model based on consumers' decisions. We present a negative result showing that such choice models may not be identifiable. To overcome the identifiability problem, we introduce a directed acyclic graph (DAG) representation of the choice model, which in a sense captures as much information about the choice model as could information-theoretically be identified. We then consider the problem of learning an approximation to this DAG representation in an active-learning setting. We design an efficient active-learning algorithm to estimate the DAG representation of the non-parametric choice model, which runs in polynomial time when the set of frequent rankings is drawn uniformly at random. Our algorithm learns the distribution over the most popular items of frequent preferences by actively and repeatedly offering assortments of items and observing the item chosen. We show that our algorithm can better recover a set of frequent preferences on both a synthetic and publicly available dataset on consumers' preferences, compared to the corresponding non-active learning estimation algorithms. This demonstrates the value of our algorithm and active-learning approaches more generally.
Many online platforms, ranging from online retail stores to social media platforms, employ algorithms to optimize their offered assortment of items (e.g., products and contents). These algorithms often focus exclusively … Many online platforms, ranging from online retail stores to social media platforms, employ algorithms to optimize their offered assortment of items (e.g., products and contents). These algorithms often focus exclusively on achieving the platforms' objectives, highlighting items with the highest popularity or revenue. This approach, however, can compromise the equality of opportunities for the rest of the items, in turn leading to less content diversity and increased regulatory scrutiny for the platform. Motivated by that, we introduce and study a fair assortment planning problem, which requires any two items with similar quality/merits to be offered similar outcomes. We show that the problem can be formulated as a linear program (LP), called (FAIR), that optimizes over the distribution of all feasible assortments. To find a near-optimal solution to (FAIR), we propose a framework based on the Ellipsoid method, which requires a polynomial-time separation oracle to the dual of the LP. We show that finding an optimal separation oracle to the dual problem is an NP-complete problem, and hence we propose a series of approximate separation oracles, which then result in a 1/2-approx. algorithm and an FPTAS for the original Problem (FAIR). The approximate separation oracles are designed by (i) showing the separation oracle to the dual of the LP is equivalent to solving an infinite series of parameterized knapsack problems, and (ii) leveraging the structure of the parameterized knapsack problems. Finally, we conduct a case study using the MovieLens dataset, which demonstrates the efficacy of our algorithms and further sheds light on the price of fairness.
Online advertising channels have commonly focused on maximizing total advertiser value (or welfare) to enhance long-run retention and channel healthiness. Previous literature has studied auction design by incorporating machine learning … Online advertising channels have commonly focused on maximizing total advertiser value (or welfare) to enhance long-run retention and channel healthiness. Previous literature has studied auction design by incorporating machine learning predictions on advertiser values (also known as machine-learned advice) through various forms to improve total welfare. Yet, such improvements could come at the cost of individual bidders' welfare and do not shed light on how particular advertiser bidding strategies impact welfare. Motivated by this, we present an analysis on an individual bidder's welfare loss in the autobidding world for auctions with and without machine-learned advice, and also uncover how advertiser strategies relate to such losses. In particular, we demonstrate how ad platforms can utilize ML advice to improve welfare guarantee on the aggregate and individual bidder level by setting ML advice as personalized reserve prices when the platform consists of autobidders who maximize value while respecting a return-on-ad spent (ROAS) constraint. Under parallel VCG auctions with such ML advice-based reserves, we present a worst-case welfare lower-bound guarantee for an individual autobidder, and show that the lower-bound guarantee is positively correlated with ML advice quality as well the scale of bids induced by the autobidder's bidding strategies. Further, we prove an impossibility result showing that no truthful, and possibly randomized mechanism with anonymous allocations can achieve universally better individual welfare guarantees than VCG, in presence of personalized reserves based on ML-advice of equal quality. Moreover, we extend our individual welfare guarantee results to generalized first price (GFP) and generalized second price (GSP) auctions.
We study the problem of actively learning a non-parametric choice model based on consumers' decisions. We present a negative result showing that such choice models may not be identifiable. To … We study the problem of actively learning a non-parametric choice model based on consumers' decisions. We present a negative result showing that such choice models may not be identifiable. To overcome the identifiability problem, we introduce a directed acyclic graph (DAG) representation of the choice model, which in a sense captures as much information about the choice model as could information-theoretically be identified. We then consider the problem of learning an approximation to this DAG representation in an active-learning setting. We design an efficient active-learning algorithm to estimate the DAG representation of the non-parametric choice model, which runs in polynomial time when the set of frequent rankings is drawn uniformly at random. Our algorithm learns the distribution over the most popular items of frequent preferences by actively and repeatedly offering assortments of items and observing the item chosen. We show that our algorithm can better recover a set of frequent preferences on both a synthetic and publicly available dataset on consumers' preferences, compared to the corresponding non-active learning estimation algorithms. This demonstrates the value of our algorithm and active-learning approaches more generally.
How to optimize posted price mechanisms? The sequential posted-price (SPP) mechanism is one of the widely used selling mechanisms in practice. In this mechanism, the seller presents each buyer with … How to optimize posted price mechanisms? The sequential posted-price (SPP) mechanism is one of the widely used selling mechanisms in practice. In this mechanism, the seller presents each buyer with a price sequentially and the buyer can either accept or reject the mechanism's offer. Despite the widespread use of the SPP mechanism, the problem of optimizing prices in this mechanism has not been fully addressed. In a paper entitled, “Improved Revenue Bounds for Posted-Price and Second-Price Mechanisms,” H. Beyhaghi, N. Golrezaei, R. Paes Leme, M. Pal, and B. Sivan construct SPP mechanisms by considering the best of two simple pricing rules: one that imitates the optimal mechanism and the other that posts a uniform price (same price for every buyer). Their simple pricing rules can be easily generalized to the setting with multiple units and yield the first improvement over long-established approximation factors.
Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to … Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability.
In many online platforms, customers' decisions are substantially influenced by product rankings as most customers only examine a few top-ranked products. Concurrently, such platforms also use the same data corresponding … In many online platforms, customers' decisions are substantially influenced by product rankings as most customers only examine a few top-ranked products. Concurrently, such platforms also use the same data corresponding to customers' actions to learn how these products must be ranked or ordered. These interactions in the underlying learning process, however, may incentivize sellers to artificially inflate their position by employing fake users, as exemplified by the emergence of click farms. Motivated by such fraudulent behavior, we study the ranking problem of a platform that faces a mixture of real and fake users who are indistinguishable from one another. We first show that existing learning algorithms---that are optimal in the absence of fake users---may converge to highly sub-optimal rankings under manipulation by fake users. To overcome this deficiency, we develop efficient learning algorithms under two informational environments: in the first setting, the platform is aware of the number of fake users, and in the second setting, it is agnostic to the number of fake users. For both these environments, we prove that our algorithms converge to the optimal ranking, while being robust to the aforementioned fraudulent behavior; we also present worst-case performance guarantees for our methods, and show that they significantly outperform existing algorithms. At a high level, our work employs several novel approaches to guarantee robustness such as: (i) constructing product-ordering graphs that encode the pairwise relationships between products inferred from the customers' actions; and (ii) implementing multiple levels of learning with a judicious amount of bi-directional cross-learning between levels. Overall, our results indicate that online platforms can effectively combat fraudulent users without incurring large costs by designing new learning algorithms that guarantee efficient convergence even when the platform is completely oblivious to the number and identity of the fake users.
In online advertising markets, setting budget and return on investment (ROI) constraints are two prevalent ways to help advertisers (i.e. buyers) utilize limited monetary resources efficiently. In this work, we … In online advertising markets, setting budget and return on investment (ROI) constraints are two prevalent ways to help advertisers (i.e. buyers) utilize limited monetary resources efficiently. In this work, we provide a holistic view of ROI and budget constrained markets. We first tackle the buyer's bidding problem subject to both budget and ROI constraints in repeated second-price auctions. We show that the optimal buyer hindsight policy admits a structure that suggests the buyer win all auctions during which her valuation-to-expenditure ratio is greater than some threshold. We further propose a threshold-based bidding framework that aims to mimic the hindsight bidding policy by learning its threshold. We show that when facing stochastic competition, our algorithm guarantees the satisfaction of both budget and ROI constraints and achieves sublinear regret compared to the optimal hindsight policy. Next, we study the seller's pricing problem against an ROI and budget constrained buyer. We establish that the seller's revenue function admits a bell-shaped structure, and then further propose a pricing algorithm that utilizes an episodic binary-search procedure to identify a revenue-optimal selling price. During each binary search episode, our pricing algorithm explores a particular price, allowing the buyer's learning algorithm to adapt and stabilize quickly. This, in turn, allows our seller algorithm to achieve sublinear regret against adaptive buyer algorithms that quickly react to price changes.
In many on-demand online platforms such as ride-sharing, grocery delivery, or shipping, some arriving agents are patient and willing to wait a short amount of time for the resource or … In many on-demand online platforms such as ride-sharing, grocery delivery, or shipping, some arriving agents are patient and willing to wait a short amount of time for the resource or service as long as there is an upfront guarantee that service will be ultimately provided within a certain delay. Motivated by this, we present a setting with patient and impatient agents who seek a resource or service that replenishes periodically. Impatient agents demand the resource immediately upon arrival while patient agents are willing to wait a short period conditioned on an upfront commitment to receive the resource. We study this setting under adversarial arrival models using a relaxed notion of competitive ratio. We present a class of POLYtope-based Resource Allocation (POLYRA) algorithms that achieve optimal or near-optimal competitive ratios. Such POLYRA algorithms work by consulting a particular polytope and only making decisions that guarantee the algorithm's state remains feasible in this polytope. When the number of agent types is either two or three, POLYRA algorithms can obtain the optimal competitive ratio. To design these polytopes, we construct an upper bound on the competitive ratio of any algorithm, which is characterized via a linear program (LP) that considers a collection of overlapping worst-case input sequences. Our designed POLYRA algorithms then mimic the optimal solution of this upper bound LP via its polytope's definition, obtaining the optimal competitive ratio. When there are more than three types, our overlapping worst-case input sequences do not necessarily result in an attainable competitive ratio, and so we present a class of simple and interpretable POLYRA algorithm which achieves at least 80% of the optimal competitive ratio. We complement our theoretical studies with numerical analysis which shows the efficiency of our algorithms beyond adversarial arrivals
Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to … Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(\sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We also extend our reduction to greedy-like first order methods used in continuous optimization, such as those used for maximizing continuous strong DR monotone submodular functions subject to convex constraints. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds. We complement our theoretical studies by conducting numerical simulations for two of our applications, in both of which we observe that the numerical performance of our transformations outperforms the theoretical guarantees in practical instances.
Internet advertisers (buyers) repeatedly procure ad impressions from ad platforms (sellers) with the aim to maximize total conversion (i.e. ad value) while respecting both budget and return-on-investment (ROI) constraints for … Internet advertisers (buyers) repeatedly procure ad impressions from ad platforms (sellers) with the aim to maximize total conversion (i.e. ad value) while respecting both budget and return-on-investment (ROI) constraints for efficient utilization of limited monetary resources. Facing such a constrained buyer who aims to learn her optimal strategy to acquire impressions, we study from a seller's perspective how to learn and price ad impressions through repeated posted price mechanisms to maximize revenue. For this two-sided learning setup, we propose a learning algorithm for the seller that utilizes an episodic binary-search procedure to identify a revenue-optimal selling price. We show that such a simple learning algorithm enjoys low seller regret when within each episode, the budget and ROI constrained buyer approximately best responds to the posted price. We present simple yet natural buyer's bidding algorithms under which the buyer approximately best responds while satisfying budget and ROI constraints, leading to a low regret for our proposed seller pricing algorithm. The design of our seller algorithm is motivated by the fact that the seller's revenue function admits a bell-shaped structure when the buyer best responds to prices under budget and ROI constraints, enabling our seller algorithm to identify revenue-optimal selling prices efficiently.
We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices … We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices vary over time according to a memory-based dynamic, which is a weighted average of all historical prices. We focus on the setting where firms are not aware of demand functions and how reference prices are formed but have access to an oracle that provides a measure of consumers' responsiveness to the current posted prices. We show that if the firms run no-regret algorithms, in particular, online mirror descent(OMD), with decreasing step sizes, the market stabilizes in the sense that firms' prices and reference prices converge to a stable Nash Equilibrium (SNE). Interestingly, we also show that there exist constant step sizesunder which the market stabilizes. We further characterize the rate of convergence to the SNE for both decreasing and constant OMD step sizes.
In many practical settings, the decision makers have to learn their best actions by experimenting with possible options and collecting feedback (data) over time. It is often assumed that the … In many practical settings, the decision makers have to learn their best actions by experimenting with possible options and collecting feedback (data) over time. It is often assumed that the collected data can be trusted as they reflect the ground truth. But this assumption is violated when the data are generated by strategic players. Consider online advertising market in which the ad exchange (decision maker) aims at learning the best reserve prices in the repeated auctions. In this setting, the data are advertisers’ submitted bids. Such data can be strategically corrupted by advertisers to trick the learning algorithm of the ad exchange to offer them lower reserve prices in the future auctions. In “Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions,” N. Golrezaei, A. Javanmard, and V. Mirrokni design effective learning algorithms with sublinear regret in such environments that are robust to the strategic behavior of the players.
Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item … Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers' valuations, i.e., buyers' preferences. The seller's goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is to minimize her regret for revenue, where the regret is computed against a clairvoyant policy that knows buyers' heterogeneous preferences. Given the seller's goal, utility-maximizing buyers have the incentive to bid untruthfully in order to manipulate the seller's learning policy. We propose learning policies that are robust to such strategic behavior. These policies use the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices. When the market noise distribution is known to the seller, we propose a policy called Contextual Robust Pricing (CORP) that achieves a T-period regret of $O(d\log(Td) \log (T))$, where $d$ is the dimension of {the} contextual information. When the market noise distribution is unknown to the seller, we propose two policies whose regrets are sublinear in $T$.
We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices … We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices vary over time according to a memory-based dynamic, which is a weighted average of all historical prices. We focus on the setting where firms are not aware of demand functions and how reference prices are formed but have access to an oracle that provides a measure of consumers' responsiveness to the current posted prices. We show that if the firms run no-regret algorithms, in particular, online mirror descent(OMD), with decreasing step sizes, the market stabilizes in the sense that firms' prices and reference prices converge to a stable Nash Equilibrium (SNE). Interestingly, we also show that there exist constant step sizesunder which the market stabilizes. We further characterize the rate of convergence to the SNE for both decreasing and constant OMD step sizes.
In many online platforms, customers' decisions are substantially influenced by product rankings as most customers only examine a few top-ranked products. Concurrently, such platforms also use the same data corresponding … In many online platforms, customers' decisions are substantially influenced by product rankings as most customers only examine a few top-ranked products. Concurrently, such platforms also use the same data corresponding to customers' actions to learn how these products must be ranked or ordered. These interactions in the underlying learning process, however, may incentivize sellers to artificially inflate their position by employing fake users, as exemplified by the emergence of click farms. Motivated by such fraudulent behavior, we study the ranking problem of a platform that faces a mixture of real and fake users who are indistinguishable from one another. We first show that existing learning algorithms---that are optimal in the absence of fake users---may converge to highly sub-optimal rankings under manipulation by fake users. To overcome this deficiency, we develop efficient learning algorithms under two informational environments: in the first setting, the platform is aware of the number of fake users, and in the second setting, it is agnostic to the number of fake users. For both these environments, we prove that our algorithms converge to the optimal ranking, while being robust to the aforementioned fraudulent behavior; we also present worst-case performance guarantees for our methods, and show that they significantly outperform existing algorithms. At a high level, our work employs several novel approaches to guarantee robustness such as: (i) constructing product-ordering graphs that encode the pairwise relationships between products inferred from the customers' actions; and (ii) implementing multiple levels of learning with a judicious amount of bi-directional cross-learning between levels. Overall, our results indicate that online platforms can effectively combat fraudulent users without incurring large costs by designing new learning algorithms that guarantee efficient convergence even when the platform is completely oblivious to the number and identity of the fake users.
We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with n buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering … We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with n buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering the best of two simple pricing rules: one that imitates the optimal/Myersonian mechanism via the taxation principle and the other that posts a uniform price. Our pricing rules are rather generalizable and yield the first improvement over long-established approximation factors in several settings. We design factor-revealing mathematical programs that crisply capture the approximation factor of our SPP mechanism. In the single-unit setting, our SPP mechanism yields a better approximation factor than the state of the art prior to our work (Azar et al. 2018). In the multi-unit setting, our SPP mechanism yields the first improved approximation factor over the state of the art after over nine years (Yan (2011) and Chakraborty et al. (2010)). Our results on SPP mechanisms immediately imply improved performance guarantees for the equivalent free-order prophet inequality problem. In the position auction setting, our SPP mechanism yields the first higher-than 1 − 1/e approximation factor. In eager second-price (ESP) auctions, our two simple pricing rules lead to the first improved approximation factor that is strictly greater than what is obtained by the SPP mechanism in the single-unit setting
We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices … We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices vary over time according to a memory-based dynamic, which is a weighted average of all historical prices. We focus on the setting where firms are not aware of demand functions and how reference prices are formed but have access to an oracle that provides a measure of consumers' responsiveness to the current posted prices. We show that if the firms run no-regret algorithms, in particular, online mirror descent(OMD), with decreasing step sizes, the market stabilizes in the sense that firms' prices and reference prices converge to a stable Nash Equilibrium (SNE). Interestingly, we also show that there exist constant step sizesunder which the market stabilizes. We further characterize the rate of convergence to the SNE for both decreasing and constant OMD step sizes.
In many online platforms, customers' decisions are substantially influenced by product rankings as most customers only examine a few top-ranked products. Concurrently, such platforms also use the same data corresponding … In many online platforms, customers' decisions are substantially influenced by product rankings as most customers only examine a few top-ranked products. Concurrently, such platforms also use the same data corresponding to customers' actions to learn how these products must be ranked or ordered. These interactions in the underlying learning process, however, may incentivize sellers to artificially inflate their position by employing fake users, as exemplified by the emergence of click farms. Motivated by such fraudulent behavior, we study the ranking problem of a platform that faces a mixture of real and fake users who are indistinguishable from one another. We first show that existing learning algorithms---that are optimal in the absence of fake users---may converge to highly sub-optimal rankings under manipulation by fake users. To overcome this deficiency, we develop efficient learning algorithms under two informational environments: in the first setting, the platform is aware of the number of fake users, and in the second setting, it is agnostic to the number of fake users. For both these environments, we prove that our algorithms converge to the optimal ranking, while being robust to the aforementioned fraudulent behavior; we also present worst-case performance guarantees for our methods, and show that they significantly outperform existing algorithms. At a high level, our work employs several novel approaches to guarantee robustness such as: (i) constructing product-ordering graphs that encode the pairwise relationships between products inferred from the customers' actions; and (ii) implementing multiple levels of learning with a judicious amount of bi-directional cross-learning between levels.
Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item … Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers' valuations, i.e., buyers' preferences. The seller's goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is to minimize her regret for revenue, where the regret is computed against a clairvoyant policy that knows buyers' heterogeneous preferences. Given the seller's goal, utility-maximizing buyers have the incentive to bid untruthfully in order to manipulate the seller's learning policy. We propose learning policies that are robust to such strategic behavior. These policies use the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices. When the market noise distribution is known to the seller, we propose a policy called Contextual Robust Pricing (CORP) that achieves a T-period regret of $O(d\log(Td) \log (T))$, where $d$ is the dimension of {the} contextual information. When the market noise distribution is unknown to the seller, we propose two policies whose regrets are sublinear in $T$.
We consider a dynamic pricing problem for repeated contextual second-price auctions with strategic buyers whose goals are to maximize their long-term time discounted utility. The seller has very limited information … We consider a dynamic pricing problem for repeated contextual second-price auctions with strategic buyers whose goals are to maximize their long-term time discounted utility. The seller has very limited information about buyers' overall demand curves, which depends on $d$-dimensional context vectors characterizing auctioned items, and a non-parametric market noise distribution that captures buyers' idiosyncratic tastes. The noise distribution and the relationship between the context vectors and buyers' demand curves are both unknown to the seller. We focus on designing the seller's learning policy to set contextual reserve prices where the seller's goal is to minimize his regret for revenue. We first propose a pricing policy when buyers are truthful and show that it achieves a $T$-period regret bound of $\tilde{\mathcal{O}}(\sqrt{dT})$ against a clairvoyant policy that has full information of the buyers' demand. Next, under the setting where buyers bid strategically to maximize their long-term discounted utility, we develop a variant of our first policy that is robust to strategic (corrupted) bids. This policy incorporates randomized isolation periods, during which a buyer is randomly chosen to solely participate in the auction. We show that this design allows the seller to control the number of periods in which buyers significantly corrupt their bids. Because of this nice property, our robust policy enjoys a $T$-period regret of $\tilde{\mathcal{O}}(\sqrt{dT})$, matching that under the truthful setting up to a constant factor that depends on the utility discount factor.
We study the problem of computing personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a dataset that contains the … We study the problem of computing personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a dataset that contains the submitted bids of n buyers in a set of auctions and the goal is to return personalized reserve prices r that maximize the revenue earned on these auctions by running eager second price auctions with reserve r. We present a novel LP formulation to this problem and a rounding procedure which achieves a (1+2(√2-1)e√2-2)-1≅0.684-approximation. This improves over the 1/2-approximation Algorithm due to Roughgarden and Wang. We show that our analysis is tight for this rounding procedure. We also bound the integrality gap of the LP, which bounds the performance of any algorithm based on this LP.
We study the problem of computing data-driven personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a data-set that contains … We study the problem of computing data-driven personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a data-set that contains the submitted bids of $n$ buyers in a set of auctions and the problem is to return personalized reserve prices $\textbf r$ that maximize the revenue earned on these auctions by running eager second price auctions with reserve $\textbf r$. For this problem, which is known to be APX-hard, we present a novel LP formulation and a rounding procedure which achieves a $(1+2(\sqrt{2}-1)e^{\sqrt{2}-2})^{-1} \approx 0.684$-approximation. This improves over the $\frac{1}{2}$-approximation algorithm due to Roughgarden and Wang. We show that our analysis is tight for this rounding procedure. We also bound the integrality gap of the LP, which shows that it is impossible to design an algorithm that yields an approximation factor larger than $0.828$ with respect to this LP.
We study the problem of computing data-driven personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a data-set that contains … We study the problem of computing data-driven personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a data-set that contains the submitted bids of $n$ buyers in a set of auctions and the problem is to return personalized reserve prices $\textbf r$ that maximize the revenue earned on these auctions by running eager second price auctions with reserve $\textbf r$. For this problem, which is known to be NP-hard, we present a novel LP formulation and a rounding procedure which achieves a $(1+2(\sqrt{2}-1)e^{\sqrt{2}-2})^{-1} \approx 0.684$-approximation. This improves over the $\frac{1}{2}$-approximation algorithm due to Roughgarden and Wang. We show that our analysis is tight for this rounding procedure. We also bound the integrality gap of the LP, which shows that it is impossible to design an algorithm that yields an approximation factor larger than $0.828$ with respect to this LP.
In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $a$ to perform, and receives some reward $r_{a,t}(c)$. We consider the … In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $a$ to perform, and receives some reward $r_{a,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{a,t}(c)$, the learner also learns the values of $r_{a,t}(c')$ for all other contexts $c'$; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on $C$ and achieve regret $\tilde{O}(\sqrt{KT})$ (when contexts are stochastic with known distribution), $\tilde{O}(K^{1/3}T^{2/3})$ (when contexts are stochastic with unknown distribution), and $\tilde{O}(\sqrt{KT})$ (when contexts are adversarial but rewards are stochastic). We simulate our algorithms on real auction data from an ad exchange running first-price auctions (showing that they outperform traditional contextual bandit algorithms).
We study the problem of computing data-driven personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a data-set that contains … We study the problem of computing data-driven personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a data-set that contains the submitted bids of $n$ buyers in a set of auctions and the problem is to return personalized reserve prices $\textbf r$ that maximize the revenue earned on these auctions by running eager second price auctions with reserve $\textbf r$. For this problem, which is known to be APX-hard, we present a novel LP formulation and a rounding procedure which achieves a $(1+2(\sqrt{2}-1)e^{\sqrt{2}-2})^{-1} \approx 0.684$-approximation. This improves over the $\frac{1}{2}$-approximation algorithm due to Roughgarden and Wang. We show that our analysis is tight for this rounding procedure. We also bound the integrality gap of the LP, which shows that it is impossible to design an algorithm that yields an approximation factor larger than $0.828$ with respect to this LP.
We consider a dynamic pricing problem for repeated contextual second-price auctions with multiple strategic buyers who aim to maximize their long-term time discounted utility. The seller has limited information on … We consider a dynamic pricing problem for repeated contextual second-price auctions with multiple strategic buyers who aim to maximize their long-term time discounted utility. The seller has limited information on buyers' overall demand curves which depends on a non-parametric market-noise distribution, and buyers may potentially submit corrupted bids (relative to true valuations) to manipulate the seller's pricing policy for more favorable reserve prices in the future. We focus on designing the seller's learning policy to set contextual reserve prices where the seller's goal is to minimize regret compared to the revenue of a benchmark clairvoyant policy that has full information of buyers' demand. We propose a policy with a phased-structure that incorporates randomized "isolation" periods, during which a buyer is randomly chosen to solely participate in the auction. We show that this design allows the seller to control the number of periods in which buyers significantly corrupt their bids. We then prove that our policy enjoys a $T$-period regret of $\widetilde{\mathcal{O}}(\sqrt{T})$ facing strategic buyers. Finally, we conduct numerical simulations to compare our proposed algorithm to standard pricing policies. Our numerical results show that our algorithm outperforms these policies under various buyer bidding behavior.
In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $a$ to perform, and receives some reward $r_{a,t}(c)$. We consider the … In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $a$ to perform, and receives some reward $r_{a,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{a,t}(c)$, the learner also learns the values of $r_{a,t}(c')$ for all other contexts $c'$; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on $C$ and achieve regret $O(\sqrt{KT})$ (when contexts are stochastic with known distribution), $\tilde{O}(K^{1/3}T^{2/3})$ (when contexts are stochastic with unknown distribution), and $\tilde{O}(\sqrt{KT})$ (when contexts are adversarial but rewards are stochastic).
We study the fundamental problem of selling a single indivisible item to one of $n$ buyers with independent and potentially nonidentical value distributions. We focus on two simple and widely … We study the fundamental problem of selling a single indivisible item to one of $n$ buyers with independent and potentially nonidentical value distributions. We focus on two simple and widely used selling mechanisms: the second price auction with \emph{eager} personalized reserve prices and the sequential posted price mechanism. Using a new approach, we improve the best-known performance guarantees for these mechanisms. We show that for every value of the number of buyers $n$, the eager second price (ESP) auction and sequential posted price mechanisms respectively earn at least $0.6620$ and $0.6543$ fractions of the optimal revenue. We also provide improved performance guarantees for these mechanisms when the number of buyers is small, which is the more relevant regime for many applications of interest. This in particular implies an improved bound of $0.6543$ for free-order prophet inequalities. Motivated by our improved revenue bounds, we further study the problem of optimizing reserve prices in the ESP auctions when the sorted order of personalized reserve prices among bidders is exogenous. We show that this problem can be solved polynomially. In addition, by analyzing a real auction dataset from Google's advertising exchange, we demonstrate the effectiveness of order-based pricing.
We study revenue maximization through sequential posted price mechanisms (SPMs) in single-dimensional settings with n buyers, and independent but not necessarily identical value distributions. We construct our SPMs by considering … We study revenue maximization through sequential posted price mechanisms (SPMs) in single-dimensional settings with n buyers, and independent but not necessarily identical value distributions. We construct our SPMs by considering two simple pricing schemes: one that imitates Myerson's mechanism via taxation principle, and the other that posts a uniform price. We design a factor revealing LP that crisply captures the approximation factor of this SPM. In the H-units setting, our SPM yields the first improvement in approximation over the correlation-gap-based approximation factor from over eight years ago (Yan 2011). In the position auction setting, our SPM yields the first higher-than 1-1/e approximation factor. In the 1-unit setting, our SPM yields better than the current best known approximation factor for small values of the number of buyers n. Our results on SPMs immediately imply improved performance guarantees for the equivalent free-order prophet inequality problem. In the recent surge of results that show an improvement over 1-1/e for SPMs, ours is the only technique that generalizes beyond the 1-unit setting, and yields the first improvement over long established approximation factors in several settings.
We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with $n$ buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering … We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with $n$ buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering the best of two simple pricing rules: one that imitates the revenue optimal mchanism, namely the Myersonian mechanism, via the taxation principle and the other that posts a uniform price. Our pricing rules are rather generalizable and yield the first improvement over long-established approximation factors in several settings. We design factor-revealing mathematical programs that crisply capture the approximation factor of our SPP mechanism. In the single-unit setting, our SPP mechanism yields a better approximation factor than the state of the art prior to our work (Azar, Chiplunkar & Kaplan, 2018). In the multi-unit setting, our SPP mechanism yields the first improved approximation factor over the state of the art after over nine years (Yan, 2011 and Chakraborty et al., 2010). Our results on SPP mechanisms immediately imply improved performance guarantees for the equivalent free-order prophet inequality problem. In the position auction setting, our SPP mechanism yields the first higher-than $1-1/e$ approximation factor. In eager second-price (ESP) auctions, our two simple pricing rules lead to the first improved approximation factor that is strictly greater than what is obtained by the SPP mechanism in the single-unit setting.
Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item … Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers' valuations, i.e., buyers' preferences. The seller's goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is to minimize her regret for revenue, where the regret is computed against a clairvoyant policy that knows buyers' heterogeneous preferences. Given the seller's goal, utility-maximizing buyers have the incentive to bid untruthfully in order to manipulate the seller's learning policy. We propose learning policies that are robust to such strategic behavior. These policies use the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices. When the market noise distribution is known to the seller, we propose a policy called Contextual Robust Pricing (CORP) that achieves a T-period regret of O(d log(Td)log(T)), where d is the dimension of the contextual information. When the market noise distribution is unknown to the seller, we propose two policies whose regrets are sublinear in T.
We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with $n$ buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering … We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with $n$ buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering the best of two simple pricing rules: one that imitates the revenue optimal mchanism, namely the Myersonian mechanism, via the taxation principle and the other that posts a uniform price. Our pricing rules are rather generalizable and yield the first improvement over long-established approximation factors in several settings. We design factor-revealing mathematical programs that crisply capture the approximation factor of our SPP mechanism. In the single-unit setting, our SPP mechanism yields a better approximation factor than the state of the art prior to our work (Azar, Chiplunkar & Kaplan, 2018). In the multi-unit setting, our SPP mechanism yields the first improved approximation factor over the state of the art after over nine years (Yan, 2011 and Chakraborty et al., 2010). Our results on SPP mechanisms immediately imply improved performance guarantees for the equivalent free-order prophet inequality problem. In the position auction setting, our SPP mechanism yields the first higher-than $1-1/e$ approximation factor. In eager second-price (ESP) auctions, our two simple pricing rules lead to the first improved approximation factor that is strictly greater than what is obtained by the SPP mechanism in the single-unit setting.
In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the … In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{i,t}(c)$, the learner also learns the values of $r_{i,t}(c')$ for some other contexts $c'$ in set $\mathcal{O}_i(c)$; i.e., the rewards that would have been achieved by performing that action under different contexts $c'\in \mathcal{O}_i(c)$. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We design and analyze new algorithms for the contextual bandits problem with cross-learning and show that their regret has better dependence on the number of contexts. Under complete cross-learning where the rewards for all contexts are learned when choosing an action, i.e., set $\mathcal{O}_i(c)$ contains all contexts, we show that our algorithms achieve regret $\tilde{O}(\sqrt{KT})$, removing the dependence on $C$. For any other cases, i.e., under partial cross-learning where $|\mathcal{O}_i(c)|< C$ for some context-action pair of $(i,c)$, the regret bounds depend on how the sets $\mathcal O_i(c)$ impact the degree to which cross-learning between contexts is possible. We simulate our algorithms on real auction data from an ad exchange running first-price auctions and show that they outperform traditional contextual bandit algorithms.
We propose a new scheme for increasing the throughput of video files in cellular communications systems. This scheme exploits (1) the redundancy of user requests as well as (2) the … We propose a new scheme for increasing the throughput of video files in cellular communications systems. This scheme exploits (1) the redundancy of user requests as well as (2) the considerable storage capacity of smartphones and tablets. Users cache popular video files and-after receiving requests from other users-serve these requests via device-to-device localized transmissions. The file placement is optimal when a central control knows a priori the locations of wireless devices when file requests occur. However, even a purely random caching scheme shows only a minor performance loss compared to such a "genie-aided" scheme. We then analyze the optimal collaboration distance, trading off frequency reuse with the probability of finding a requested file within the collaboration distance. We show that an improvement of spectral efficiency of one to two orders of magnitude is possible, even if there is not very high redundancy in video requests.

Commonly Cited References

The surest way to increase the system capacity of a wireless link is by getting the transmitter and receiver closer to each other, which creates the dual benefits of higher-quality … The surest way to increase the system capacity of a wireless link is by getting the transmitter and receiver closer to each other, which creates the dual benefits of higher-quality links and more spatial reuse. In a network with nomadic users, this inevitably involves deploying more infrastructure, typically in the form of microcells, hot spots, distributed antennas, or relays. A less expensive alternative is the recent concept of femtocells - also called home base stations - which are data access points installed by home users to get better indoor voice and data coverage. In this article we overview the technical and business arguments for femtocells and describe the state of the art on each front. We also describe the technical challenges facing femtocell networks and give some preliminary ideas for how to overcome them.
We suggest a novel approach to handle the ongoing explosive increase in the demand for video content in wireless/mobile devices. We envision femtocell-like base stations, which we call helpers, with … We suggest a novel approach to handle the ongoing explosive increase in the demand for video content in wireless/mobile devices. We envision femtocell-like base stations, which we call helpers, with weak backhaul links but large storage capacity. These helpers form a wireless distributed caching network that assists the macro base station by handling requests of popular files that have been cached. Due to the short distances between helpers and requesting devices, the transmission of cached files can be done very efficiently. A key question for such a system is the wireless distributed caching problem, i.e., which files should be cached by which helpers. If every mobile device has only access to a exactly one helper, then clearly each helper should cache the same files, namely the most popular ones. However, for the case that each mobile device can access multiple caches, the assignment of files to helpers becomes nontrivial. The theoretical contribution of our paper lies in (i) formalizing the distributed caching problem, (ii) showing that this problem is NP-hard, and (iii) presenting approximation algorithms that lie within a constant factor of the theoretical optimum. On the practical side, we present a detailed simulation of a university campus scenario covered by a single 3GPP LTE R8 cell and several helpers using a simplified 802.11n protocol. We use a real campus trace of video requests and show how distributed caching can increase the number served users by as much as 400 - 500%.
In many practical settings, the decision makers have to learn their best actions by experimenting with possible options and collecting feedback (data) over time. It is often assumed that the … In many practical settings, the decision makers have to learn their best actions by experimenting with possible options and collecting feedback (data) over time. It is often assumed that the collected data can be trusted as they reflect the ground truth. But this assumption is violated when the data are generated by strategic players. Consider online advertising market in which the ad exchange (decision maker) aims at learning the best reserve prices in the repeated auctions. In this setting, the data are advertisers’ submitted bids. Such data can be strategically corrupted by advertisers to trick the learning algorithm of the ad exchange to offer them lower reserve prices in the future auctions. In “Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions,” N. Golrezaei, A. Javanmard, and V. Mirrokni design effective learning algorithms with sublinear regret in such environments that are robust to the strategic behavior of the players.
We study the question of setting and testing reserve prices in single item auctions when the bidders are not identical. At a high level, there are two generalizations of the … We study the question of setting and testing reserve prices in single item auctions when the bidders are not identical. At a high level, there are two generalizations of the standard second price auction: in the lazy version we first determine the winner, and then apply reserve prices; in the eager version we first discard the bidders not meeting their reserves, and then determine the winner among the rest. We show that the two versions have dramatically different properties: lazy reserves are easy to optimize, and A/B test in production, whereas eager reserves always lead to higher welfare, but their optimization is NP-complete, and naive A/B testing will lead to incorrect conclusions. Despite their different characteristics, we show that the overall revenue for the two scenarios is always within a factor of 2 of each other, even in the presence of correlated bids. Moreover, we prove that the eager auction dominates the lazy auction on revenue whenever the bidders are independent or symmetric. We complement our theoretical results with simulations on real world data that show that even suboptimally set eager reserve prices are preferred from a revenue standpoint.
In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $a$ to perform, and receives some reward $r_{a,t}(c)$. We consider the … In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $a$ to perform, and receives some reward $r_{a,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{a,t}(c)$, the learner also learns the values of $r_{a,t}(c')$ for all other contexts $c'$; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on $C$ and achieve regret $\tilde{O}(\sqrt{KT})$ (when contexts are stochastic with known distribution), $\tilde{O}(K^{1/3}T^{2/3})$ (when contexts are stochastic with unknown distribution), and $\tilde{O}(\sqrt{KT})$ (when contexts are adversarial but rewards are stochastic). We simulate our algorithms on real auction data from an ad exchange running first-price auctions (showing that they outperform traditional contextual bandit algorithms).
We study the problem of contextual search, a multidimensional generalization of binary search that captures many problems in contextual decision-making. In contextual search, a learner is trying to learn the … We study the problem of contextual search, a multidimensional generalization of binary search that captures many problems in contextual decision-making. In contextual search, a learner is trying to learn the value of a hidden vector. Every round the learner is provided an adversarially-chosen context vector, submits a guess for the inner product of the context vector with the hidden vector, learns whether their guess was too high or too low, and incurs some loss based on the quality of their guess. The learner's goal is to minimize their total loss over the course of some number of rounds. We present an algorithm for the contextual search problem for the symmetric loss function that achieves constant total loss. We present a new algorithm for the dynamic pricing problem (which can be realized as a special case of the contextual search problem) that achieves doubly-logarithmic total loss, improving exponentially on the previous best known upper bounds and matching the known lower bounds (up to a polynomial dependence on dimension). Both algorithms make significant use of ideas from the field of integral geometry, most notably the notion of intrinsic volumes of a convex set. To the best of our knowledge this is the first application of intrinsic volumes to algorithm design.
A large fraction of online advertisements are sold via repeated second-price auctions. In these auctions, the reserve price is the main tool for the auctioneer to boost revenues. In this … A large fraction of online advertisements are sold via repeated second-price auctions. In these auctions, the reserve price is the main tool for the auctioneer to boost revenues. In this work, we investigate the following question: How can the auctioneer optimize reserve prices by learning from the previous bids, while accounting for the long-term incentives and strategic behavior of the bidders? To this end, we consider a seller who repeatedly sells ex ante identical items via a second-price auction. Buyers' valuations for each item are drawn i.i.d. from a distribution F that is unknown to the seller. We find that if the seller attempts to dynamically update a common reserve price based on the bidding history, this creates an incentive for buyers to shade their bids, which can hurt revenue. When there is more than one buyer, incentive compatibility can be restored by using *personalized* reserve prices, where the personal reserve price for each buyer is set using the historical bids of *other* buyers. Such a mechanism asymptotically achieves the expected revenue obtained under the static Myerson optimal auction for F. Further, if valuation distributions differ across bidders, the loss relative to the Myerson benchmark is only quadratic in the size of such differences. We extend our results to a contextual setting where the valuations of the buyers depend on observed features of the items. When up-front fees are permitted, we show how the seller can determine such payments based on the bids of others to obtain an approximately incentive-compatible mechanism that extracts nearly all the surplus.Our earlier conference paper in WINE 2014 "Dynamic Reserve Prices for Repeated Auctions: Learning from Bids" featured a different model. The full version of that paper is posted at https://arxiv.org/abs/2002.07331
We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We … We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-δ$ while incurring regret at most $O(\sqrt{KT\ln(N/δ)})$ over $T$ time steps. The new algorithm is tested empirically in a large-scale, real-world dataset. Second, we give a new algorithm called VE that competes with a possibly infinite set of policies of VC-dimension $d$ while incurring regret at most $O(\sqrt{T(d\ln(T) + \ln (1/δ))})$ with probability $1-δ$. These guarantees improve on those of all previous algorithms, whether in a stochastic or adversarial environment, and bring us closer to providing supervised learning type guarantees for the contextual bandit setting.
In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n … In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.
We study the problem of computing personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a dataset that contains the … We study the problem of computing personalized reserve prices in eager second price auctions without having any assumption on valuation distributions. Here, the input is a dataset that contains the submitted bids of n buyers in a set of auctions and the goal is to return personalized reserve prices r that maximize the revenue earned on these auctions by running eager second price auctions with reserve r. We present a novel LP formulation to this problem and a rounding procedure which achieves a (1+2(√2-1)e√2-2)-1≅0.684-approximation. This improves over the 1/2-approximation Algorithm due to Roughgarden and Wang. We show that our analysis is tight for this rounding procedure. We also bound the integrality gap of the LP, which bounds the performance of any algorithm based on this LP.
We propose a new scheme for increasing the throughput of video files in cellular communications systems. This scheme exploits (i) the redundancy of user requests as well as (ii) the … We propose a new scheme for increasing the throughput of video files in cellular communications systems. This scheme exploits (i) the redundancy of user requests as well as (ii) the considerable storage capacity of smartphones and tablets. Users cache popular video files and - after receiving requests from other users - serve these requests via device-to-device localized transmissions. We investigate what is the optimal collaboration distance, trading off frequency reuse with the probability of finding a requested file within the collaboration distance. We show that an improvement of spectral efficiency of one to two orders of magnitude is possible, even if there is not very high redundancy in video requests.
We present a new architecture to handle the ongoing explosive increase in the demand for video content in wireless networks. It is based on distributed caching of the content in … We present a new architecture to handle the ongoing explosive increase in the demand for video content in wireless networks. It is based on distributed caching of the content in femtobasestations with small or non-existing backhaul capacity but with considerable storage space, called helper nodes. We also consider using the wireless terminals themselves as caching helpers, which can distribute video through device-to-device communications. This approach allows an improvement in the video throughput without deployment of any additional infrastructure. The new architecture can improve video throughput by one to two orders-of-magnitude.
We study the two-stage vertex-weighted online bipartite matching problem of Feng, Niazadeh, and Saberi (SODA 2021) in a setting where the algorithm has access to a suggested matching that is … We study the two-stage vertex-weighted online bipartite matching problem of Feng, Niazadeh, and Saberi (SODA 2021) in a setting where the algorithm has access to a suggested matching that is recommended in the first stage. We evaluate an algorithm by its robustness $R$, which is its performance relative to that of the optimal offline matching, and its consistency $C$, which is its performance when the advice or the prediction given is correct. We characterize for this problem the Pareto-efficient frontier between robustness and consistency, which is rare in the literature on advice-augmented algorithms, yet necessary for quantifying such an algorithm to be optimal. Specifically, we propose an algorithm that is $R$-robust and $C$-consistent for any $(R,C)$ with $0 \leq R \leq \frac{3}{4}$ and $\sqrt{1-R} + \sqrt{1-C} = 1$, and prove that no other algorithm can achieve a better tradeoff.
We consider a non-stationary variant of a sequential stochastic optimization problem, in which the underlying cost functions may change along the horizon. We propose a measure, termed variation budget, that … We consider a non-stationary variant of a sequential stochastic optimization problem, in which the underlying cost functions may change along the horizon. We propose a measure, termed variation budget, that controls the extent of said change, and study how restrictions on this budget impact achievable performance. We identify sharp conditions under which it is possible to achieve long-run-average optimality and more refined performance measures such as rate optimality that fully characterize the complexity of such problems. In doing so, we also establish a strong connection between two rather disparate strands of literature: adversarial online convex optimization; and the more traditional stochastic approximation paradigm (couched in a non-stationary setting). This connection is the key to deriving well performing policies in the latter, by leveraging structure of optimal policies in the former. Finally, tight bounds on the minimax regret allow us to quantify the "price of non-stationarity," which mathematically captures the added complexity embedded in a temporally changing environment versus a stationary one.
In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the … In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{i,t}(c)$, the learner also learns the values of $r_{i,t}(c')$ for some other contexts $c'$ in set $\mathcal{O}_i(c)$; i.e., the rewards that would have been achieved by performing that action under different contexts $c'\in \mathcal{O}_i(c)$. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We design and analyze new algorithms for the contextual bandits problem with cross-learning and show that their regret has better dependence on the number of contexts. Under complete cross-learning where the rewards for all contexts are learned when choosing an action, i.e., set $\mathcal{O}_i(c)$ contains all contexts, we show that our algorithms achieve regret $\tilde{O}(\sqrt{KT})$, removing the dependence on $C$. For any other cases, i.e., under partial cross-learning where $|\mathcal{O}_i(c)|< C$ for some context-action pair of $(i,c)$, the regret bounds depend on how the sets $\mathcal O_i(c)$ impact the degree to which cross-learning between contexts is possible. We simulate our algorithms on real auction data from an ad exchange running first-price auctions and show that they outperform traditional contextual bandit algorithms.
Dynamic Pricing with Limited Demand Information Dynamic Pricing with Limited Demand Information
Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences … Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences change abruptly at an unknown moment in time. We consider the problem of identifying the K most attractive items and propose cascading non-stationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving this non-stationary problem: CascadeDUCB and CascadeSWUCB. We analyze their performance and derive gap-dependent upper bounds on the n-step regret of these algorithms. We also establish a lower bound on the regret for cascading non-stationary bandits and show that both algorithms match the lower bound up to a logarithmic factor. Finally, we evaluate their performance on a real-world web search click dataset.
We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with $n$ buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering … We study revenue maximization through sequential posted-price (SPP) mechanisms in single-dimensional settings with $n$ buyers and independent but not necessarily identical value distributions. We construct the SPP mechanisms by considering the best of two simple pricing rules: one that imitates the revenue optimal mchanism, namely the Myersonian mechanism, via the taxation principle and the other that posts a uniform price. Our pricing rules are rather generalizable and yield the first improvement over long-established approximation factors in several settings. We design factor-revealing mathematical programs that crisply capture the approximation factor of our SPP mechanism. In the single-unit setting, our SPP mechanism yields a better approximation factor than the state of the art prior to our work (Azar, Chiplunkar & Kaplan, 2018). In the multi-unit setting, our SPP mechanism yields the first improved approximation factor over the state of the art after over nine years (Yan, 2011 and Chakraborty et al., 2010). Our results on SPP mechanisms immediately imply improved performance guarantees for the equivalent free-order prophet inequality problem. In the position auction setting, our SPP mechanism yields the first higher-than $1-1/e$ approximation factor. In eager second-price (ESP) auctions, our two simple pricing rules lead to the first improved approximation factor that is strictly greater than what is obtained by the SPP mechanism in the single-unit setting.
We consider a dynamic pricing problem for repeated contextual second-price auctions with multiple strategic buyers who aim to maximize their long-term time discounted utility. The seller has limited information on … We consider a dynamic pricing problem for repeated contextual second-price auctions with multiple strategic buyers who aim to maximize their long-term time discounted utility. The seller has limited information on buyers' overall demand curves which depends on a non-parametric market-noise distribution, and buyers may potentially submit corrupted bids (relative to true valuations) to manipulate the seller's pricing policy for more favorable reserve prices in the future. We focus on designing the seller's learning policy to set contextual reserve prices where the seller's goal is to minimize regret compared to the revenue of a benchmark clairvoyant policy that has full information of buyers' demand. We propose a policy with a phased-structure that incorporates randomized "isolation" periods, during which a buyer is randomly chosen to solely participate in the auction. We show that this design allows the seller to control the number of periods in which buyers significantly corrupt their bids. We then prove that our policy enjoys a $T$-period regret of $\widetilde{\mathcal{O}}(\sqrt{T})$ facing strategic buyers. Finally, we conduct numerical simulations to compare our proposed algorithm to standard pricing policies. Our numerical results show that our algorithm outperforms these policies under various buyer bidding behavior.
This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory. The reader will learn several tools for the analysis of the extreme singular values of … This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory. The reader will learn several tools for the analysis of the extreme singular values of random matrices with independent rows or columns. Many of these methods sprung off from the development of geometric functional analysis since the 1970s. They have applications in several fields, most notably in theoretical computer science, statistics and signal processing. A few basic applications are covered in this text, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurement matrices in compressed sensing. This tutorial is written particularly for graduate students and beginning researchers in different areas, including functional analysts, probabilists, theoretical statisticians, electrical engineers, and theoretical computer scientists.
We provide algorithms that learn simple auctions whose revenue is approximately optimal in multi-item multi-bidder settings, for a wide range of bidder valuations including unit-demand, additive, constrained additive, XOS, and … We provide algorithms that learn simple auctions whose revenue is approximately optimal in multi-item multi-bidder settings, for a wide range of bidder valuations including unit-demand, additive, constrained additive, XOS, and subadditive. We obtain our learning results in two settings. The first is the commonly studied setting where sample access to the bidders' distributions over valuations is given, for both regular distributions and arbitrary distributions with bounded support. Here, our algorithms require polynomially many samples in the number of items and bidders. The second is a more general max-min learning setting that we introduce, where we are given "approximate distributions," and we seek to compute a mechanism whose revenue is approximately optimal simultaneously for all "true distributions" that are close to the ones we were given. These results are more general in that they imply the sample-based results, and are also applicable in settings where we have no sample access to the underlying distributions but have estimated them indirectly via market research or by observation of bidder behavior in previously run, potentially non-truthful auctions. All our results hold for valuation distributions satisfying the standard (and necessary) independence-across-items property. They also generalize and improve upon recent works of Goldner and Karlin [25] and Morgenstern and Roughgarden [32], which have provided algorithms that learn approximately optimal multi-item mechanisms in more restricted settings with additive, subadditive and unit-demand valuations using sample access to distributions. We generalize these results to the complete unit-demand, additive, and XOS setting, to i.i.d. subadditive bidders, and to the max-min setting. Our results are enabled by new uniform convergence bounds for hypotheses classes under product measures. Our bounds result in exponential savings in sample complexity compared to bounds derived by bounding the VC dimension and are of independent interest.
Let $X_i \geq 0$ be independent, $i = 1, \cdots, n$, and $X^\ast_n = \max(X_1, \cdots, X_n)$. Let $t(c) (s(c))$ be the threshold stopping rule for $X_1, \cdots, X_n$, defined … Let $X_i \geq 0$ be independent, $i = 1, \cdots, n$, and $X^\ast_n = \max(X_1, \cdots, X_n)$. Let $t(c) (s(c))$ be the threshold stopping rule for $X_1, \cdots, X_n$, defined by $t(c) = \text{smallest} i$ for which $X_i \geq c(s(c) = \text{smallest} i$ for which $X_i > c), = n$ otherwise. Let $m$ be a median of the distribution of $X^\ast_n$. It is shown that for every $n$ and $\underline{X}$ either $EX^\ast_n \leq 2EX_{t(m)}$ or $EX^\ast_n \leq 2EX_{s(m)}$. This improves previously known results, [1], [4]. Some results for i.i.d. $X_i$ are also included.
We introduce a novel wireless device-to-device (D2D) collaboration architecture that exploits distributed storage of popular content to enable frequency reuse. We identify a fundamental conflict between collaboration distance and interference … We introduce a novel wireless device-to-device (D2D) collaboration architecture that exploits distributed storage of popular content to enable frequency reuse. We identify a fundamental conflict between collaboration distance and interference and show how to optimize the transmission power to maximize frequency reuse. Our analysis depends on the user content request statistics which are modeled by a Zipf distribution. Our main result is a closed form expression of the optimal collaboration distance as a function of the content reuse distribution parameters. We show that if the Zipf exponent of the content reuse distribution is greater than 1, it is possible to have a number of D2D interference-free collaboration pairs that scales linearly in the number of nodes. If the Zipf exponent is smaller than 1, we identify the best possible scaling in the number of D2D collaborating links. Surprisingly, a very simple distributed caching policy achieves the optimal scaling behavior and therefore there is no need to centrally coordinate what each node is caching.
Real-Time Resource Allocation: Beyond Stochastic Demand Modeling Real-Time Resource Allocation: Beyond Stochastic Demand Modeling
Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to … Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability.
How to optimize posted price mechanisms? The sequential posted-price (SPP) mechanism is one of the widely used selling mechanisms in practice. In this mechanism, the seller presents each buyer with … How to optimize posted price mechanisms? The sequential posted-price (SPP) mechanism is one of the widely used selling mechanisms in practice. In this mechanism, the seller presents each buyer with a price sequentially and the buyer can either accept or reject the mechanism's offer. Despite the widespread use of the SPP mechanism, the problem of optimizing prices in this mechanism has not been fully addressed. In a paper entitled, “Improved Revenue Bounds for Posted-Price and Second-Price Mechanisms,” H. Beyhaghi, N. Golrezaei, R. Paes Leme, M. Pal, and B. Sivan construct SPP mechanisms by considering the best of two simple pricing rules: one that imitates the optimal mechanism and the other that posts a uniform price (same price for every buyer). Their simple pricing rules can be easily generalized to the setting with multiple units and yield the first improvement over long-established approximation factors.
We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle. We present an algorithm called … We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle. We present an algorithm called Generalized Followthe- Perturbed-Leader and provide conditions under which it is oracle-efficient while achieving vanishing regret. Our results make significant progress on an open problem raised by Hazan and Koren [1], who showed that oracle-efficient algorithms do not exist in full generality and asked whether one can identify conditions under which oracle-efficient online learning may be possible. Our auction-design framework considers an auctioneer learning an optimal auction for a sequence of adversarially selected valuations with the goal of achieving revenue that is almost as good as the optimal auction in hindsight, among a class of auctions. We give oracle-efficient learning results for: (1) VCG auctions with bidder-specific reserves in singleparameter settings, (2) envy-free item-pricing auctions in multiitem settings, and (3) the level auctions of Morgenstern and Roughgarden [2] for single-item settings. The last result leads to an approximation of the overall optimal Myerson auction when bidders' valuations are drawn according to a fast-mixing Markov process, extending prior work that only gave such guarantees for the i.i.d. setting.We also derive various extensions, including: (1) oracleefficient algorithms for the contextual learning setting in which the learner has access to side information (such as bidder demographics), (2) learning with approximate oracles such as those based on Maximal-in-Range algorithms, and (3) no-regret bidding algorithms in simultaneous auctions, which resolve an open problem of Daskalakis and Syrgkanis [3].
Video on-demand streaming from Internet-based servers is becoming one of the most important services offered by wireless networks today. In order to improve the area spectral efficiency of video transmission … Video on-demand streaming from Internet-based servers is becoming one of the most important services offered by wireless networks today. In order to improve the area spectral efficiency of video transmission in cellular systems, small cells heterogeneous architectures (e.g., femtocells, WiFi off-loading) are being proposed, such that video traffic to nomadic users can be handled by short-range links to the nearest small cell access points (referred to as helpers). As the helper deployment density increases, the backhaul capacity becomes the system bottleneck. In order to alleviate such bottleneck we propose a system where helpers with low-rate backhaul but high storage capacity cache popular video files. Files not available from helpers are transmitted by the cellular base station. We analyze the optimum way of assigning files to the helpers, in order to minimize the expected downloading time for files. We distinguish between the uncoded case (where only complete files are stored) and the coded case, where segments of Fountain-encoded versions of the video files are stored at helpers. We show that the uncoded optimum file assignment is NP-hard, and develop a greedy strategy that is provably within a factor 2 of the optimum. Further, for a special case we provide an efficient algorithm achieving a provably better approximation ratio of $1-(1-1/d)^d$, where $d$ is the maximum number of helpers a user can be connected to. We also show that the coded optimum cache assignment problem is convex that can be further reduced to a linear program. We present numerical results comparing the proposed schemes.
We introduce data-driven decision-making algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in … We introduce data-driven decision-making algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown \emph{a priori} and possibly adversarial) non-stationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Our main contribution is a general algorithmic recipe for a wide variety of non-stationary bandit problems. Specifically, we design and analyze the sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound for each of the settings when we know the respective underlying \emph{variation budget}, which quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the "forgetting principle" in the learning processes, which is vital in changing environments. Our extensive numerical experiments on both synthetic and real world online auto-loan datasets show that our proposed algorithms achieve superior empirical performance compared to existing algorithms.
Inspired by real-time ad exchanges for online display advertising, we consider the problem of inferring a buyer's value distribution for a good when the buyer is repeatedly interacting with a … Inspired by real-time ad exchanges for online display advertising, we consider the problem of inferring a buyer's value distribution for a good when the buyer is repeatedly interacting with a seller through a posted-price mechanism. We model the buyer as a strategic agent, whose goal is to maximize her long-term surplus, and we are interested in mechanisms that maximize the seller's long-term revenue. We define the natural notion of strategic regret --- the lost revenue as measured against a truthful (non-strategic) buyer. We present seller algorithms that are no-(strategic)-regret when the buyer discounts her future surplus --- i.e. the buyer prefers showing advertisements to users sooner rather than later. We also give a lower bound on strategic regret that increases as the buyer's discounting weakens and shows, in particular, that any seller algorithm will suffer linear strategic regret if there is no discounting.
We consider a dynamic assortment selection problem where in every round the retailer offers a subset (assortment) of N substitutable products to a consumer, who selects one of these products … We consider a dynamic assortment selection problem where in every round the retailer offers a subset (assortment) of N substitutable products to a consumer, who selects one of these products according to a multinomial logit (MNL) choice model. The retailer observes this choice, and the objective is to dynamically learn the model parameters while optimizing cumulative revenues over a selling horizon of length T. We refer to this exploration–exploitation formulation as the MNL-Bandit problem. Existing methods for this problem follow an explore-then-exploit approach, which estimates parameters to a desired accuracy and then, treating these estimates as if they are the correct parameter values, offers the optimal assortment based on these estimates. These approaches require certain a priori knowledge of “separability,” determined by the true parameters of the underlying MNL model, and this in turn is critical in determining the length of the exploration period. (Separability refers to the distinguishability of the true optimal assortment from the other suboptimal alternatives.) In this paper, we give an efficient algorithm that simultaneously explores and exploits, without a priori knowledge of any problem parameters. Furthermore, the algorithm is adaptive in the sense that its performance is near optimal in the “well-separated” case as well as the general parameter setting where this separation need not hold.
Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual … Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution. We analyze various standard notions of regret suited to non-stationary environments for these algorithms, including interval regret, switching regret, and dynamic regret. When competing with the best policy at each time, one of our algorithms achieves regret $\mathcal{O}(\sqrt{ST})$ if there are $T$ rounds with $S$ stationary periods, or more generally $\mathcal{O}(Δ^{1/3}T^{2/3})$ where $Δ$ is some non-stationarity measure. These results almost match the optimal guarantees achieved by an inefficient baseline that is a variant of the classic Exp4 algorithm. The dynamic regret result is also the first one for efficient and fully adversarial contextual bandit. Furthermore, while the results above require tuning a parameter based on the unknown quantity $S$ or $Δ$, we also develop a parameter free algorithm achieving regret $\min\{S^{1/4}T^{3/4}, Δ^{1/5}T^{4/5}\}$. This improves and generalizes the best existing result $Δ^{0.18}T^{0.82}$ by Karnin and Anava (2016) which only holds for the two-armed bandit problem.
Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting. However, a major concern in this context is … Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting. However, a major concern in this context is when the system cannot decide whether the user feedback for each item is actually exploitable. Indeed, much of the content may have been simply ignored by the user. The present work proposes to exploit available information regarding the display position bias under the so-called Position-based click model (PBM). We first discuss how this model differs from the Cascade model and its variants considered in several recent works on multiple-play bandits. We then provide a novel regret lower bound for this model as well as computationally efficient algorithms that display good empirical and theoretical performance.
Traditional online algorithms encapsulate decision making under uncertainty, and give ways to hedge against all possible future events, while guaranteeing a nearly optimal solution as compared to an offline optimum. … Traditional online algorithms encapsulate decision making under uncertainty, and give ways to hedge against all possible future events, while guaranteeing a nearly optimal solution as compared to an offline optimum. On the other hand, machine learning algorithms are in the business of extrapolating patterns found in the data to predict the future, and usually come with strong guarantees on the expected generalization error. In this work we develop a framework for augmenting online algorithms with a machine learned oracle to achieve competitive ratios that provably improve upon unconditional worst case lower bounds when the oracle has low error. Our approach treats the oracle as a complete black box, and is not dependent on its inner workings, or the exact distribution of its errors. We apply this framework to the traditional caching problem -- creating an eviction strategy for a cache of size $k$. We demonstrate that naively following the oracle's recommendations may lead to very poor performance, even when the average error is quite low. Instead we show how to modify the Marker algorithm to take into account the oracle's predictions, and prove that this combined approach achieves a competitive ratio that both (i) decreases as the oracle's error decreases, and (ii) is always capped by $O(\log k)$, which can be achieved without any oracle input. We complement our results with an empirical evaluation of our algorithm on real world datasets, and show that it performs well empirically even using simple off-the-shelf predictions.
The <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">index coding</i> problem has recently attracted a significant attention from the research community due to its theoretical significance and applications in wireless ad hoc networks. An instance … The <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">index coding</i> problem has recently attracted a significant attention from the research community due to its theoretical significance and applications in wireless ad hoc networks. An instance of the index coding problem includes a sender that holds a set of information messages <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X={x</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">,...,x</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">}</i> and a set of receivers <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R</i> . Each receiver <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(x,H)</i> in <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R</i> needs to obtain a message <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x X</i> and has prior <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">side information</i> consisting of a subset <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H</i> of <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> . The sender uses a noiseless communication channel to broadcast encoding of messages in <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> to all clients. The objective is to find an encoding scheme that minimizes the number of transmissions required to satisfy the demands of all the receivers. In this paper, we analyze the relation between the index coding problem, the more general network coding problem, and the problem of finding a linear representation of a matroid. In particular, we show that any instance of the network coding and matroid representation problems can be efficiently reduced to an instance of the index coding problem. Our reduction implies that many important properties of the network coding and matroid representation problems carry over to the index coding problem. Specifically, we show that <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">vector linear codes</i> outperform scalar linear index codes and that vector linear codes are insufficient for achieving the optimum number of transmissions.
We present a multi-channel P2P Video-on-Demand (VoD) system using "plug-and-play" helpers. Helpers are heterogenous "micro-servers" with limited storage, bandwidth and number of users they can serve simultaneously. Our proposed system … We present a multi-channel P2P Video-on-Demand (VoD) system using "plug-and-play" helpers. Helpers are heterogenous "micro-servers" with limited storage, bandwidth and number of users they can serve simultaneously. Our proposed system has the following salient features: (1) it minimizes the server load; (2) it is distributed, and requires little or no maintenance overhead and which can easily adapt to system dynamics; and (3) it is adaptable to varying supply and demand patterns across multiple video channels irrespective of video popularity. Our proposed solution jointly optimizes over helper-user topology, video storage allocation and bandwidth allocation. The combinatorial nature of the problem and the system demand for distributed algorithms makes the problem uniquely challenging. By utilizing Lagrangian decomposition and Markov chain approximation based arguments, we address this challenge by designing two distributed algorithms running in tandem: a primal-dual storage and bandwidth allocation algorithm and a "soft-worst-neighbor-choking" topology-building algorithm. Our scheme provably converges to a near-optimal solution, and is easy to implement in practice. Simulation results validate that the proposed scheme achieves minimum sever load under highly heterogeneous combinations of supply and demand patterns, and is robust to system dynamics of user/helper churn, user/helper asynchrony, and random delays in the network.
For revenue and welfare maximization in single-dimensional Bayesian settings, Chawla et al. (STOC10) recently showed that sequential posted-price mechanisms (SPMs), though simple in form, can perform surprisingly well compared to … For revenue and welfare maximization in single-dimensional Bayesian settings, Chawla et al. (STOC10) recently showed that sequential posted-price mechanisms (SPMs), though simple in form, can perform surprisingly well compared to the optimal mechanisms. In this paper, we give a theoretical explanation of this fact, based on a connection to the notion of correlation gap. Loosely speaking, for auction environments with matroid constraints, we can relate the performance of a mechanism to the expectation of a monotone submodular function over a random set. This random set corresponds to the winner set for the optimal mechanism, which is highly correlated, and corresponds to certain demand set for SPMs, which is independent. The notion of correlation gap of Agrawal et al.\ (SODA10) quantifies how much we {}"lose" in the expectation of the function by ignoring correlation in the random set, and hence bounds our loss in using certain SPM instead of the optimal mechanism. Furthermore, the correlation gap of a monotone and submodular function is known to be small, and it follows that certain SPM can approximate the optimal mechanism by a good constant factor. Exploiting this connection, we give tight analysis of a greedy-based SPM of Chawla et al.\ for several environments. In particular, we show that it gives an $e/(e-1)$-approximation for matroid environments, gives asymptotically a $1/(1-1/\sqrt{2\pi k})$-approximation for the important sub-case of $k$-unit auctions, and gives a $(p+1)$-approximation for environments with $p$-independent set system constraints.
In this paper we study the fundamental problems of maximizing a continuous non-monotone submodular function over the hypercube, both with and without coordinate-wise concavity. This family of optimization problems has … In this paper we study the fundamental problems of maximizing a continuous non-monotone submodular function over the hypercube, both with and without coordinate-wise concavity. This family of optimization problems has several applications in machine learning, economics, and communication systems. Our main result is the first $\frac{1}{2}$-approximation algorithm for continuous submodular function maximization; this approximation factor of $\frac{1}{2}$ is the best possible for algorithms that only query the objective function at polynomially many points. For the special case of DR-submodular maximization, i.e. when the submodular functions is also coordinate wise concave along all coordinates, we provide a different $\frac{1}{2}$-approximation algorithm that runs in quasilinear time. Both of these results improve upon prior work [Bian et al, 2017, Soma and Yoshida, 2017]. Our first algorithm uses novel ideas such as reducing the guaranteed approximation problem to analyzing a zero-sum game for each coordinate, and incorporates the geometry of this zero-sum game to fix the value at this coordinate. Our second algorithm exploits coordinate-wise concavity to identify a monotone equilibrium condition sufficient for getting the required approximation guarantee, and hunts for the equilibrium point using binary search. We further run experiments to verify the performance of our proposed algorithms in related machine learning applications.
A natural optimization model that formulates many online resource allocation problems is the online linear programming (LP) problem in which the constraint matrix is revealed column by column along with … A natural optimization model that formulates many online resource allocation problems is the online linear programming (LP) problem in which the constraint matrix is revealed column by column along with the corresponding objective coefficient. In such a model, a decision variable has to be set each time a column is revealed without observing the future inputs, and the goal is to maximize the overall objective function. In this paper, we propose a near-optimal algorithm for this general class of online problems under the assumptions of random order of arrival and some mild conditions on the size of the LP right-hand-side input. Specifically, our learning-based algorithm works by dynamically updating a threshold price vector at geometric time intervals, where the dual prices learned from the revealed columns in the previous period are used to determine the sequential decisions in the current period. Through dynamic learning, the competitiveness of our algorithm improves over the past study of the same problem. We also present a worst case example showing that the performance of our algorithm is near optimal.
Consider the high-dimensional linear regression model <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y</i> = <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> β <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> + <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">w</i> , where <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y</i> ∈ \BBR <i xmlns:mml="http://www.w3.org/1998/Math/MathML" … Consider the high-dimensional linear regression model <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y</i> = <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> β <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> + <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">w</i> , where <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y</i> ∈ \BBR <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> is an observation vector, <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> ∈ \BBR <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> × <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</i> is a design matrix with <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</i> >; <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> , β <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> ∈ \BBR <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</i> is an unknown regression vector, and <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">w</i> ~ <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</i> (0, σ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">I</i> ) is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating β <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> in either <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -loss and <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -prediction loss, assuming that β <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> belongs to an <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">lq</i> -ball \BBB <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q</i> ( <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Rq</i> ) for some <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q</i> ∈ [0,1]. It is shown that under suitable regularity conditions on the design matrix <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> , the minimax optimal rate in <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -loss and <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -prediction loss scales as Θ( <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Rq</i> ([(log <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</i> )/( <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> )]) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1-q</sup> / <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> ). The analysis in this paper reveals that conditions on the design matrix <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> enter into the rates for <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -error and <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -prediction error in complementary ways in the upper and lower bounds. Our proofs of the lower bounds are information theoretic in nature, based on Fano's inequality and results on the metric entropy of the balls \BBB <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q</i> ( <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Rq</i> ), whereas our proofs of the upper bounds are constructive, involving direct analysis of least squares over <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">lq</i> -balls. For the special case <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">q</i> =0, corresponding to models with an exact sparsity constraint, our results show that although computationally efficient <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> -based methods can achieve the minimax rates up to constant factors, they require slightly stronger assumptions on the design matrix <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> than optimal algorithms involving least-squares over the <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> -ball.
Internet search companies sell advertisement slots based on users' search queries via an auction. While there has been previous work onthe auction process and its game-theoretic aspects, most of it … Internet search companies sell advertisement slots based on users' search queries via an auction. While there has been previous work onthe auction process and its game-theoretic aspects, most of it focuses on the Internet company. In this work, we focus on the advertisers, who must solve a complex optimization problem to decide how to place bids on keywords to maximize their return (the number of user clicks on their ads) for a given budget. We model the entire process and study this budget optimization problem. While most variants are NP-hard, we show, perhaps surprisingly, that simply randomizing between two uniform strategies that bid equally on all the keywordsworks well. More precisely, this strategy gets at least a 1-1/e fraction of the maximum clicks possible. As our preliminary experiments show, such uniform strategies are likely to be practical. We also present inapproximability results, and optimal algorithms for variants of the budget optimization problem.
Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper … Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently established a natural extension of Freedman's inequality that provides tail bounds for the maximum singular value of a matrix-valued martingale. This note describes a different proof of the matrix Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other types of matrix martingales. The new techniques are adapted from recent work by the present author.
For online resource allocation problems, we propose a new demand arrival model where the sequence of arrivals contains both an adversarial component and a stochastic one. Our model requires no … For online resource allocation problems, we propose a new demand arrival model where the sequence of arrivals contains both an adversarial component and a stochastic one. Our model requires no demand forecasting; however, due to the presence of the stochastic component, we can partially predict future demand as the sequence of arrivals unfolds. Under the proposed model, we study the problem of the online allocation of a single resource to two types of customers, and design online algorithms that outperform existing ones. Our algorithms are adjustable to the relative size of the stochastic component, and our analysis reveals that as the portion of the stochastic component grows, the loss due to making online decisions decreases. This highlights the value of (even partial) predictability in online resource allocation. We impose no conditions on how the resource capacity scales with the maximum number of customers. However, we show that using an adaptive algorithm---which makes online decisions based on observed data---is particularly beneficial when capacity scales linearly with the number of customers. Our work serves as a first step in bridging the long-standing gap between the two well-studied approaches to the design and analysis of online algorithms based on (1) adversarial models and (2) stochastic ones. Using novel algorithm design, we demonstrate that even if the arrival sequence contains an adversarial component, we can take advantage of the limited information that the data reveals to improve allocation decisions. We also study the classical secretary problem under our proposed arrival model, and we show that randomizing over multiple stopping rules may increase the probability of success.
We study online learning in repeated first-price auctions where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize … We study online learning in repeated first-price auctions where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces censored feedback: if she wins the bid, then she is not able to observe the highest bid of the other bidders, which we assume is \textit{iid} drawn from an unknown distribution. In this paper, we develop the first learning algorithm that achieves a near-optimal $\widetilde{O}(\sqrt{T})$ regret bound, by exploiting two structural properties of first-price auctions, i.e. the specific feedback structure and payoff function. We first formulate the feedback structure in first-price auctions as partially ordered contextual bandits, a combination of the graph feedback across actions (bids), the cross learning across contexts (private values), and a partial order over the contexts. We establish both strengths and weaknesses of this framework, by showing a curious separation that a regret nearly independent of the action/context sizes is possible under stochastic contexts, but is impossible under adversarial contexts. In particular, this framework leads to an $O(\sqrt{T}\log^{2.5}T)$ regret for first-price auctions when the bidder's private values are \emph{iid}. Despite the limitation of the above framework, we further exploit the special payoff function of first-price auctions to develop a sample-efficient algorithm even in the presence of adversarially generated private values. We establish an $O(\sqrt{T}\log^3 T)$ regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for first-price auctions.
We consider a dynamic pricing problem for repeated contextual second-price auctions with strategic buyers whose goals are to maximize their long-term time discounted utility. The seller has very limited information … We consider a dynamic pricing problem for repeated contextual second-price auctions with strategic buyers whose goals are to maximize their long-term time discounted utility. The seller has very limited information about buyers' overall demand curves, which depends on $d$-dimensional context vectors characterizing auctioned items, and a non-parametric market noise distribution that captures buyers' idiosyncratic tastes. The noise distribution and the relationship between the context vectors and buyers' demand curves are both unknown to the seller. We focus on designing the seller's learning policy to set contextual reserve prices where the seller's goal is to minimize his regret for revenue. We first propose a pricing policy when buyers are truthful and show that it achieves a $T$-period regret bound of $\tilde{\mathcal{O}}(\sqrt{dT})$ against a clairvoyant policy that has full information of the buyers' demand. Next, under the setting where buyers bid strategically to maximize their long-term discounted utility, we develop a variant of our first policy that is robust to strategic (corrupted) bids. This policy incorporates randomized isolation periods, during which a buyer is randomly chosen to solely participate in the auction. We show that this design allows the seller to control the number of periods in which buyers significantly corrupt their bids. Because of this nice property, our robust policy enjoys a $T$-period regret of $\tilde{\mathcal{O}}(\sqrt{dT})$, matching that under the truthful setting up to a constant factor that depends on the utility discount factor.
The classical analysis of online algorithms, due to its worst-case nature, can be quite pessimistic when the input instance at hand is far from worst-case. Often this is not an … The classical analysis of online algorithms, due to its worst-case nature, can be quite pessimistic when the input instance at hand is far from worst-case. Often this is not an issue with machine learning approaches, which shine in exploiting patterns in past inputs in order to predict the future. However, such predictions, although usually accurate, can be arbitrarily poor. Inspired by a recent line of work, we augment three well-known online settings with machine learned predictions about the future, and develop algorithms that take them into account. In particular, we study the following online selection problems: (i) the classical secretary problem, (ii) online bipartite matching and (iii) the graphic matroid secretary problem. Our algorithms still come with a worst-case performance guarantee in the case that predictions are subpar while obtaining an improved competitive ratio (over the best-known classical online algorithm for each problem) when the predictions are sufficiently accurate. For each algorithm, we establish a trade-off between the competitive ratios obtained in the two respective cases.