Author Description

Login to generate an author description

Ask a Question About This Mathematician

We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if … We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if it is envy-free up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent up to the removal of one good. In addition, an allocation is deemed efficient if it satisfies Pareto efficiency. While each of these well-studied properties is easy to achieve separately, achieving them together is far from obvious. Recently, Caragiannis et al. (2016) established the surprising result that when agents have additive valuations for the goods, there always exists an allocation that simultaneously satisfies these two seemingly incompatible properties. Specifically, they showed that an allocation that maximizes the Nash social welfare objective is both EF1 and Pareto efficient. However, the problem of maximizing Nash social welfare is NP-hard. As a result, this approach does not provide an efficient algorithm for finding a fair and efficient allocation. In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations that are EF1 and Pareto efficient; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally Pareto efficient. Another key contribution of our work is to show that our algorithm provides a polynomial-time 1.45-approximation to the Nash social welfare objective. This improves upon the best known approximation ratio for this problem (namely, the 2-approximation algorithm of Cole et al., 2017), and also matches the lower bound on the integrality gap of the convex program of Cole et al. (2017). Unlike many of the existing approaches, our algorithm is completely combinatorial, and relies on constructing integral Fisher markets wherein specific equilibria are not only efficient, but also fair.
We consider the problem of allocating indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin … We consider the problem of allocating indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin share , which is defined to be the maximum value that an agent can ensure for herself, if she were to partition the goods into n bundles, and then receive a minimum valued bundle. Since maximin fair allocations (i.e., allocations in which each agent gets at least her maximin share) do not always exist, prior work has focused on approximation results that aim to find allocations in which the value of the bundle allocated to each agent is (multiplicatively) as close to her maximin share as possible. In particular, Procaccia and Wang (2014) along with Amanatidis et al. (2015) have shown that under additive valuations, a 2/3-approximate maximin fair allocation always exists and can be found in polynomial time. We complement these results by developing a simple and efficient algorithm that achieves the same approximation guarantee. Furthermore, we initiate the study of approximate maximin fair division under submodular valuations . Specifically, we show that when the valuations of the agents are nonnegative , monotone , and submodular, then a 0.21-approximate maximin fair allocation is guaranteed to exist. In fact, we show that such an allocation can be efficiently found by using a simple round-robin algorithm. A technical contribution of the article is to analyze the performance of this combinatorial algorithm by employing the concept of multilinear extensions .
We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness … We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness threshold. Specifically, maximin share is defined as the minimum utility that an agent can guarantee for herself when asked to partition the set of goods into n bundles such that the remaining (n-1) agents pick their bundles adversarially. An allocation is deemed to be fair if every agent gets a bundle whose valuation is at least her maximin share. Even though maximin shares provide a natural benchmark for fairness, it has its own drawbacks and, in particular, it is not sufficient to rule out unsatisfactory allocations. Motivated by these considerations, in this work we define a stronger notion of fairness, called groupwise maximin share guarantee (GMMS). In GMMS, we require that the maximin share guarantee is achieved not just with respect to the grand bundle, but also among all the subgroups of agents. Hence, this solution concept strengthens MMS and provides an ex-post fairness guarantee. We show that in specific settings, GMMS allocations always exist. We also establish the existence of approximate GMMS allocations under additive valuations, and develop a polynomial-time algorithm to find such allocations. Moreover, we establish a scale of fairness wherein we show that GMMS implies approximate envy freeness. Finally, we empirically demonstrate the existence of GMMS allocations in a large set of randomly generated instances. For the same set of instances, we additionally show that our algorithm achieves an approximation factor better than the established, worst-case bound.
We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with … We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with additive valuations (Eisenberg and Gale 1959; Orlin 2010), such equilibria, in general, assign goods fractionally to agents. Hence, Fisher markets are not directly applicable in the context of indivisible goods. In this work we show that one can always bypass this hurdle and, up to a bounded change in agents’ budgets, obtain markets that admit an integral equilibrium. We refer to such markets as pure markets and show that, for any given Fisher market (with additive valuations), one can efficiently compute a “near-by,” pure market with an accompanying integral equilibrium.Our work on pure markets leads to novel algorithmic results for fair division of indivisible goods. Prior work in discrete fair division has shown that, under additive valuations, there always exist allocations that simultaneously achieve the seemingly incompatible properties of fairness and efficiency (Caragiannis et al. 2016); here fairness refers to envyfreeness up to one good (EF1) and efficiency corresponds to Pareto efficiency. However, polynomial-time algorithms are not known for finding such allocations. Considering relaxations of proportionality and EF1, respectively, as our notions of fairness, we show that fair and Pareto efficient allocations can be computed in strongly polynomial time.
We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social … We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social welfare, which is the geometric mean of the valuations of the agents for their bundles. While the problem of maximizing Nash social welfare is known to be APX-hard in general, we study the effectiveness of simple, greedy algorithms in solving this problem in two interesting special cases. First, we show that a simple, greedy algorithm provides a 1.061-approximation guarantee when agents have identical valuations, even though the problem of maximizing Nash social welfare remains NP-hard for this setting. Second, we show that when agents have binary valuations over the goods, an exact solution (i.e., a Nash optimal allocation) can be found in polynomial time via a greedy algorithm. Our results in the binary setting extend to provide novel, exact algorithms for optimizing Nash social welfare under concave valuations. Notably, for the above mentioned scenarios, our techniques provide a simple alternative to several of the existing, more sophisticated techniques for this problem such as constructing equilibria of Fisher markets or using real stable polynomials.
We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social … We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social welfare, which is the geometric mean of the valuations of the agents for their bundles. While the problem of maximizing Nash social welfare is known to be APX-hard in general, we study the effectiveness of simple, greedy algorithms in solving this problem in two interesting special cases. First, we show that a simple, greedy algorithm provides a 1.061-approximation guarantee when agents have identical valuations, even though the problem of maximizing Nash social welfare remains NP-hard for this setting. Second, we show that when agents have binary valuations over the goods, an exact solution (i.e., a Nash optimal allocation) can be found in polynomial time via a greedy algorithm. Our results in the binary setting extend to provide novel, exact algorithms for optimizing Nash social welfare under concave valuations. Notably, for the above mentioned scenarios, our techniques provide a simple alternative to several of the existing, more sophisticated techniques for this problem such as constructing equilibria of Fisher markets or using real stable polynomials.
We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social … We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social welfare, which is the geometric mean of the valuations of the agents for their bundles. While the problem of maximizing Nash social welfare is known to be APX-hard in general, we study the effectiveness of simple, greedy algorithms in solving this problem in two interesting special cases. First, we show that a simple, greedy algorithm provides a 1.061-approximation guarantee when agents have identical valuations, even though the problem of maximizing Nash social welfare remains NP-hard for this setting. Second, we show that when agents have binary valuations over the goods, an exact solution (i.e., a Nash optimal allocation) can be found in polynomial time via a greedy algorithm. Our results in the binary setting extend to provide novel, exact algorithms for optimizing Nash social welfare under concave valuations. Notably, for the above mentioned scenarios, our techniques provide a simple alternative to several of the existing, more sophisticated techniques for this problem such as constructing equilibria of Fisher markets or using real stable polynomials.
We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) … We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) to that user. In a number of applications (like healthcare), collecting features from users can be costly. To address this issue, we propose algorithms that avoid needless feature collection while maintaining strong regret guarantees.
We design and implement an adaptive experiment (a ``contextual bandit'') to learn a targeted treatment assignment policy, where the goal is to use a participant's survey responses to determine which … We design and implement an adaptive experiment (a ``contextual bandit'') to learn a targeted treatment assignment policy, where the goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation. The design balances two competing objectives: optimizing the outcomes for the subjects in the experiment (``cumulative regret minimization'') and gathering data that will be most useful for policy learning, that is, for learning an assignment rule that will maximize welfare if used after the experiment (``simple regret minimization''). We evaluate alternative experimental designs by collecting pilot data and then conducting a simulation study. Next, we implement our selected algorithm. Finally, we perform a second simulation study anchored to the collected data that evaluates the benefits of the algorithm we chose. Our first result is that the value of a learned policy in this setting is higher when data is collected via a uniform randomization rather than collected adaptively using standard cumulative regret minimization or policy learning algorithms. We propose a simple heuristic for adaptive experimentation that improves upon uniform randomization from the perspective of policy learning at the expense of increasing cumulative regret relative to alternative bandit algorithms. The heuristic modifies an existing contextual bandit algorithm by (i) imposing a lower bound on assignment probabilities that decay slowly so that no arm is discarded too quickly, and (ii) after adaptively collecting data, restricting policy learning to select from arms where sufficient data has been gathered.
We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if … We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if it is envy-free up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent up to the removal of one good. In addition, an allocation is deemed efficient if it satisfies Pareto optimality (PO). While each of these well-studied properties is easy to achieve separately, achieving them together is far from obvious. Recently, Caragiannis et al. (2016) established the surprising result that when agents have additive valuations for the goods, there always exists an allocation that simultaneously satisfies these two seemingly incompatible properties. Specifically, they showed that an allocation that maximizes the Nash social welfare (NSW) objective is both EF1 and PO. However, the problem of maximizing NSW is NP-hard. As a result, this approach does not provide an efficient algorithm for finding a fair and efficient allocation. In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations that are EF1 and PO; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally PO. Another contribution of our work is to show that our algorithm provides a polynomial-time 1.45-approximation to the NSW objective. This improves upon the best known approximation ratio for this problem (namely, the 2-approximation algorithm of Cole et al. (2017)). Unlike many of the existing approaches, our algorithm is completely combinatorial.
Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the … Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.
We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with … We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with additive valuations, such equilibria, in general, assign goods fractionally to agents. Hence, Fisher markets are not directly applicable in the context of indivisible goods. In this work we show that one can always bypass this hurdle and, up to a bounded change in agents' budgets, obtain markets that admit an integral equilibrium. We refer to such markets as pure markets and show that, for any given Fisher market (with additive valuations), one can efficiently compute a near-by, pure market with an accompanying integral equilibrium. Our work on pure markets leads to novel algorithmic results for fair division of indivisible goods. Prior work in discrete fair division has shown that, under additive valuations, there always exist allocations that simultaneously achieve the seemingly incompatible properties of fairness and efficiency; here fairness refers to envy-freeness up to one good (EF1) and efficiency corresponds to Pareto efficiency. However, polynomial-time algorithms are not known for finding such allocations. Considering relaxations of proportionality and EF1, respectively, as our notions of fairness, we show that fair and Pareto efficient allocations can be computed in strongly polynomial time.
Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We show it is more data-efficient to … Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We show it is more data-efficient to estimate any function that explains the reward differences between actions, that is, the treatment effects. Motivated by this observation, building on recent work on oracle-based bandit algorithms, we provide the first reduction of contextual bandits to general-purpose heterogeneous treatment effect estimation, and we design a simple and computationally efficient algorithm based on this reduction. Our theoretical and experimental results demonstrate that heterogeneous treatment effect estimation in contextual bandits offers practical advantages over reward estimation, including more efficient model estimation and greater flexibility to model misspecification.
We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with … We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with additive valuations, such equilibria, in general, assign goods fractionally to agents. Hence, Fisher markets are not directly applicable in the context of indivisible goods. In this work we show that one can always bypass this hurdle and, up to a bounded change in agents' budgets, obtain markets that admit an integral equilibrium. We refer to such markets as pure markets and show that, for any given Fisher market (with additive valuations), one can efficiently compute a "near-by," pure market with an accompanying integral equilibrium. Our work on pure markets leads to novel algorithmic results for fair division of indivisible goods. Prior work in discrete fair division has shown that, under additive valuations, there always exist allocations that simultaneously achieve the seemingly incompatible properties of fairness and efficiency; here fairness refers to envy-freeness up to one good (EF1) and efficiency corresponds to Pareto efficiency. However, polynomial-time algorithms are not known for finding such allocations. Considering relaxations of proportionality and EF1, respectively, as our notions of fairness, we show that fair and Pareto efficient allocations can be computed in strongly polynomial time.
We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness … We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness threshold. Specifically, maximin share is defined as the minimum utility that an agent can guarantee for herself when asked to partition the set of goods into n bundles such that the remaining (n-1) agents pick their bundles adversarially. An allocation is deemed to be fair if every agent gets a bundle whose valuation is at least her maximin share. Even though maximin shares provide a natural benchmark for fairness, it has its own drawbacks and, in particular, it is not sufficient to rule out unsatisfactory allocations. Motivated by these considerations, in this work we define a stronger notion of fairness, called groupwise maximin share guarantee (GMMS). In GMMS, we require that the maximin share guarantee is achieved not just with respect to the grand bundle, but also among all the subgroups of agents. Hence, this solution concept strengthens MMS and provides an ex-post fairness guarantee. We show that in specific settings, GMMS allocations always exist. We also establish the existence of approximate GMMS allocations under additive valuations, and develop a polynomial-time algorithm to find such allocations. Moreover, we establish a scale of fairness wherein we show that GMMS implies approximate envy freeness. Finally, we empirically demonstrate the existence of GMMS allocations in a large set of randomly generated instances. For the same set of instances, we additionally show that our algorithm achieves an approximation factor better than the established, worst-case bound.
Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, … Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.
We study the problem of model selection for contextual bandits, in which the algorithm must balance the bias-variance trade-off for model estimation while also balancing the exploration-exploitation trade-off. In this … We study the problem of model selection for contextual bandits, in which the algorithm must balance the bias-variance trade-off for model estimation while also balancing the exploration-exploitation trade-off. In this paper, we propose the first reduction of model selection in contextual bandits to offline model selection oracles, allowing for flexible general purpose algorithms with computational requirements no worse than those for model selection for regression. Our main result is a new model selection guarantee for stochastic contextual bandits. When one of the classes in our set is realizable, up to a logarithmic dependency on the number of classes, our algorithm attains optimal realizability-based regret bounds for that class under one of two conditions: if the time-horizon is large enough, or if an assumption that helps with detecting misspecification holds. Hence our algorithm adapts to the complexity of this unknown class. Even when this realizable class is known, we prove improved regret guarantees in early rounds by relying on simpler model classes for those rounds and hence further establish the importance of model selection in contextual bandits.
Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the … Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That … In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter determines the weight placed on cumulative regret minimization (where we establish near-optimal minimax guarantees) versus simple regret minimization (where we establish state-of-the-art guarantees). Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings. This flexibility comes from constructing and relying on "conformal arm sets" (CASs). CASs provide a set of arms for every context, encompassing the context-specific optimal arm with a certain probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted with a negative result, which shows that no algorithm can achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.
We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) … We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) to that user. In a number of applications (like healthcare), collecting features from users can be costly. To address this issue, we propose algorithms that avoid needless feature collection while maintaining strong regret guarantees.
Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for … Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for cumulative regret minimization in the stochastic contextual bandit setting. Recent work [Marinov and Zimmert, 2021] identifies instances where no algorithm can guarantee costless regret bounds. Nevertheless, we identify benign conditions where costless model selection is feasible: gradually increasing class complexity, and diminishing marginal returns for best-in-class policy value with increasing class complexity. Our algorithm is based on a novel misspecification test, and our analysis demonstrates the benefits of using model selection for reward estimation. Unlike prior work on model selection in contextual bandits, our algorithm carefully adapts to the evolving bias-variance trade-off as more data is collected. In particular, our algorithm and analysis go beyond adapting to the complexity of the simplest realizable class and instead adapt to the complexity of the simplest class whose estimation variance dominates the bias. For short horizons, this provides improved regret guarantees that depend on the complexity of simpler classes.
We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if … We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if it is envy-free up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent up to the removal of one good. In addition, an allocation is deemed efficient if it satisfies Pareto optimality (PO). While each of these well-studied properties is easy to achieve separately, achieving them together is far from obvious. Recently, Caragiannis et al. (2016) established the surprising result that when agents have additive valuations for the goods, there always exists an allocation that simultaneously satisfies these two seemingly incompatible properties. Specifically, they showed that an allocation that maximizes the Nash social welfare (NSW) objective is both EF1 and PO. However, the problem of maximizing NSW is NP-hard. As a result, this approach does not provide an efficient algorithm for finding a fair and efficient allocation. In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations that are EF1 and PO; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally PO. Another contribution of our work is to show that our algorithm provides a polynomial-time 1.45-approximation to the NSW objective. This improves upon the best known approximation ratio for this problem (namely, the 2-approximation algorithm of Cole et al. (2017)). Unlike many of the existing approaches, our algorithm is completely combinatorial.
We consider the problem of allocating indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin … We consider the problem of allocating indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin share, that is defined to be the maximum value that an agent can ensure for herself, if she were to partition the goods into n bundles, and then receive a minimum valued bundle. Since maximin fair allocations (i.e., allocations in which each agent gets at least her maximin share) do not always exist, prior work has focused on approximation results that aim to find allocations in which the value of the bundle allocated to each agent is (multiplicatively) as close to her maximin share as possible. In particular, Procaccia and Wang (2014) along with Amanatidis et al. (2015) have shown that under additive valuations a 2/3-approximate maximin fair allocation always exists and can be found in polynomial time. We complement these results by developing a simple and efficient algorithm that achieves the same approximation guarantee. Furthermore, we initiate the study of approximate maximin fair division under submodular valuations. Specifically, we show that when the valuations of the agents are nonnegative, monotone, and submodular, then a 0.21-approximate maximin fair allocation is guaranteed to exist. In fact, we show that such an allocation can be efficiently found by using a simple round-robin algorithm. A technical contribution of the paper is to analyze the performance of this combinatorial algorithm by employing the concept of multilinear extensions.
We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms. To learn … We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms. To learn this, it is sufficient to evaluate the treatment effect of deviating from the behavioral policy at step h after having optimized the policy for all future steps. Since the policy at any step can affect next-state distributions, the related distributional shift challenges can make this problem far more statistically hard than estimating such treatment effects in the stochastic contextual bandit setting. However, the hardness of many real-world RL instances lies between the two regimes. We develop a flexible and general method called selective uncertainty propagation for confidence interval construction that adapts to the hardness of the associated distribution shift challenges. We show benefits of our approach on toy environments and demonstrate the benefits of these techniques for offline policy learning.
We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and … We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and refer to this as the error estimation problem. For a single such tuple, data-driven confidence intervals can often be used to bound the error in our estimate. However, for a class of estimate-estimand-error tuples, nontrivial high probability upper bounds on the maximum error often require class complexity as input -- limiting the practicality of such methods and often resulting in loose bounds. Rather than deriving theoretical class complexity-based bounds, we propose a completely data-driven approach to estimate an upper bound on the maximum error. The simple and general nature of our solution to this fundamental challenge lends itself to several applications including: multiple CI construction, multiple hypothesis testing, estimating excess risk bounds (a fundamental measure of uncertainty in machine learning) for any training/fine-tuning algorithm, and enabling the development of a contextual bandit pipeline that can leverage any reward model estimation procedure as input (without additional mathematical analysis).
We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and … We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and refer to this as the error estimation problem. For a single such tuple, data-driven confidence intervals can often be used to bound the error in our estimate. However, for a class of estimate-estimand-error tuples, nontrivial high probability upper bounds on the maximum error often require class complexity as input -- limiting the practicality of such methods and often resulting in loose bounds. Rather than deriving theoretical class complexity-based bounds, we propose a completely data-driven approach to estimate an upper bound on the maximum error. The simple and general nature of our solution to this fundamental challenge lends itself to several applications including: multiple CI construction, multiple hypothesis testing, estimating excess risk bounds (a fundamental measure of uncertainty in machine learning) for any training/fine-tuning algorithm, and enabling the development of a contextual bandit pipeline that can leverage any reward model estimation procedure as input (without additional mathematical analysis).
We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms. To learn … We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms. To learn this, it is sufficient to evaluate the treatment effect of deviating from the behavioral policy at step h after having optimized the policy for all future steps. Since the policy at any step can affect next-state distributions, the related distributional shift challenges can make this problem far more statistically hard than estimating such treatment effects in the stochastic contextual bandit setting. However, the hardness of many real-world RL instances lies between the two regimes. We develop a flexible and general method called selective uncertainty propagation for confidence interval construction that adapts to the hardness of the associated distribution shift challenges. We show benefits of our approach on toy environments and demonstrate the benefits of these techniques for offline policy learning.
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That … In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter determines the weight placed on cumulative regret minimization (where we establish near-optimal minimax guarantees) versus simple regret minimization (where we establish state-of-the-art guarantees). Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings. This flexibility comes from constructing and relying on "conformal arm sets" (CASs). CASs provide a set of arms for every context, encompassing the context-specific optimal arm with a certain probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted with a negative result, which shows that no algorithm can achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.
Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We show it is more data-efficient to … Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We show it is more data-efficient to estimate any function that explains the reward differences between actions, that is, the treatment effects. Motivated by this observation, building on recent work on oracle-based bandit algorithms, we provide the first reduction of contextual bandits to general-purpose heterogeneous treatment effect estimation, and we design a simple and computationally efficient algorithm based on this reduction. Our theoretical and experimental results demonstrate that heterogeneous treatment effect estimation in contextual bandits offers practical advantages over reward estimation, including more efficient model estimation and greater flexibility to model misspecification.
We design and implement an adaptive experiment (a ``contextual bandit'') to learn a targeted treatment assignment policy, where the goal is to use a participant's survey responses to determine which … We design and implement an adaptive experiment (a ``contextual bandit'') to learn a targeted treatment assignment policy, where the goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation. The design balances two competing objectives: optimizing the outcomes for the subjects in the experiment (``cumulative regret minimization'') and gathering data that will be most useful for policy learning, that is, for learning an assignment rule that will maximize welfare if used after the experiment (``simple regret minimization''). We evaluate alternative experimental designs by collecting pilot data and then conducting a simulation study. Next, we implement our selected algorithm. Finally, we perform a second simulation study anchored to the collected data that evaluates the benefits of the algorithm we chose. Our first result is that the value of a learned policy in this setting is higher when data is collected via a uniform randomization rather than collected adaptively using standard cumulative regret minimization or policy learning algorithms. We propose a simple heuristic for adaptive experimentation that improves upon uniform randomization from the perspective of policy learning at the expense of increasing cumulative regret relative to alternative bandit algorithms. The heuristic modifies an existing contextual bandit algorithm by (i) imposing a lower bound on assignment probabilities that decay slowly so that no arm is discarded too quickly, and (ii) after adaptively collecting data, restricting policy learning to select from arms where sufficient data has been gathered.
We study the problem of model selection for contextual bandits, in which the algorithm must balance the bias-variance trade-off for model estimation while also balancing the exploration-exploitation trade-off. In this … We study the problem of model selection for contextual bandits, in which the algorithm must balance the bias-variance trade-off for model estimation while also balancing the exploration-exploitation trade-off. In this paper, we propose the first reduction of model selection in contextual bandits to offline model selection oracles, allowing for flexible general purpose algorithms with computational requirements no worse than those for model selection for regression. Our main result is a new model selection guarantee for stochastic contextual bandits. When one of the classes in our set is realizable, up to a logarithmic dependency on the number of classes, our algorithm attains optimal realizability-based regret bounds for that class under one of two conditions: if the time-horizon is large enough, or if an assumption that helps with detecting misspecification holds. Hence our algorithm adapts to the complexity of this unknown class. Even when this realizable class is known, we prove improved regret guarantees in early rounds by relying on simpler model classes for those rounds and hence further establish the importance of model selection in contextual bandits.
Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the … Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.
Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the … Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.
Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for … Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for cumulative regret minimization in the stochastic contextual bandit setting. Recent work [Marinov and Zimmert, 2021] identifies instances where no algorithm can guarantee costless regret bounds. Nevertheless, we identify benign conditions where costless model selection is feasible: gradually increasing class complexity, and diminishing marginal returns for best-in-class policy value with increasing class complexity. Our algorithm is based on a novel misspecification test, and our analysis demonstrates the benefits of using model selection for reward estimation. Unlike prior work on model selection in contextual bandits, our algorithm carefully adapts to the evolving bias-variance trade-off as more data is collected. In particular, our algorithm and analysis go beyond adapting to the complexity of the simplest realizable class and instead adapt to the complexity of the simplest class whose estimation variance dominates the bias. For short horizons, this provides improved regret guarantees that depend on the complexity of simpler classes.
We consider the problem of allocating indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin … We consider the problem of allocating indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin share , which is defined to be the maximum value that an agent can ensure for herself, if she were to partition the goods into n bundles, and then receive a minimum valued bundle. Since maximin fair allocations (i.e., allocations in which each agent gets at least her maximin share) do not always exist, prior work has focused on approximation results that aim to find allocations in which the value of the bundle allocated to each agent is (multiplicatively) as close to her maximin share as possible. In particular, Procaccia and Wang (2014) along with Amanatidis et al. (2015) have shown that under additive valuations, a 2/3-approximate maximin fair allocation always exists and can be found in polynomial time. We complement these results by developing a simple and efficient algorithm that achieves the same approximation guarantee. Furthermore, we initiate the study of approximate maximin fair division under submodular valuations . Specifically, we show that when the valuations of the agents are nonnegative , monotone , and submodular, then a 0.21-approximate maximin fair allocation is guaranteed to exist. In fact, we show that such an allocation can be efficiently found by using a simple round-robin algorithm. A technical contribution of the article is to analyze the performance of this combinatorial algorithm by employing the concept of multilinear extensions .
We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) … We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) to that user. In a number of applications (like healthcare), collecting features from users can be costly. To address this issue, we propose algorithms that avoid needless feature collection while maintaining strong regret guarantees.
Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, … Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.
We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) … We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) to that user. In a number of applications (like healthcare), collecting features from users can be costly. To address this issue, we propose algorithms that avoid needless feature collection while maintaining strong regret guarantees.
We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with … We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with additive valuations (Eisenberg and Gale 1959; Orlin 2010), such equilibria, in general, assign goods fractionally to agents. Hence, Fisher markets are not directly applicable in the context of indivisible goods. In this work we show that one can always bypass this hurdle and, up to a bounded change in agents’ budgets, obtain markets that admit an integral equilibrium. We refer to such markets as pure markets and show that, for any given Fisher market (with additive valuations), one can efficiently compute a “near-by,” pure market with an accompanying integral equilibrium.Our work on pure markets leads to novel algorithmic results for fair division of indivisible goods. Prior work in discrete fair division has shown that, under additive valuations, there always exist allocations that simultaneously achieve the seemingly incompatible properties of fairness and efficiency (Caragiannis et al. 2016); here fairness refers to envyfreeness up to one good (EF1) and efficiency corresponds to Pareto efficiency. However, polynomial-time algorithms are not known for finding such allocations. Considering relaxations of proportionality and EF1, respectively, as our notions of fairness, we show that fair and Pareto efficient allocations can be computed in strongly polynomial time.
We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with … We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with additive valuations, such equilibria, in general, assign goods fractionally to agents. Hence, Fisher markets are not directly applicable in the context of indivisible goods. In this work we show that one can always bypass this hurdle and, up to a bounded change in agents' budgets, obtain markets that admit an integral equilibrium. We refer to such markets as pure markets and show that, for any given Fisher market (with additive valuations), one can efficiently compute a near-by, pure market with an accompanying integral equilibrium. Our work on pure markets leads to novel algorithmic results for fair division of indivisible goods. Prior work in discrete fair division has shown that, under additive valuations, there always exist allocations that simultaneously achieve the seemingly incompatible properties of fairness and efficiency; here fairness refers to envy-freeness up to one good (EF1) and efficiency corresponds to Pareto efficiency. However, polynomial-time algorithms are not known for finding such allocations. Considering relaxations of proportionality and EF1, respectively, as our notions of fairness, we show that fair and Pareto efficient allocations can be computed in strongly polynomial time.
We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social … We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social welfare, which is the geometric mean of the valuations of the agents for their bundles. While the problem of maximizing Nash social welfare is known to be APX-hard in general, we study the effectiveness of simple, greedy algorithms in solving this problem in two interesting special cases. First, we show that a simple, greedy algorithm provides a 1.061-approximation guarantee when agents have identical valuations, even though the problem of maximizing Nash social welfare remains NP-hard for this setting. Second, we show that when agents have binary valuations over the goods, an exact solution (i.e., a Nash optimal allocation) can be found in polynomial time via a greedy algorithm. Our results in the binary setting extend to provide novel, exact algorithms for optimizing Nash social welfare under concave valuations. Notably, for the above mentioned scenarios, our techniques provide a simple alternative to several of the existing, more sophisticated techniques for this problem such as constructing equilibria of Fisher markets or using real stable polynomials.
We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if … We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if it is envy-free up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent up to the removal of one good. In addition, an allocation is deemed efficient if it satisfies Pareto efficiency. While each of these well-studied properties is easy to achieve separately, achieving them together is far from obvious. Recently, Caragiannis et al. (2016) established the surprising result that when agents have additive valuations for the goods, there always exists an allocation that simultaneously satisfies these two seemingly incompatible properties. Specifically, they showed that an allocation that maximizes the Nash social welfare objective is both EF1 and Pareto efficient. However, the problem of maximizing Nash social welfare is NP-hard. As a result, this approach does not provide an efficient algorithm for finding a fair and efficient allocation. In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations that are EF1 and Pareto efficient; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally Pareto efficient. Another key contribution of our work is to show that our algorithm provides a polynomial-time 1.45-approximation to the Nash social welfare objective. This improves upon the best known approximation ratio for this problem (namely, the 2-approximation algorithm of Cole et al., 2017), and also matches the lower bound on the integrality gap of the convex program of Cole et al. (2017). Unlike many of the existing approaches, our algorithm is completely combinatorial, and relies on constructing integral Fisher markets wherein specific equilibria are not only efficient, but also fair.
We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness … We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness threshold. Specifically, maximin share is defined as the minimum utility that an agent can guarantee for herself when asked to partition the set of goods into n bundles such that the remaining (n-1) agents pick their bundles adversarially. An allocation is deemed to be fair if every agent gets a bundle whose valuation is at least her maximin share. Even though maximin shares provide a natural benchmark for fairness, it has its own drawbacks and, in particular, it is not sufficient to rule out unsatisfactory allocations. Motivated by these considerations, in this work we define a stronger notion of fairness, called groupwise maximin share guarantee (GMMS). In GMMS, we require that the maximin share guarantee is achieved not just with respect to the grand bundle, but also among all the subgroups of agents. Hence, this solution concept strengthens MMS and provides an ex-post fairness guarantee. We show that in specific settings, GMMS allocations always exist. We also establish the existence of approximate GMMS allocations under additive valuations, and develop a polynomial-time algorithm to find such allocations. Moreover, we establish a scale of fairness wherein we show that GMMS implies approximate envy freeness. Finally, we empirically demonstrate the existence of GMMS allocations in a large set of randomly generated instances. For the same set of instances, we additionally show that our algorithm achieves an approximation factor better than the established, worst-case bound.
We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social … We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social welfare, which is the geometric mean of the valuations of the agents for their bundles. While the problem of maximizing Nash social welfare is known to be APX-hard in general, we study the effectiveness of simple, greedy algorithms in solving this problem in two interesting special cases. First, we show that a simple, greedy algorithm provides a 1.061-approximation guarantee when agents have identical valuations, even though the problem of maximizing Nash social welfare remains NP-hard for this setting. Second, we show that when agents have binary valuations over the goods, an exact solution (i.e., a Nash optimal allocation) can be found in polynomial time via a greedy algorithm. Our results in the binary setting extend to provide novel, exact algorithms for optimizing Nash social welfare under concave valuations. Notably, for the above mentioned scenarios, our techniques provide a simple alternative to several of the existing, more sophisticated techniques for this problem such as constructing equilibria of Fisher markets or using real stable polynomials.
We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with … We study Fisher markets that admit equilibria wherein each good is integrally assigned to some agent. While strong existence and computational guarantees are known for equilibria of Fisher markets with additive valuations, such equilibria, in general, assign goods fractionally to agents. Hence, Fisher markets are not directly applicable in the context of indivisible goods. In this work we show that one can always bypass this hurdle and, up to a bounded change in agents' budgets, obtain markets that admit an integral equilibrium. We refer to such markets as pure markets and show that, for any given Fisher market (with additive valuations), one can efficiently compute a "near-by," pure market with an accompanying integral equilibrium. Our work on pure markets leads to novel algorithmic results for fair division of indivisible goods. Prior work in discrete fair division has shown that, under additive valuations, there always exist allocations that simultaneously achieve the seemingly incompatible properties of fairness and efficiency; here fairness refers to envy-freeness up to one good (EF1) and efficiency corresponds to Pareto efficiency. However, polynomial-time algorithms are not known for finding such allocations. Considering relaxations of proportionality and EF1, respectively, as our notions of fairness, we show that fair and Pareto efficient allocations can be computed in strongly polynomial time.
We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social … We study the problem of fairly allocating a set of indivisible goods among agents with additive valuations. The extent of fairness of an allocation is measured by its Nash social welfare, which is the geometric mean of the valuations of the agents for their bundles. While the problem of maximizing Nash social welfare is known to be APX-hard in general, we study the effectiveness of simple, greedy algorithms in solving this problem in two interesting special cases. First, we show that a simple, greedy algorithm provides a 1.061-approximation guarantee when agents have identical valuations, even though the problem of maximizing Nash social welfare remains NP-hard for this setting. Second, we show that when agents have binary valuations over the goods, an exact solution (i.e., a Nash optimal allocation) can be found in polynomial time via a greedy algorithm. Our results in the binary setting extend to provide novel, exact algorithms for optimizing Nash social welfare under concave valuations. Notably, for the above mentioned scenarios, our techniques provide a simple alternative to several of the existing, more sophisticated techniques for this problem such as constructing equilibria of Fisher markets or using real stable polynomials.
We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if … We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if it is envy-free up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent up to the removal of one good. In addition, an allocation is deemed efficient if it satisfies Pareto optimality (PO). While each of these well-studied properties is easy to achieve separately, achieving them together is far from obvious. Recently, Caragiannis et al. (2016) established the surprising result that when agents have additive valuations for the goods, there always exists an allocation that simultaneously satisfies these two seemingly incompatible properties. Specifically, they showed that an allocation that maximizes the Nash social welfare (NSW) objective is both EF1 and PO. However, the problem of maximizing NSW is NP-hard. As a result, this approach does not provide an efficient algorithm for finding a fair and efficient allocation. In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations that are EF1 and PO; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally PO. Another contribution of our work is to show that our algorithm provides a polynomial-time 1.45-approximation to the NSW objective. This improves upon the best known approximation ratio for this problem (namely, the 2-approximation algorithm of Cole et al. (2017)). Unlike many of the existing approaches, our algorithm is completely combinatorial.
We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness … We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness threshold. Specifically, maximin share is defined as the minimum utility that an agent can guarantee for herself when asked to partition the set of goods into n bundles such that the remaining (n-1) agents pick their bundles adversarially. An allocation is deemed to be fair if every agent gets a bundle whose valuation is at least her maximin share. Even though maximin shares provide a natural benchmark for fairness, it has its own drawbacks and, in particular, it is not sufficient to rule out unsatisfactory allocations. Motivated by these considerations, in this work we define a stronger notion of fairness, called groupwise maximin share guarantee (GMMS). In GMMS, we require that the maximin share guarantee is achieved not just with respect to the grand bundle, but also among all the subgroups of agents. Hence, this solution concept strengthens MMS and provides an ex-post fairness guarantee. We show that in specific settings, GMMS allocations always exist. We also establish the existence of approximate GMMS allocations under additive valuations, and develop a polynomial-time algorithm to find such allocations. Moreover, we establish a scale of fairness wherein we show that GMMS implies approximate envy freeness. Finally, we empirically demonstrate the existence of GMMS allocations in a large set of randomly generated instances. For the same set of instances, we additionally show that our algorithm achieves an approximation factor better than the established, worst-case bound.
We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if … We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if it is envy-free up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent up to the removal of one good. In addition, an allocation is deemed efficient if it satisfies Pareto optimality (PO). While each of these well-studied properties is easy to achieve separately, achieving them together is far from obvious. Recently, Caragiannis et al. (2016) established the surprising result that when agents have additive valuations for the goods, there always exists an allocation that simultaneously satisfies these two seemingly incompatible properties. Specifically, they showed that an allocation that maximizes the Nash social welfare (NSW) objective is both EF1 and PO. However, the problem of maximizing NSW is NP-hard. As a result, this approach does not provide an efficient algorithm for finding a fair and efficient allocation. In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations that are EF1 and PO; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally PO. Another contribution of our work is to show that our algorithm provides a polynomial-time 1.45-approximation to the NSW objective. This improves upon the best known approximation ratio for this problem (namely, the 2-approximation algorithm of Cole et al. (2017)). Unlike many of the existing approaches, our algorithm is completely combinatorial.
We consider the problem of allocating indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin … We consider the problem of allocating indivisible goods fairly among n agents who have additive and submodular valuations for the goods. Our fairness guarantees are in terms of the maximin share, that is defined to be the maximum value that an agent can ensure for herself, if she were to partition the goods into n bundles, and then receive a minimum valued bundle. Since maximin fair allocations (i.e., allocations in which each agent gets at least her maximin share) do not always exist, prior work has focused on approximation results that aim to find allocations in which the value of the bundle allocated to each agent is (multiplicatively) as close to her maximin share as possible. In particular, Procaccia and Wang (2014) along with Amanatidis et al. (2015) have shown that under additive valuations a 2/3-approximate maximin fair allocation always exists and can be found in polynomial time. We complement these results by developing a simple and efficient algorithm that achieves the same approximation guarantee. Furthermore, we initiate the study of approximate maximin fair division under submodular valuations. Specifically, we show that when the valuations of the agents are nonnegative, monotone, and submodular, then a 0.21-approximate maximin fair allocation is guaranteed to exist. In fact, we show that such an allocation can be efficiently found by using a simple round-robin algorithm. A technical contribution of the paper is to analyze the performance of this combinatorial algorithm by employing the concept of multilinear extensions.
Personalized web services strive to adapt their services (advertisements, news articles, etc) to individual users by making use of both content and user information. Despite a few recent advances, this … Personalized web services strive to adapt their services (advertisements, news articles, etc) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.
We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We … We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time $\mathrm{polylog}(N)$, where $N$ is the number of classification rules among which the oracle might choose. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work.
We consider the general (stochastic) contextual bandit problem under the realizability assumption, that is, the expected reward, as a function of contexts and actions, belongs to a general function class … We consider the general (stochastic) contextual bandit problem under the realizability assumption, that is, the expected reward, as a function of contexts and actions, belongs to a general function class [Formula: see text]. We design a fast and simple algorithm that achieves the statistically optimal regret with only [Formula: see text] calls to an offline regression oracle across all T rounds. The number of oracle calls can be further reduced to [Formula: see text] if T is known in advance. Our results provide the first universal and optimal reduction from contextual bandits to offline regression, solving an important open problem in the contextual bandit literature. A direct consequence of our results is that any advances in offline regression immediately translate to contextual bandits, statistically and computationally. This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.
We consider Max-min Share (MmS) allocations of items both in the case where items are goods (positive utility) and when they are chores (negative utility). We show that fair allocations … We consider Max-min Share (MmS) allocations of items both in the case where items are goods (positive utility) and when they are chores (negative utility). We show that fair allocations of goods and chores have some fundamental connections but differences as well. We prove that like in the case for goods, an MmS allocation does not need to exist for chores and computing an MmS allocation - if it exists - is strongly NP-hard. In view of these non-existence and complexity results, we present a polynomial-time 2-approximation algorithm for MmS fairness for chores. We then introduce a new fairness concept called optimal MmS that represents the best possible allocation in terms of MmS that is guaranteed to exist. For both goods and chores, we use connections to parallel machine scheduling to give (1) an exponential-time exact algorithm and (2) a polynomial-time approximation scheme for computing an optimal MmS allocation when the number of agents is fixed.
A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational … A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods. Our algorithms leverage the availability of a regression oracle for the value-function class, a more realistic and reasonable oracle than the classification oracles over policies typically assumed by agnostic methods. Our approach generalizes both UCB and LinUCB to far more expressive possible model classes and achieves low regret under certain distributional assumptions. In an extensive empirical evaluation, compared to both realizability-based and agnostic baselines, we find that our approach typically gives comparable or superior results.
Abstract : Under the pari-mutuel system of betting on horse races the final track's odds are in some sense a consensus of the 'subjective odds' of the individual bettors weighted … Abstract : Under the pari-mutuel system of betting on horse races the final track's odds are in some sense a consensus of the 'subjective odds' of the individual bettors weighted by the amounts of their bets. The properties which this consensus must possess and prove that there always exists a unique set of odds having the required properties are formulated. (Author)
We study model selection in stochastic bandit problems. Our approach relies on a master algorithm that selects its actions among candidate base algorithms. While this problem is studied for specific … We study model selection in stochastic bandit problems. Our approach relies on a master algorithm that selects its actions among candidate base algorithms. While this problem is studied for specific classes of stochastic base algorithms, our objective is to provide a method that can work with more general classes of stochastic base algorithms. We propose a master algorithm inspired by CORRAL \cite{DBLP:conf/colt/AgarwalLNS17} and introduce a novel and generic smoothing transformation for stochastic bandit algorithms that permits us to obtain $O(\sqrt{T})$ regret guarantees for a wide class of base algorithms when working along with our master. We exhibit a lower bound showing that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically. We apply our algorithm to choose among different values of $\epsilon$ for the $\epsilon$-greedy algorithm, and to choose between the $k$-armed UCB and linear UCB algorithms. Our empirical studies further confirm the effectiveness of our model-selection method.
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit problems. Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear … We consider the problem of online learning in misspecified linear stochastic multi-armed bandit problems. Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features. It is, however, of interest to investigate the impact of potential misspecification in linear bandit models, where the expected rewards are perturbed away from the linear subspace determined by the arms features. Although OFUL has recently been shown to be robust to relatively small deviations from linearity, we show that any linear bandit algorithm that enjoys optimal regret performance in the perfectly linear setting (e.g., OFUL) must suffer linear regret under a sparse additive perturbation of the linear model. In an attempt to overcome this negative result,we define a natural class of bandit models characterized by a non-sparse deviation from linearity. We argue that the OFUL algorithm can fail to achieve sublinear regret even under models that have non-sparse deviation. We finally develop a novel bandit algorithm, comprising a hypothesis test for linearity followed by a decision to use either the OFUL or Upper Confidence Bound (UCB) algorithm. For perfectly linear bandit models, the algorithm provably exhibits OFULs favorable regret performance, while for misspecified models satisfying the non-sparse deviation property, the algorithm avoids the linear regret phenomenon and falls back on UCBs sublinear regret scaling. Numerical experiments on synthetic data, and on recommendation data from the public Yahoo! Learning toRank Challenge dataset, empirically support our findings.
We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result … We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result is a new model selection guarantee for linear contextual bandits. We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}l(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$. The algorithm also achieves regret $\tilde{O}(T^{3/4} + \sqrt{Td_{m^\star}})$, which is optimal for $d_{m^{\star}}\geq{}\sqrt{T}$. This is the first model selection result for contextual bandits with non-vacuous regret for all values of $d_{m^\star}$, and to the best of our knowledge is the first positive result of this type for any online learning setting with partial information. The core of the algorithm is a new estimator for the gap in the best loss achievable by two linear policy classes, which we show admits a convergence rate faster than the rate required to learn the parameters for either class.
A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on … A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Reinforcement-based learning algorithms such as contextual bandits can be very … Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Reinforcement-based learning algorithms such as contextual bandits can be very effective in these settings, but applying them in practice is fraught with technical debt, and no general system exists that supports them completely. We address this and create the first general system for contextual learning, called the Decision Service. Existing systems often suffer from technical debt that arises from issues like incorrect data collection and weak debuggability, issues we systematically address through our ML methodology and system abstractions. The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy. Notably, our new explore and log abstractions ensure the system produces correct, unbiased data, which our learner uses for online learning and to enable real-time safeguards, all in a fully reproducible manner. The Decision Service has a simple user interface and works with a variety of applications: we present two live production deployments for content recommendation that achieved click-through improvements of 25-30%, another with 18% revenue lift in the landing page, and ongoing applications in tech support and machine failure handling. The service makes real-time decisions and learns continuously and scalably, while significantly lowering technical debt.
The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning. However, existing optimistic algorithms (primarily … The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning. However, existing optimistic algorithms (primarily UCB and its variants) often struggle to deal with general function classes and large context spaces. In this paper, we study general contextual bandits with an offline regression oracle and propose a simple, generic principle to design optimistic algorithms, dubbed "Upper Counterfactual Confidence Bounds" (UCCB). The key innovation of UCCB is building confidence bounds in policy space, rather than in action space as is done in UCB. We demonstrate that these algorithms are provably optimal and computationally efficient in handling general function classes and large context spaces. Furthermore, we illustrate that the UCCB principle can be seamlessly extended to infinite-action general contextual bandits, provide the first solutions to these settings when employing an offline regression oracle.
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? … In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits can be achieved for rich, general classes of policies. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case. Finally, we provide structural results that tie together a number of complexity measures previously proposed throughout contextual bandits, reinforcement learning, and active learning and elucidate their role in determining the optimal instance-dependent regret. In a large-scale empirical evaluation, we find that our approach often gives superior results for challenging exploration problems. Turning our focus to reinforcement learning with function approximation, we develop new oracle-efficient algorithms for reinforcement learning with rich observations that obtain optimal gap-dependent sample complexity.
We present the first constant-factor approximation algorithm for maximizing the Nash social welfare when allocating indivisible items to agents with budget-additive valuation functions. Budget-additive valuations represent an important class of … We present the first constant-factor approximation algorithm for maximizing the Nash social welfare when allocating indivisible items to agents with budget-additive valuation functions. Budget-additive valuations represent an important class of submodular functions. They have attracted a lot of research interest in recent years due to many interesting applications. For every $\varepsilon > 0$, our algorithm obtains a $(2.404 + \varepsilon)$-approximation in time polynomial in the input size and $1/\varepsilon$. Our algorithm relies on rounding an approximate equilibrium in a linear Fisher market where sellers have earning limits (upper bounds on the amount of money they want to earn) and buyers have utility limits (upper bounds on the amount of utility they want to achieve). In contrast to markets with either earning or utility limits, these markets have not been studied before. They turn out to have fundamentally different properties. Although the existence of equilibria is not guaranteed, we show that the market instances arising from the Nash social welfare problem always have an equilibrium. Further, we show that the set of equilibria is not convex, answering a question of [Cole et al, EC 2017]. We design an FPTAS to compute an approximate equilibrium, a result that may be of independent interest.
We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if … We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if it is envy-free up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent up to the removal of one good. In addition, an allocation is deemed efficient if it satisfies Pareto optimality (PO). While each of these well-studied properties is easy to achieve separately, achieving them together is far from obvious. Recently, Caragiannis et al. (2016) established the surprising result that when agents have additive valuations for the goods, there always exists an allocation that simultaneously satisfies these two seemingly incompatible properties. Specifically, they showed that an allocation that maximizes the Nash social welfare (NSW) objective is both EF1 and PO. However, the problem of maximizing NSW is NP-hard. As a result, this approach does not provide an efficient algorithm for finding a fair and efficient allocation. In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations that are EF1 and PO; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally PO. Another contribution of our work is to show that our algorithm provides a polynomial-time 1.45-approximation to the NSW objective. This improves upon the best known approximation ratio for this problem (namely, the 2-approximation algorithm of Cole et al. (2017)). Unlike many of the existing approaches, our algorithm is completely combinatorial.
We consider fair allocation of indivisible items under an additional constraint: there is an undirected graph describing the relationship between the items, and each agent's share must form a connected … We consider fair allocation of indivisible items under an additional constraint: there is an undirected graph describing the relationship between the items, and each agent's share must form a connected subgraph of this graph. This framework captures, e.g., fair allocation of land plots, where the graph describes the accessibility relation among the plots. We focus on agents that have additive utilities for the items, and consider several common fair division solution concepts, such as proportionality, envy-freeness and maximin share guarantee. While finding good allocations according to these solution concepts is computationally hard in general, we design efficient algorithms for special cases where the underlying graph has simple structure, and/or the number of agents -or, less restrictively, the number of agent types- is small. In particular, despite non-existence results in the general case, we prove that for acyclic graphs a maximin share allocation always exists and can be found efficiently.
Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and … Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use.This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, product recommendation, assortment, active learning with neural networks, and reinforcement learning in Markov decision processes.Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions.We will also discuss when and why Thompson sampling is or is not effective and relations to alternative algorithms.
We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if … We study the problem of allocating a set of indivisible goods among a set of agents in a fair and efficient manner. An allocation is said to be fair if it is envy-free up to one good (EF1), which means that each agent prefers its own bundle over the bundle of any other agent up to the removal of one good. In addition, an allocation is deemed efficient if it satisfies Pareto efficiency. While each of these well-studied properties is easy to achieve separately, achieving them together is far from obvious. Recently, Caragiannis et al. (2016) established the surprising result that when agents have additive valuations for the goods, there always exists an allocation that simultaneously satisfies these two seemingly incompatible properties. Specifically, they showed that an allocation that maximizes the Nash social welfare objective is both EF1 and Pareto efficient. However, the problem of maximizing Nash social welfare is NP-hard. As a result, this approach does not provide an efficient algorithm for finding a fair and efficient allocation. In this paper, we bypass this barrier, and develop a pseudopolynomial time algorithm for finding allocations that are EF1 and Pareto efficient; in particular, when the valuations are bounded, our algorithm finds such an allocation in polynomial time. Furthermore, we establish a stronger existence result compared to Caragiannis et al. (2016): For additive valuations, there always exists an allocation that is EF1 and fractionally Pareto efficient. Another key contribution of our work is to show that our algorithm provides a polynomial-time 1.45-approximation to the Nash social welfare objective. This improves upon the best known approximation ratio for this problem (namely, the 2-approximation algorithm of Cole et al., 2017), and also matches the lower bound on the integrality gap of the convex program of Cole et al. (2017). Unlike many of the existing approaches, our algorithm is completely combinatorial, and relies on constructing integral Fisher markets wherein specific equilibria are not only efficient, but also fair.
In fair division of indivisible goods, using sequences of sincere choices (or picking sequences) is a natural way to allocate the objects. The idea is the following: at each stage, … In fair division of indivisible goods, using sequences of sincere choices (or picking sequences) is a natural way to allocate the objects. The idea is the following: at each stage, a designated agent picks one object among those that remain. This paper, restricted to the case where the agents have numerical additive preferences over objects, revisits to some extent the seminal paper by Brams and King [9] which was specific to ordinal and linear order preferences over items. We point out similarities and differences with this latter context. In particular, we show that any Pareto-optimal allocation (under additive preferences) is sequenceable, but that the converse is not true anymore. This asymmetry leads naturally to the definition of a "scale of efficiency" having three steps: Pareto-optimality, sequenceability without Pareto-optimality, and non-sequenceability. Finally, we investigate the links between these efficiency properties and the "scale of fairness" we have described in an earlier work [7]: we first show that an allocation can be envy-free and non-sequenceable, but that every competitive equilibrium with equal incomes is sequenceable. Then we experimentally explore the links between the scales of efficiency and fairness.
We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by … We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling.Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context (TSRC) and the Windows Thompson Sampling with Restricted Context (WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasets.
The goal of division is to distribute resources among competing players in a fair way. Envy-freeness is the most extensively studied fairness notion in division. Envy-free allocations do not always … The goal of division is to distribute resources among competing players in a fair way. Envy-freeness is the most extensively studied fairness notion in division. Envy-free allocations do not always exist with indivisible goods, motivating the study of relaxed versions of envy-freeness. We study the envy-freeness up to any good (EFX) property, which states that no player prefers the bundle of another player following the removal of any single good, and prove the first general results about this property. We use the leximin solution to show existence of EFX allocations in several contexts, sometimes in conjunction with Pareto optimality. For two players with valuations obeying a mild assumption, one of these results provides stronger guarantees than the currently deployed algorithm on Spliddit, a popular division website. Unfortunately, finding the leximin solution can require exponential time. We show that this is necessary by proving an exponential lower bound on the number of value queries needed to identify an EFX allocation, even for two players with identical valuations. We consider both additive and more general valuations, and our work suggests that there is a rich landscape of problems to explore in the division of indivisible goods with different classes of player valuations.
We study a non-parametric multi-armed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on … We study a non-parametric multi-armed bandit problem with stochastic covariates, where a key complexity driver is the smoothness of payoff functions with respect to covariates. Previous studies have focused on deriving minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, the smoothness of payoff functions is typically not known in advance, and misspecification of smoothness may severely deteriorate the performance of existing methods. In this work, we consider a framework where the smoothness of payoff functions is not known, and study when and how algorithms may adapt to unknown smoothness. First, we establish that designing algorithms that adapt to unknown smoothness of payoff functions is, in general, impossible. However, under a self-similarity condition (which does not reduce the minimax complexity of the dynamic optimization problem at hand), we establish that adapting to unknown smoothness is possible, and further devise a general policy for achieving smoothness-adaptive performance. Our policy infers the smoothness of payoffs throughout the decision-making process, while leveraging the structure of non-adaptive off-the-shelf policies. We establish that for problem settings with either differentiable or non-differentiable payoff functions this policy matches (up to a logarithmic scale) the regret rate that is achievable when the smoothness of payoffs is known a priori.
A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on … A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
We consider a stochastic contextual bandit problem where the dimension $d$ of the feature vectors is potentially large, however, only a sparse subset of features of cardinality $s_0 \ll d$ … We consider a stochastic contextual bandit problem where the dimension $d$ of the feature vectors is potentially large, however, only a sparse subset of features of cardinality $s_0 \ll d$ affect the reward function. Essentially all existing algorithms for sparse bandits require a priori knowledge of the value of the sparsity index $s_0$. This knowledge is almost never available in practice, and misspecification of this parameter can lead to severe deterioration in the performance of existing methods. The main contribution of this paper is to propose an algorithm that does not require prior knowledge of the sparsity index $s_0$ and establish tight regret bounds on its performance under mild conditions. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms existing methods, even when the correct sparsity index is revealed to them but is kept hidden from our algorithm.
The contextual bandit literature has traditionally focused on algorithms that address the exploration–exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be suboptimal in general. … The contextual bandit literature has traditionally focused on algorithms that address the exploration–exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be suboptimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate optimal (achieves asymptotically optimal regret) if there is sufficient randomness in the observed contexts (covariates). We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term covariate diversity. Furthermore, even absent this condition, we show that a greedy algorithm can be rate optimal with positive probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces exploration and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or upper confidence bound. This paper was accepted by J. George Shanthikumar, big data analytics.
Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, … Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.
In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends … In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. We present Thresholded Lasso bandit, an algorithm that (i) estimates the vector defining the reward function as well as its sparse support, i.e., significant feature elements, using the Lasso framework with thresholding, and (ii) selects an arm greedily according to this estimate projected on its support. The algorithm does not require prior knowledge of the sparsity index $s_0$ and can be parameter-free under some symmetric assumptions. For this simple algorithm, we establish non-asymptotic regret upper bounds scaling as $\mathcal{O}( \log d + \sqrt{T} )$ in general, and as $\mathcal{O}( \log d + \log T)$ under the so-called margin condition (a probabilistic condition on the separation of the arm rewards). The regret of previous algorithms scales as $\mathcal{O}( \log d + \sqrt{T \log (d T)})$ and $\mathcal{O}( \log T \log d)$ in the two settings, respectively. Through numerical experiments, we confirm that our algorithm outperforms existing methods.
We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together … We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang [Ann. Statist. 40 (2012) 1878–1905] for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the $Q$-aggregation procedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled.
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static … We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a margin parameter. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination (abse) that adaptively decomposes the global problem into suitably "localized" static bandit problems. This policy constructs an adaptive partition using a variant of the Successive Elimination (se) policy. Our results include sharper regret bounds for the se policy in a static bandit problem and minimax optimal regret bounds for the abse policy in the dynamic problem.
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on … Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.
Because many illnesses show heterogeneous response to treatment, there is increasing interest in individualizing treatment to patients [Arch. Gen. Psychiatry 66 (2009) 128–133]. An individualized treatment rule is a decision … Because many illnesses show heterogeneous response to treatment, there is increasing interest in individualizing treatment to patients [Arch. Gen. Psychiatry 66 (2009) 128–133]. An individualized treatment rule is a decision rule that recommends treatment according to patient characteristics. We consider the use of clinical trial data in the construction of an individualized treatment rule leading to highest mean response. This is a difficult computational problem because the objective function is the expectation of a weighted indicator function that is nonconcave in the parameters. Furthermore, there are frequently many pretreatment variables that may or may not be useful in constructing an optimal individualized treatment rule, yet cost and interpretability considerations imply that only a few variables should be used by the individualized treatment rule. To address these challenges, we consider estimation based on l1-penalized least squares. This approach is justified via a finite sample upper bound on the difference between the mean response due to the estimated individualized treatment rule and the mean response due to the optimal individualized treatment rule.
We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we … We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we only require that each base algorithm comes with a candidate regret bound that may or may not hold during all rounds. In each round, our approach plays a base algorithm to keep the candidate regret bounds of all remaining base algorithms balanced, and eliminates algorithms that violate their candidate bound. We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor. This factor is reasonably small in several applications, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and LinUCB applied to linear bandits with different confidence parameters. We further show that, under a suitable gap-assumption, this factor only scales with the number of base algorithms and not their complexity when the number of rounds is large enough. Finally, unlike recent efforts in model selection for linear stochastic bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment, rather than a stochastic one.
Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the … Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms for more general function approximation classes.
We consider the problem of model selection for two popular stochastic linear bandit settings, and propose algorithms that adapts to the unknown problem complexity. In the first setting, we consider … We consider the problem of model selection for two popular stochastic linear bandit settings, and propose algorithms that adapts to the unknown problem complexity. In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i \in [K]$, is $μ_i+ \langle α_{i,t},θ^* \rangle $, with $α_{i,t} \in \mathbb{R}^d$ being the known context vector and $μ_i \in [-1,1]$ and $θ^*$ are unknown parameters. We define $\|θ^*\|$ as the problem complexity and consider a sequence of nested hypothesis classes, each positing a different upper bound on $\|θ^*\|$. Exploiting this, we propose Adaptive Linear Bandit (ALB), a novel phase based algorithm that adapts to the true problem complexity, $\|θ^*\|$. We show that ALB achieves regret scaling of $O(\|θ^*\|\sqrt{T})$, where $\|θ^*\|$ is apriori unknown. As a corollary, when $θ^*=0$, ALB recovers the minimax regret for the simple bandit algorithm without such knowledge of $θ^*$. ALB is the first algorithm that uses parameter norm as model section criteria for linear bandits. Prior state of art algorithms \cite{osom} achieve a regret of $O(L\sqrt{T})$, where $L$ is the upper bound on $\|θ^*\|$, fed as an input to the problem. In the second setting, we consider the standard linear bandit problem (with possibly an infinite number of arms) where the sparsity of $θ^*$, denoted by $d^* \leq d$, is unknown to the algorithm. Defining $d^*$ as the problem complexity, we show that ALB achieves $O(d^*\sqrt{T})$ regret, matching that of an oracle who knew the true sparsity level. This methodology is then extended to the case of finitely many arms and similar results are proven. This is the first algorithm that achieves such model selection guarantees. We further verify our results via synthetic and real-data experiments.
Competitive equilibrium from equal incomes (CEEI) is a classic solution to the problem of fair and efficient allocation of goods [Foley'67, Varian'74]. Every agent receives an equal budget of artificial … Competitive equilibrium from equal incomes (CEEI) is a classic solution to the problem of fair and efficient allocation of goods [Foley'67, Varian'74]. Every agent receives an equal budget of artificial currency with which to purchase goods, and prices match demand and supply. However, a CEEI is not guaranteed to exist when the goods are indivisible, even in the simple two-agent, single-item market. Yet, it is easy to see that once the two budgets are slightly perturbed (made generic), a competitive equilibrium does exist. In this paper we aim to extend this approach beyond the single-item case, and study the existence of equilibria in markets with two agents and additive preferences over multiple items. We show that for agents with equal budgets, making the budgets generic -- by adding vanishingly small random perturbations -- ensures the existence of an equilibrium. We further consider agents with arbitrary non-equal budgets, representing non-equal entitlements for goods. We show that competitive equilibrium guarantees a new notion of fairness among non-equal agents, and that it exists in cases of interest (like when the agents have identical preferences) if budgets are perturbed. Our results open opportunities for future research on generic equilibrium existence and fair treatment of non-equals.
We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is … We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward. We derive general lower bounds on the performance of any admissible policy, and develop an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. This is done by decomposing the global problem into suitably "localized" bandit problems. Proofs blend ideas from nonparametric statistics and traditional methods used in the bandit literature.
A number of recent results on optimization problems involving submodular functions have made use of the multilinear relaxation of the problem. These results hold typically in the value oracle model, … A number of recent results on optimization problems involving submodular functions have made use of the multilinear relaxation of the problem. These results hold typically in the value oracle model, where the objective function is accessible via a black box returning $f(S)$ for a given $S$. We present a general approach to deriving inapproximability results in the value oracle model, based on the notion of symmetry gap. Our main result is that for any fixed instance that exhibits a certain symmetry gap in its multilinear relaxation, there is a naturally related class of instances for which a better approximation factor than the symmetry gap would require exponentially many oracle queries. This unifies several known hardness results for submodular maximization, e.g., the optimality of $(1-1/e)$-approximation for monotone submodular maximization under a cardinality constraint and the impossibility of $(\frac12+\epsilon)$-approximation for unconstrained (nonmonotone) submodular maximization. As a new application, we consider the problem of maximizing a nonmonotone submodular function over the bases of a matroid. A $(\frac16-o(1))$-approximation has been developed for this problem, assuming that the matroid contains two disjoint bases. We show that the best approximation one can achieve is indeed related to packings of bases in the matroid. Specifically, for any $k \geq 2$, there is a class of matroids of fractional base packing number $\nu = \frac{k}{k-1}$ such that any algorithm achieving a better than $(1-\frac{1}{\nu}) = \frac{1}{k}$-approximation for this class would require exponentially many value queries. In particular, there is no constant factor approximation for maximizing a nonmonotone submodular function over the bases of a general matroid. On the positive side, we present a $\frac12 (1-\frac{1}{\nu}-o(1))$-approximation algorithm assuming fractional base packing number at least $\nu$, where $\nu \in (1,2]$. We also present an improved $0.309$-approximation for maximization of a nonmonotone submodular function subject to a matroid independence constraint, improving the previously known factor of $\frac14-\epsilon$. For this problem, we obtain a hardness of $(\frac12 + \epsilon)$-approximation for any fixed $\epsilon>0$.
Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin … Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization and prove them at a very general level, for general margin and power tail behavior of the excess losses. These bounds we then apply to typical examples.
A well-studied nonlinear extension of the minimum-cost flow problem is to minimize the objective ∑ij∈E Cij(fij) over feasible flows f, where on every arc ij of the network, Cij is … A well-studied nonlinear extension of the minimum-cost flow problem is to minimize the objective ∑ij∈E Cij(fij) over feasible flows f, where on every arc ij of the network, Cij is a convex function. We give a strongly polynomial algorithm for finding an exact optimal solution for a broad class of such problems. The key characteristic of this class is that an optimal solution can be computed exactly provided its support. This includes separable convex quadratic objectives and also certain market equilibria problems: Fisher's market with linear and with spending constraint utilities. We thereby give the first strongly polynomial algorithms for separable quadratic minimum-cost flows and for Fisher's market with spending constraint utilities, settling open questions posed e.g. in [15] and in [35], respectively. The running time is O(m4 log m) for quadratic costs, O(n4+n2(m+n log n) log n) for Fisher's markets with linear utilities and O(mn3 +m2(m+n log n) log m) for spending constraint utilities.
We consider the problem of maximizing a nonnegative submodular set function $f:2^N \rightarrow {\mathbb R}_+$ over a ground set $N$ subject to a variety of packing-type constraints including (multiple) matroid … We consider the problem of maximizing a nonnegative submodular set function $f:2^N \rightarrow {\mathbb R}_+$ over a ground set $N$ subject to a variety of packing-type constraints including (multiple) matroid constraints, knapsack constraints, and their intersections. In this paper we develop a general framework that allows us to derive a number of new results, in particular, when $f$ may be a nonmonotone function. Our algorithms are based on (approximately) maximizing the multilinear extension $F$ of $f$ over a polytope $P$ that represents the constraints, and then effectively rounding the fractional solution. Although this approach has been used quite successfully, it has been limited in some important ways. We overcome these limitations as follows. First, we give constant factor approximation algorithms to maximize $F$ over a downward-closed polytope $P$ described by an efficient separation oracle. Previously this was known only for monotone functions. For nonmonotone functions, a constant factor was known only when the polytope was either the intersection of a fixed number of knapsack constraints or a matroid polytope. Second, we show that contention resolution schemes are an effective way to round a fractional solution, even when $f$ is nonmonotone. In particular, contention resolution schemes for different polytopes can be combined to handle the intersection of different constraints. Via linear programming duality we show that a contention resolution scheme for a constraint is related to the correlation gap of weighted rank functions of the constraint. This leads to an optimal contention resolution scheme for the matroid polytope. Our results provide a broadly applicable framework for maximizing linear and submodular functions subject to independence constraints. We give several illustrative examples. Contention resolution schemes may find other applications.
We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We … We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-δ$ while incurring regret at most $O(\sqrt{KT\ln(N/δ)})$ over $T$ time steps. The new algorithm is tested empirically in a large-scale, real-world dataset. Second, we give a new algorithm called VE that competes with a possibly infinite set of policies of VC-dimension $d$ while incurring regret at most $O(\sqrt{T(d\ln(T) + \ln (1/δ))})$ with probability $1-δ$. These guarantees improve on those of all previous algorithms, whether in a stochastic or adversarial environment, and bring us closer to providing supervised learning type guarantees for the contextual bandit setting.
We consider the Max-Min Allocation problem: given a set A of m agents and a set I of n items, where agent A ¿ A has utility u <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" … We consider the Max-Min Allocation problem: given a set A of m agents and a set I of n items, where agent A ¿ A has utility u <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">A</sub> ,i for item i ¿ I, our goal is to allocate items to agents so as to maximize fairness. Specifically, the utility of an agent is the sum of its utilities for the items it receives, and we seek to maximize the minimum utility of any agent. While this problem has received much attention recently, its approximability has not been well-understood thus far: the best known approximation algorithm achieves an O¿(¿m)-approximation, and in contrast, the best known hardness of approximation stands at 2. Our main result is an algorithm that achieves an O¿(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">¿</sup> )-approximation for any ¿ = ¿((log log n)/(log n)) in time n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(1/¿)</sup> . In particular, we obtain poly-logarithmic approximation in quasipolynomial time, and for every constant ¿ > 0, we obtain an O¿(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">¿</sup> )-approximation in polynomial time. An interesting technical aspect of our algorithm is that we use as a building block a linear program whose integrality gap is ¿(¿m). We bypass this obstacle by iteratively using the solutions produced by the LP to construct new instances with significantly smaller integrality gaps, eventually obtaining the desired approximation. As a corollary of our main result, we also show that for any constant ¿ > 0, an O(m <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">¿</sup> )-approximation can be achieved in quasi-polynomial time. We also investigate the special case of the problem, where every item has non-zero utility for at most two agents. This problem is hard to approximate up to any factor better than 2. We give a factor 2-approximation algorithm.