Author Description

Login to generate an author description

Ask a Question About This Mathematician

Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns … Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m\times n$ matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let $A'=CUR$. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-\delta$, $\|A-A'\|_F\leq(1+\epsilon)\,\|A-A_k\|_F$, where $A_k$ is the “best” rank-k approximation provided by truncating the SVD of A, and where $\|X\|_F$ is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a low-degree polynomial in k, $1/\epsilon$, and $\log(1/\delta)$. Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top k singular vectors of A. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-A” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.
We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, … We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani and Vazirani gives an approximation ratio of 1- 1/e ¿ 0.632, a very familiar bound that holds for many online problems; further, the bound is tight in this case. In the online, stochastic case when nodes are drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, but still, no algorithm is known that beats the 1 - 1/e bound. Our main result is a 0.67-approximation online algorithm for stochastic bipartite matching, breaking this 1 - ¿ barrier. Furthermore, we show that no online algorithm can produce a 1 - ¿ approximation for an arbitrarily small e for this problem. Our algorithms are based on computing an optimal offline solution to the expected instance, and using this solution as a guideline in the process of online allocation. We employ a novel application of the idea of the power of two choices from load balancing: we compute two disjoint solutions to the expected instance, and use both of them in the online algorithm in a prescribed preference order. To identify these two disjoint solutions, we solve a max flow problem in a boosted flow graph, and then carefully decompose this maximum flow to two edge-disjoint (near-)matchings. In addition to guiding the online decision making, these two offline solutions are used to characterize an upper bound for the optimum in any scenario. This is done by identifying a cut whose value we can bound under the arrival distribution. At the end, we discuss extensions of our results to more general bipartite allocations that are important in a display ad application.
Internet search companies sell advertisement slots based on users' search queries via an auction. While there has been previous work onthe auction process and its game-theoretic aspects, most of it … Internet search companies sell advertisement slots based on users' search queries via an auction. While there has been previous work onthe auction process and its game-theoretic aspects, most of it focuses on the Internet company. In this work, we focus on the advertisers, who must solve a complex optimization problem to decide how to place bids on keywords to maximize their return (the number of user clicks on their ads) for a given budget. We model the entire process and study this budget optimization problem. While most variants are NP-hard, we show, perhaps surprisingly, that simply randomizing between two uniform strategies that bid equally on all the keywordsworks well. More precisely, this strategy gets at least a 1-1/e fraction of the maximum clicks possible. As our preliminary experiments show, such uniform strategies are likely to be practical. We also present inapproximability results, and optimal algorithms for variants of the budget optimization problem.
Recent advances in personalized recommendation have sparked great interest in the exploitation of rich structured information provided by knowledge graphs. Unlike most existing approaches that only focus on leveraging knowledge … Recent advances in personalized recommendation have sparked great interest in the exploitation of rich structured information provided by knowledge graphs. Unlike most existing approaches that only focus on leveraging knowledge graphs for more accurate recommendation, we aim to conduct explicit reasoning with knowledge for decision making so that the recommendations are generated and supported by an interpretable causal inference procedure. To this end, we propose a method called Policy-Guided Path Reasoning (PGPR), which couples recommendation and interpretability by providing actual paths in a knowledge graph. Our contributions include four aspects. We first highlight the significance of incorporating knowledge graphs into recommendation to formally define and interpret the reasoning process. Second, we propose a reinforcement learning (RL) approach featured by an innovative soft reward strategy, user-conditional action pruning and a multi-hop scoring function. Third, we design a policy-guided graph search algorithm to efficiently and effectively sample reasoning paths for recommendation. Finally, we extensively evaluate our method on several large-scale real-world benchmark datasets, obtaining favorable results compared with state-of-the-art methods.
We present and analyze a sampling algorithm for the basic linear-algebraic problem of l2 regression. The l2 regression (or least-squares fit) problem takes as input a matrix A ∈ Rn×d … We present and analyze a sampling algorithm for the basic linear-algebraic problem of l2 regression. The l2 regression (or least-squares fit) problem takes as input a matrix A ∈ Rn×d (where we assume n > d) and a target vector b ∈ Rn, and it returns as output Z = minx∈Rd |b - Ax|2. Also of interest is xopt = A+b, where A+ is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced l2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both Z and xopt. Applications of this sampling methodology are briefly discussed.
It is clear from the growing role of ad exchanges in the real-time sale of advertising slots that Web publishers are considering a new alternative to their more traditional reservation-based … It is clear from the growing role of ad exchanges in the real-time sale of advertising slots that Web publishers are considering a new alternative to their more traditional reservation-based ad contracts. To make this choice, the publisher must trade off, in real-time, the short-term revenue from ad exchange with the long-term benefits of delivering good spots to the reservation ads. In this paper we formalize this combined optimization problem as a multiobjective stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange prices. We prove the asymptotic optimality of this policy in terms of any arbitrary trade-off between the quality of delivered reservation ads and revenue from the exchange, and we show that our policy approximates any Pareto-optimal point on the quality-versus-revenue curve. Experimental results on data derived from real publisher inventory confirm that there are significant benefits for publishers if they jointly optimize over both channels. Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2014.2017 . This paper was accepted by Dimitris Bertsimas, optimization.
We study the emerging phenomenon of ad hoc, sensor-based communication networks. The communication is modeled by the geometric random graph model G(n, r, e) where n points randomly placed within … We study the emerging phenomenon of ad hoc, sensor-based communication networks. The communication is modeled by the geometric random graph model G(n, r, e) where n points randomly placed within [0, e]d form the nodes, and any two nodes that correspond to points at most distance r away from each other are connected. We study fundamental properties of G(n, r, e) of interest: connectivity, coverage, and routing-stretch. Our main contribution is a simple analysis technique we call bin-covering that we apply uniformly to get first known, (asymptotically) tight thresholds for each of these properties. Typically, in the past, geometric random graph analyses involved sophisticated methods from continuum percolation theory; on contrast, our bin-covering approach is discrete and very simple, yet it gives us tight threshold bounds. The technique also yields algorithmic benefits as illustrated by a simple local routing algorithm for finding paths with low stretch. Our specific results should also prove interesting to the networking community that has seen a recent increase in the study of geometric random graphs motivated by engineering ad hoc networks.
A range counting problem is specified by a set P of size |P| = n of points in Rd, an integer weight xp associated to each point p ∈ P, … A range counting problem is specified by a set P of size |P| = n of points in Rd, an integer weight xp associated to each point p ∈ P, and a range space R ⊆ 2P. Given a query range R ∈ R, the output is R(x) = ∑p ∈ Rxp. The average squared error of an algorithm A is 1/|R|∑R ∈ R((A(R, x) - R(x)))2. Range counting for different range spaces is a central problem in Computational Geometry. We study (ε, δ)-differentially private algorithms for range counting. Our main results are for the range space given by hyperplanes, that is, the halfspace counting problem. We present an (ε, δ)-differentially private algorithm for halfspace counting in d dimensions which is O(n1-1/d) approximate for average squared error. This contrasts with the Ω(n) lower bound established by the classical result of Dinur and Nissim on approximation for arbitrary subset counting queries. We also show a matching lower bound of Ω(n1-1/d) approximation for any (ε, δ)-differentially private algorithm for halfspace counting.
Over the past decade, advertising has emerged as the primary source of revenue for many web sites and apps. In this paper we report a first-of-its-kind study that seeks to … Over the past decade, advertising has emerged as the primary source of revenue for many web sites and apps. In this paper we report a first-of-its-kind study that seeks to broadly understand the features, mechanisms and dynamics of display advertising on the web - i.e., the Adscape. Our study takes the perspective of users who are the targets of display ads shown on web sites. We develop a scalable crawling capability that enables us to gather the details of display ads including creatives and landing pages. Our crawling strategy is focused on maximizing the number of unique ads harvested. Of critical importance to our study is the recognition that a user's profile (i.e. browser profile and cookies) can have a significant impact on which ads are shown. We deploy our crawler over a variety of websites and profiles and this yields over 175K distinct display ads. We find that while targeting is widely used, there remain many instances in which delivered ads do not depend on user profile; further, ads vary more over user profiles than over websites. We also assess the population of advertisers seen and identify over 3.7K distinct entities from a variety of business segments. Finally, we find that when targeting is used, the specific types of ads delivered generally correspond with the details of user profiles, and also on users' patterns of visit.
Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies … Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can grow exponentially with their length. To overcome this challenge, we use models of user interaction with the list of items, the so-called click models, to construct estimators that learn statistically efficiently. We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds. We evaluate our estimators in a series of experiments on a real-world dataset and show that they consistently outperform prior estimators.
We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, … We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani and Vazirani gives an approximation ratio of $1-1/e$. In the online, stochastic case when nodes are drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, but still, no algorithm is known that beats the $1 - 1/e$ bound. Our main result is a 0.67-approximation online algorithm for stochastic bipartite matching, breaking this $1 - {1/e}$ barrier. Furthermore, we show that no online algorithm can produce a $1-ε$ approximation for an arbitrarily small $ε$ for this problem. We employ a novel application of the idea of the power of two choices from load balancing: we compute two disjoint solutions to the expected instance, and use both of them in the online algorithm in a prescribed preference order. To identify these two disjoint solutions, we solve a max flow problem in a boosted flow graph, and then carefully decompose this maximum flow to two edge-disjoint (near-)matchings. These two offline solutions are used to characterize an upper bound for the optimum in any scenario. This is done by identifying a cut whose value we can bound under the arrival distribution.
Display advertisements on the web are sold via ad exchanges that use real time auction. We describe the challenges of designing a suitable auction, and present a simple auction called … Display advertisements on the web are sold via ad exchanges that use real time auction. We describe the challenges of designing a suitable auction, and present a simple auction called the Optional Second Price (OSP) auction that is currently used in Doubleclick Ad Exchange.
Article Free Access Share on First and second order diffusive methods for rapid, coarse, distributed load balancing (extended abstract) Authors: Bhaskar Ghosh Informix Software, Inc, 4100 Bohannon Drive, Bldg. 4600 … Article Free Access Share on First and second order diffusive methods for rapid, coarse, distributed load balancing (extended abstract) Authors: Bhaskar Ghosh Informix Software, Inc, 4100 Bohannon Drive, Bldg. 4600 Menlo Park, CA Informix Software, Inc, 4100 Bohannon Drive, Bldg. 4600 Menlo Park, CAView Profile , S. Muthukrishnan U. Warmick U. WarmickView Profile , Martin H. Schultz Yale U. Yale U.View Profile Authors Info & Claims SPAA '96: Proceedings of the eighth annual ACM symposium on Parallel Algorithms and ArchitecturesJune 1996 Pages 72–81https://doi.org/10.1145/237502.237509Online:24 June 1996Publication History 24citation440DownloadsMetricsTotal Citations24Total Downloads440Last 12 Months17Last 6 weeks1 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and … In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and quality of allocating reservation ads. In this paper, we formalize this combined optimization problem as a stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange bids. Ad Exchanges like RightMedia, AdECN or DoubleClick are an emerging market for the real-time sale of display advertising slots in publishing sites on the Internet. While exchanges differ in their implementations, in a generic Ad Exchange (AdX), publishers post an ad slot with a reservation price, advertisers post bids, and an auction is run; this happens between the time a user visits a page and the ad is displayed.
Recent research explores incorporating knowledge graphs (KG) into e-commerce recommender systems, not only to achieve better recommendation performance, but more importantly to generate explanations of why particular decisions are made. … Recent research explores incorporating knowledge graphs (KG) into e-commerce recommender systems, not only to achieve better recommendation performance, but more importantly to generate explanations of why particular decisions are made. This can be achieved by explicit KG reasoning, where a model starts from a user node, sequentially determines the next step, and walks towards an item node of potential interest to the user. However, this is challenging due to the huge search space, unknown destination, and sparse signals over the KG, so informative and effective guidance is needed to achieve a satisfactory recommendation quality. To this end, we propose a CoArse-to-FinE neural symbolic reasoning approach (CAFE). It first generates user profiles as coarse sketches of user behaviors, which subsequently guide a path-finding process to derive reasoning paths for recommendations as fine-grained predictions. User profiles can capture prominent user behaviors from the history, and provide valuable signals about which kinds of path patterns are more likely to lead to potential items of interest for the user. To better exploit the user profiles, an improved path-finding algorithm called Profile-guided Path Reasoning (PPR) is also developed, which leverages an inventory of neural symbolic reasoning modules to effectively and efficiently find a batch of paths over a large-scale KG. We extensively experiment on four real-world benchmarks and observe substantial gains in the recommendation performance compared with state-of-the-art methods.
Many advertisers buy advertisements (ads) on the Internet or on traditional media and seek simple, online mechanisms to reserve ad slots in advance. Media publishers represent a vast and varying … Many advertisers buy advertisements (ads) on the Internet or on traditional media and seek simple, online mechanisms to reserve ad slots in advance. Media publishers represent a vast and varying inventory, and they too seek automatic, online mechanisms for pricing and allocating such reservations. In this paper, we present and study a simple model for auctioning such ad slots in advance. Bidders arrive sequentially and report which slots they are interested in. The seller must decide immediately whether or not to grant a reservation. Our model allows a seller to accept reservations, but possibly cancel the allocations later and pay the bidder a cancellation compensation (bump payment). Our main result is an online mechanism to derive prices and bump payments that is efficient to implement. This mechanism has many desirable properties. It is individually rational; winners have an incentive to be honest and bidding one's true value dominates any lower bid. Our mechanism's efficiency is within a constant fraction of the a posteriori optimally efficient solution. Its revenue is within a constant fraction of the a posteriori revenue of the Vickrey-Clarke-Groves mechanism. Our results make no assumptions about the order of arrival of bids or the value distribution of bidders and still hold if the items for sale are elements of a matroid, a more general setting than slot allocation.
Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential … Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobserved. We identify a class of non-negative matrices whose maximum entry can be found statistically efficiently and propose an algorithm for finding them, which we call LowRankElim. We derive a $\DeclareMathOperator{\poly}{poly} O((K + L) \poly(d) Δ^{-1} \log n)$ upper bound on its $n$-step regret, where $K$ is the number of rows, $L$ is the number of columns, $d$ is the rank of the matrix, and $Δ$ is the minimum gap. The bound depends on other problem-specific constants that clearly do not depend $K L$. To the best of our knowledge, this is the first such result in the literature.
A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, … A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, and a range space ${\cal R} \subseteq 2^{P}$. Given a query range $R \in {\cal R}$, the target output is $R(\vec{x}) = \sum_{p \in R}{x_p}$. Range counting for different range spaces is a central problem in Computational Geometry. We study $(\epsilon, \delta)$-differentially private algorithms for range counting. Our main results are for the range space given by hyperplanes, that is, the halfspace counting problem. We present an $(\epsilon, \delta)$-differentially private algorithm for halfspace counting in $d$ dimensions which achieves $O(n^{1-1/d})$ average squared error. This contrasts with the $\Omega(n)$ lower bound established by the classical result of Dinur and Nissim [PODS 2003] for arbitrary subset counting queries. We also show a matching lower bound on average squared error for any $(\epsilon, \delta)$-differentially private algorithm for halfspace counting. Both bounds are obtained using discrepancy theory. For the lower bound, we use a modified discrepancy measure and bound approximation of $(\epsilon, \delta)$-differentially private algorithms for range counting queries in terms of this discrepancy. We also relate the modified discrepancy measure to classical combinatorial discrepancy, which allows us to exploit known discrepancy lower bounds. This approach also yields a lower bound of $\Omega((\log n)^{d-1})$ for $(\epsilon, \delta)$-differentially private orthogonal range counting in $d$ dimensions, the first known superconstant lower bound for this problem. For the upper bound, we use an approach inspired by partial coloring methods for proving discrepancy upper bounds, and obtain $(\epsilon, \delta)$-differentially private algorithms for range counting with polynomially bounded shatter function range spaces.
Estimating the size of the maximum matching is a canonical problem in graph algorithms, and one that has attracted extensive study over a range of different computational models. We present … Estimating the size of the maximum matching is a canonical problem in graph algorithms, and one that has attracted extensive study over a range of different computational models. We present improved streaming algorithms for approximating the size of maximum matching with sparse (bounded arboricity) graphs. * Insert-Only Streams: We present a one-pass algorithm that takes O(c log^2 n) space and approximates the size of the maximum matching in graphs with arboricity c within a factor of O(c). This improves significantly on the state-of-the-art O~(cn^{2/3})-space streaming algorithms. * Dynamic Streams: Given a dynamic graph stream (i.e., inserts and deletes) of edges of an underlying c-bounded arboricity graph, we present a one-pass algorithm that uses space O~(c^{10/3}n^{2/3}) and returns an O(c)-estimator for the size of the maximum matching. This algorithm improves the state-of-the-art O~(cn^{4/5})-space algorithms, where the O~(.) notation hides logarithmic in $n$ dependencies. In contrast to the previous works, our results take more advantage of the streaming access to the input and characterize the matching size based on the ordering of the edges in the stream in addition to the degree distributions and structural properties of the sparse graphs.
We consider the "Offline Ad Slot Scheduling" problem, where advertisers must be scheduled to "sponsored search" slots during a given period of time. Advertisers specify a budget constraint, as well … We consider the "Offline Ad Slot Scheduling" problem, where advertisers must be scheduled to "sponsored search" slots during a given period of time. Advertisers specify a budget constraint, as well as a maximum cost per click, and may not be assigned to more than one slot for a particular search. We give a truthful mechanism under the utility model where bidders try to maximize their clicks, subject to their personal constraints. In addition, we show that the revenue-maximizing mechanism is not truthful, but has a Nash equilibrium whose outcome is identical to our mechanism. As far as we can tell, this is the first treatment of sponsored search that directly incorporates both multiple slots and budget constraints into an analysis of incentives. Our mechanism employs a descending-price auction that maintains a solution to a certain machine scheduling problem whose job lengths depend on the price, and hence is variable over the auction. The price stops when the set of bidders that can afford that price pack exactly into a block of ad slots, at which point the mechanism allocates that block and continues on the remaining slots. To prove our result on the equilibrium of the revenue-maximizing mechanism, we first show that a greedy algorithm suffices to solve the revenue-maximizing linear program; we then use this insight to prove that bidders allocated in the same block of our mechanism have no incentive to deviate from bidding the fixed price of that block.
We study which property testing and sublinear time algorithms can be transformed into graph streaming algorithms for random order streams. Our main result is that for bounded degree graphs, any … We study which property testing and sublinear time algorithms can be transformed into graph streaming algorithms for random order streams. Our main result is that for bounded degree graphs, any property that is constant-query testable in the adjacency list model can be tested with constant space in a single-pass in random order streams. Our result is obtained by estimating the distribution of local neighborhoods of the vertices on a random order graph stream using constant space. We then show that our approach can also be applied to constant time approximation algorithms for bounded degree graphs in the adjacency list model: As an example, we obtain a constant-space single-pass random order streaming algorithms for approximating the size of a maximum matching with additive error $εn$ ($n$ is the number of nodes). Our result establishes for the first time that a large class of sublinear algorithms can be simulated in random order streams, while $Ω(n)$ space is needed for many graph streaming problems for adversarial orders.
Ad auctions in sponsored search support ``broad match'' that allows an advertiser to target a large number of queries while bidding only on a limited number. While giving more expressiveness … Ad auctions in sponsored search support ``broad match'' that allows an advertiser to target a large number of queries while bidding only on a limited number. While giving more expressiveness to advertisers, this feature makes it challenging to optimize bids to maximize their returns: choosing to bid on a query as a broad match because it provides high profit results in one bidding for related queries which may yield low or even negative profits. We abstract and study the complexity of the {\em bid optimization problem} which is to determine an advertiser's bids on a subset of keywords (possibly using broad match) so that her profit is maximized. In the query language model when the advertiser is allowed to bid on all queries as broad match, we present an linear programming (LP)-based polynomial-time algorithm that gets the optimal profit. In the model in which an advertiser can only bid on keywords, ie., a subset of keywords as an exact or broad match, we show that this problem is not approximable within any reasonable approximation factor unless P=NP. To deal with this hardness result, we present a constant-factor approximation when the optimal profit significantly exceeds the cost. This algorithm is based on rounding a natural LP formulation of the problem. Finally, we study a budgeted variant of the problem, and show that in the query language model, one can find two budget constrained ad campaigns in polynomial time that implement the optimal bidding strategy. Our results are the first to address bid optimization under the broad match feature which is common in ad auctions.
Recent work on recommender systems has considered external knowledge graphs as valuable sources of information, not only to produce better recommendations but also to provide explanations of why the recommended … Recent work on recommender systems has considered external knowledge graphs as valuable sources of information, not only to produce better recommendations but also to provide explanations of why the recommended items were chosen. Pure rule-based symbolic methods provide a transparent reasoning process over knowledge graph but lack generalization ability to unseen examples, while deep learning models enhance powerful feature representation ability but are hard to interpret. Moreover, direct reasoning over large-scale knowledge graph can be costly due to the huge search space of pathfinding. We approach the problem through a novel coarse-to-fine neural symbolic reasoning method called NSER. It first generates a coarse-grained explanation to capture abstract user behavioral pattern, followed by a fined-grained explanation accompanying with explicit reasoning paths and recommendations inferred from knowledge graph. We extensively experiment on four real-world datasets and observe substantial gains of recommendation performance compared with state-of-the-art methods as well as more diversified explanations in different granularity.
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of ``components.'' Typically, these components are linear combinations of the rows and columns … Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of ``components.'' Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m \times n$ matrix $A$ and a rank parameter $k$. In our first algorithm, $C$ is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore-Penrose generalized inverse of $C$. In our second algorithm $C$, $U$, $R$ are chosen, and we let $A'=CUR$. ($C$ and $R$ are matrices that consist of actual columns and rows, respectively, of $A$, and $U$ is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-δ$: $$ ||A-A'||_F \leq (1+ε) ||A-A_k||_F, $$ where $A_k$ is the ``best'' rank-$k$ approximation provided by truncating the singular value decomposition (SVD) of $A$. The number of columns of $C$ and rows of $R$ is a low-degree polynomial in $k$, $1/ε$, and $\log(1/δ)$. Our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple, they take time of the order needed to approximately compute the top $k$ singular vectors of $A$, and they use a novel, intuitive sampling method called ``subspace sampling.''
Ads on the Internet are increasingly sold via ad exchanges such as RightMedia, AdECN and Doubleclick Ad Exchange. These exchanges allow real-time bidding, that is, each time the publisher contacts … Ads on the Internet are increasingly sold via ad exchanges such as RightMedia, AdECN and Doubleclick Ad Exchange. These exchanges allow real-time bidding, that is, each time the publisher contacts the exchange, the exchange ``calls out'' to solicit bids from ad networks. This aspect of soliciting bids introduces a novel aspect, in contrast to existing literature. This suggests developing a joint optimization framework which optimizes over the allocation and well as solicitation. We model this selective call out as an online recurrent Bayesian decision framework with bandwidth type constraints. We obtain natural algorithms with bounded performance guarantees for several natural optimization criteria. We show that these results hold under different call out constraint models, and different arrival processes. Interestingly, the paper shows that under MHR assumptions, the expected revenue of generalized second price auction with reserve is constant factor of the expected welfare. Also the analysis herein allow us prove adaptivity gap type results for the adwords problem.
We study a generalization of the classical median finding problem to batched query case: given an array of unsorted $n$ items and $k$ (not necessarily disjoint) intervals in the array, … We study a generalization of the classical median finding problem to batched query case: given an array of unsorted $n$ items and $k$ (not necessarily disjoint) intervals in the array, the goal is to determine the median in {\em each} of the intervals in the array. We give an algorithm that uses $O(n\log n + k\log k \log n)$ comparisons and show a lower bound of $\Omega(n\log k)$ comparisons for this problem. This is optimal for $k=O(n/\log n)$.
We design algorithms for computing approximately revenue-maximizing {\em sequential posted-pricing mechanisms (SPM)} in $K$-unit auctions, in a standard Bayesian model. A seller has $K$ copies of an item to sell, … We design algorithms for computing approximately revenue-maximizing {\em sequential posted-pricing mechanisms (SPM)} in $K$-unit auctions, in a standard Bayesian model. A seller has $K$ copies of an item to sell, and there are $n$ buyers, each interested in only one copy, who have some value for the item. The seller must post a price for each buyer, the buyers arrive in a sequence enforced by the seller, and a buyer buys the item if its value exceeds the price posted to it. The seller does not know the values of the buyers, but have Bayesian information about them. An SPM specifies the ordering of buyers and the posted prices, and may be {\em adaptive} or {\em non-adaptive} in its behavior. The goal is to design SPM in polynomial time to maximize expected revenue. We compare against the expected revenue of optimal SPM, and provide a polynomial time approximation scheme (PTAS) for both non-adaptive and adaptive SPMs. This is achieved by two algorithms: an efficient algorithm that gives a $(1-\frac{1}{\sqrt{2\pi K}})$-approximation (and hence a PTAS for sufficiently large $K$), and another that is a PTAS for constant $K$. The first algorithm yields a non-adaptive SPM that yields its approximation guarantees against an optimal adaptive SPM -- this implies that the {\em adaptivity gap} in SPMs vanishes as $K$ becomes larger.
Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas … Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of 'scienceography', which focuses on the writing of science. We provide a first large scale study using data derived from the arXiv e-print repository. Crucially, our data includes the "source code" of scientific papers-the LaTEX source-which enables us to study features not present in the "final product", such as the tools used and private comments between authors. Our study identifies broad patterns and trends in two example areas-computer science and mathematics-as well as highlighting key differences in the way that science is written in these fields. Finally, we outline future directions to extend the new topic of scienceography.
In many cases, recommendations are consumed by groups of users rather than individuals. In this paper, we present a system which recommends social events to groups. The system helps groups … In many cases, recommendations are consumed by groups of users rather than individuals. In this paper, we present a system which recommends social events to groups. The system helps groups to organize a joint activity and collectively select which activity to perform among several possible options. We also facilitate the consensus making, following the principle of group consensus decision making. Our system allows users to asynchronously vote, add and comment on alternatives. We observe social influence within groups through post-recommendation feedback during the group decision making process. We propose a decision cascading model and estimate such social influence, which can be used to improve the performance of group recommendation. We conduct experiments to measure the prediction performance of our model. The result shows that the model achieves better results than that of independent decision making model.
A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that … A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incur a linear regret. We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret. We evaluate the algorithm on both synthetic and real-world data, and show that it quickly learns high quality pricing strategies. This is the first principled study of learning a waterfall design online by sequential experimentation.
In sponsored search, a number of advertising slots is available on a search results page, and have to be allocated among a set of advertisers competing to display an ad … In sponsored search, a number of advertising slots is available on a search results page, and have to be allocated among a set of advertisers competing to display an ad on the page. This gives rise to a bipartite matching market that is typically cleared by the way of an automated auction. Several auction mechanisms have been proposed, with variants of the Generalized Second Price (GSP) being widely used in practice. A rich body of work on bipartite matching markets builds upon the stable marriage model of Gale and Shapley and the assignment model of Shapley and Shubik. We apply insights from this line of research into the structure of stable outcomes and their incentive properties to advertising auctions. We model advertising auctions in terms of an assignment model with linear utilities, extended with bidder and item specific maximum and minimum prices. Auction mechanisms like the commonly used GSP or the well-known Vickrey-Clarke-Groves (VCG) are interpreted as simply computing a \emph{bidder-optimal stable matching} in this model, for a suitably defined set of bidder preferences. In our model, the existence of a stable matching is guaranteed, and under a non-degeneracy assumption a bidder-optimal stable matching exists as well. We give an algorithm to find such matching in polynomial time, and use it to design truthful mechanism that generalizes GSP, is truthful for profit-maximizing bidders, implements features like bidder-specific minimum prices and position-specific bids, and works for rich mixtures of bidders and preferences.
Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies … Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can grow exponentially with their length. To overcome this challenge, we use models of user interaction with the list of items, the so-called click models, to construct estimators that learn statistically efficiently. We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds. We evaluate our estimators in a series of experiments on a real-world dataset and show that they consistently outperform prior estimators.
In this work, we aim to understand the mechanisms driving academic collaboration. We begin by building a model for how researchers split their effort between multiple papers, and how collaboration … In this work, we aim to understand the mechanisms driving academic collaboration. We begin by building a model for how researchers split their effort between multiple papers, and how collaboration affects the number of citations a paper receives, supported by observations from a large real-world publication and citation dataset, which we call the h-Reinvestment model. Using tools from the field of Game Theory, we study researchers' collaborative behavior over time under this model, with the premise that each researcher wants to maximize his or her academic success. We find analytically that there is a strong incentive to collaborate rather than work in isolation, and that studying collaborative behavior through a game-theoretic lens is a promising approach to help us better understand the nature and dynamics of academic collaboration.
Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related … Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author.
A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely-used example which aims to improve over … A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely-used example which aims to improve over simple metrics such as raw counts of papers or citations. However, a limitation of this measure is that it considers authors in isolation and does not account for contributions through a collaborative team. To address this, we propose a natural variant that we dub the Social h-index. The idea is to redistribute the h-index score to reflect an individual's impact on the research community. In addition to describing this new measure, we provide examples, discuss its properties, and contrast with other measures.
Sponsored search involves running an auction among advertisers who bid in order to have their ad shown next to search results for specific keywords. Currently, the most popular auction for … Sponsored search involves running an auction among advertisers who bid in order to have their ad shown next to search results for specific keywords. Currently, the most popular auction for sponsored search is the "Generalized Second Price" (GSP) auction in which advertisers are assigned to slots in the decreasing order of their "score," which is defined as the product of their bid and click-through rate. In the past few years, there has been significant research on the game-theoretic issues that arise in an advertiser's interaction with the mechanism as well as possible redesigns of the mechanism, but this ranking order has remained standard. From a search engine's perspective, the fundamental question is: what is the best assignment of advertisers to slots? Here "best" could mean "maximizing user satisfaction," "most efficient," "revenue-maximizing," "simplest to interact with," or a combination of these. To answer this question we need to understand the behavior of a search engine user when she sees the displayed ads, since that defines the commodity the advertisers are bidding on, and its value. Most prior work has assumed that the probability of a user clicking on an ad is independent of the other ads shown on the page. We propose a simple Markovian user model that does not make this assumption. We then present an algorithm to determine the most efficient assignment under this model, which turns out to be different than that of GSP. A truthful auction then follows from an application of the Vickrey-Clarke-Groves (VCG) mechanism. Further, we show that our assignment has many of the desirable properties of GSP that makes bidding intuitive. At the technical core of our result are a number of insights about the structure of the optimal assignment.
Recent research explores incorporating knowledge graphs (KG) into e-commerce recommender systems, not only to achieve better recommendation performance, but more importantly to generate explanations of why particular decisions are made. … Recent research explores incorporating knowledge graphs (KG) into e-commerce recommender systems, not only to achieve better recommendation performance, but more importantly to generate explanations of why particular decisions are made. This can be achieved by explicit KG reasoning, where a model starts from a user node, sequentially determines the next step, and walks towards an item node of potential interest to the user. However, this is challenging due to the huge search space, unknown destination, and sparse signals over the KG, so informative and effective guidance is needed to achieve a satisfactory recommendation quality. To this end, we propose a CoArse-to-FinE neural symbolic reasoning approach (CAFE). It first generates user profiles as coarse sketches of user behaviors, which subsequently guide a path-finding process to derive reasoning paths for recommendations as fine-grained predictions. User profiles can capture prominent user behaviors from the history, and provide valuable signals about which kinds of path patterns are more likely to lead to potential items of interest for the user. To better exploit the user profiles, an improved path-finding algorithm called Profile-guided Path Reasoning (PPR) is also developed, which leverages an inventory of neural symbolic reasoning modules to effectively and efficiently find a batch of paths over a large-scale KG. We extensively experiment on four real-world benchmarks and observe substantial gains in the recommendation performance compared with state-of-the-art methods.
Recent work on recommender systems has considered external knowledge graphs as valuable sources of information, not only to produce better recommendations but also to provide explanations of why the recommended … Recent work on recommender systems has considered external knowledge graphs as valuable sources of information, not only to produce better recommendations but also to provide explanations of why the recommended items were chosen. Pure rule-based symbolic methods provide a transparent reasoning process over knowledge graph but lack generalization ability to unseen examples, while deep learning models enhance powerful feature representation ability but are hard to interpret. Moreover, direct reasoning over large-scale knowledge graph can be costly due to the huge search space of pathfinding. We approach the problem through a novel coarse-to-fine neural symbolic reasoning method called NSER. It first generates a coarse-grained explanation to capture abstract user behavioral pattern, followed by a fined-grained explanation accompanying with explicit reasoning paths and recommendations inferred from knowledge graph. We extensively experiment on four real-world datasets and observe substantial gains of recommendation performance compared with state-of-the-art methods as well as more diversified explanations in different granularity.
Recent advances in personalized recommendation have sparked great interest in the exploitation of rich structured information provided by knowledge graphs. Unlike most existing approaches that only focus on leveraging knowledge … Recent advances in personalized recommendation have sparked great interest in the exploitation of rich structured information provided by knowledge graphs. Unlike most existing approaches that only focus on leveraging knowledge graphs for more accurate recommendation, we aim to conduct explicit reasoning with knowledge for decision making so that the recommendations are generated and supported by an interpretable causal inference procedure. To this end, we propose a method called Policy-Guided Path Reasoning (PGPR), which couples recommendation and interpretability by providing actual paths in a knowledge graph. Our contributions include four aspects. We first highlight the significance of incorporating knowledge graphs into recommendation to formally define and interpret the reasoning process. Second, we propose a reinforcement learning (RL) approach featured by an innovative soft reward strategy, user-conditional action pruning and a multi-hop scoring function. Third, we design a policy-guided graph search algorithm to efficiently and effectively sample reasoning paths for recommendation. Finally, we extensively evaluate our method on several large-scale real-world benchmark datasets, obtaining favorable results compared with state-of-the-art methods.
A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that … A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incur a linear regret. We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret. We evaluate the algorithm on both synthetic and real-world data, and show that it quickly learns high quality pricing strategies. This is the first principled study of learning a waterfall design online by sequential experimentation.
Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies … Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can grow exponentially with their length. To overcome this challenge, we use models of user interaction with the list of items, the so-called click models, to construct estimators that learn statistically efficiently. We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds. We evaluate our estimators in a series of experiments on a real-world dataset and show that they consistently outperform prior estimators.
Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies … Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can grow exponentially with their length. To overcome this challenge, we use models of user interaction with the list of items, the so-called click models, to construct estimators that learn statistically efficiently. We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds. We evaluate our estimators in a series of experiments on a real-world dataset and show that they consistently outperform prior estimators.
We study which property testing and sublinear time algorithms can be transformed into graph streaming algorithms for random order streams. Our main result is that for bounded degree graphs, any … We study which property testing and sublinear time algorithms can be transformed into graph streaming algorithms for random order streams. Our main result is that for bounded degree graphs, any property that is constant-query testable in the adjacency list model can be tested with constant space in a single-pass in random order streams. Our result is obtained by estimating the distribution of local neighborhoods of the vertices on a random order graph stream using constant space. We then show that our approach can also be applied to constant time approximation algorithms for bounded degree graphs in the adjacency list model: As an example, we obtain a constant-space single-pass random order streaming algorithms for approximating the size of a maximum matching with additive error $εn$ ($n$ is the number of nodes). Our result establishes for the first time that a large class of sublinear algorithms can be simulated in random order streams, while $Ω(n)$ space is needed for many graph streaming problems for adversarial orders.
Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential … Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobserved. We identify a class of non-negative matrices whose maximum entry can be found statistically efficiently and propose an algorithm for finding them, which we call LowRankElim. We derive a $\DeclareMathOperator{\poly}{poly} O((K + L) \poly(d) Δ^{-1} \log n)$ upper bound on its $n$-step regret, where $K$ is the number of rows, $L$ is the number of columns, $d$ is the rank of the matrix, and $Δ$ is the minimum gap. The bound depends on other problem-specific constants that clearly do not depend $K L$. To the best of our knowledge, this is the first such result in the literature.
Estimating the size of the maximum matching is a canonical problem in graph algorithms, and one that has attracted extensive study over a range of different computational models. We present … Estimating the size of the maximum matching is a canonical problem in graph algorithms, and one that has attracted extensive study over a range of different computational models. We present improved streaming algorithms for approximating the size of maximum matching with sparse (bounded arboricity) graphs. * Insert-Only Streams: We present a one-pass algorithm that takes O(c log^2 n) space and approximates the size of the maximum matching in graphs with arboricity c within a factor of O(c). This improves significantly on the state-of-the-art O~(cn^{2/3})-space streaming algorithms. * Dynamic Streams: Given a dynamic graph stream (i.e., inserts and deletes) of edges of an underlying c-bounded arboricity graph, we present a one-pass algorithm that uses space O~(c^{10/3}n^{2/3}) and returns an O(c)-estimator for the size of the maximum matching. This algorithm improves the state-of-the-art O~(cn^{4/5})-space algorithms, where the O~(.) notation hides logarithmic in $n$ dependencies. In contrast to the previous works, our results take more advantage of the streaming access to the input and characterize the matching size based on the ordering of the edges in the stream in addition to the degree distributions and structural properties of the sparse graphs.
It is clear from the growing role of ad exchanges in the real-time sale of advertising slots that Web publishers are considering a new alternative to their more traditional reservation-based … It is clear from the growing role of ad exchanges in the real-time sale of advertising slots that Web publishers are considering a new alternative to their more traditional reservation-based ad contracts. To make this choice, the publisher must trade off, in real-time, the short-term revenue from ad exchange with the long-term benefits of delivering good spots to the reservation ads. In this paper we formalize this combined optimization problem as a multiobjective stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange prices. We prove the asymptotic optimality of this policy in terms of any arbitrary trade-off between the quality of delivered reservation ads and revenue from the exchange, and we show that our policy approximates any Pareto-optimal point on the quality-versus-revenue curve. Experimental results on data derived from real publisher inventory confirm that there are significant benefits for publishers if they jointly optimize over both channels. Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2014.2017 . This paper was accepted by Dimitris Bertsimas, optimization.
We present the problem of finding comparable researchers for any given researcher. This problem has many motivations. Firstly, know thyself. The answers of where we stand among research community and … We present the problem of finding comparable researchers for any given researcher. This problem has many motivations. Firstly, know thyself. The answers of where we stand among research community and who we are most alike may not be easily found by existing evaluations of ones' research mainly based on citation counts. Secondly, there are many situations where one needs to find comparable researchers e.g., for reviewing peers, constructing programming committees or compiling teams for grants. It is often done through an ad hoc and informal basis. Utilizing the large scale scholarly data accessible on the web, we address the problem of automatically finding comparable researchers. We propose a standard to quantify the quality of research output, via the quality of publishing venues. We represent a researcher as a sequence of her publication records, and develop a framework of comparison of researchers by sequence matching. Several variations of comparisons are considered including matching by quality of publication venue and research topics, and performing prefix matching. We evaluate our methods on a large corpus and demonstrate the effectiveness of our methods through examples. In the end, we identify several promising directions for further work.
Over the past decade, advertising has emerged as the primary source of revenue for many web sites and apps. In this paper we report a first-of-its-kind study that seeks to … Over the past decade, advertising has emerged as the primary source of revenue for many web sites and apps. In this paper we report a first-of-its-kind study that seeks to broadly understand the features, mechanisms and dynamics of display advertising on the web - i.e., the Adscape. Our study takes the perspective of users who are the targets of display ads shown on web sites. We develop a scalable crawling capability that enables us to gather the details of display ads including creatives and landing pages. Our crawling strategy is focused on maximizing the number of unique ads harvested. Of critical importance to our study is the recognition that a user's profile (i.e. browser profile and cookies) can have a significant impact on which ads are shown. We deploy our crawler over a variety of websites and profiles and this yields over 175K distinct display ads. We find that while targeting is widely used, there remain many instances in which delivered ads do not depend on user profile; further, ads vary more over user profiles than over websites. We also assess the population of advertisers seen and identify over 3.7K distinct entities from a variety of business segments. Finally, we find that when targeting is used, the specific types of ads delivered generally correspond with the details of user profiles, and also on users' patterns of visit.
In this work, we aim to understand the mechanisms driving academic collaboration. We begin by building a model for how researchers split their effort between multiple papers, and how collaboration … In this work, we aim to understand the mechanisms driving academic collaboration. We begin by building a model for how researchers split their effort between multiple papers, and how collaboration affects the number of citations a paper receives, supported by observations from a large real-world publication and citation dataset, which we call the h-Reinvestment model. Using tools from the field of Game Theory, we study researchers' collaborative behavior over time under this model, with the premise that each researcher wants to maximize his or her academic success. We find analytically that there is a strong incentive to collaborate rather than work in isolation, and that studying collaborative behavior through a game-theoretic lens is a promising approach to help us better understand the nature and dynamics of academic collaboration.
Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related … Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author.
In many cases, recommendations are consumed by groups of users rather than individuals. In this paper, we present a system which recommends social events to groups. The system helps groups … In many cases, recommendations are consumed by groups of users rather than individuals. In this paper, we present a system which recommends social events to groups. The system helps groups to organize a joint activity and collectively select which activity to perform among several possible options. We also facilitate the consensus making, following the principle of group consensus decision making. Our system allows users to asynchronously vote, add and comment on alternatives. We observe social influence within groups through post-recommendation feedback during the group decision making process. We propose a decision cascading model and estimate such social influence, which can be used to improve the performance of group recommendation. We conduct experiments to measure the prediction performance of our model. The result shows that the model achieves better results than that of independent decision making model.
Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related … Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author.
A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely-used example which aims to improve over … A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely-used example which aims to improve over simple metrics such as raw counts of papers or citations. However, a limitation of this measure is that it considers authors in isolation and does not account for contributions through a collaborative team. To address this, we propose a natural variant that we dub the Social h-index. The idea is to redistribute the h-index score to reflect an individual's impact on the research community. In addition to describing this new measure, we provide examples, discuss its properties, and contrast with other measures.
A range counting problem is specified by a set P of size |P| = n of points in Rd, an integer weight xp associated to each point p ∈ P, … A range counting problem is specified by a set P of size |P| = n of points in Rd, an integer weight xp associated to each point p ∈ P, and a range space R ⊆ 2P. Given a query range R ∈ R, the output is R(x) = ∑p ∈ Rxp. The average squared error of an algorithm A is 1/|R|∑R ∈ R((A(R, x) - R(x)))2. Range counting for different range spaces is a central problem in Computational Geometry. We study (ε, δ)-differentially private algorithms for range counting. Our main results are for the range space given by hyperplanes, that is, the halfspace counting problem. We present an (ε, δ)-differentially private algorithm for halfspace counting in d dimensions which is O(n1-1/d) approximate for average squared error. This contrasts with the Ω(n) lower bound established by the classical result of Dinur and Nissim on approximation for arbitrary subset counting queries. We also show a matching lower bound of Ω(n1-1/d) approximation for any (ε, δ)-differentially private algorithm for halfspace counting.
Display advertisements on the web are sold via ad exchanges that use real time auction. We describe the challenges of designing a suitable auction, and present a simple auction called … Display advertisements on the web are sold via ad exchanges that use real time auction. We describe the challenges of designing a suitable auction, and present a simple auction called the Optional Second Price (OSP) auction that is currently used in Doubleclick Ad Exchange.
We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal … We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve $O(\log T)$ regret after $T$ time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with $T$ or achieve regrets that grow linearly with the number of contexts $|\myset{X}|$. We propose an $\epsilon$-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in $\reals^d$, we prove that our algorithm has a constant computation complexity per iteration of $O(poly(d))$ and can achieve a regret of $O(poly(d) \log T)$ even when $|\myset{X}| = \Omega (2^d) $. In addition, unlike previous algorithms, its space complexity scales like $O(Kd^2)$ and does not grow with $T$.
A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, … A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, and a range space ${\cal R} \subseteq 2^{P}$. Given a query range $R \in {\cal R}$, the target output is $R(\vec{x}) = \sum_{p \in R}{x_p}$. Range counting for different range spaces is a central problem in Computational Geometry. We study $(\epsilon, \delta)$-differentially private algorithms for range counting. Our main results are for the range space given by hyperplanes, that is, the halfspace counting problem. We present an $(\epsilon, \delta)$-differentially private algorithm for halfspace counting in $d$ dimensions which achieves $O(n^{1-1/d})$ average squared error. This contrasts with the $\Omega(n)$ lower bound established by the classical result of Dinur and Nissim [PODS 2003] for arbitrary subset counting queries. We also show a matching lower bound on average squared error for any $(\epsilon, \delta)$-differentially private algorithm for halfspace counting. Both bounds are obtained using discrepancy theory. For the lower bound, we use a modified discrepancy measure and bound approximation of $(\epsilon, \delta)$-differentially private algorithms for range counting queries in terms of this discrepancy. We also relate the modified discrepancy measure to classical combinatorial discrepancy, which allows us to exploit known discrepancy lower bounds. This approach also yields a lower bound of $\Omega((\log n)^{d-1})$ for $(\epsilon, \delta)$-differentially private orthogonal range counting in $d$ dimensions, the first known superconstant lower bound for this problem. For the upper bound, we use an approach inspired by partial coloring methods for proving discrepancy upper bounds, and obtain $(\epsilon, \delta)$-differentially private algorithms for range counting with polynomially bounded shatter function range spaces.
We study cardinal auctions for selling multiple copies of a good, in which bidders specify not only their bid or how much they are ready to pay for the good, … We study cardinal auctions for selling multiple copies of a good, in which bidders specify not only their bid or how much they are ready to pay for the good, but also a cardinality constraint on the number of copies that will be sold via the auction. We perform first known Price of Anarchy type analyses with detailed comparison of the classical Vickrey-Clarke-Groves (VCG) auction and one based on minimum pay property (MPP) which is similar to Generalized Second Price auction commonly used in sponsored search. Without cardinality constraints, MPP has the same efficiency (total value to bidders) and at least as much revenue (total income to the auctioneer) as VCG; this also holds for certain other generalizations of MPP (e.g., prefix constrained auctions, as we show here). In contrast, our main results are that, with cardinality constraints, (a) equilibrium efficiency of MPP is 1/2 of that of VCG and this factor is tight, and (b) in equilibrium MPP may collect as little as 1/2 the revenue of VCG. These aspects arise because in presence of cardinality constraints, more strategies are available to bidders in MPP, including bidding above their value, and this makes analyses nontrivial.
Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas … Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of 'scienceography', which focuses on the writing of science. We provide a first large scale study using data derived from the arXiv e-print repository. Crucially, our data includes the "source code" of scientific papers-the LaTEX source-which enables us to study features not present in the "final product", such as the tools used and private comments between authors. Our study identifies broad patterns and trends in two example areas-computer science and mathematics-as well as highlighting key differences in the way that science is written in these fields. Finally, we outline future directions to extend the new topic of scienceography.
A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely-used example which aims to improve over … A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely-used example which aims to improve over simple metrics such as raw counts of papers or citations. However, a limitation of this measure is that it considers authors in isolation and does not account for contributions through a collaborative team. To address this, we propose a natural variant that we dub the Social h-index. The idea is to redistribute the h-index score to reflect an individual's impact on the research community. In addition to describing this new measure, we provide examples, discuss its properties, and contrast with other measures.
In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and … In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and quality of allocating reservation ads. In this paper, we formalize this combined optimization problem as a stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange bids. Ad Exchanges like RightMedia, AdECN or DoubleClick are an emerging market for the real-time sale of display advertising slots in publishing sites on the Internet. While exchanges differ in their implementations, in a generic Ad Exchange (AdX), publishers post an ad slot with a reservation price, advertisers post bids, and an auction is run; this happens between the time a user visits a page and the ad is displayed.
Consider an input text string T[1,N] drawn from an unbounded alphabet. We study partial computation in suffix-based problems for Data Compression and Text Indexing such as (I) retrieve any segment … Consider an input text string T[1,N] drawn from an unbounded alphabet. We study partial computation in suffix-based problems for Data Compression and Text Indexing such as (I) retrieve any segment of K<=N consecutive symbols from the Burrows-Wheeler transform of T, and (II) retrieve any chunk of K<=N consecutive entries of the Suffix Array or the Suffix Tree. Prior literature would take O(N log N) comparisons (and time) to solve these problems by solving the total problem of building the entire Burrows-Wheeler transform or Text Index for T, and performing a post-processing to single out the wanted portion. We introduce a novel adaptive approach to partial computational problems above, and solve both the partial problems in O(K log K + N) comparisons and time, improving the best known running times of O(N log N) for K=o(N). These partial-computation problems are intimately related since they share a common bottleneck: the suffix multi-selection problem, which is to output the suffixes of rank r_1,r_2,...,r_K under the lexicographic order, where r_1
In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and … In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and quality of allocating reservation ads. In this paper, we formalize this combined optimization problem as a stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange bids. We prove asymptotic optimality of this policy in terms of any trade-off between quality of delivered reservation ads and revenue from the exchange, and provide a rigorous bound for its convergence rate to the optimal policy. We also give experimental results on data derived from real publisher inventory, showing that our policy can achieve any pareto-optimal point on the quality vs. revenue curve. Finally, we study a parametric training-based algorithm in which instead of learning the dual variables from a sample data (as is done in non-parametric training-based algorithms), we learn the parameters of the distribution and construct those dual variables from the learned parameter values. We compare parametric and non-parametric ways to estimate from data both analytically and experimentally in the special case without the ad exchange, and show that though both methods converge to the optimal policy as the sample size grows, our parametric method converges faster, and thus performs better on smaller samples.
We design algorithms for computing approximately revenue-maximizing {\em sequential posted-pricing mechanisms (SPM)} in $K$-unit auctions, in a standard Bayesian model. A seller has $K$ copies of an item to sell, … We design algorithms for computing approximately revenue-maximizing {\em sequential posted-pricing mechanisms (SPM)} in $K$-unit auctions, in a standard Bayesian model. A seller has $K$ copies of an item to sell, and there are $n$ buyers, each interested in only one copy, who have some value for the item. The seller must post a price for each buyer, the buyers arrive in a sequence enforced by the seller, and a buyer buys the item if its value exceeds the price posted to it. The seller does not know the values of the buyers, but have Bayesian information about them. An SPM specifies the ordering of buyers and the posted prices, and may be {\em adaptive} or {\em non-adaptive} in its behavior. The goal is to design SPM in polynomial time to maximize expected revenue. We compare against the expected revenue of optimal SPM, and provide a polynomial time approximation scheme (PTAS) for both non-adaptive and adaptive SPMs. This is achieved by two algorithms: an efficient algorithm that gives a $(1-\frac{1}{\sqrt{2\pi K}})$-approximation (and hence a PTAS for sufficiently large $K$), and another that is a PTAS for constant $K$. The first algorithm yields a non-adaptive SPM that yields its approximation guarantees against an optimal adaptive SPM -- this implies that the {\em adaptivity gap} in SPMs vanishes as $K$ becomes larger.
Ads on the Internet are increasingly sold via ad exchanges such as RightMedia, AdECN and Doubleclick Ad Exchange. These exchanges allow real-time bidding, that is, each time the publisher contacts … Ads on the Internet are increasingly sold via ad exchanges such as RightMedia, AdECN and Doubleclick Ad Exchange. These exchanges allow real-time bidding, that is, each time the publisher contacts the exchange, the exchange ``calls out'' to solicit bids from ad networks. This aspect of soliciting bids introduces a novel aspect, in contrast to existing literature. This suggests developing a joint optimization framework which optimizes over the allocation and well as solicitation. We model this selective call out as an online recurrent Bayesian decision framework with bandwidth type constraints. We obtain natural algorithms with bounded performance guarantees for several natural optimization criteria. We show that these results hold under different call out constraint models, and different arrival processes. Interestingly, the paper shows that under MHR assumptions, the expected revenue of generalized second price auction with reserve is constant factor of the expected welfare. Also the analysis herein allow us prove adaptivity gap type results for the adwords problem.
We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, … We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani and Vazirani gives an approximation ratio of 1- 1/e ¿ 0.632, a very familiar bound that holds for many online problems; further, the bound is tight in this case. In the online, stochastic case when nodes are drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, but still, no algorithm is known that beats the 1 - 1/e bound. Our main result is a 0.67-approximation online algorithm for stochastic bipartite matching, breaking this 1 - ¿ barrier. Furthermore, we show that no online algorithm can produce a 1 - ¿ approximation for an arbitrarily small e for this problem. Our algorithms are based on computing an optimal offline solution to the expected instance, and using this solution as a guideline in the process of online allocation. We employ a novel application of the idea of the power of two choices from load balancing: we compute two disjoint solutions to the expected instance, and use both of them in the online algorithm in a prescribed preference order. To identify these two disjoint solutions, we solve a max flow problem in a boosted flow graph, and then carefully decompose this maximum flow to two edge-disjoint (near-)matchings. In addition to guiding the online decision making, these two offline solutions are used to characterize an upper bound for the optimum in any scenario. This is done by identifying a cut whose value we can bound under the arrival distribution. At the end, we discuss extensions of our results to more general bipartite allocations that are important in a display ad application.
We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, … We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani and Vazirani gives an approximation ratio of $1-1/e$. In the online, stochastic case when nodes are drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, but still, no algorithm is known that beats the $1 - 1/e$ bound. Our main result is a 0.67-approximation online algorithm for stochastic bipartite matching, breaking this $1 - {1/e}$ barrier. Furthermore, we show that no online algorithm can produce a $1-ε$ approximation for an arbitrarily small $ε$ for this problem. We employ a novel application of the idea of the power of two choices from load balancing: we compute two disjoint solutions to the expected instance, and use both of them in the online algorithm in a prescribed preference order. To identify these two disjoint solutions, we solve a max flow problem in a boosted flow graph, and then carefully decompose this maximum flow to two edge-disjoint (near-)matchings. These two offline solutions are used to characterize an upper bound for the optimum in any scenario. This is done by identifying a cut whose value we can bound under the arrival distribution.
Given string $S[1..N]$ and integer $k$, the {\em suffix selection} problem is to determine the $k$th lexicographically smallest amongst the suffixes $S[i... N]$, $1 \leq i \leq N$. We study … Given string $S[1..N]$ and integer $k$, the {\em suffix selection} problem is to determine the $k$th lexicographically smallest amongst the suffixes $S[i... N]$, $1 \leq i \leq N$. We study the suffix selection problem in the cache-aware model that captures two-level memory inherent in computing systems, for a \emph{cache} of limited size $M$ and block size $B$. The complexity of interest is the number of block transfers. We present an optimal suffix selection algorithm in the cache-aware model, requiring $\Thetah{N/B}$ block transfers, for any string $S$ over an unbounded alphabet (where characters can only be compared), under the common tall-cache assumption (i.e. $M=\Omegah{B^{1+ε}}$, where $ε<1$). Our algorithm beats the bottleneck bound for permuting an input array to the desired output array, which holds for nearly any nontrivial problem in hierarchical memory models.
Ad auctions in sponsored search support ``broad match'' that allows an advertiser to target a large number of queries while bidding only on a limited number. While giving more expressiveness … Ad auctions in sponsored search support ``broad match'' that allows an advertiser to target a large number of queries while bidding only on a limited number. While giving more expressiveness to advertisers, this feature makes it challenging to optimize bids to maximize their returns: choosing to bid on a query as a broad match because it provides high profit results in one bidding for related queries which may yield low or even negative profits. We abstract and study the complexity of the {\em bid optimization problem} which is to determine an advertiser's bids on a subset of keywords (possibly using broad match) so that her profit is maximized. In the query language model when the advertiser is allowed to bid on all queries as broad match, we present an linear programming (LP)-based polynomial-time algorithm that gets the optimal profit. In the model in which an advertiser can only bid on keywords, ie., a subset of keywords as an exact or broad match, we show that this problem is not approximable within any reasonable approximation factor unless P=NP. To deal with this hardness result, we present a constant-factor approximation when the optimal profit significantly exceeds the cost. This algorithm is based on rounding a natural LP formulation of the problem. Finally, we study a budgeted variant of the problem, and show that in the query language model, one can find two budget constrained ad campaigns in polynomial time that implement the optimal bidding strategy. Our results are the first to address bid optimization under the broad match feature which is common in ad auctions.
Inspired by Internet ad auction applications, we study the problem of allocating a single item via an auction when bidders place very different values on the item. We formulate this … Inspired by Internet ad auction applications, we study the problem of allocating a single item via an auction when bidders place very different values on the item. We formulate this as the problem of prior-free auction and focus on designing a simple mechanism that always allocates the item. Rather than designing sophisticated pricing methods like prior literature, we design better allocation methods. In particular, we propose quasi-proportional allocation methods in which the probability that an item is allocated to a bidder depends (quasi-proportionally) on the bids. We prove that corresponding games for both all-pay and winners-pay quasi-proportional mechanisms admit pure Nash equilibria and this equilibrium is unique. We also give an algorithm to compute this equilibrium in polynomial time. Further, we show that the revenue of the auctioneer is promisingly high compared to the ultimate, i.e., the highest value of any of the bidders, and show bounds on the revenue of equilibria both analytically, as well as using experiments for specific quasi-proportional functions. This is the first known revenue analysis for these natural mechanisms (including the special case of proportional mechanism which is common in network resource allocation problems).
We study a generalization of the classical median finding problem to batched query case: given an array of unsorted $n$ items and $k$ (not necessarily disjoint) intervals in the array, … We study a generalization of the classical median finding problem to batched query case: given an array of unsorted $n$ items and $k$ (not necessarily disjoint) intervals in the array, the goal is to determine the median in {\em each} of the intervals in the array. We give an algorithm that uses $O(n\log n + k\log k \log n)$ comparisons and show a lower bound of $\Omega(n\log k)$ comparisons for this problem. This is optimal for $k=O(n/\log n)$.
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns … Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m\times n$ matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let $A'=CUR$. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-\delta$, $\|A-A'\|_F\leq(1+\epsilon)\,\|A-A_k\|_F$, where $A_k$ is the “best” rank-k approximation provided by truncating the SVD of A, and where $\|X\|_F$ is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a low-degree polynomial in k, $1/\epsilon$, and $\log(1/\delta)$. Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top k singular vectors of A. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-A” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.
Many advertisers buy advertisements (ads) on the Internet or on traditional media and seek simple, online mechanisms to reserve ad slots in advance. Media publishers represent a vast and varying … Many advertisers buy advertisements (ads) on the Internet or on traditional media and seek simple, online mechanisms to reserve ad slots in advance. Media publishers represent a vast and varying inventory, and they too seek automatic, online mechanisms for pricing and allocating such reservations. In this paper, we present and study a simple model for auctioning such ad slots in advance. Bidders arrive sequentially and report which slots they are interested in. The seller must decide immediately whether or not to grant a reservation. Our model allows a seller to accept reservations, but possibly cancel the allocations later and pay the bidder a cancellation compensation (bump payment). Our main result is an online mechanism to derive prices and bump payments that is efficient to implement. This mechanism has many desirable properties. It is individually rational; winners have an incentive to be honest and bidding one's true value dominates any lower bid. Our mechanism's efficiency is within a constant fraction of the a posteriori optimally efficient solution. Its revenue is within a constant fraction of the a posteriori revenue of the Vickrey-Clarke-Groves mechanism. Our results make no assumptions about the order of arrival of bids or the value distribution of bidders and still hold if the items for sale are elements of a matroid, a more general setting than slot allocation.
Sponsored search involves running an auction among advertisers who bid in order to have their ad shown next to search results for specific keywords. Currently, the most popular auction for … Sponsored search involves running an auction among advertisers who bid in order to have their ad shown next to search results for specific keywords. Currently, the most popular auction for sponsored search is the "Generalized Second Price" (GSP) auction in which advertisers are assigned to slots in the decreasing order of their "score," which is defined as the product of their bid and click-through rate. In the past few years, there has been significant research on the game-theoretic issues that arise in an advertiser's interaction with the mechanism as well as possible redesigns of the mechanism, but this ranking order has remained standard. From a search engine's perspective, the fundamental question is: what is the best assignment of advertisers to slots? Here "best" could mean "maximizing user satisfaction," "most efficient," "revenue-maximizing," "simplest to interact with," or a combination of these. To answer this question we need to understand the behavior of a search engine user when she sees the displayed ads, since that defines the commodity the advertisers are bidding on, and its value. Most prior work has assumed that the probability of a user clicking on an ad is independent of the other ads shown on the page. We propose a simple Markovian user model that does not make this assumption. We then present an algorithm to determine the most efficient assignment under this model, which turns out to be different than that of GSP. A truthful auction then follows from an application of the Vickrey-Clarke-Groves (VCG) mechanism. Further, we show that our assignment has many of the desirable properties of GSP that makes bidding intuitive. At the technical core of our result are a number of insights about the structure of the optimal assignment.
We consider the "Offline Ad Slot Scheduling" problem, where advertisers must be scheduled to "sponsored search" slots during a given period of time. Advertisers specify a budget constraint, as well … We consider the "Offline Ad Slot Scheduling" problem, where advertisers must be scheduled to "sponsored search" slots during a given period of time. Advertisers specify a budget constraint, as well as a maximum cost per click, and may not be assigned to more than one slot for a particular search. We give a truthful mechanism under the utility model where bidders try to maximize their clicks, subject to their personal constraints. In addition, we show that the revenue-maximizing mechanism is not truthful, but has a Nash equilibrium whose outcome is identical to our mechanism. As far as we can tell, this is the first treatment of sponsored search that directly incorporates both multiple slots and budget constraints into an analysis of incentives. Our mechanism employs a descending-price auction that maintains a solution to a certain machine scheduling problem whose job lengths depend on the price, and hence is variable over the auction. The price stops when the set of bidders that can afford that price pack exactly into a block of ad slots, at which point the mechanism allocates that block and continues on the remaining slots. To prove our result on the equilibrium of the revenue-maximizing mechanism, we first show that a greedy algorithm suffices to solve the revenue-maximizing linear program; we then use this insight to prove that bidders allocated in the same block of our mechanism have no incentive to deviate from bidding the fixed price of that block.
Internet search companies sell advertisement slots based on users' search queries via an auction. While there has been previous work onthe auction process and its game-theoretic aspects, most of it … Internet search companies sell advertisement slots based on users' search queries via an auction. While there has been previous work onthe auction process and its game-theoretic aspects, most of it focuses on the Internet company. In this work, we focus on the advertisers, who must solve a complex optimization problem to decide how to place bids on keywords to maximize their return (the number of user clicks on their ads) for a given budget. We model the entire process and study this budget optimization problem. While most variants are NP-hard, we show, perhaps surprisingly, that simply randomizing between two uniform strategies that bid equally on all the keywordsworks well. More precisely, this strategy gets at least a 1-1/e fraction of the maximum clicks possible. As our preliminary experiments show, such uniform strategies are likely to be practical. We also present inapproximability results, and optimal algorithms for variants of the budget optimization problem.
Motivated by applications in which the data may be formulated as a matrix, we consider algorithms for several common linear algebra problems. These algorithms make more efficient use of computational … Motivated by applications in which the data may be formulated as a matrix, we consider algorithms for several common linear algebra problems. These algorithms make more efficient use of computational resources, such as the computation time, random access memory (RAM), and the number of passes over the data, than do previously known algorithms for these problems. In this paper, we devise two algorithms for the matrix multiplication problem. Suppose A and B (which are $m\times n$ and $n\times p$, respectively) are the two input matrices. In our main algorithm, we perform c independent trials, where in each trial we randomly sample an element of $\{ 1,2,\ldots, n\}$ with an appropriate probability distribution ${\cal P}$ on $\{ 1,2,\ldots, n\}$. We form an $m\times c$ matrix C consisting of the sampled columns of A, each scaled appropriately, and we form a $c\times n$ matrix R using the corresponding rows of B, again scaled appropriately. The choice of ${\cal P}$ and the column and row scaling are crucial features of the algorithm. When these are chosen judiciously, we show that $CR$ is a good approximation to $AB$. More precisely, we show that $$ \left\|AB-CR\right\|_F = O(\left\|A\right\|_F \left\|B\right\|_F /\sqrt c) , $$ where $\|\cdot\|_F$ denotes the Frobenius norm, i.e., $\|A\|^2_F=\sum_{i,j}A_{ij}^2$. This algorithm can be implemented without storing the matrices A and B in RAM, provided it can make two passes over the matrices stored in external memory and use $O(c(m+n+p))$ additional RAM to construct C and R. We then present a second matrix multiplication algorithm which is similar in spirit to our main algorithm. In addition, we present a model (the pass-efficient model) in which the efficiency of these and other approximate matrix algorithms may be studied and which we argue is well suited to many applications involving massive data sets. In this model, the scarce computational resources are the number of passes over the data and the additional space and time required by the algorithm. The input matrices may be presented in any order of the entries (and not just row or column order), as is the case in many applications where, e.g., the data has been written in by multiple agents. In addition, the input matrices may be presented in a sparse representation, where only the nonzero entries are written.
I propose the index h, defined as the number of papers with citation number > or =h, as a useful index to characterize the scientific output of a researcher. I propose the index h, defined as the number of papers with citation number > or =h, as a useful index to characterize the scientific output of a researcher.
In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix A. It is often of interest to find a low-rank approximation to … In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix A. It is often of interest to find a low-rank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a specified rank k, where k is much smaller than m and n. Methods such as the singular value decomposition (SVD) may be used to find an approximation to A which is the best in a well-defined sense. These methods require memory and time which are superlinear in m and n; for many applications in which the data sets are very large this is prohibitive. Two simple and intuitive algorithms are presented which, when given an $m \times n$ matrix A, compute a description of a low-rank approximation $D^{*}$ to A, and which are qualitatively faster than the SVD. Both algorithms have provable bounds for the error matrix $A-D^{*}$. For any matrix X, let $\|{X}\|_F$ and $\|{X}\|_2$ denote its Frobenius norm and its spectral norm, respectively. In the first algorithm, c columns of A are randomly chosen. If the $m \times c$ matrix C consists of those c columns of A (after appropriate rescaling), then it is shown that from $C^TC$ approximations to the top singular values and corresponding singular vectors may be computed. From the computed singular vectors a description $D^{*}$ of the matrix A may be computed such that $\mathrm{rank}(D^{*}) \le k$ and such that $$ \left\|A-D^{*}\right\|_{\xi}^{2} \le \min_{D:\mathrm{rank}(D)\le k} \left\|A-D\right\|_{\xi}^{2} + poly(k,1/c) \left\|{A}\right\|^2_F $$ holds with high probability for both $\xi = 2,F$. This algorithm may be implemented without storing the matrix A in random access memory (RAM), provided it can make two passes over the matrix stored in external memory and use $O(cm+c^2)$ additional RAM. The second algorithm is similar except that it further approximates the matrix C by randomly sampling r rows of C to form a $r \times c$ matrix W. Thus, it has additional error, but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error (beyond the best rank k approximation) that is at most $\epsilon\|{A}\|^2_F$, both algorithms take time which is polynomial in k, $1/\epsilon$, and $\log(1/\delta)$, where $\delta>0$ is a failure probability; the first takes time linear in $\mbox{max}(m,n)$ and the second takes time independent of m and n. Our bounds improve previously published results with respect to the rank parameter k for both the Frobenius and spectral norms. In addition, the proofs for the error bounds use a novel method that makes important use of matrix perturbation theory. The probability distribution over columns of A and the rescaling are crucial features of the algorithms which must be chosen judiciously.
In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix A which may be stored on disk but which is too large … In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix A which may be stored on disk but which is too large to be read into random access memory (RAM) or to practically perform superlinear polynomial time computations on it. Two algorithms are presented which, when given an $m \times n$ matrix A, compute approximations to A which are the product of three smaller matrices, C, U, and R, each of which may be computed rapidly. Let $A' = CUR$ be the computed approximate decomposition; both algorithms have provable bounds for the error matrix $A-A'$. In the first algorithm, c columns of A and r rows of A are randomly chosen. If the $m \times c$ matrix C consists of those c columns of A (after appropriate rescaling) and the $r \times n$ matrix R consists of those r rows of A (also after appropriate rescaling), then the $c \times r$ matrix U may be calculated from C and R. For any matrix X, let $\|X\|_F$ and $\|X\|_2$ denote its Frobenius norm and its spectral norm, respectively. It is proven that $$ \left\|A-A'\right\|_\xi \le \min_{D:\mathrm{rank}(D)\le k} \left\|A-D\right\|_\xi + poly(k,1/c) \left\|A\right\|_F $$ holds in expectation and with high probability for both $\xi = 2,F$ and for all $k=1,\ldots,\mbox{rank}(A)$; thus by appropriate choice of k $$ \left\|A-A'\right\|_2 \le \epsilon \left\|A\right\|_F $$ also holds in expectation and with high probability. This algorithm may be implemented without storing the matrix A in RAM, provided it can make two passes over the matrix stored in external memory and use $O(m+n)$ additional RAM (assuming that c and r are constants, independent of the size of the input). The second algorithm is similar except that it approximates the matrix C by randomly sampling a constant number of rows of C. Thus, it has additional error but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error (beyond the best rank-k approximation) that is at most $\epsilon \|A\|_F$, both algorithms take time which is a low-degree polynomial in k, $1/\epsilon$, and $1/\delta$, where $\delta>0$ is a failure probability; the first takes time linear in $\mbox{max}(m,n)$ and the second takes time independent of m and n. The proofs for the error bounds make important use of matrix perturbation theory and previous work on approximating matrix multiplication and computing low-rank approximations to a matrix. The probability distribution over columns and rows and the rescaling are crucial features of the algorithms and must be chosen judiciously.
We present and analyze a sampling algorithm for the basic linear-algebraic problem of l2 regression. The l2 regression (or least-squares fit) problem takes as input a matrix A ∈ Rn×d … We present and analyze a sampling algorithm for the basic linear-algebraic problem of l2 regression. The l2 regression (or least-squares fit) problem takes as input a matrix A ∈ Rn×d (where we assume n > d) and a target vector b ∈ Rn, and it returns as output Z = minx∈Rd |b - Ax|2. Also of interest is xopt = A+b, where A+ is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced l2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both Z and xopt. Applications of this sampling methodology are briefly discussed.
A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n3), where n is the number of training examples. We develop … A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n3), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n × n Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form ~Gk = CWk+CT, where C is a matrix consisting of a small number c of columns of G and Wk is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let ||·||2 and ||·||F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let Gk be the best rank-k approximation to G. We prove that by choosing O(k/e4) columns||G-CWk+CT||ξ ≤ ||G-Gk||ξ + e Σi=1n Gii2 ,both in expectation and with high probability, for both ξ = 2, F, and for all k: 0 ≤ k ≤ rank(W). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nystrom method from integral equation theory are discussed.
Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data … Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ)model. This gives a complete answer to the question when running time is not a concern.
We demonstrate that, ignoring computational constraints, it is possible to release privacy-preserving databases that are useful for all queries over a discretized domain from any given concept class with polynomial … We demonstrate that, ignoring computational constraints, it is possible to release privacy-preserving databases that are useful for all queries over a discretized domain from any given concept class with polynomial VC-dimension. We show a new lower bound for releasing databases that are useful for halfspace queries over a continuous domain. Despite this, we give a privacy-preserving polynomial time algorithm that releases information useful for all halfspace queries, for a slightly relaxed definition of usefulness. Inspired by learning theory, we introduce a new notion of data privacy, which we call distributional privacy, and show that it is strictly stronger than the prevailing privacy notion, differential privacy.
Building a successful recommender system depends on understanding both the dimensions of people's preferences as well as their dynamics. In certain domains, such as fashion, modeling such preferences can be … Building a successful recommender system depends on understanding both the dimensions of people's preferences as well as their dynamics. In certain domains, such as fashion, modeling such preferences can be incredibly difficult, due to the need to simultaneously model the visual appearance of products as well as their evolution over time. The subtle semantics and non-linear dynamics of fashion evolution raise unique challenges especially considering the sparsity and large scale of the underlying datasets. In this paper we build novel models for the One-Class Collaborative Filtering setting, where our goal is to estimate users' fashion-aware personalized ranking functions based on their past feedback. To uncover the complex and evolving visual factors that people consider when evaluating products, our method combines high-level visual features extracted from a deep convolutional neural network, users' past feedback, as well as evolving trends within the community. Experimentally we evaluate our method on two large real-world datasets from Amazon.com, where we show it to outperform state-of-the-art personalized ranking measures, and also use it to visualize the high-level fashion trends across the 11-year span of our dataset.
State-of-the-art recommendation algorithms -- especially the collaborative filtering (CF) based approaches with shallow or deep models -- usually work with various unstructured information sources for recommendation, such as textual reviews, … State-of-the-art recommendation algorithms -- especially the collaborative filtering (CF) based approaches with shallow or deep models -- usually work with various unstructured information sources for recommendation, such as textual reviews, visual images, and various implicit or explicit feedbacks. Though structured knowledge bases were considered in content-based approaches, they have been largely neglected recently due to the availability of vast amount of data, and the learning power of many complex models. However, structured knowledge bases exhibit unique advantages in personalized recommendation systems. When the explicit knowledge about users and items is considered for recommendation, the system could provide highly customized recommendations based on users' historical behaviors. A great challenge for using knowledge bases for recommendation is how to integrated large-scale structured and unstructured data, while taking advantage of collaborative filtering for highly accurate performance. Recent achievements on knowledge base embedding sheds light on this problem, which makes it possible to learn user and item representations while preserving the structure of their relationship with external knowledge. In this work, we propose to reason over knowledge base embeddings for personalized recommendation. Specifically, we propose a knowledge base representation learning approach to embed heterogeneous entities for recommendation. Experimental results on real-world dataset verified the superior performance of our approach compared with state-of-the-art baselines.
We explore the properties of a congestion game in which users of a congested resource anticipate the effect of their actions on the price of the resource. When users are … We explore the properties of a congestion game in which users of a congested resource anticipate the effect of their actions on the price of the resource. When users are sharing a single resource, we establish that the aggregate utility received by the users is at least 3/4 of the maximum possible aggregate utility. We also consider extensions to a network context, where users submit individual payments for each link in the network they may wish to use. In this network model, we again show that the selfish behavior of the users leads to an aggregate utility that is no worse than 3/4 of the maximum possible aggregate utility. We also show that the same analysis extends to a wide class of resource allocation systems where end users simultaneously require multiple scarce resources. These results form part of a growing literature on the “price of anarchy,” i.e., the extent to which selfish behavior affects system efficiency.
In this paper, we present the first approximation algorithms for the problem of designing revenue optimal Bayesian incentive compatible auctions when there are multiple (heterogeneous) items and when bidders have … In this paper, we present the first approximation algorithms for the problem of designing revenue optimal Bayesian incentive compatible auctions when there are multiple (heterogeneous) items and when bidders have arbitrary demand and budget constraints (and additive valuations). Our mechanisms are surprisingly simple: We show that a sequential all-pay mechanism is a 4 approximation to the revenue of the optimal ex-interim truthful mechanism with a discrete type space for each bidder, where her valuations for different items can be correlated. We also show that a sequential posted price mechanism is a O(1) approximation to the revenue of the optimal ex-post truthful mechanism when the type space of each bidder is a product distribution that satisfies the standard hazard rate condition. We further show a logarithmic approximation when the hazard rate condition is removed, and complete the picture by showing that achieving a sub-logarithmic approximation, even for regular distributions and one bidder, requires pricing bundles of items. Our results are based on formulating novel LP relaxations for these problems, and developing generic rounding schemes from first principles.
Explainable recommendation attempts to develop models that generate not only high-quality recommendations but also intuitive explanations.The explanations may either be post-hoc or directly come from an explainable model (also called … Explainable recommendation attempts to develop models that generate not only high-quality recommendations but also intuitive explanations.The explanations may either be post-hoc or directly come from an explainable model (also called interpretable or transparent model in some contexts).Explainable recommendation tries to address the problem of why: by providing explanations to users or system designers, it helps humans to understand why certain items are recommended by the algorithm, where the human can either be users or system designers.Explainable recommendation helps to improve the transparency, persuasiveness, effectiveness, trustworthiness, and satisfaction of recommendation systems.It also facilitates system designers for better system debugging.In recent years, a large number of explainable recommendation approaches -especially model-based methods -have been proposed and applied in real-world systems.In this survey, we provide a comprehensive review for the explainable recommendation research.We first highlight the position of explainable recommendation in recommender system research by categorizing recommendation problems into the 5W, i.e., what, when, who, where, and why.We then
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic … A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we consider efficient learning in large-scale combinatorial semi-bandits with linear generalization, and as a solution, propose two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and Combinatorial Linear UCB (CombLinUCB). Both algorithms are computationally efficient as long as the offline version of the combinatorial problem can be solved efficiently. We establish that CombLinTS and CombLinUCB are also provably statistically efficient under reasonable assumptions, by developing regret bounds that are independent of the problem scale (number of items) and sublinear in time. We also evaluate CombLinTS on a variety of problems with thousands of items. Our experiment results demonstrate that CombLinTS is scalable, robust to the choice of algorithm parameters, and significantly outperforms the best of our baselines.
Let the $n\,{\times}\,p$ $(n\geq p)$ matrix X have the QR factorization $X = QR$, where R is an upper triangular matrix of order p and Q is orthonormal. This widely … Let the $n\,{\times}\,p$ $(n\geq p)$ matrix X have the QR factorization $X = QR$, where R is an upper triangular matrix of order p and Q is orthonormal. This widely used decomposition has the drawback that Q is not generally sparse even when X is. One cure is to discard Q, retaining only X and R. Products like $a = Q\trp y = R\itp X\trp y$ can then be formed by computing $b = X\trp y$ and solving the system $R\trp a = b$. This approach can be used to modify the Gram--Schmidt algorithm for computing Q and R to compute R without forming Q or altering X. Unfortunately, this quasi-Gram--Schmidt algorithm can produce inaccurate results. In this paper it is shown that with reorthogonalization the inaccuracies are bounded under certain natural conditions.
In this paper, we give some upper (lower) bounds on where the X i 's are real-valued random variables. Some applications are given. In this paper, we give some upper (lower) bounds on where the X i 's are real-valued random variables. Some applications are given.
Existence and uniqueness of equilibrium points for concave n-person games - dynamic model for nonequilibrium situations Existence and uniqueness of equilibrium points for concave n-person games - dynamic model for nonequilibrium situations
Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. … Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.
Convex programming involves a convex set F ⊆ Rn and a convex cost function c : F → R. The goal of convex programming is to find a point in … Convex programming involves a convex set F ⊆ Rn and a convex cost function c : F → R. The goal of convex programming is to find a point in F which minimizes c. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain. We also apply this algorithm to repeated games, and show that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized infinitesimal gradient ascent (GIGA) is universally consistent.
We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f:Rn -> R non-adaptively. Here, the database is represented by a … We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f:Rn -> R non-adaptively. Here, the database is represented by a vector in R and proximity between databases is measured in the l1-metric. We show that the noise complexity is determined by two geometric parameters associated with the set of queries. We use this connection to give tight upper and lower bounds on the noise complexity for any d ≤ n. We show that for d random linear queries of sensitivity 1, it is necessary and sufficient to add l2-error Θ(min d√d/ε,d√(log (n/d))/ε) to achieve ε-differential privacy. Assuming the truth of a deep conjecture from convex geometry, known as the Hyperplane conjecture, we can extend our results to arbitrary linear queries giving nearly matching upper and lower bounds.
Item recommendation is the task of predicting a personalized ranking on a set of items (e.g. websites, movies, products). In this paper, we investigate the most common scenario with implicit … Item recommendation is the task of predicting a personalized ranking on a set of items (e.g. websites, movies, products). In this paper, we investigate the most common scenario with implicit feedback (e.g. clicks, purchases). There are many methods for item recommendation from implicit feedback like matrix factorization (MF) or adaptive knearest-neighbor (kNN). Even though these methods are designed for the item prediction task of personalized ranking, none of them is directly optimized for ranking. In this paper we present a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem. We also provide a generic learning algorithm for optimizing models with respect to BPR-Opt. The learning method is based on stochastic gradient descent with bootstrap sampling. We show how to apply our method to two state-of-the-art recommender models: matrix factorization and adaptive kNN. Our experiments indicate that for the task of personalized ranking our optimization method outperforms the standard learning techniques for MF and kNN. The results show the importance of optimizing models for the right criterion.
In several applications such as databases, planning, and sensor networks, parameters such as selectivity, load, or sensed values are known only with some associated uncertainty. The performance of such a … In several applications such as databases, planning, and sensor networks, parameters such as selectivity, load, or sensed values are known only with some associated uncertainty. The performance of such a system (as captured by some objective function over the parameters) is significantly improved if some of these parameters can be probed or observed. In a resource constrained situation, deciding which parameters to observe in order to optimize system performance, itself becomes an interesting and important optimization problem. This general problem is the focus of this article. One of the most important considerations in this framework is whether adaptivity is required for the observations. Adaptive observations introduce blocking or sequential operations in the system whereas nonadaptive observations can be performed in parallel. One of the important questions in this regard is to characterize the benefit of adaptivity for probes and observation. We present general techniques for designing constant factor approximations to the optimal observation schemes for several widely used scheduling and metric objective functions. We show a unifying technique that relates this optimization problem to the outlier version of the corresponding deterministic optimization. By making this connection, our technique shows constant factor upper bounds for the benefit of adaptivity of the observation schemes. We show that while probing yields significant improvement in the objective function, being adaptive about the probing is not beneficial beyond constant factors.
Given an $m \times n$ matrix M with $m \geqslant n$, it is shown that there exists a permutation $\Pi $ and an integer k such that the QR factorization … Given an $m \times n$ matrix M with $m \geqslant n$, it is shown that there exists a permutation $\Pi $ and an integer k such that the QR factorization \[ M\Pi = Q\left( {\begin{array}{*{20}c} {A_k } & {B_k } \\ {} & {C_k } \\ \end{array} } \right) \] reveals the numerical rank of M: the $k \times k$ upper-triangular matrix $A_k $ is well conditioned, $\|C_k \|_2 $ is small, and $B_k $is linearly dependent on $A_k $ with coefficients bounded by a low-degree polynomial in n. Existing rank-revealing QR (RRQR) algorithms are related to such factorizations and two algorithms are presented for computing them. The new algorithms are nearly as efficient as QR with column pivoting for most problems and take $O(mn^2 )$ floating-point operations in the worst case.
We define a new interactive differentially private mechanism --- the median mechanism --- for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can … We define a new interactive differentially private mechanism --- the median mechanism --- for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy mechanism (the Laplace mechanism, which independently perturbs each query result). With respect to the number of queries, our guarantee is close to the best possible, even for non-interactive privacy mechanisms. Conceptually, the median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting.
In this article, we will formalize the method of dual fitting and the idea of factor-revealing LP. This combination is used to design and analyze two greedy algorithms for the … In this article, we will formalize the method of dual fitting and the idea of factor-revealing LP. This combination is used to design and analyze two greedy algorithms for the metric uncapacitated facility location problem. Their approximation factors are 1.861 and 1.61, with running times of O ( m log m ) and O ( n 3 ), respectively, where n is the total number of vertices and m is the number of edges in the underlying complete bipartite graph between cities and facilities. The algorithms are used to improve recent results for several variants of the problem.
Motivated by numerous applications in which the data may be modeled by a variable subscripted by three or more indices, we develop a tensor-based extension of the matrix CUR decomposition. … Motivated by numerous applications in which the data may be modeled by a variable subscripted by three or more indices, we develop a tensor-based extension of the matrix CUR decomposition. The tensor-CUR decomposition is most relevant as a data analysis tool when the data consist of one mode that is qualitatively different than the others. In this case, the tensor-CUR decomposition approximately expresses the original data tensor in terms of a basis consisting of underlying subtensors that are actual data elements and thus that have natural interpretation in terms ofthe processes generating the data. In order to demonstrate the general applicability of this tensor decomposition, we apply it to problems in two diverse domains of data analysis: hyperspectral medical image analysis and consumer recommendation system analysis. In the hyperspectral data application, the tensor-CUR decomposition is used to compress the data, and we show that classification quality is not substantially reduced even after substantial data compression. In the recommendation system application, the tensor-CUR decomposition is used to reconstruct missing entries in a user-product-product preference tensor, and we show that high quality recommendations can be made on the basis of a small number of basis users and a small number of product-product comparisons from a new user.
Let X1, …, Xn be independent Bernoulli random variables, and let pi = P[Xi = 1], λ = Σi=1n pi and Σi=1n Xi. Successively improved estimates of the total variation … Let X1, …, Xn be independent Bernoulli random variables, and let pi = P[Xi = 1], λ = Σi=1n pi and Σi=1n Xi. Successively improved estimates of the total variation distance between the distribution ℒ(W) of W and a Poisson distribution Pλ with mean λ have been obtained by Prohorov[5], Le Cam [4], Kerstan[3], Vervaat[8], Chen [2], Serfling[7] and Romanowska[6]. Prohorov, Vervaat and Romanowska discussed only the case of identically distributed Xi's, whereas Chen and Serfling were primarily interested in more general, dependent sequences. Under the present hypotheses, the following inequalities, here expressed in terms of the total variation distancewere established respectively by Le Cam, Kerstan and Chen:(Kerstan's published estimate of ([3], p.174, equation (1)) is a misprint for , the constant 2·1 appearing twice on p. 175 of his paper.) Here, we use Chen's [2] elegant adaptation of Stein's method to improve hte estimates given in (1·1), and we complement these estimates with a reverse inequality expressed in similar terms. Second order estimates, and the case of more general non-negative integer valued X's, are also discussed.
We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known … We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as contextual bandits, encompasses a wide variety of applications including health-care policy and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance. In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the doubly robust technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have either a good (but not necessarily consistent) model of rewards or a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust approach uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice.
Solutions to rank deficient least squares problems are conveniently expressed in terms of the singular value decomposition (SVD) of the coefficient matrix. When the matrix is nearly rank deficient, a … Solutions to rank deficient least squares problems are conveniently expressed in terms of the singular value decomposition (SVD) of the coefficient matrix. When the matrix is nearly rank deficient, a common procedure is to neglect its smallest singular values, which leads to the truncated SVD (TSVD) solution. In this paper, an efficient method is presented for computing the TSVD solution via a QR-factorization, without the need for computing a complete SVD. The numerical rank of the matrix is determined by means of a rank revealing QR-factorization, which provides upper and lower bounds on the small singular values and approximations to the corresponding singular vectors, which are then refined by inverse subspace iteration and used in conjunction with the QR factors to compute the TSVD solution.
Abstract : Under the pari-mutuel system of betting on horse races the final track's odds are in some sense a consensus of the 'subjective odds' of the individual bettors weighted … Abstract : Under the pari-mutuel system of betting on horse races the final track's odds are in some sense a consensus of the 'subjective odds' of the individual bettors weighted by the amounts of their bets. The properties which this consensus must possess and prove that there always exists a unique set of odds having the required properties are formulated. (Author)
We present an incentive-compatible polynomial-time approximation scheme for multi-unit auctions with general k-minded playervaluations. The mechanism fully optimizes over an appropriately chosen sub-range of possible allocations and then uses VCG … We present an incentive-compatible polynomial-time approximation scheme for multi-unit auctions with general k-minded playervaluations. The mechanism fully optimizes over an appropriately chosen sub-range of possible allocations and then uses VCG payments over this sub-range. We show that obtaining a fully polynomial-time incentive-compatible approximation scheme, at least using VCG payments, is NP-hard. For the case of valuations given by black boxes, we give a polynomial-time incentive-compatible 2-approximation mechanism and show that no better is possible, at least using VCG payments.
We consider the problem of approximating a given m × n matrix A by another matrix of specified rank k , which is smaller than m and n . The … We consider the problem of approximating a given m × n matrix A by another matrix of specified rank k , which is smaller than m and n . The Singular Value Decomposition (SVD) can be used to find the "best" such approximation. However, it takes time polynomial in m, n which is prohibitive for some modern applications. In this article, we develop an algorithm that is qualitatively faster, provided we may sample the entries of the matrix in accordance with a natural probability distribution. In many applications, such sampling can be done efficiently. Our main result is a randomized algorithm to find the description of a matrix D * of rank at most k so that holds with probability at least 1 − δ (where |·| F is the Frobenius norm). The algorithm takes time polynomial in k ,1/ϵ, log(1/δ) only and is independent of m and n . In particular, this implies that in constant time, it can be determined if a given matrix of arbitrary size has a good low-rank approximation.
A natural optimization model that formulates many online resource allocation problems is the online linear programming (LP) problem in which the constraint matrix is revealed column by column along with … A natural optimization model that formulates many online resource allocation problems is the online linear programming (LP) problem in which the constraint matrix is revealed column by column along with the corresponding objective coefficient. In such a model, a decision variable has to be set each time a column is revealed without observing the future inputs, and the goal is to maximize the overall objective function. In this paper, we propose a near-optimal algorithm for this general class of online problems under the assumptions of random order of arrival and some mild conditions on the size of the LP right-hand-side input. Specifically, our learning-based algorithm works by dynamically updating a threshold price vector at geometric time intervals, where the dual prices learned from the revealed columns in the previous period are used to determine the sequential decisions in the current period. Through dynamic learning, the competitiveness of our algorithm improves over the past study of the same problem. We also present a worst case example showing that the performance of our algorithm is near optimal.