Author Description

Login to generate an author description

Ask a Question About This Mathematician

All published works (188)

We investigate the role of inaccurate priors for the classical Pandora's box problem. In the classical Pandora's box problem we are given a set of boxes each with a known … We investigate the role of inaccurate priors for the classical Pandora's box problem. In the classical Pandora's box problem we are given a set of boxes each with a known cost and an unknown value sampled from a known distribution. We investigate how inaccuracies in the beliefs can affect existing algorithms. Specifically, we assume that the knowledge of the underlying distribution has a small error in the Kolmogorov distance, and study how this affects the utility obtained by the optimal algorithm.
One of the classic problems in online decision-making is the *secretary problem* where to goal is to maximize the probability of choosing the largest number from a randomly ordered sequence. … One of the classic problems in online decision-making is the *secretary problem* where to goal is to maximize the probability of choosing the largest number from a randomly ordered sequence. A natural extension allows selecting multiple values under a combinatorial constraint. Babaioff, Immorlica, Kempe, and Kleinberg (SODA'07, JACM'18) introduced the *matroid secretary conjecture*, suggesting an $O(1)$-competitive algorithm exists for matroids. Many works since have attempted to obtain algorithms for both general matroids and specific classes of matroids. The ultimate goal is to obtain an $e$-competitive algorithm, and the *strong matroid secretary conjecture* states that this is possible for general matroids. A key class of matroids is the *graphic matroid*, where a set of graph edges is independent if it contains no cycle. The rich combinatorial structure of graphs makes them a natural first step towards solving a problem for general matroids. Babaioff et al. (SODA'07, JACM'18) first studied the graphic matroid setting, achieving a $16$-competitive algorithm. Subsequent works have improved the competitive ratio, most recently to 4 by Soto, Turkieltaub, and Verdugo (SODA'18). We break this $4$-competitive barrier, presenting a new algorithm with a competitive ratio of $3.95$. For simple graphs, we further improve this to $3.77$. Intuitively, solving the problem for simple graphs is easier since they lack length-two cycles. A natural question is whether a ratio arbitrarily close to $e$ can be achieved by assuming sufficiently large girth. We answer this affirmatively, showing a competitive ratio arbitrarily close to $e$ even for constant girth values, supporting the strong matroid secretary conjecture. We also prove this bound is tight: for any constant $g$, no algorithm can achieve a ratio better than $e$ even when the graph has girth at least $g$.
Selecting representatives based on voters' preferences is a fundamental problem in social choice theory. While cardinal utility functions offer a detailed representation of preferences, ordinal rankings are often the only … Selecting representatives based on voters' preferences is a fundamental problem in social choice theory. While cardinal utility functions offer a detailed representation of preferences, ordinal rankings are often the only available information due to their simplicity and practical constraints. The metric distortion framework addresses this issue by modeling voters and candidates as points in a metric space, with distortion quantifying the efficiency loss from relying solely on ordinal rankings. Existing works define the cost of a voter with respect to a candidate as their distance and set the overall cost as either the sum (utilitarian) or maximum (egalitarian) of these costs across all voters. They show that deterministic algorithms achieve a best-possible distortion of 3 for any metric when considering a single candidate. This paper explores whether one can obtain a better approximation compared to an optimal candidate by relying on a committee of $k$ candidates ($k \ge 1$), where the cost of a voter is defined as its distance to the closest candidate in the committee. We answer this affirmatively in the case of line metrics, demonstrating that with $O(1)$ candidates, it is possible to achieve optimal cost. Our results extend to both utilitarian and egalitarian objectives, providing new upper bounds for the problem. We complement our results with lower bounds for both the line and 2-D Euclidean metrics.
The secretary and the prophet inequality problems are central to the field of Stopping Theory. Recently, there has been a lot of work in generalizing these models to multiple items … The secretary and the prophet inequality problems are central to the field of Stopping Theory. Recently, there has been a lot of work in generalizing these models to multiple items because of their applications in mechanism design. The most important of these generalizations are to matroids and to combinatorial auctions (extends bipartite matching). Kleinberg-Weinberg \cite{KW-STOC12} and Feldman et al. \cite{feldman2015combinatorial} show that for adversarial arrival order of random variables the optimal prophet inequalities give a $1/2$-approximation. For many settings, however, it's conceivable that the arrival order is chosen uniformly at random, akin to the secretary problem. For such a random arrival model, we improve upon the $1/2$-approximation and obtain $(1-1/e)$-approximation prophet inequalities for both matroids and combinatorial auctions. This also gives improvements to the results of Yan \cite{yan2011mechanism} and Esfandiari et al. \cite{esfandiari2015prophet} who worked in the special cases where we can fully control the arrival order or when there is only a single item. Our techniques are threshold based. We convert our discrete problem into a continuous setting and then give a generic template on how to dynamically adjust these thresholds to lower bound the expected total welfare.
The online bipartite matching problem, extensively studied in the literature, deals with the allocation of online arriving vertices (items) to a predetermined set of offline vertices (agents). However, little attention … The online bipartite matching problem, extensively studied in the literature, deals with the allocation of online arriving vertices (items) to a predetermined set of offline vertices (agents). However, little attention has been given to the concept of class fairness, where agents are categorized into different classes, and the matching algorithm must ensure equitable distribution across these classes. We here focus on randomized algorithms for the fair matching of indivisible items, subject to various definitions of fairness. Our main contribution is the first (randomized) non-wasteful algorithm that simultaneously achieves a $1/2$ approximation to class envy-freeness (CEF) while simultaneously ensuring an equivalent approximation to the class proportionality (CPROP) and utilitarian social welfare (USW) objectives. We supplement this result by demonstrating that no non-wasteful algorithm can achieve an $\alpha$-CEF guarantee for $\alpha > 0.761$. In a similar vein, we provide a novel input instance for deterministic divisible matching that demonstrates a nearly tight CEF approximation. Lastly, we define the ``price of fairness,'' which represents the trade-off between optimal and fair matching. We demonstrate that increasing the level of fairness in the approximation of the solution leads to a decrease in the objective of maximizing USW, following an inverse proportionality relationship.
We initiate the study of the submodular cover problem in dynamic setting where the elements of the ground set are inserted and deleted. In the classical submodular cover problem, we … We initiate the study of the submodular cover problem in dynamic setting where the elements of the ground set are inserted and deleted. In the classical submodular cover problem, we are given a monotone submodular function $f : 2^{V} \to \mathbb{R}^{\ge 0}$ and the goal is to obtain a set $S \subseteq V$ that minimizes the cost subject to the constraint $f(S) = f(V)$. This is a classical problem in computer science and generalizes the Set Cover problem, 2-Set Cover, and dominating set problem among others. We consider this problem in a dynamic setting where there are updates to our set $V$, in the form of insertions and deletions of elements from a ground set $\mathcal{V}$, and the goal is to maintain an approximately optimal solution with low query complexity per update. For this problem, we propose a randomized algorithm that, in expectation, obtains a $(1-O(\epsilon), O(\epsilon^{-1}))$-bicriteria approximation using polylogarithmic query complexity per update.
We give the first non-trivial decremental dynamic embedding of a weighted, undirected graph $G$ into $\ell_p$ space. Given a weighted graph $G$ undergoing a sequence of edge weight increases, the … We give the first non-trivial decremental dynamic embedding of a weighted, undirected graph $G$ into $\ell_p$ space. Given a weighted graph $G$ undergoing a sequence of edge weight increases, the goal of this problem is to maintain a (randomized) mapping $\phi: (G,d) \to (X,\ell_p)$ from the set of vertices of the graph to the $\ell_p$ space such that for every pair of vertices $u$ and $v$, the expected distance between $\phi(u)$ and $\phi(v)$ in the $\ell_p$ metric is within a small multiplicative factor, referred to as the distortion, of their distance in $G$. Our main result is a dynamic algorithm with expected distortion $O(\log^2 n)$ and total update time $O\left((m^{1+o(1)} \log^2 W + Q)\log(nW) \right)$, where $W$ is the maximum weight of the edges, $Q$ is the total number of updates and $n, m$ denote the number of vertices and edges in $G$ respectively. This is the first result of its kind, extending the seminal result of Bourgain to the expanding field of dynamic algorithms. Moreover, we demonstrate that in the fully dynamic regime, where we tolerate edge insertions as well as deletions, no algorithm can explicitly maintain an embedding into $\ell_p$ space that has a low distortion with high probability.
Prize-Collecting Steiner Tree (PCST) is a generalization of the Steiner Tree problem, a fundamental problem in computer science. In the classic Steiner Tree problem, we aim to connect a set … Prize-Collecting Steiner Tree (PCST) is a generalization of the Steiner Tree problem, a fundamental problem in computer science. In the classic Steiner Tree problem, we aim to connect a set of vertices known as terminals using the minimum-weight tree in a given weighted graph. In this generalized version, each vertex has a penalty, and there is flexibility to decide whether to connect each vertex or pay its associated penalty, making the problem more realistic and practical.
Prize-Collecting Steiner Tree (PCST) is a generalization of the Steiner Tree problem, a fundamental problem in computer science. In the classic Steiner Tree problem, we aim to connect a set … Prize-Collecting Steiner Tree (PCST) is a generalization of the Steiner Tree problem, a fundamental problem in computer science. In the classic Steiner Tree problem, we aim to connect a set of vertices known as terminals using the minimum-weight tree in a given weighted graph. In this generalized version, each vertex has a penalty, and there is flexibility to decide whether to connect each vertex or pay its associated penalty, making the problem more realistic and practical. Both the Steiner Tree problem and its Prize-Collecting version had long-standing $2$-approximation algorithms, matching the integrality gap of the natural LP formulations for both. This barrier for both problems has been surpassed, with algorithms achieving approximation factors below $2$. While research on the Steiner Tree problem has led to a series of reductions in the approximation ratio below $2$, culminating in a $\ln(4)+\epsilon$ approximation by Byrka, Grandoni, Rothvo{\ss}, and Sanit\`a, the Prize-Collecting version has not seen improvements in the past 15 years since the work of Archer, Bateni, Hajiaghayi, and Karloff, which reduced the approximation factor for this problem from $2$ to $1.9672$. Interestingly, even the Prize-Collecting TSP approximation, which was first improved below $2$ in the same paper, has seen several advancements since then. In this paper, we reduce the approximation factor for the PCST problem substantially to 1.7994 via a novel iterative approach.
We here address the problem of fairly allocating indivisible goods or chores to n agents with weights that define their entitlement to the set of indivisible resources. Stemming from well-studied … We here address the problem of fairly allocating indivisible goods or chores to n agents with weights that define their entitlement to the set of indivisible resources. Stemming from well-studied fairness concepts such as envy-freeness up to one good (EF1) and envy-freeness up to any good (EFX) for agents with equal entitlements, we present, in this study, the first set of impossibility results alongside algorithmic guarantees for fairness among agents with unequal entitlements. Within this paper, we expand the concept of envy-freeness up to any good or chore to the weighted context (WEFX and XWEF respectively), demonstrating that these allocations are not guaranteed to exist for two or three agents. Despite these negative results, we develop a WEFX procedure for two agents with integer weights, and furthermore, we devise an approximate WEFX procedure for two agents with normalized weights. We further present a polynomial-time algorithm that guarantees a weighted envy-free allocation up to one chore (1WEF) for any number of agents with additive cost functions. Our work underscores the heightened complexity of the weighted fair division problem when compared to its unweighted counterpart.
We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is … We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is identical to the ones in previous rounds. Alice cuts the cake at a point of her choice, while Bob chooses the left piece or the right piece, leaving the remainder for Alice. We consider two versions: sequential, where Bob observes Alice's cut point before choosing left/right, and simultaneous, where he only observes her cut point after making his choice. The simultaneous version was first considered by Aumann and Maschler (1995). We observe that if Bob is almost myopic and chooses his favorite piece too often, then he can be systematically exploited by Alice through a strategy akin to a binary search. This strategy allows Alice to approximate Bob's preferences with increasing precision, thereby securing a disproportionate share of the resource over time. We analyze the limits of how much a player can exploit the other one and show that fair utility profiles are in fact achievable. Specifically, the players can enforce the equitable utility profile of $(1/2, 1/2)$ in the limit on every trajectory of play, by keeping the other player's utility to approximately $1/2$ on average while guaranteeing they themselves get at least approximately $1/2$ on average. We show this theorem using a connection with Blackwell approachability. Finally, we analyze a natural dynamic known as fictitious play, where players best respond to the empirical distribution of the other player. We show that fictitious play converges to the equitable utility profile of $(1/2, 1/2)$ at a rate of $O(1/\sqrt{T})$.
Approximation algorithms for the prize-collecting Steiner forest problem (PCSF) have been a subject of research for over three decades, starting with the seminal works of Agrawal, Klein, and Ravi [1, … Approximation algorithms for the prize-collecting Steiner forest problem (PCSF) have been a subject of research for over three decades, starting with the seminal works of Agrawal, Klein, and Ravi [1, 2] and Goemans and Williamson [14, 15] on Steiner forest and prize-collecting problems. In this paper, we propose and analyze a natural deterministic algorithm for PCSF that achieves a 2-approximate solution in polynomial time. This represents a significant improvement compared to the previously best known algorithm with a 2.54-approximation factor developed by Hajiaghayi and Jain [19] in 2006. Furthermore, Kƶnemann, Olver, Pashkovich, Ravi, Swamy, and Vygen [24] have established an integrality gap of at least 9/4 for the natural LP relaxation for PCSF. However, we surpass this gap through the utilization of a combinatorial algorithm and a novel analysis technique. Since 2 is the best known approximation guarantee for Steiner forest problem [2] (see also [15]), which is a special case of PCSF, our result matches this factor and closes the gap between the Steiner forest problem and its generalized version, PCSF.
Submodular maximization under matroid and cardinality constraints are classical problems with a wide range of applications in machine learning, auction theory, and combinatorial optimization. In this paper, we consider these … Submodular maximization under matroid and cardinality constraints are classical problems with a wide range of applications in machine learning, auction theory, and combinatorial optimization. In this paper, we consider these problems in the dynamic setting where (1) we have oracle access to a monotone submodular function f : 2V → ā„+ and (2) we are given a sequence S of insertions and deletions of elements of an underlying ground set V.
The study of approximate matching in the Massively Parallel Computations (MPC) model has recently seen a burst of breakthroughs. Despite this progress, we still have a limited understanding of maximal … The study of approximate matching in the Massively Parallel Computations (MPC) model has recently seen a burst of breakthroughs. Despite this progress, we still have a limited understanding of maximal matching which is one of the central problems of parallel and distributed computing. All known MPC algorithms for maximal matching either take polylogarithmic time which is considered inefficient, or require a strictly super-linear space of n 1+Ī© (1) per machine. In this work, we close this gap by providing a novel analysis of an extremely simple algorithm, which is a variant of an algorithm conjectured to work by Czumaj, Lacki, Madry, Mitrovic, Onak, and Sankowski [ 15 ]. The algorithm edge-samples the graph, randomly partitions the vertices, and finds a random greedy maximal matching within each partition. We show that this algorithm drastically reduces the vertex degrees. This, among other results, leads to an O (log log Ī”) round algorithm for maximal matching with O(n) space (or even mildly sublinear in n using standard techniques). As an immediate corollary, we get a 2 approximate minimum vertex cover in essentially the same rounds and space, which is the optimal approximation factor under standard assumptions. We also get an improved O (log log Ī”) round algorithm for 1 + ε approximate matching. All these results can also be implemented in the congested clique model in the same number of rounds.
We consider a multi-agent delegation mechanism without money. In our model, given a set of agents, each agent has a fixed number of solutions which is exogenous to the mechanism, … We consider a multi-agent delegation mechanism without money. In our model, given a set of agents, each agent has a fixed number of solutions which is exogenous to the mechanism, and privately sends a signal, e.g., a subset of solutions, to the principal. Then, the principal selects a final solution based on the agents' signals. In stark contrast to single-agent setting by Kleinberg and Kleinberg [2018] with an approximate Bayesian mechanism, we show that there exists efficient approximate prior-independent mechanisms with both information and performance gain, thanks to the competitive tension between the agents. Interestingly however, the amount of such a compelling power significantly varies with respect to the information available to the agents, and the degree of correlation between the principal's and the agent's utility. Technically, we conduct a comprehensive study on the multi-agent delegation problem and derive several results on the approximation factors of Bayesian/prior-independent mechanisms in complete/incomplete information settings. As a special case of independent interest, we obtain comparative statics regarding the number of agents which implies the dominance of the multi-agent setting (n ≄ 2) over the single-agent setting (n = 1) in terms of the principal's utility. We further extend our problem by considering an examination cost of the mechanism and derive some analogous results in the complete information setting.
Given two strings of length n over alphabet Ī£, and an upper bound k on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) from almost … Given two strings of length n over alphabet Ī£, and an upper bound k on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) from almost forty years back computes the unweighted string edit distance in O(n+k2) time. To date, it remains the fastest algorithm for exact edit distance computation, and it is optimal under the Strong Exponential Hypothesis (Backurs and Indyk; STOC'15). Over the years, this result has inspired many developments, including fast approximation algorithms for string edit distance as well as similar ƕ(n+poly(k))-time algorithms for generalizations to tree and Dyck edit distances. Surprisingly, all these results hold only for unweighted instances.
Given two strings of length $n$ over alphabet $\Sigma$, and an upper bound $k$ on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) computes the … Given two strings of length $n$ over alphabet $\Sigma$, and an upper bound $k$ on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) computes the unweighted string edit distance in $\mathcal{O}(n+k^2)$ time. Till date, it remains the fastest algorithm for exact edit distance computation, and it is optimal under the Strong Exponential Hypothesis (STOC'15). Over the years, this result has inspired many developments, including fast approximation algorithms for string edit distance as well as similar $\tilde{\mathcal{O}}(n+$poly$(k))$-time algorithms for generalizations to tree and Dyck edit distances. Surprisingly, all these results hold only for unweighted instances. While unweighted edit distance is theoretically fundamental, almost all real-world applications require weighted edit distance, where different weights are assigned to different edit operations and may vary with the characters being edited. Given a weight function $w: \Sigma \cup \{\varepsilon \}\times \Sigma \cup \{\varepsilon \} \rightarrow \mathbb{R}_{\ge 0}$ (such that $w(a,a)=0$ and $w(a,b)\ge 1$ for all $a,b\in \Sigma \cup \{\varepsilon\}$ with $a\ne b$), the goal is to find an alignment that minimizes the total weight of edits. Except for the vanilla $\mathcal{O}(n^2)$-time dynamic-programming algorithm and its almost trivial $\mathcal{O}(nk)$-time implementation, none of the aforementioned developments on the unweighted edit distance apply to the weighted variant. In this paper, we propose the first $\mathcal{O}(n+$poly$(k))$-time algorithm that computes weighted string edit distance exactly, thus bridging a fundamental gap between our understanding of unweighted and weighted edit distance. We then generalize this result to weighted tree and Dyck edit distances, which lead to a deterministic algorithm that improves upon the previous work for unweighted tree edit distance.
We study social learning dynamics motivated by reviews on online platforms. The agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We … We study social learning dynamics motivated by reviews on online platforms. The agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals for the arms' expected rewards. We derive stark learning failures for any such behavior, and provide matching positive results. As a special case, we obtain the first general results on failure of the greedy algorithm in bandits, thus providing a theoretical foundation for why bandit algorithms should explore.
Decision trees are widely used for their low computational cost, good predictive performance, and ability to assess the importance of features. Though often used in practice for feature selection, the … Decision trees are widely used for their low computational cost, good predictive performance, and ability to assess the importance of features. Though often used in practice for feature selection, the theoretical guarantees of these methods are not well understood. We here obtain a tight finite sample bound for the feature selection problem in linear regression using single-depth decision trees. We examine the statistical properties of these "decision stumps" for the recovery of the $s$ active features from $p$ total features, where $s \ll p$. Our analysis provides tight sample performance guarantees on high-dimensional sparse systems which align with the finite sample bound of $O(s \log p)$ as obtained by Lasso, improving upon previous bounds for both the median and optimal splitting criteria. Our results extend to the non-linear regime as well as arbitrary sub-Gaussian distributions, demonstrating that tree based methods attain strong feature selection properties under a wide variety of settings and further shedding light on the success of these methods in practice. As a byproduct of our analysis, we show that we can provably guarantee recovery even when the number of active features $s$ is unknown. We further validate our theoretical results and proof methodology using computational experiments.
We consider a multi-agent delegation mechanism without money. In our model, given a set of agents, each agent has a fixed number of solutions which is exogenous to the mechanism, … We consider a multi-agent delegation mechanism without money. In our model, given a set of agents, each agent has a fixed number of solutions which is exogenous to the mechanism, and privately sends a signal, e.g., a subset of solutions, to the principal. Then, the principal selects a final solution based on the agents' signals. In stark contrast to single-agent setting by Kleinberg and Kleinberg (EC'18) with an approximate Bayesian mechanism, we show that there exists efficient approximate prior-independent mechanisms with both information and performance gain, thanks to the competitive tension between the agents. Interestingly, however, the amount of such a compelling power significantly varies with respect to the information available to the agents, and the degree of correlation between the principal's and the agent's utility. Technically, we conduct a comprehensive study on the multi-agent delegation problem and derive several results on the approximation factors of Bayesian/prior-independent mechanisms in complete/incomplete information settings. As a special case of independent interest, we obtain comparative statics regarding the number of agents which implies the dominance of the multi-agent setting ($n \ge 2$) over the single-agent setting ($n=1$) in terms of the principal's utility. We further extend our problem by considering an examination cost of the mechanism and derive some analogous results in the complete information setting.
Fault-tolerant consensus is about reaching agreement on some of the input values in a limited time by non-faulty autonomous processes, despite of failures of processes or communication medium. This problem … Fault-tolerant consensus is about reaching agreement on some of the input values in a limited time by non-faulty autonomous processes, despite of failures of processes or communication medium. This problem is particularly challenging and costly against an adaptive adversary with full information. Bar-Joseph and Ben-Or (PODC'98) were the first who proved an absolute lower bound $\Omega(\sqrt{n/\log n})$ on expected time complexity of consensus in any classic (i.e., randomized or deterministic) message-passing network with $n$ processes succeeding with probability $1$ against such a strong adaptive adversary crashing processes. Seminal work of Ben-Or and Hassidim (STOC'05) broke the $\Omega(\sqrt{n/\log n})$ barrier for consensus in classic (deterministic and randomized) networks by employing quantum computing. They showed an (expected) constant-time quantum algorithm for a linear number of crashes $t<n/3$. In this paper, we improve upon that seminal work by reducing the number of quantum and communication bits to an arbitrarily small polynomial, and even more, to a polylogarithmic number -- though, the latter in the cost of a slightly larger polylogarithmic time (still exponentially smaller than the time lower bound $\Omega(\sqrt{n/\log n})$ for classic computation).
Maximizing a monotone submodular function under cardinality constraint $k$ is a core problem in machine learning and database with many basic applications, including video and data summarization, recommendation systems, feature … Maximizing a monotone submodular function under cardinality constraint $k$ is a core problem in machine learning and database with many basic applications, including video and data summarization, recommendation systems, feature extraction, exemplar clustering, and coverage problems. We study this classic problem in the fully dynamic model where a stream of insertions and deletions of elements of an underlying ground set is given and the goal is to maintain an approximate solution using a fast update time. A recent paper at NeurIPS'20 by Lattanzi, Mitrovic, Norouzi{-}Fard, Tarnawski, Zadimoghaddam claims to obtain a dynamic algorithm for this problem with a $\frac{1}{2} -\epsilon$ approximation ratio and a query complexity bounded by $\mathrm{poly}(\log(n),\log(k),\epsilon^{-1})$. However, as we explain in this paper, the analysis has some important gaps. Having a dynamic algorithm for the problem with polylogarithmic update time is even more important in light of a recent result by Chen and Peng at STOC'22 who show a matching lower bound for the problem -- any randomized algorithm with a $\frac{1}{2}+\epsilon$ approximation ratio must have an amortized query complexity that is polynomial in $n$. In this paper, we develop a simpler algorithm for the problem that maintains a $(\frac{1}{2}-\epsilon)$-approximate solution for submodular maximization under cardinality constraint $k$ using a polylogarithmic amortized update time.
We here address the problem of fairly allocating indivisible goods or chores to $n$ agents with weights that define their entitlement to the set of indivisible resources. Stemming from well-studied … We here address the problem of fairly allocating indivisible goods or chores to $n$ agents with weights that define their entitlement to the set of indivisible resources. Stemming from well-studied fairness concepts such as envy-freeness up to one good (EF1) and envy-freeness up to any good (EFX) for agents with equal entitlements, we present, in this study, the first set of impossibility results alongside algorithmic guarantees for fairness among agents with unequal entitlements. Within this paper, we expand the concept of envy-freeness up to any good or chore to the weighted context (WEFX and XWEF respectively), demonstrating that these allocations are not guaranteed to exist for two or three agents. Despite these negative results, we develop a WEFX procedure for two agents with integer weights, and furthermore, we devise an approximate WEFX procedure for two agents with normalized weights. We further present a polynomial-time algorithm that guarantees a weighted envy-free allocation up to one chore (1WEF) for any number of agents with additive cost functions. Our work underscores the heightened complexity of the weighted fair division problem when compared to its unweighted counterpart.
Submodular maximization under matroid and cardinality constraints are classical problems with a wide range of applications in machine learning, auction theory, and combinatorial optimization. In this paper, we consider these … Submodular maximization under matroid and cardinality constraints are classical problems with a wide range of applications in machine learning, auction theory, and combinatorial optimization. In this paper, we consider these problems in the dynamic setting, where (1) we have oracle access to a monotone submodular function $f: 2^{V} \rightarrow \mathbb{R}^+$ and (2) we are given a sequence $\mathcal{S}$ of insertions and deletions of elements of an underlying ground set $V$. We develop the first fully dynamic $(4+\epsilon)$-approximation algorithm for the submodular maximization problem under the matroid constraint using an expected worst-case $O(k\log(k)\log^3{(k/\epsilon)})$ query complexity where $0 < \epsilon \le 1$. This resolves an open problem of Chen and Peng (STOC'22) and Lattanzi et al. (NeurIPS'20). As a byproduct, for the submodular maximization under the cardinality constraint $k$, we propose a parameterized (by the cardinality constraint $k$) dynamic algorithm that maintains a $(2+\epsilon)$-approximate solution of the sequence $\mathcal{S}$ at any time $t$ using an expected worst-case query complexity $O(k\epsilon^{-1}\log^2(k))$. This is the first dynamic algorithm for the problem that has a query complexity independent of the size of ground set $V$.
Approximation algorithms for the prize-collecting Steiner forest problem (PCSF) have been a subject of research for over three decades, starting with the seminal works of Agrawal, Klein, and Ravi and … Approximation algorithms for the prize-collecting Steiner forest problem (PCSF) have been a subject of research for over three decades, starting with the seminal works of Agrawal, Klein, and Ravi and Goemans and Williamson on Steiner forest and prize-collecting problems. In this paper, we propose and analyze a natural deterministic algorithm for PCSF that achieves a $2$-approximate solution in polynomial time. This represents a significant improvement compared to the previously best known algorithm with a $2.54$-approximation factor developed by Hajiaghayi and Jain in 2006. Furthermore, K{\"{o}}nemann, Olver, Pashkovich, Ravi, Swamy, and Vygen have established an integrality gap of at least $9/4$ for the natural LP relaxation for PCSF. However, we surpass this gap through the utilization of a combinatorial algorithm and a novel analysis technique. Since $2$ is the best known approximation guarantee for Steiner forest problem, which is a special case of PCSF, our result matches this factor and closes the gap between the Steiner forest problem and its generalized version, PCSF.
We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18. In this model, a principal … We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18. In this model, a principal interacts repeatedly with an agent who possesses an exogenous set of solutions to search for efficient ones. Each solution can yield varying utility for both the principal and the agent, and the agent may propose a solution to maximize its own utility in a selfish manner. To mitigate this behavior, the principal announces an eligible set which screens out a certain set of solutions. The principal, however, does not have any information on the distribution of solutions in advance. Therefore, the principal dynamically announces various eligible sets to efficiently learn the distribution. The principal's objective is to minimize cumulative regret compared to the optimal eligible set in hindsight. We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or stochastic utility. Our analysis mainly characterizes some regimes under which the principal can recover the sublinear regret, thereby shedding light on the rise and fall of the repeated delegation procedure in various regimes.
We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an … We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of $O(T^{\frac{2}{3}}(K\log(|\Pi|))^{\frac{1}{3}})$ and makes at most $O(K)$ calls per round to an offline optimization oracle, where $K$ denotes the number of actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of policies. This is the first result to improve the prior best bound of $O((TK)^{\frac{2}{3}}(\log(|\Pi|))^{\frac{1}{3}})$ as obtained by Syrgkanis et al. at NeurIPS 2016, and the first to match the original bound of Langford and Zhang at NeurIPS 2007 which was obtained for the stochastic case.
Maximizing submodular functions has been increasingly used in many applications of machine learning, such as data summarization, recommendation systems, and feature selection. Moreover, there has been a growing interest in … Maximizing submodular functions has been increasingly used in many applications of machine learning, such as data summarization, recommendation systems, and feature selection. Moreover, there has been a growing interest in both submodular maximization and dynamic algorithms. In 2020, Monemizadeh and Lattanzi, Mitrovic, Norouzi{-}Fard, Tarnawski, and Zadimoghaddam initiated developing dynamic algorithms for the monotone submodular maximization problem under the cardinality constraint $k$. Recently, there have been some improvements on the topic made by Banihashem, Biabani, Goudarzi, Hajiaghayi, Jabbarzade, and Monemizadeh. In 2022, Chen and Peng studied the complexity of this problem and raised an important open question: "Can we extend [fully dynamic] results (algorithm or hardness) to non-monotone submodular maximization?". We affirmatively answer their question by demonstrating a reduction from maximizing a non-monotone submodular function under the cardinality constraint $k$ to maximizing a monotone submodular function under the same constraint. Through this reduction, we obtain the first dynamic algorithms to solve the non-monotone submodular maximization problem under the cardinality constraint $k$. Our algorithms maintain an $(8+\epsilon)$-approximate of the solution and use expected amortized $O(\epsilon^{-3}k^3\log^3(n)\log(k))$ or $O(\epsilon^{-1}k^2\log^3(k))$ oracle queries per update, respectively. Furthermore, we showcase the benefits of our dynamic algorithm for video summarization and max-cut problems on several real-world data sets.
This paper explores the potential for leveraging Large Language Models (LLM) in the realm of online advertising systems. We delve into essential requirements including privacy, latency, reliability as well as … This paper explores the potential for leveraging Large Language Models (LLM) in the realm of online advertising systems. We delve into essential requirements including privacy, latency, reliability as well as the satisfaction of users and advertisers which such a system must fulfill. We further introduce a general framework for LLM advertisement, consisting of modification, bidding, prediction, and auction modules. Different design considerations for each module is presented, with an in-depth examination of their practicality and the technical challenges inherent to their implementation. Finally, we explore the prospect of LLM-based dynamic creative optimization as a means to significantly enhance the appeal of advertisements to users and discuss its additional challenges.
Research in fair machine learning, and particularly clustering, has been crucial in recent years given the many ethical controversies that modern intelligent systems have posed. Ahmadian et al. [2020] established … Research in fair machine learning, and particularly clustering, has been crucial in recent years given the many ethical controversies that modern intelligent systems have posed. Ahmadian et al. [2020] established the study of fairness in \textit{hierarchical} clustering, a stronger, more structured variant of its well-known flat counterpart, though their proposed algorithm that optimizes for Dasgupta's [2016] famous cost function was highly theoretical. Knittel et al. [2023] then proposed the first practical fair approximation for cost, however they were unable to break the polynomial-approximate barrier they posed as a hurdle of interest. We break this barrier, proposing the first truly polylogarithmic-approximate low-cost fair hierarchical clustering, thus greatly bridging the gap between the best fair and vanilla hierarchical clustering approximations.
We study a problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms to maximize their payoff. We consider Bayesian agents who are unaware of … We study a problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms to maximize their payoff. We consider Bayesian agents who are unaware of ex-post realization of their own arms' mean rewards, which is the first to study Bayesian extension of Shin et al. (2022). This extension presents significant challenges in analyzing equilibrium, in contrast to the fully-informed setting by Shin et al. (2022) under which the problem simply reduces to a case where each agent only has a single arm. With Bayesian agents, even in a single-agent setting, analyzing the replication-proofness of an algorithm becomes complicated. Remarkably, we first show that the algorithm proposed by Shin et al. (2022), defined H-UCB, is no longer replication-proof for any exploration parameters. Then, we provide sufficient and necessary conditions for an algorithm to be replication-proof in the single-agent setting. These results centers around several analytical results in comparing the expected regret of multiple bandit instances, which might be of independent interest. We further prove that exploration-then-commit (ETC) algorithm satisfies these properties, whereas UCB does not, which in fact leads to the failure of being replication-proof. We expand this result to multi-agent setting, and provide a replication-proof algorithm for any problem instance. The proof mainly relies on the single-agent result, as well as some structural properties of ETC and the novel introduction of a restarting round, which largely simplifies the analysis while maintaining the regret unchanged (up to polylogarithmic factor). We finalize our result by proving its sublinear regret upper bound, which matches that of H-UCB.
We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the … We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the Massively Parallel Computation (MPC) model. In the past decade, research on highly scalable algorithms has had significant impact on many massive systems. The MPC model, introduced in 2010 by Karloff et al. [16], which is an abstraction of famous practical frameworks such as MapReduce, Hadoop, Flume, and Spark, has been at the forefront of this research. While great strides have been taken to create highly efficient MPC algorithms for a range of problems, recent progress has been limited by the 1-vs-2 Cycle Conjecture [20], which postulates that the simple problem of distinguishing between one and two cycles requires Ī©(log n) MPC rounds. In the AMPC model, each machine has adaptive read access to a distributed hash table even when communication is restricted (i.e., in the middle of a round). While remaining practical [4], this gives algorithms the power to bypass limitations like the 1-vs-2 Cycle Conjecture.
In this paper, we generalize the recently studied stochastic matching problem to more accurately model a significant medical process, kidney exchange, and several other applications. Up until now the stochastic … In this paper, we generalize the recently studied stochastic matching problem to more accurately model a significant medical process, kidney exchange, and several other applications. Up until now the stochastic matching problem that has been studied was as follows: given a graph G= (V,E), each edge is included in the realized sub-graph of G independently with probability pe, and the goal is to find a degree-bounded sub-graph Q of G that has an expected maximum matching that approximates the expected maximum matching of G. This model does not account for possibilities of vertex dropouts, which can be found in several applications, e.g. in kidney exchange when donors or patients opt out of the exchange process as well as in online freelancing and online dating when online profiles are found to be faked. Thus, we will study a more generalized model of stochastic matching in which vertices and edges are both realized independently with some probabilities pv, pe, respectively, which more accurately fits important applications than the previously studied model. We will discuss the first algorithms and analysis for this generalization of the stochastic matching model and prove that they achieve good approximation ratios. In particular, we show that the approximation factor of a natural algorithm for this problem is at least 0.6568 in unweighted graphs, and 1/2+ε in weighted graphs for some constant ε &gt;0. We further improve our result for unweighted graphs to 2/3 using edge degree constrained sub-graphs (EDCS).
Consensus is one of the most thoroughly studied problems in distributed computing, yet there are still complexity gaps that have not been bridged for decades. In particular, in the classical … Consensus is one of the most thoroughly studied problems in distributed computing, yet there are still complexity gaps that have not been bridged for decades. In particular, in the classical message-passing setting with processes' crashes, since the seminal works of Bar-Joseph and Ben-Or [1998] \cite{Bar-JosephB98} and Aspnes and Waarts [1996, 1998] \cite{AspnesW-SICOMP-96,Aspnes-JACM-98} in the previous century, there is still a fundamental unresolved question about communication complexity of fast randomized Consensus against a (strong) adaptive adversary crashing processes arbitrarily online. The best known upper bound on the number of communication bits is $\Theta(\frac{n^{3/2}}{\sqrt{\log{n}}})$ per process, while the best lower bound is $\Omega(1)$. This is in contrast to randomized Consensus against a (weak) oblivious adversary, for which time-almost-optimal algorithms guarantee amortized $O(1)$ communication bits per process \cite{GK-SODA-10}. We design an algorithm against adaptive adversary that reduces the communication gap by nearly linear factor to $O(\sqrt{n}\cdot\text{polylog } n)$ bits per process, while keeping almost-optimal (up to factor $O(\log^3 n)$) time complexity $O(\sqrt{n}\cdot\log^{5/2} n)$. More surprisingly, we show this complexity indeed can be lowered further, but at the expense of increasing time complexity, i.e., there is a {\em trade-off} between communication complexity and time complexity. More specifically, our main Consensus algorithm allows to reduce communication complexity per process to any value from $\text{polylog } n$ to $O(\sqrt{n}\cdot\text{polylog } n)$, as long as Time $\times$ Communication $= O(n\cdot \text{polylog } n)$. Similarly, reducing time complexity requires more random bits per process, i.e., Time $\times$ Randomness $=O(n\cdot \text{polylog } n)$.
Clustering is a fundamental building block of modern statistical analysis pipelines. Fair clustering has seen much attention from the machine learning community in recent years. We are some of the … Clustering is a fundamental building block of modern statistical analysis pipelines. Fair clustering has seen much attention from the machine learning community in recent years. We are some of the first to study fairness in the context of hierarchical clustering, after the results of Ahmadian et al. from NeurIPS in 2020. We evaluate our results using Dasgupta's cost function, perhaps one of the most prevalent theoretical metrics for hierarchical clustering evaluation. Our work vastly improves the previous $O(n^{5/6}poly\log(n))$ fair approximation for cost to a near polylogarithmic $O(n^\delta poly\log(n))$ fair approximation for any constant $\delta\in(0,1)$. This result establishes a cost-fairness tradeoff and extends to broader fairness constraints than the previous work. We also show how to alter existing hierarchical clusterings to guarantee fairness and cluster balance across any level in the hierarchy.
In this paper, we analyze a natural learning algorithm for uniform pacing of advertising budgets, equipped to adapt to varying ad sale platform conditions. On the demand side, advertisers face … In this paper, we analyze a natural learning algorithm for uniform pacing of advertising budgets, equipped to adapt to varying ad sale platform conditions. On the demand side, advertisers face a fundamental technical challenge in automating bidding in a way that spreads their allotted budget across a given campaign subject to hidden, and potentially dynamic, cost functions. This automation and calculation must be done in runtime, implying a necessarily low computational cost for the high frequency auction rate. Advertisers are additionally expected to exhaust nearly all of their sub-interval (by the hour or minute) budgets to maintain budgeting quotas in the long run. To resolve this challenge, our study analyzes a simple learning algorithm that adapts to the latent cost function of the market and learns the optimal average bidding value for a period of auctions in a small fraction of the total campaign time, allowing for smooth budget pacing in real-time. We prove our algorithm is robust to changes in the auction mechanism, and exhibits a fast convergence to a stable average bidding strategy. The algorithm not only guarantees that budgets are nearly spent in their entirety, but also smoothly paces bidding to prevent early exit from the campaign and a loss of the opportunity to bid on potentially lucrative impressions later in the period. In addition to the theoretical guarantees, we validate our algorithm with experimental results from open source data on real advertising campaigns to further demonstrate the effectiveness of our proposed approach.
In this paper, we generalize the recently studied Stochastic Matching problem to more accurately model a significant medical process, kidney exchange, and several other applications. Up until now the Stochastic … In this paper, we generalize the recently studied Stochastic Matching problem to more accurately model a significant medical process, kidney exchange, and several other applications. Up until now the Stochastic Matching problem that has been studied was as follows: given a graph G = (V, E), each edge is included in the realized sub-graph of G mutually independently with probability p_e, and the goal is to find a degree-bounded sub-graph Q of G that has an expected maximum matching that approximates the expected maximum matching of the realized sub-graph. This model does not account for possibilities of vertex dropouts, which can be found in several applications, e.g. in kidney exchange when donors or patients opt out of the exchange process as well as in online freelancing and online dating when online profiles are found to be faked. Thus, we will study a more generalized model of Stochastic Matching in which vertices and edges are both realized independently with some probabilities p_v, p_e, respectively, which more accurately fits important applications than the previously studied model. We will discuss the first algorithms and analysis for this generalization of the Stochastic Matching model and prove that they achieve good approximation ratios. In particular, we show that the approximation factor of a natural algorithm for this problem is at least $0.6568$ in unweighted graphs, and $1/2 + \epsilon$ in weighted graphs for some constant $\epsilon > 0$. We further improve our result for unweighted graphs to $2/3$ using edge degree constrained subgraphs (EDCS).
Computing the edit distance of two strings is one of the most basic problems in computer science and combinatorial optimization. Tree edit distance is a natural generalization of edit distance … Computing the edit distance of two strings is one of the most basic problems in computer science and combinatorial optimization. Tree edit distance is a natural generalization of edit distance in which the task is to compute a measure of dissimilarity between two (unweighted) rooted trees with node labels. Perhaps the most notable recent application of tree edit distance is in NoSQL big databases, such as MongoDB, where each row of the database is a JSON document represented as a labeled rooted tree, and finding dissimilarity between two rows is a basic operation. Until recently, the fastest algorithm for tree edit distance ran in cubic time (Demaine, Mozes, Rossman, Weimann; TALG'10); however, Mao (FOCS'21) broke the cubic barrier for the tree edit distance problem using fast matrix multiplication. Given a parameter $k$ as an upper bound on the distance, an $O(n+k^2)$-time algorithm for edit distance has been known since the 1980s due to the works of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88). The existence of an $\tilde{O}(n+\mathrm{poly}(k))$-time algorithm for tree edit distance has been posed as an open question, e.g., by Akmal and Jin (ICALP'21), who gave a state-of-the-art $\tilde{O}(nk^2)$-time algorithm. In this paper, we answer this question positively.
The Santa Claus problem is a fundamental problem in fair division: the goal is to partition a set of heterogeneous items among heterogeneous agents so as to maximize the minimum … The Santa Claus problem is a fundamental problem in fair division: the goal is to partition a set of heterogeneous items among heterogeneous agents so as to maximize the minimum value of items received by any agent. In this paper, we study the online version of this problem where the items are not known in advance and have to be assigned to agents as they arrive over time. If the arrival order of items is arbitrary, then no good assignment rule exists in the worst case. However, we show that, if the arrival order is random, then for $n$ agents and any $\varepsilon > 0$, we can obtain a competitive ratio of $1-\varepsilon$ when the optimal assignment gives value at least $\Omega(\log n / \varepsilon^2)$ to every agent (assuming each item has at most unit value). We also show that this result is almost tight: namely, if the optimal solution has value at most $C \ln n / \varepsilon$ for some constant $C$, then there is no $(1-\varepsilon)$-competitive algorithm even for random arrival order.
We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the … We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the Massively Parallel Computation (MPC) model. In the past decade, research on highly scalable algorithms has had significant impact on many massive systems. The MPC model, introduced in 2010 by Karloff et al. [16], which is an abstraction of famous practical frameworks such as MapReduce, Hadoop, Flume, and Spark, has been at the forefront of this research. While great strides have been taken to create highly efficient MPC algorithms for a range of problems, recent progress has been limited by the 1-vs-2 Cycle Conjecture [20], which postulates that the simple problem of distinguishing between one and two cycles requires $\Omega(\log n)$ MPC rounds. In the AMPC model, each machine has adaptive read access to a distributed hash table even when communication is restricted (i.e., in the middle of a round). While remaining practical [4], this gives algorithms the power to bypass limitations like the 1-vs-2 Cycle Conjecture. We give the first sublogarithmic AMPC algorithm, requiring $O(\log\log n)$ rounds, for $(2+\epsilon)$-approximate weighted Min Cut. Our algorithm is inspired by the divide and conquer approach of Ghaffari and Nowicki [11], which solves the $(2+\epsilon)$-approximate weighted Min Cut problem in $O(\log n\log\log n)$ rounds of MPC using the classic result of Karger and Stein [15]. Our work is fully-scalable in the sense that the local memory of each machine is $O(n^\epsilon)$ for any constant $0 < \epsilon < 1$. There are no $o(\log n)$-round MPC algorithms for Min Cut in this memory regime assuming the 1-vs-2 Cycle Conjecture holds. The exponential speedup in AMPC is the result of decoupling the different layers of the divide and conquer algorithm and solving all layers in $O(1)$ rounds.
Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 27 February 2020Accepted: 03 June 2021Published online: 30 September 2021Keywordsexternal memory, integer sorting, … Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 27 February 2020Accepted: 03 June 2021Published online: 30 September 2021Keywordsexternal memory, integer sorting, network coding, Li and Li conjectureAMS Subject Headings94A24, 68P10Publication DataISSN (print): 0097-5397ISSN (online): 1095-7111Publisher: Society for Industrial and Applied MathematicsCODEN: smjcat
We study distributed algorithms for string matching problem in presence of wildcard characters. Given a string T (a text), we look for all occurrences of another string P (a pattern) … We study distributed algorithms for string matching problem in presence of wildcard characters. Given a string T (a text), we look for all occurrences of another string P (a pattern) as a substring of string T . Each wildcard character in the pattern matches a specific class of strings based on its type. String matching is one of the most fundamental problems in computer science, especially in the fields of bioinformatics and machine learning. Persistent effort has led to a variety of algorithms for the problem since 1960s. With rise of big data and the inevitable demand to solve problems on huge data sets, there have been many attempts to adapt classic algorithms into the MPC framework to obtain further efficiency. MPC is a recent framework for parallel computation of big data, which is designed to capture the MapReduce-like algorithms. In this paper, we study the string matching problem using a set of tools translated to MPC model. We consider three types of wildcards in string matching: - '?' wildcard: In this setting, the pattern is allowed to contain special '?' characters or don't cares that match any character of the text. String matching with don't cares could be solved by fast convolutions, and we give a constant round MPC algorithm for which by utilizing FFT in a constant number of MPC rounds. - '+' wildcard: '+' wildcard is a special character that allows for arbitrary repetitions of a character. When the pattern contains '+' wildcard characters, our algorithm runs in a constant number of MPC rounds by a reduction from subset matching problem. - '*' wildcard: '*' is a special character that matches with any substring of the text. When '*' is allowed in the pattern, we solve two special cases of the problem in logarithmic rounds.
The edit distance between two strings is defined as the smallest number of insertions , deletions , and substitutions that need to be made to transform one of the strings … The edit distance between two strings is defined as the smallest number of insertions , deletions , and substitutions that need to be made to transform one of the strings to another one. Approximating edit distance in subquadratic time is ā€œone of the biggest unsolved problems in the field of combinatorial pattern matchingā€ [37]. Our main result is a quantum constant approximation algorithm for computing the edit distance in truly subquadratic time. More precisely, we give an <?TeX $O(n^{1.810})$?> quantum algorithm that approximates the edit distance within a factor of 3. We further extend this result to an <?TeX $O(n^{1.708})$?> quantum algorithm that approximates the edit distance within a larger constant factor. Our solutions are based on a framework for approximating edit distance in parallel settings. This framework requires as black box an algorithm that computes the distances of several smaller strings all at once. For a quantum algorithm, we reduce the black box to metric estimation and provide efficient algorithms for approximating it. We further show that this framework enables us to approximate edit distance in distributed settings. To this end, we provide a MapReduce algorithm to approximate edit distance within a factor of <?TeX $1+\epsilon$?> , with sublinearly many machines and sublinear memory. Also, our algorithm runs in a logarithmic number of rounds.
Hierarchical clustering is a stronger extension of one of today's most influential unsupervised learning methods: clustering. The goal of this method is to create a hierarchy of clusters, thus constructing … Hierarchical clustering is a stronger extension of one of today's most influential unsupervised learning methods: clustering. The goal of this method is to create a hierarchy of clusters, thus constructing cluster evolutionary history and simultaneously finding clusterings at all resolutions. We propose four traits of interest for hierarchical clustering algorithms: (1) empirical performance, (2) theoretical guarantees, (3) cluster balance, and (4) scalability. While a number of algorithms are designed to achieve one to two of these traits at a time, there exist none that achieve all four. Inspired by Bateni et al.'s scalable and empirically successful Affinity Clustering [NeurIPs 2017], we introduce Affinity Clustering's successor, Matching Affinity Clustering. Like its predecessor, Matching Affinity Clustering maintains strong empirical performance and uses Massively Parallel Communication as its distributed model. Designed to maintain provably balanced clusters, we show that our algorithm achieves good, constant factor approximations for Moseley and Wang's revenue and Cohen-Addad et al.'s value. We show Affinity Clustering cannot approximate either function. Along the way, we also introduce an efficient $k$-sized maximum matching algorithm in the MPC model.
Given a graph, the densest subgraph problem asks for a set of vertices such that the average degree among these vertices is maximized. Densest subgraph has numerous applications in learning, … Given a graph, the densest subgraph problem asks for a set of vertices such that the average degree among these vertices is maximized. Densest subgraph has numerous applications in learning, e.g., community detection in social networks, link spam detection, correlation mining, bioinformatics, and so on. Although there are efficient algorithms that output either exact or approximate solutions to the densest subgraph problem, existing algorithms may violate the privacy of the individuals in the network, e.g., leaking the existence/non-existence of edges. In this paper, we study the densest subgraph problem in the framework of the differential privacy, and we derive the first upper and lower bounds for this problem. We show that there exists a linear-time $\epsilon$-differentially private algorithm that finds a $2$-approximation of the densest subgraph with an extra poly-logarithmic additive error. Our algorithm not only reports the approximate density of the densest subgraph, but also reports the vertices that form the dense subgraph. Our upper bound almost matches the famous $2$-approximation by Charikar both in performance and in approximation ratio, but we additionally achieve differential privacy. In comparison with Charikar's algorithm, our algorithm has an extra poly-logarithmic additive error. We partly justify the additive error with a new lower bound, showing that for any differentially private algorithm that provides a constant-factor approximation, a sub-logarithmic additive error is inherent. We also practically study our differentially private algorithm on real-world graphs, and we show that in practice the algorithm finds a solution which is very close to the optimal
Miller and Reif's FOCS'85 classic and fundamental tree contraction algorithm is a broadly applicable technique for the parallel solution of a large number of tree problems. Additionally it is also … Miller and Reif's FOCS'85 classic and fundamental tree contraction algorithm is a broadly applicable technique for the parallel solution of a large number of tree problems. Additionally it is also used as an algorithmic design technique for a large number of parallel graph algorithms. In all previously explored models of computation, however, tree contractions have only been achieved in $\Omega(\log n)$ rounds of parallel run time. In this work, we not only introduce a generalized tree contraction method but also show it can be computed highly efficiently in $O(1/\epsilon^3)$ rounds in the Adaptive Massively Parallel Computing (AMPC) setting, where each machine has $O(n^\epsilon)$ local memory for some $0 < \epsilon < 1$. AMPC is a practical extension of Massively Parallel Computing (MPC) which utilizes distributed hash tables. In general, MPC is an abstract model for MapReduce, Hadoop, Spark, and Flume which are currently widely used across industry and has been studied extensively in the theory community in recent years. Last but not least, we show that our results extend to multiple problems on trees, including but not limited to maximum and maximal matching, maximum and maximal independent set, tree isomorphism testing, and more.
Hierarchical clustering is a stronger extension of one of today's most influential unsupervised learning methods: clustering. The goal of this method is to create a hierarchy of clusters, thus constructing … Hierarchical clustering is a stronger extension of one of today's most influential unsupervised learning methods: clustering. The goal of this method is to create a hierarchy of clusters, thus constructing cluster evolutionary history and simultaneously finding clusterings at all resolutions. We propose four traits of interest for hierarchical clustering algorithms: (1) empirical performance, (2) theoretical guarantees, (3) cluster balance, and (4) scalability. While a number of algorithms are designed to achieve one to two of these traits at a time, there exist none that achieve all four. Inspired by Bateni et al.'s scalable and empirically successful Affinity Clustering [NeurIPs 2017], we introduce Affinity Clustering's successor, Matching Affinity Clustering. Like its predecessor, Matching Affinity Clustering maintains strong empirical performance and uses Massively Parallel Communication as its distributed model. Designed to maintain provably balanced clusters, we show that our algorithm achieves good, constant factor approximations for Moseley and Wang's revenue and Cohen-Addad et al.'s value. We show Affinity Clustering cannot approximate either function. Along the way, we also introduce an efficient $k$-sized maximum matching algorithm in the MPC model.
Envy-free up to one good (EF1) and envy-free up to any good (EFX) are two well-known extensions of envy-freeness for the case of indivisible items. It is shown that EF1 … Envy-free up to one good (EF1) and envy-free up to any good (EFX) are two well-known extensions of envy-freeness for the case of indivisible items. It is shown that EF1 can always be guaranteed for agents with subadditive valuations. In sharp contrast, it is unknown whether or not an EFX allocation always exists, even for four agents and additive valuations. In addition, the best approximation guarantee for EFX is $(\phi -1) \simeq 0.61$ by Amanitidis et al.. In order to find a middle ground to bridge this gap, in this paper we suggest another fairness criterion, namely envy-freeness up to a random good or EFR, which is weaker than EFX, yet stronger than EF1. For this notion, we provide a polynomial-time $0.73$-approximation allocation algorithm. For our algorithm, we introduce Nash Social Welfare Matching which makes a connection between Nash Social Welfare and envy freeness. We believe Nash Social Welfare Matching will find its applications in future work.
Content Distribution and End-to-End Monitoring with Set Cover by Pairs Content Distribution and End-to-End Monitoring with Set Cover by Pairs
This paper proposes inverse feature learning as a novel supervised feature learning technique that learns a set of high-level features for classification based on an error representation approach. The key … This paper proposes inverse feature learning as a novel supervised feature learning technique that learns a set of high-level features for classification based on an error representation approach. The key contribution of this method is to learn the representation of error as high-level features, while current representation learning methods interpret error by loss functions which are obtained as a function of differences between the true labels and the predicted ones. One advantage of such learning method is that the learned features for each class are independent of learned features for other classes; therefore, this method can learn simultaneously meaning that it can learn new classes without retraining. Error representation learning can also help with generalization and reduce the chance of over-fitting by adding a set of impactful features to the original data set which capture the relationships between each instance and different classes through an error generation and analysis process. This method can be particularly effective in data sets, where the instances of each class have diverse feature representations or the ones with imbalanced classes. The experimental results show that the proposed method results in significantly better performance compared to the state-of-the-art classification techniques for several popular data sets. We hope this paper can open a new path to utilize the proposed perspective of error representation learning in different feature learning domains.

Commonly Cited References

We give algorithms for geometric graph problems in the modern parallel models such as MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in … We give algorithms for geometric graph problems in the modern parallel models such as MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1 + ε)-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem [9], despite drawing significant attention in recent years.
Consider a gambler who observes a sequence of independent, non-negative random numbers and is allowed to stop the sequence at any time, claiming a reward equal to the most recent … Consider a gambler who observes a sequence of independent, non-negative random numbers and is allowed to stop the sequence at any time, claiming a reward equal to the most recent observation. The famous prophet inequality of Krengel, Sucheston, and Garling asserts that a gambler who knows the distribution of each random variable can achieve at least half as much reward, in expectation, as a "prophet" who knows the sampled values of each random variable and can choose the largest one. We generalize this result to the setting in which the gambler and the prophet are allowed to make more than one selection, subject to a matroid constraint. We show that the gambler can still achieve at least half as much reward as the prophet; this result is the best possible, since it is known that the ratio cannot be improved even in the original prophet inequality, which corresponds to the special case of rank-one matroids. Generalizing the result still further, we show that under an intersection of $p$ matroid constraints, the prophet's reward exceeds the gambler's by a factor of at most $O(p)$, and this factor is also tight.
The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. … The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit distance between two strings is a classical computational task, with a well-known algorithm based on dynamic programming. Unfortunately, all known algorithms for this problem run in nearly quadratic time.
We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length n and every fixed ε >; 0, the … We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length n and every fixed ε >; 0, the algorithm computes a (log n) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(1/ε)</sup> approximation in n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1+ε</sup> time. This is an exponential improvement over the previously known approximation factor, 2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ƕ</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(√log n)</sup> , with a comparable running time [Ostrovsky and Rabani, J. ACM 2007; Andoni and Onak, STOC 2009]. This result arises naturally in the study of a new asymmetric query model. In this model, the input consists of two strings x and y, and an algorithm can access y in an unrestricted manner, while being charged for querying every symbol of x. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being "repetitive", which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance.
Given an undirected graph $G$, a collection {(s1,t1), ..., (sl,tl)} of pairs of vertices, and an integer p, the Edge Multicut problem ask if there is a set S of … Given an undirected graph $G$, a collection {(s1,t1), ..., (sl,tl)} of pairs of vertices, and an integer p, the Edge Multicut problem ask if there is a set S of at most p edges such that the removal of S disconnects every si from the corresponding ti. Vertex Multicut is the analogous problem where S is a set of at most p vertices. Our main result is that both problems can be solved in time 2O(p3) ā‹… nO(1), i.e., fixed-parameter tractable parameterized by the size p of the cutset in the solution. By contrast, it is unlikely that an algorithm with running time of the form f(p) ā‹… nO(1) exists for the directed version of the problem, as we show it to be W[1]-hard parameterized by the size of the cutset.
For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their … For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms?
Hill and Kertz studied the prophet inequality on iid distributions [The Annals of Probability 1982]. They proved a theoretical bound of 1 - 1/e on the approximation factor of their … Hill and Kertz studied the prophet inequality on iid distributions [The Annals of Probability 1982]. They proved a theoretical bound of 1 - 1/e on the approximation factor of their algorithm. They conjectured that the best approximation factor for arbitrarily large n is 1/1+1/e ā‰ƒ 0.731. This conjecture remained open prior to this paper for over 30 years. In this paper we present a threshold-based algorithm for the prophet inequality with n iid distributions. Using a nontrivial and novel approach we show that our algorithm is a 0.738-approximation algorithm. By beating the bound of 1/1+1/e, this refutes the conjecture of Hill and Kertz. Moreover, we generalize our results to non-uniform distributions and discuss its applications in mechanism design.
For revenue and welfare maximization in single-dimensional Bayesian settings, Chawla et al. (STOC10) recently showed that sequential posted-price mechanisms (SPMs), though simple in form, can perform surprisingly well compared to … For revenue and welfare maximization in single-dimensional Bayesian settings, Chawla et al. (STOC10) recently showed that sequential posted-price mechanisms (SPMs), though simple in form, can perform surprisingly well compared to the optimal mechanisms. In this paper, we give a theoretical explanation of this fact, based on a connection to the notion of correlation gap. Loosely speaking, for auction environments with matroid constraints, we can relate the performance of a mechanism to the expectation of a monotone submodular function over a random set. This random set corresponds to the winner set for the optimal mechanism, which is highly correlated, and corresponds to certain demand set for SPMs, which is independent. The notion of correlation gap of Agrawal et al.\ (SODA10) quantifies how much we {}"lose" in the expectation of the function by ignoring correlation in the random set, and hence bounds our loss in using certain SPM instead of the optimal mechanism. Furthermore, the correlation gap of a monotone and submodular function is known to be small, and it follows that certain SPM can approximate the optimal mechanism by a good constant factor. Exploiting this connection, we give tight analysis of a greedy-based SPM of Chawla et al.\ for several environments. In particular, we show that it gives an $e/(e-1)$-approximation for matroid environments, gives asymptotically a $1/(1-1/\sqrt{2\pi k})$-approximation for the important sub-case of $k$-unit auctions, and gives a $(p+1)$-approximation for environments with $p$-independent set system constraints.
article Free Access Share on Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming Authors: Michel X. Goemans Massachusetts Institute of Technology, Cambridge Massachusetts Institute of Technology, … article Free Access Share on Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming Authors: Michel X. Goemans Massachusetts Institute of Technology, Cambridge Massachusetts Institute of Technology, CambridgeView Profile , David P. Williamson IBM T. J. Watson Research Center, Yorktown Heights, NY IBM T. J. Watson Research Center, Yorktown Heights, NYView Profile Authors Info & Claims Journal of the ACMVolume 42Issue 6Nov. 1995 pp 1115–1145https://doi.org/10.1145/227683.227684Published:01 November 1995Publication History 2,172citation9,389DownloadsMetricsTotal Citations2,172Total Downloads9,389Last 12 Months1,081Last 6 weeks88 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
We present O(loglog n) -round algorithms in the Massively Parallel Computation (MPC) model, with ƕ (n) memory per machine, that compute a maximal independent set, a 1+ε approximation of maximum … We present O(loglog n) -round algorithms in the Massively Parallel Computation (MPC) model, with ƕ (n) memory per machine, that compute a maximal independent set, a 1+ε approximation of maximum matching, and a 2+εapproximation of minimum vertex cover, for any n-vertex graph and any constant \eps>0. These improve the state of the art as follows:
We study the maximum weight matching problem in the semi-streaming model, and improve on the currently best one-pass algorithm due to Zelke (Proc. of STACS2008, pages 669-680) by devising a … We study the maximum weight matching problem in the semi-streaming model, and improve on the currently best one-pass algorithm due to Zelke (Proc. of STACS2008, pages 669-680) by devising a deterministic approach whose performance guarantee is 4.91+epsilon. In addition, we study preemptive online algorithms, a sub-class of one-pass algorithms where we are only allowed to maintain a feasible matching in memory at any point in time. All known results prior to Zelke's belong to this sub-class. We provide a lower bound of 4.967 on the competitive ratio of any such deterministic algorithm, and hence show that future improvements will have to store in memory a set of edges which is not necessarily a feasible matching.
In the Colonel Blotto game, which was initially introduced by Borel in 1921, two colonels simultaneously distribute their troops across different battlefields. The winner of each battlefield is determined independently … In the Colonel Blotto game, which was initially introduced by Borel in 1921, two colonels simultaneously distribute their troops across different battlefields. The winner of each battlefield is determined independently by a winner-take-all rule. The ultimate payoff of each colonel is the number of battlefields he wins. This game is commonly used for analyzing a wide range of applications such as the U.S presidential election, innovative technology competitions, advertisements, etc. There have been persistent efforts for finding the optimal strategies for the Colonel Blotto game. After almost a century Ahmadinejad, Dehghani, Hajiaghayi, Lucier, Mahini, and Seddighin provided a poly-time algorithm for finding the optimal strategies. They first model the problem by a Linear Program (LP) and use Ellipsoid method to solve it. However, despite the theoretical importance of their algorithm, it is highly impractical. In general, even Simplex method (despite its exponential running-time) performs better than Ellipsoid method in practice. In this paper, we provide the first polynomial-size LP formulation of the optimal strategies for the Colonel Blotto game. We use linear extension techniques. Roughly speaking, we project the strategy space polytope to a higher dimensional space, which results in a lower number of facets for the polytope. We use this polynomial-size LP to provide a novel, simpler and significantly faster algorithm for finding the optimal strategies for the Colonel Blotto game. We further show this representation is asymptotically tight in terms of the number of constraints. We also extend our approach to multi-dimensional Colonel Blotto games, and implement our algorithm to observe interesting properties of Colonel Blotto; for example, we observe the behavior of players in the discrete model is very similar to the previously studied continuous model.
One of the biggest open problems in external memory data structures is the priority queue problem with DecreaseKey operations. If only Insert and ExtractMin operations need to be supported, one … One of the biggest open problems in external memory data structures is the priority queue problem with DecreaseKey operations. If only Insert and ExtractMin operations need to be supported, one can design a comparison-based priority queue performing O((N/B)lgM/B N) I/Os over a sequence of N operations, where B is the disk block size in number of words and M is the main memory size in number of words. This matches the lower bound for comparison-based sorting and is hence optimal for comparison-based priority queues. However, if we also need to support DecreaseKeys, the performance of the best known priority queue is only O((N/B) lg2 N) I/Os. The big open question is whether a degradation in performance really is necessary. We answer this question affirmatively by proving a lower bound of Ī©((N/B) lglgN B) I/Os for processing a sequence of N intermixed Insert, ExtraxtMin and DecreaseKey operations. Our lower bound is proved in the cell probe model and thus holds also for non-comparison-based priority queues.
Given a directed graph G, a set of k terminals and an integer p, the Directed Vertex Multiway Cut problem asks if there is a set S of at most … Given a directed graph G, a set of k terminals and an integer p, the Directed Vertex Multiway Cut problem asks if there is a set S of at most p (nonterminal) vertices whose removal disconnects each terminal from all other terminals. Directed Edge Multiway Cut is the analogous problem where S is a set of at most p edges. These two problems indeed are known to be equivalent. A natural generalization of the multiway cut is the multicut problem, in which we want to disconnect only a set of k given pairs instead of all pairs. Marx (Theor. Comp. Sci. 2006) showed that in undirected graphs multiway cut is fixed-parameter tractable (FPT) parameterized by p. Marx and Razgon (STOC 2011) showed that undirected multicut is FPT and directed multicut is W[1]-hard parameterized by p. We complete the picture here by our main result which is that both Directed Vertex Multiway Cut and Directed Edge Multiway Cut can be solved in time 22O(p) nO(1), i.e., FPT parameterized by size p of the cutset of the solution. This answers an open question raised by Marx (Theor. Comp. Sci. 2006) and Marx and Razgon (STOC 2011). It follows from our result that Directed Multicut is FPT for the case of k = 2 terminal pairs, which answers another open problem raised in Marx and Razgon (STOC 2011).
For all uniformly bounded sequences of independent random variables X 1 , X 2, Ā·Ā·Ā·, a complete comparison is made between the optimal value V ( X 1 , X … For all uniformly bounded sequences of independent random variables X 1 , X 2, Ā·Ā·Ā·, a complete comparison is made between the optimal value V ( X 1 , X 2 , Ā·Ā·Ā·) = sup { EX t :t is an (a.e.) finite stop rule for X 1, X 2 , Ā·Ā·Ā·} and , where M i ( X 1, X 2 , Ā·Ā·Ā·) is the i th largest order statistic for X 1 , X 2 , Ā·Ā·Ā· In particular, for k&amp;gt; 1, the set of ordered pairs {( x , y ): x = V ( X 1 , X 2, Ā·Ā·Ā·) and for some independent random variables X 1 , X 2 , Ā·Ā·Ā· taking values in [0, 1]} is precisely the set , where B k (0) = 0, B k (1) = 1, and for The result yields sharp, universal inequalities for independent random variables comparing two choice mechanisms, the mortal&amp;amp;s value of the game V ( X 1 , X 2, Ā·Ā·Ā·) and the prophet&amp;amp;s constrained maxima expectation of the game . Techniques of proof include probability- and convexity-based reductions; calculus-based, multivariate, extremal problem analysis; and limit theorems of Poisson-approximation type. Precise results are also given for finite sequences of independent random variables.
We consider the problem of inserting one item into a list of N-1 ordered items. We previously showed that no quantum algorithm could solve this problem in fewer than log … We consider the problem of inserting one item into a list of N-1 ordered items. We previously showed that no quantum algorithm could solve this problem in fewer than log N/(2 log log N) queries, for N large. We transform the problem into a "translationally invariant" problem and restrict attention to invariant algorithms. We construct the "greedy" invariant algorithm and show numerically that it outperforms the best classical algorithm for various N. We also find invariant algorithms that succeed exactly in fewer queries than is classically possible, and iterating one of them shows that the insertion problem can be solved in fewer than 0.53 log N quantum queries for large N (where log N is the classical lower bound). We don't know whether a o(log N) algorithm exists.
We consider the problem of computing a relational query q on a large input database of size n, using a large number p of servers. The computation is performed in … We consider the problem of computing a relational query q on a large input database of size n, using a large number p of servers. The computation is performed in rounds, and each server can receive only O(n/p1-ε) bits of data, where ε ∈[0,1] is a parameter that controls replication. We examine how many global communication steps are needed to compute q. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires ε ≄ 1--1/Ļ„*, where Ļ„* is the fractional vertex cover of the hypergraph of q. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent ε. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication.
In this paper we present a quantum algorithm solving the triangle finding problem in unweighted graphs with query complexity ƕ(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5/4</sup> ), where n denotes the number of … In this paper we present a quantum algorithm solving the triangle finding problem in unweighted graphs with query complexity ƕ(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5/4</sup> ), where n denotes the number of vertices in the graph. This improves the previous upper bound O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">9/7</sup> ) = O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1.285</sup> ) recently obtained by Lee, Magniez and Santha. Our result shows, for the first time, that in the quantum query complexity setting unweighted triangle finding is easier than its edge-weighted version, since for finding an edge-weighted triangle Belovs and Rosmanis proved that any quantum algorithm requires O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">9/7</sup> / √log n) queries. Our result also illustrates some limitations of the non-adaptive learning graph approach used to obtain the previous O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">9/7</sup> ) upper bound since, even over unweighted graphs, any quantum algorithm for triangle finding obtained using this approach requires v(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">9/7</sup> / √log n) queries as well. To bypass the obstacles characterized by these lower bounds, our quantum algorithm uses combinatorial ideas exploiting the graph-theoretic properties of triangle finding, which cannot be used when considering edge-weighted graphs or the non-adaptive learning graph approach.
In this article, we will formalize the method of dual fitting and the idea of factor-revealing LP. This combination is used to design and analyze two greedy algorithms for the … In this article, we will formalize the method of dual fitting and the idea of factor-revealing LP. This combination is used to design and analyze two greedy algorithms for the metric uncapacitated facility location problem. Their approximation factors are 1.861 and 1.61, with running times of O ( m log m ) and O ( n 3 ), respectively, where n is the total number of vertices and m is the number of edges in the underlying complete bipartite graph between cities and facilities. The algorithms are used to improve recent results for several variants of the problem.
We study the problem of finding small trees. Classical network design problems are considered with the additional constraint that only a specified number k of nodes are required to be … We study the problem of finding small trees. Classical network design problems are considered with the additional constraint that only a specified number k of nodes are required to be connected in the solution. A prototypical example is the kMST problem in which we require a tree of minimum weight spanning at least k nodes in an edge-weighted graph. We show that the kMST problem is NP-hard even for points in the Euclidean plane. We provide approximation algorithms with performance ratio $2\sqrt{k} $ for the general edge-weighted case and $O(k^{1/4} )$ for the case of points in the plane. Polynomial-time exact solutions are also presented for the class of treewidth-bounded graphs, which includes trees, series-parallel graphs, and bounded bandwidth graphs, and for points on the boundary of a convex region in the Euclidean plane. We also investigate the problem of finding short trees and, more generally, that of finding networks with minimum diameter. A simple technique is used to provide a polynomial-time solution for finding k-trees of minimum diameter. We identify easy and hard problems arising in finding short networks using a framework due to T. C. Hu.
We present a quantum algorithm solving the k-distinctness problem in a less number of queries than the previous algorithm by Ambainis. The construction uses a modified learning graph approach. Compared … We present a quantum algorithm solving the k-distinctness problem in a less number of queries than the previous algorithm by Ambainis. The construction uses a modified learning graph approach. Compared to the recent paper by Belovs and Lee, the algorithm doesn't require any prior information on the input, and the complexity analysis is much simpler.
We present a fast algorithm for the subset convolution problem:given functions f and g defined on the lattice of subsets of ann-element set n, compute their subset convolution f*g, defined … We present a fast algorithm for the subset convolution problem:given functions f and g defined on the lattice of subsets of ann-element set n, compute their subset convolution f*g, defined for SāŠ† N by [ (f * g)(S) = [T āŠ† S] f(T) g(S/T),,]where addition and multiplication is carried out in an arbitrary ring. Via Mƶbius transform and inversion, our algorithm evaluates the subset convolution in O(n2 2n) additions and multiplications, substanti y improving upon the straightforward O(3n) algorithm. Specifically, if the input functions have aninteger range [-M,-M+1,...,M], their subset convolution over the ordinary sum--product ring can be computed in ƕ(2n log M) time; the notation ƕ suppresses polylogarithmic factors.Furthermore, using a standard embedding technique we can compute the subset convolution over the max--sum or min--sum semiring in ƕ(2n M) time.
We show how to compute the edit distance between two strings of length n up to a factor of 2(O-tilde(sqrt(log n))) in n(1+o(1)) time. This is the first sub-polynomial approximation … We show how to compute the edit distance between two strings of length n up to a factor of 2(O-tilde(sqrt(log n))) in n(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n(1/3+o(1)) approximation. Previously, approximation of 2O √log n) was known only for embedding edit distance into l1, and it is not known if that embedding can be computed in less than a quadratic time.
General criteria are given to ensure that in a family of discrete random processes, given parameters exhibit convergence to the solution of a system of differential equations. As one application … General criteria are given to ensure that in a family of discrete random processes, given parameters exhibit convergence to the solution of a system of differential equations. As one application we consider random graph processes in which the maximum degree is bounded and show that the numbers of vertices of given degree exhibit this convergence as the total number of vertices tends to infinity. Two other applications are to random processes which generate independent sets of vertices in random $r$-regular graphs. In these cases, we deduce almost sure lower bounds on the size of independent sets of vertices in random $r$-regular graphs.
We show how to compute the edit distance between two strings of length $n$ up to a factor of $2^{\tilde{O}(\sqrt{\log n})}$ in $n^{1+o(1)}$ time. This is the first subpolynomial approximation … We show how to compute the edit distance between two strings of length $n$ up to a factor of $2^{\tilde{O}(\sqrt{\log n})}$ in $n^{1+o(1)}$ time. This is the first subpolynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art $n^{1/3+o(1)}$ approximation. Previously, approximation of $2^{\tilde{O}(\sqrt{\log n})}$ was known only for embedding edit distance into $\ell_1$, and it is not known if that embedding can be computed in less than quadratic time.
We present a collection of new results on problems related to 3SUM, including: The first truly subquadratic algorithm for computing the (min,+) convolution for monotone increasing sequences with integer values … We present a collection of new results on problems related to 3SUM, including: The first truly subquadratic algorithm for computing the (min,+) convolution for monotone increasing sequences with integer values bounded by O(n), solving 3SUM for monotone sets in 2D with integer coordinates bounded by O(n), and preprocessing a binary string for histogram indexing (also called jumbled indexing).
Consider a function $f$ which is defined on the integers from 1 to $N$ and takes the values $\ensuremath{-}1$ and $+1$. The parity of $f$ is the product over all … Consider a function $f$ which is defined on the integers from 1 to $N$ and takes the values $\ensuremath{-}1$ and $+1$. The parity of $f$ is the product over all $x$ from 1 to $N$ of $f(x)$. With no further information about $f$, to classically determine the parity of $f$ requires $N$ calls of the function $f$. We show that any quantum algorithm capable of determining the parity of $f$ contains at least $N/2$ applications of the unitary operator which evaluates $f$. Thus, for this problem, quantum computers cannot outperform classical computers.
We present a method for reducing the treewidth of a graph while preserving all of its minimal s - t separators up to a certain fixed size k . This … We present a method for reducing the treewidth of a graph while preserving all of its minimal s - t separators up to a certain fixed size k . This technique allows us to solve s - t Cut and Multicut problems with various additional restrictions (e.g., the vertices being removed from the graph form an independent set or induce a connected graph) in linear time for every fixed number k of removed vertices. Our results have applications for problems that are not directly defined by separators, but the known solution methods depend on some variant of separation. For example, we can solve similarly restricted generalizations of Bipartization (delete at most k vertices from G to make it bipartite) in almost linear time for every fixed number k of removed vertices. These results answer a number of open questions in the area of parameterized complexity. Furthermore, our technique turns out to be relevant for ( H , C , K )- and ( H , C ,≤K)-coloring problems as well, which are cardinality constrained variants of the classical H -coloring problem. We make progress in the classification of the parameterized complexity of these problems by identifying new cases that can be solved in almost linear time for every fixed cardinality bound.
The problem of finding locally dense components of a graph is an important primitive in data analysis, with wide-ranging applications from community mining to spam detection and the discovery of … The problem of finding locally dense components of a graph is an important primitive in data analysis, with wide-ranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in the streaming model. For any ε &gt; 0, our algorithms make O (log 1+ε n ) passes over the input and find a subgraph whose density is guaranteed to be within a factor 2(1 + ε) of the optimum. Our algorithms are also easily parallelizable and we illustrate this by realizing them in the MapReduce model. In addition we perform extensive experimental evaluation on massive real-world graphs showing the performance and scalability of our algorithms in practice.
We present a novel approximation algorithm for k-median that achieves an approximation guarantee of 1+√3+ε, improving upon the decade-old ratio of 3+ε. Our approach is based on two components, each … We present a novel approximation algorithm for k-median that achieves an approximation guarantee of 1+√3+ε, improving upon the decade-old ratio of 3+ε. Our approach is based on two components, each of which, we believe, is of independent interest. First, we show that in order to give an α-approximation algorithm for k-median, it is sufficient to give a pseudo-approximation algorithm that finds an α-approximate solution by opening k+O(1) facilities. This is a rather surprising result as there exist instances for which opening k+1 facilities may lead to a significant smaller cost than if only k facilities were opened.
Abstract We consider the random 2‐satisfiability (2‐SAT) problem, in which each instance is a formula that is the conjunction of m clauses of the form x ∨ y , chosen … Abstract We consider the random 2‐satisfiability (2‐SAT) problem, in which each instance is a formula that is the conjunction of m clauses of the form x ∨ y , chosen uniformly at random from among all 2‐clauses on n Boolean variables and their negations. As m and n tend to infinity in the ratio m / n →α, the problem is known to have a phase transition at α c =1, below which the probability that the formula is satisfiable tends to one and above which it tends to zero. We determine the finite‐size scaling about this transition, namely the scaling of the maximal window W ( n , Ī“)=(α āˆ’ ( n ,Ī“), α + ( n ,Ī“)) such that the probability of satisfiability is greater than 1āˆ’Ī“ for α&lt;α āˆ’ and is less than Ī“ for α&gt;α + . We show that W ( n ,Ī“)=(1āˆ’Ī˜( n āˆ’1/3 ), 1+Θ( n āˆ’1/3 )), where the constants implicit in Θ depend on Ī“. We also determine the rates at which the probability of satisfiability approaches one and zero at the boundaries of the window. Namely, for m =(1+ε) n , where ε may depend on n as long as |ε| is sufficiently small and |ε| n 1/3 is sufficiently large, we show that the probability of satisfiability decays like exp(āˆ’Ī˜( n ε 3 )) above the window, and goes to one like 1āˆ’Ī˜( n āˆ’1 |ε| āˆ’3 below the window. We prove these results by defining an order parameter for the transition and establishing its scaling behavior in n both inside and outside the window. Using this order parameter, we prove that the 2‐SAT phase transition is continuous with an order parameter critical exponent of 1. We also determine the values of two other critical exponents, showing that the exponents of 2‐SAT are identical to those of the random graph. Ā© 2001 John Wiley &amp; Sons, Inc. Random Struct. Alg., 18: 201–256 2001
We introduce a concept of parameterizing a problem above the optimum solution of its natural linear programming relaxation and prove that the node multiway cut problem is fixed-parameter tractable (FPT) … We introduce a concept of parameterizing a problem above the optimum solution of its natural linear programming relaxation and prove that the node multiway cut problem is fixed-parameter tractable (FPT) in this setting. As a consequence we prove that node multiway cut is FPT, when parameterized above the maximum separating cut, resolving an open problem of Razgon. Our results imply O * (4 k ) algorithms for vertex cover above maximum matching and almost 2-SAT as well as an O * (2 k ) algorithm for node multiway cut with a standard parameterization by the solution size, improving previous bounds for these problems.
In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, … In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation problems. This study is motivated by a goal of ultimately putting the MapReduce framework on an equal theoretical footing with the well-known PRAM and BSP parallel models, which would benefit both the theory and practice of MapReduce algorithms. We describe efficient MapReduce algorithms for sorting, multi-searching, and simulations of parallel algorithms specified in the BSP and CRCW PRAM models. We also provide some applications of these results to problems in parallel computational geometry for the MapReduce framework, which result in efficient MapReduce algorithms for sorting, 2- and 3-dimensional convex hulls, and fixed-dimensional linear programming. For the case when mappers and reducers have a memory/message-I/O size of $M=\Theta(N^\epsilon)$, for a small constant $\epsilon>0$, all of our MapReduce algorithms for these applications run in a constant number of rounds.