Recent years have seen great progress in the approximability of fundamental clustering and facility location problems on high-dimensional Euclidean spaces, including $k$-Means and $k$-Median. While they admit strictly better approximation …
Recent years have seen great progress in the approximability of fundamental clustering and facility location problems on high-dimensional Euclidean spaces, including $k$-Means and $k$-Median. While they admit strictly better approximation ratios than their general metric versions, their approximation ratios are still higher than the hardness ratios for general metrics, leaving the possibility that the ultimate optimal approximation ratios will be the same between Euclidean and general metrics. Moreover, such an improved algorithm for Euclidean spaces is not known for Uncapaciated Facility Location (UFL), another fundamental problem in the area. In this paper, we prove that for any $\gamma \geq 1.6774$ there exists $\varepsilon > 0$ such that Euclidean UFL admits a $(\gamma, 1 + 2e^{-\gamma} - \varepsilon)$-bifactor approximation algorithm, improving the result of Byrka and Aardal. Together with the $(\gamma, 1 + 2e^{-\gamma})$ NP-hardness in general metrics, it shows the first separation between general and Euclidean metrics for the aforementioned basic problems. We also present an $(\alpha_{Li} - \varepsilon)$-(unifactor) approximation algorithm for UFL for some $\varepsilon > 0$ in Euclidean spaces, where $\alpha_{Li} \approx 1.488$ is the best-known approximation ratio for UFL by Li.
The $\ell_2^2$ min-sum $k$-clustering problem is to partition an input set into clusters $C_1,\ldots,C_k$ to minimize $\sum_{i=1}^k\sum_{p,q\in C_i}\|p-q\|_2^2$. Although $\ell_2^2$ min-sum $k$-clustering is NP-hard, it is not known whether it …
The $\ell_2^2$ min-sum $k$-clustering problem is to partition an input set into clusters $C_1,\ldots,C_k$ to minimize $\sum_{i=1}^k\sum_{p,q\in C_i}\|p-q\|_2^2$. Although $\ell_2^2$ min-sum $k$-clustering is NP-hard, it is not known whether it is NP-hard to approximate $\ell_2^2$ min-sum $k$-clustering beyond a certain factor. In this paper, we give the first hardness-of-approximation result for the $\ell_2^2$ min-sum $k$-clustering problem. We show that it is NP-hard to approximate the objective to a factor better than $1.056$ and moreover, assuming a balanced variant of the Johnson Coverage Hypothesis, it is NP-hard to approximate the objective to a factor better than 1.327. We then complement our hardness result by giving the first $(1+\varepsilon)$-coreset construction for $\ell_2^2$ min-sum $k$-clustering. Our coreset uses $\mathcal{O}\left(k^{\varepsilon^{-4}}\right)$ space and can be leveraged to achieve a polynomial-time approximation scheme with runtime $nd\cdot f(k,\varepsilon^{-1})$, where $d$ is the underlying dimension of the input dataset and $f$ is a fixed function. Finally, we consider a learning-augmented setting, where the algorithm has access to an oracle that outputs a label $i\in[k]$ for input point, thereby implicitly partitioning the input dataset into $k$ clusters that induce an approximately optimal solution, up to some amount of adversarial error $\alpha\in\left[0,\frac{1}{2}\right)$. We give a polynomial-time algorithm that outputs a $\frac{1+\gamma\alpha}{(1-\alpha)^2}$-approximation to $\ell_2^2$ min-sum $k$-clustering, for a fixed constant $\gamma>0$.
Given a fixed arity $k \geq 2$, Min-$k$-CSP on complete instances involves a set of $n$ variables $V$ and one nontrivial constraint for every $k$-subset of variables (so there are …
Given a fixed arity $k \geq 2$, Min-$k$-CSP on complete instances involves a set of $n$ variables $V$ and one nontrivial constraint for every $k$-subset of variables (so there are $\binom{n}{k}$ constraints). The goal is to find an assignment that minimizes unsatisfied constraints. Unlike Max-$k$-CSP that admits a PTAS on dense or expanding instances, the approximability of Min-$k$-CSP is less understood. For some CSPs like Min-$k$-SAT, there's an approximation-preserving reduction from general to dense instances, making complete instances unique for potential new techniques. This paper initiates a study of Min-$k$-CSPs on complete instances. We present an $O(1)$-approximation algorithm for Min-2-SAT on complete instances, the minimization version of Max-2-SAT. Since $O(1)$-approximation on dense or expanding instances refutes the Unique Games Conjecture, it shows a strict separation between complete and dense/expanding instances. Then we study the decision versions of CSPs, aiming to satisfy all constraints; which is necessary for any nontrivial approximation. Our second main result is a quasi-polynomial time algorithm for every Boolean $k$-CSP on complete instances, including $k$-SAT. We provide additional algorithmic and hardness results for CSPs with larger alphabets, characterizing (arity, alphabet size) pairs that admit a quasi-polynomial time algorithm on complete instances.
We establish strong inapproximability for finding the sparsest nonzero vector in a real subspace. We show that it is NP-Hard (under randomized reductions) to approximate the sparsest vector in a …
We establish strong inapproximability for finding the sparsest nonzero vector in a real subspace. We show that it is NP-Hard (under randomized reductions) to approximate the sparsest vector in a subspace within any constant factor (or almost polynomial factors in quasipolynomial time). We recover as a corollary state of the art inapproximability for the shortest vector problem (SVP), a foundational problem in lattice based cryptography. Our proof is surprisingly simple, bypassing even the PCP theorem. We are inspired by the homogenization framework from the inapproximability theory of minimum distance problems (MDC) in integer lattices and error correcting codes. We use a combination of (a) \emph{product testing via tensor codes} and (b) \emph{encoding an assignment as a coset of a random code in higher dimensional space} in order to embed non-homogeneous quadratic equations into the sparsest vector problem. (a) is inspired by Austrin and Khot's simplified proof of hardness of MDC over finite fields, and (b) is inspired by Micciancio's semi-derandomization of hardness of SVP. Our reduction involves the challenge of performing (a) over the reals. We prove that tensoring of the kernel of a +1/-1 random matrix furnishes an adequate product test (while still allowing (b)). The proof exposes a connection to Littlewood-Offord theory and relies on a powerful anticoncentration result of Rudelson and Vershynin. Our main motivation in this work is the development of inapproximability theory for problems over the reals. Analytic variants of sparsest vector have connections to small set expansion, quantum separability and polynomial maximization over convex sets, all of which cause similar barriers to inapproximability. The approach we develop could lead to progress on the hardness of some of these problems.
For any $\varepsilon > 0$, we prove that $k$-Dimensional Matching is hard to approximate within a factor of $k/(12 + \varepsilon)$ for large $k$ unless $\textsf{NP} \subseteq \textsf{BPP}$. Listed in …
For any $\varepsilon > 0$, we prove that $k$-Dimensional Matching is hard to approximate within a factor of $k/(12 + \varepsilon)$ for large $k$ unless $\textsf{NP} \subseteq \textsf{BPP}$. Listed in Karp's 21 $\textsf{NP}$-complete problems, $k$-Dimensional Matching is a benchmark computational complexity problem which we find as a special case of many constrained optimization problems over independence systems including: $k$-Set Packing, $k$-Matroid Intersection, and Matroid $k$-Parity. For all the aforementioned problems, the best known lower bound was a $\Omega(k /\log(k))$-hardness by Hazan, Safra, and Schwartz. In contrast, state-of-the-art algorithms achieved an approximation of $O(k)$. Our result narrows down this gap to a constant and thus provides a rationale for the observed algorithmic difficulties. The crux of our result hinges on a novel approximation preserving gadget from $R$-degree bounded $k$-CSPs over alphabet size $R$ to $kR$-Dimensional Matching. Along the way, we prove that $R$-degree bounded $k$-CSPs over alphabet size $R$ are hard to approximate within a factor $\Omega_k(R)$ using known randomised sparsification methods for CSPs.
Recovering the underlying clustering of a set $U$ of $n$ points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query $S \subset U$, …
Recovering the underlying clustering of a set $U$ of $n$ points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query $S \subset U$, $|S|=2$, the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be $\Theta(nk)$, where $k$ is the number of clusters. However, non-adaptive schemes require $\Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a generalization of this problem to subset queries for $|S|>2$, where the oracle returns the number of clusters intersecting $S$. Allowing for subset queries of unbounded size, $O(n)$ queries is possible with an adaptive scheme (Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with subset queries. Our main result is a non-adaptive algorithm making $O(n \log k \cdot (\log k + \log\log n)^2)$ queries, which improves to $O(n \log \log n)$ when $k$ is a constant. We also consider algorithms with a restricted query size of at most $s$. In this setting we prove that $\Omega(\max(n^2/s^2,n))$ queries are necessary and obtain algorithms making $\tilde{O}(n^2k/s^2)$ queries for any $s \leq \sqrt{n}$ and $\tilde{O}(n^2/s)$ queries for any $s \leq n$. We also consider the natural special case when the clusters are balanced, obtaining non-adaptive algorithms which make $O(n \log k) + \tilde{O}(k)$ and $O(n\log^2 k)$ queries. Finally, allowing two rounds of adaptivity, we give an algorithm making $O(n \log k)$ queries in the general case and $O(n \log \log k)$ queries when the clusters are balanced.
Unbreakable decomposition, introduced by Cygan et al. (SICOMP'19) and Cygan et al. (TALG'20), has proven to be one of the most powerful tools for parameterized graph cut problems in recent …
Unbreakable decomposition, introduced by Cygan et al. (SICOMP'19) and Cygan et al. (TALG'20), has proven to be one of the most powerful tools for parameterized graph cut problems in recent years. Unfortunately, all known constructions require at least $\Omega_k\left(mn^2\right)$ time, given an undirected graph with $n$ vertices, $m$ edges, and cut-size parameter $k$. In this work, we show the first close-to-linear time parameterized algorithm that computes an unbreakable decomposition. More precisely, for any $0<\epsilon\leq 1$, our algorithm runs in time $2^{O(\frac{k}{\epsilon} \log \frac{k}{\epsilon})}m^{1 + \epsilon}$ and computes a $(O(k/\epsilon), k)$ unbreakable tree decomposition of $G$, where each bag has adhesion at most $O(k/\epsilon)$. This immediately opens up possibilities for obtaining close-to-linear time algorithms for numerous problems whose only known solution is based on unbreakable decomposition.
Parameterized Inapproximability Hypothesis (PIH) is a central question in the field of parameterized complexity. PIH asserts that given as input a 2-CSP on $k$ variables and alphabet size $n$, it …
Parameterized Inapproximability Hypothesis (PIH) is a central question in the field of parameterized complexity. PIH asserts that given as input a 2-CSP on $k$ variables and alphabet size $n$, it is W[1]-hard parameterized by $k$ to distinguish if the input is perfectly satisfiable or if every assignment to the input violates 1% of the constraints. An important implication of PIH is that it yields the tight parameterized inapproximability of the $k$-maxcoverage problem. In the $k$-maxcoverage problem, we are given as input a set system, a threshold $\tau>0$, and a parameter $k$ and the goal is to determine if there exist $k$ sets in the input whose union is at least $\tau$ fraction of the entire universe. PIH is known to imply that it is W[1]-hard parameterized by $k$ to distinguish if there are $k$ input sets whose union is at least $\tau$ fraction of the universe or if the union of every $k$ input sets is not much larger than $\tau\cdot (1-\frac{1}{e})$ fraction of the universe. In this work we present a gap preserving FPT reduction (in the reverse direction) from the $k$-maxcoverage problem to the aforementioned 2-CSP problem, thus showing that the assertion that approximating the $k$-maxcoverage problem to some constant factor is W[1]-hard implies PIH. In addition, we present a gap preserving FPT reduction from the $k$-median problem (in general metrics) to the $k$-maxcoverage problem, further highlighting the power of gap preserving FPT reductions over classical gap preserving polynomial time reductions.
In the classic Correlation Clustering problem introduced by Bansal, Blum, and Chawla~(FOCS 2002), the input is a complete graph where edges are labeled either $+$ or $-$, and the goal …
In the classic Correlation Clustering problem introduced by Bansal, Blum, and Chawla~(FOCS 2002), the input is a complete graph where edges are labeled either $+$ or $-$, and the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the -edges within parts. In recent years, Chawla, Makarychev, Schramm and Yaroslavtsev~(STOC 2015) gave a 2.06-approximation by providing a near-optimal rounding of the standard LP, and Cohen-Addad, Lee, Li, and Newman~(FOCS 2022, 2023) finally bypassed the integrality gap of 2 for this LP giving a $1.73$-approximation for the problem. In order to create a simple and unified framework for Correlation Clustering similar to those for {\em typical} approximate optimization tasks, we propose the {\em cluster LP} as a strong linear program that might tightly capture the approximability of Correlation Clustering. It unifies all the previous relaxations for the problem. We demonstrate the power of the cluster LP by presenting a simple rounding algorithm, and providing two analyses, one analytically proving a 1.49-approximation and the other solving a factor-revealing SDP to show a 1.437-approximation. Both proofs introduce principled methods by which to analyze the performance of the algorithm, resulting in a significantly improved approximation guarantee. Finally, we prove an integrality gap of $4/3$ for the cluster LP, showing our 1.437-upper bound cannot be drastically improved. Our gap instance directly inspires an improved NP-hardness of approximation with a ratio $24/23 \approx 1.042$; no explicit hardness ratio was known before.
We study polynomial-time approximation algorithms for edge and vertex Sparsest Cut and Small Set Expansion in terms of k, the number of edges or vertices cut in the optimal solution. …
We study polynomial-time approximation algorithms for edge and vertex Sparsest Cut and Small Set Expansion in terms of k, the number of edges or vertices cut in the optimal solution. Our main results are O(polylog k)-approximation algorithms for various versions in this setting. Our techniques involve an extension of the notion of sample sets (Feige and Mahdian STOC'06), originally developed for small balanced cuts, to sparse cuts in general. We then show how to combine this notion of sample sets with two algorithms, one based on an existing framework of LP rounding and another new algorithm based on the cut-matching game, to get such approximation algorithms. Our cut-matching game algorithm can be viewed as a local version of the cut-matching game by Khandekar, Khot, Orecchia and Vishnoi and certifies an expansion of every vertex set of size s in O(logs) rounds. These techniques may be of independent interest. As corollaries of our results, we also obtain an O(logopt) approximation for min-max graph partitioning, where opt is the min-max value of the optimal cut, and improve the bound on the size of multicut mimicking networks computable in polynomial time.
We study the approximability of the MaxCut problem in the presence of predictions. Specifically, we consider two models: in the noisy predictions model, for each vertex we are given its …
We study the approximability of the MaxCut problem in the presence of predictions. Specifically, we consider two models: in the noisy predictions model, for each vertex we are given its correct label in $\{-1,+1\}$ with some unknown probability $1/2 + \epsilon$, and the other (incorrect) label otherwise. In the more-informative partial predictions model, for each vertex we are given its correct label with probability $\epsilon$ and no label otherwise. We assume only pairwise independence between vertices in both models. We show how these predictions can be used to improve on the worst-case approximation ratios for this problem. Specifically, we give an algorithm that achieves an $\alpha + \widetilde{\Omega}(\epsilon^4)$-approximation for the noisy predictions model, where $\alpha \approx 0.878$ is the MaxCut threshold. While this result also holds for the partial predictions model, we can also give a $\beta + \Omega(\epsilon)$-approximation, where $\beta \approx 0.858$ is the approximation ratio for MaxBisection given by Raghavendra and Tan. This answers a question posed by Ola Svensson in his plenary session talk at SODA'23.
We consider the ℓ0-Low Rank Approximation problem, where the input consists of a matrix A ∈ ℝnR×nc and an integer k, and the goal is to find a matrix B …
We consider the ℓ0-Low Rank Approximation problem, where the input consists of a matrix A ∈ ℝnR×nc and an integer k, and the goal is to find a matrix B of rank at most k that minimizes ‖A — B‖0, which is the number of entries where A and B differ. For any constant k and ɛ > 0, we present a polynomial time (1 + ɛ)- approximation time for this problem, which significantly improves the previous best poly(k)-approximation.
A $\mu$-constrained Boolean MAX-CSP $(\psi)$ instance is a Boolean Max-CSP instance on predicate $\psi:\{0,1\}^{r} \rightarrow\{0,1\}$ where the objective is to find a labeling of relative weight exactly $\mu$ that maximizes …
A $\mu$-constrained Boolean MAX-CSP $(\psi)$ instance is a Boolean Max-CSP instance on predicate $\psi:\{0,1\}^{r} \rightarrow\{0,1\}$ where the objective is to find a labeling of relative weight exactly $\mu$ that maximizes the fraction of satisfied constraints. In this work, we study the approximability of constrained Boolean Max-CSPs via SDP hierarchies by relating the integrality gap of $\operatorname{Max}-\operatorname{CSP}(\psi)$ to its $\mu$-dependent approximation curve. Formally, assuming the Small-Set Expansion Hypothesis, we show that it is NP-hard to approximate $\mu$-constrained instances of $\operatorname{MAX}-\operatorname{CSP}(\psi)$ up to factor $\operatorname{Gap}_{\ell, \mu}(\psi) / \log (1 / \mu)^{2}$ (ignoring factors depending on r) for any $\ell \geq \ell(\mu, r)$. Here, $\operatorname{Gap}_{\ell, \mu}(\psi)$ is the optimal integrality gap of $\ell$-round Lasserre relaxation for $\mu$-constrained $\operatorname{MAX}-\operatorname{CSP}(\psi)$ instances. Our results are derived by combining the framework of Raghavendra [STOC 2008] along with more recent advances in rounding Lasserre relaxations and reductions from the Small-Set Expansion (SSE) problem. A crucial component of our reduction is a novel way of composing generic bias-dependent dictatorship tests with SSE, which could be of independent interest.
We consider the classic correlation clustering problem: Given a complete graph where edges are labelled either + or −, the goal is to find a partition of the vertices that …
We consider the classic correlation clustering problem: Given a complete graph where edges are labelled either + or −, the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the −edges within parts. Recently, Cohen-Addad, Lee and Newman [CLN22] gave a 1.995-approximation for the problem using the Sherali-Adams hierarchy, hence beating the integrality gap of 2 of the classic linear program. We significantly improve upon this result by providing a 1.73-approximation for the problem. Our approach brings together a new preprocessing of correlation clustering instances that enables a new LP formulation which combined with the algorithm from [CLN22] yields the improved bound.
.We study the problem of computing the \(p\rightarrow q\) norm of a matrix \(A \in{\mathbb{R}}^{m \times n}\) , defined as \( \|A\|_{p\rightarrow q} \:= \max _{x \in{\mathbb{R}}^n \setminus \{0\}} \frac{\|Ax\|_{q}}{\|x\|_{p}}\) …
.We study the problem of computing the \(p\rightarrow q\) norm of a matrix \(A \in{\mathbb{R}}^{m \times n}\) , defined as \( \|A\|_{p\rightarrow q} \:= \max _{x \in{\mathbb{R}}^n \setminus \{0\}} \frac{\|Ax\|_{q}}{\|x\|_{p}}\) . This problem generalizes the spectral norm of a matrix ( \(p=q=2\) ) and the Grothendieck problem ( \(p=\infty\) , \(q=1\) ) and has been widely studied in various regimes. When \(p \geq q\) , the problem exhibits a dichotomy: constant factor approximation algorithms are known if \(2 \in{[q,p]}\) , and the problem is hard to approximate within almost polynomial factors when \(2 \notin{[q,p]}\) . The regime when \(p \lt q\) , known as hypercontractive norms, is particularly significant for various applications but much less well understood. The case with \(p=2\) and \(q \gt 2\) was studied by Barak et al. [Proceedings of the 44th Annual ACM Symposium on Theory of Computing, 2012, pp. 307–326], who gave subexponential algorithms for a promise version of the problem (which captures small-set expansion) and also proved hardness of approximation results based on the exponential time hypothesis. However, no NP-hardness of approximation is known for these problems for any \(p \lt q\) . We prove the first NP-hardness result (under randomized reductions) for approximating hypercontractive norms. We show that for any \(1\lt p \lt q \lt \infty\) with \(2 \notin{[p,q]}\) , \(\|A\|_{p\rightarrow q}\) is hard to approximate within \(2^{O((\log n)^{1-\epsilon })}\) assuming \(\textrm{NP} \not \subseteq \textrm{BPTIME}(2^{(\log n)^{O(1)}})\) . En route to the above result, we also prove almost tight results for the case when \(p \geq q\) with \(2 \in{[q,p]}\) .Keywordsoperator normscontinuous optimizationinapproximabilityMSC codes689046
Previous chapter Next chapter Full AccessProceedings Symposium on Simplicity in Algorithms (SOSA)A Local Search-Based Approach for Set CoveringAnupam Gupta, Euiwoong Lee, and Jason LiAnupam Gupta, Euiwoong Lee, and Jason Lipp.1 …
Previous chapter Next chapter Full AccessProceedings Symposium on Simplicity in Algorithms (SOSA)A Local Search-Based Approach for Set CoveringAnupam Gupta, Euiwoong Lee, and Jason LiAnupam Gupta, Euiwoong Lee, and Jason Lipp.1 - 11Chapter DOI:https://doi.org/10.1137/1.9781611977585.ch1PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract In the Set Cover problem, we are given a set system with each set having a weight, and we want to find a collection of sets that cover the universe, whilst having low total weight. There are several approaches known (based on greedy approaches, relax-and-round, and dual-fitting) that achieve a Hk ≈ ln k+O(1) approximation for this problem, where the size of each set is bounded by k. Moreover, getting a ln k — O(lnln k) approximation is hard. Where does the truth lie? Can we close the gap between the upper and lower bounds? An improvement would be particularly interesting for small values of k, which are often used in reductions between Set Cover and other combinatorial optimization problems. We consider a non-oblivious local-search approach: to the best of our knowledge this gives the first Hk-approximation for Set Cover using an approach based on local-search. Our proof fits in one page, and gives a integrality gap result as well. Refining our approach by considering larger moves and an optimized potential function gives an (Hk — Ω(log2 k)/k)-approximation, improving on the previous bound of (Hk — Ω(1/k8)) (R. Hassin and A. Levin, SICOMP '05) based on a modified greedy algorithm. Previous chapter Next chapter RelatedDetails Published:2023eISBN:978-1-61197-758-5 https://doi.org/10.1137/1.9781611977585Book Series Name:ProceedingsBook Code:SOSA23Book Pages:vi + 389
The Uncapacitated Facility Location (UFL) problem is one of the most fundamental clustering problems: Given a set of clients C and a set of facilities F in a metric space …
The Uncapacitated Facility Location (UFL) problem is one of the most fundamental clustering problems: Given a set of clients C and a set of facilities F in a metric space (C ∪ F, dist) with facility costs open : F→ ℝ+, the goal is to find a set of facilities S ⊆ F to minimize the sum of the opening cost open(S) and the connection cost d(S) := Σp∈C minc∈S dist(p,c). An algorithm for UFL is called a Lagrangian Multiplier Preserving (LMP) α approximation if it outputs a solution S ⊆ F satisfying open(S) + d(S) ≤ open(S*) + αd(S*) for any S* ⊆ F. The best-known LMP approximation ratio for UFL is at most 2 by the JMS algorithm of Jain, Mahdian, and Saberi [STOC'02, J.ACM'03] based on the Dual-Fitting technique. The lack of progress on improving the upper bound on αLMP in the last two decades raised the natural question whether αLMP = 2.We answer this question negatively by presenting a (slightly) improved LMP approximation algorithm for UFL. This is achieved by combining the Dual-Fitting technique with Local Search, another popular technique to address clustering problems. In more detail, we use the LMP solution S produced by JMS to seed a local search algorithm. We show that local search substantially improves S unless a big fraction of the connection cost of S is associated with facilities of relatively small opening costs. In the latter case however the analysis of Jain, Mahdian, and Saberi can be improved (i.e., S is cheaper than expected). To summarize: Either S is close enough to the optimum, or it must belong to the local neighborhood of a good enough local optimum. From a conceptual viewpoint, our result gives a theoretical evidence that local search can be enhanced so as to avoid bad local optima by choosing the initial feasible solution with LP-based techniques.Our result directly implies a (slightly) improved approximation for the related k-Median problem, another fundamental clustering problem: Given (C ∪ F, dist) as in a UFL instance and an integer k ∈ ℕ, find S ⊆ F with |S| = k that minimizes d(S). The current best approximation algorithms for k-Median are based on the following framework: use an LMP α approximation algorithm for UFL to build an α approximate bipoint solution for k-Median, and then round it with a ρBR approximate bipoint rounding algorithm. This implies an α · ρBR approximation. The current-best value of ρBR is 1.338 by Byrka, Pensyl, Rybicki, Srinivasan, and Trinh [SODA'15, TALG'17], which yields 2.6742-approximation. Combining their algorithm with our refined LMP algorithm for UFL (replacing JMS) gives a 2.67059-approximation.* The full version of the paper can be accessed at https://arxiv.org/abs/2207.05150
A $\mu$-constrained Boolean Max-CSP$(\psi)$ instance is a Boolean Max-CSP instance on predicate $\psi:\{0,1\}^r \to \{0,1\}$ where the objective is to find a labeling of relative weight exactly $\mu$ that maximizes …
A $\mu$-constrained Boolean Max-CSP$(\psi)$ instance is a Boolean Max-CSP instance on predicate $\psi:\{0,1\}^r \to \{0,1\}$ where the objective is to find a labeling of relative weight exactly $\mu$ that maximizes the fraction of satisfied constraints. In this work, we study the approximability of constrained Boolean Max-CSPs via SDP hierarchies by relating the integrality gap of Max-CSP $(\psi)$ to its $\mu$-dependent approximation curve. Formally, assuming the Small-Set Expansion Hypothesis, we show that it is NP-hard to approximate $\mu$-constrained instances of Max-CSP($\psi$) up to factor ${\sf Gap}_{\ell,\mu}(\psi)/\log(1/\mu)^2$ (ignoring factors depending on $r$) for any $\ell \geq \ell(\mu,r)$. Here, ${\sf Gap}_{\ell,\mu}(\psi)$ is the optimal integrality gap of $\ell$-round Lasserre relaxation for $\mu$-constrained Max-CSP($\psi$) instances. Our results are derived by combining the framework of Raghavendra [STOC 2008] along with more recent advances in rounding Lasserre relaxations and reductions from the Small-Set Expansion (SSE) problem. A crucial component of our reduction is a novel way of composing generic bias-dependent dictatorship tests with SSE, which could be of independent interest.
We consider the question of approximating Max 2-CSP where each variable appears in at most $d$ constraints (but with possibly arbitrarily large alphabet). There is a simple $(\frac{d+1}{2})$-approximation algorithm for …
We consider the question of approximating Max 2-CSP where each variable appears in at most $d$ constraints (but with possibly arbitrarily large alphabet). There is a simple $(\frac{d+1}{2})$-approximation algorithm for the problem. We prove the following results for any sufficiently large $d$: - Assuming the Unique Games Conjecture (UGC), it is NP-hard (under randomized reduction) to approximate this problem to within a factor of $\left(\frac{d}{2} - o(d)\right)$. - It is NP-hard (under randomized reduction) to approximate the problem to within a factor of $\left(\frac{d}{3} - o(d)\right)$. Thanks to a known connection [Dvorak et al., Algorithmica 2023], we establish the following hardness results for approximating Maximum Independent Set on $k$-claw-free graphs: - Assuming the Unique Games Conjecture (UGC), it is NP-hard (under randomized reduction) to approximate this problem to within a factor of $\left(\frac{k}{4} - o(k)\right)$. - It is NP-hard (under randomized reduction) to approximate the problem to within a factor of $\left(\frac{k}{3 + 2\sqrt{2}} - o(k)\right) \geq \left(\frac{k}{5.829} - o(k)\right)$. In comparison, known approximation algorithms achieve $\left(\frac{k}{2} - o(k)\right)$-approximation in polynomial time [Neuwohner, STACS 2021; Thiery and Ward, SODA 2023] and $(\frac{k}{3} + o(k))$-approximation in quasi-polynomial time [Cygan et al., SODA 2013].
We consider the classic Correlation Clustering problem: Given a complete graph where edges are labelled either $+$ or $-$, the goal is to find a partition of the vertices that …
We consider the classic Correlation Clustering problem: Given a complete graph where edges are labelled either $+$ or $-$, the goal is to find a partition of the vertices that minimizes the number of the \pedges across parts plus the number of the \medges within parts. Recently, Cohen-Addad, Lee and Newman [CLN22] presented a 1.994-approximation algorithm for the problem using the Sherali-Adams hierarchy, hence breaking through the integrality gap of 2 for the classic linear program and improving upon the 2.06-approximation of Chawla, Makarychev, Schramm and Yaroslavtsev [CMSY15]. We significantly improve the state-of-the-art by providing a 1.73-approximation for the problem. Our approach introduces a preclustering of Correlation Clustering instances that allows us to essentially ignore the error arising from the {\em correlated rounding} used by [CLN22]. This additional power simplifies the previous algorithm and analysis. More importantly, it enables a new {\em set-based rounding} that complements the previous roundings. A combination of these two rounding algorithms yields the improved bound.
We consider the Low Rank Approximation problem, where the input consists of a matrix $A \in \mathbb{R}^{n_R \times n_C}$ and an integer $k$, and the goal is to find a …
We consider the Low Rank Approximation problem, where the input consists of a matrix $A \in \mathbb{R}^{n_R \times n_C}$ and an integer $k$, and the goal is to find a matrix $B$ of rank at most $k$ that minimizes $\| A - B \|_0$, which is the number of entries where $A$ and $B$ differ. For any constant $k$ and $\varepsilon > 0$, we present a polynomial time $(1 + \varepsilon)$-approximation time for this problem, which significantly improves the previous best $poly(k)$-approximation. Our algorithm is obtained by viewing the problem as a Constraint Satisfaction Problem (CSP) where each row and column becomes a variable that can have a value from $\mathbb{R}^k$. In this view, we have a constraint between each row and column, which results in a {\em dense} CSP, a well-studied topic in approximation algorithms. While most of previous algorithms focus on finite-size (or constant-size) domains and involve an exhaustive enumeration over the entire domain, we present a new framework that bypasses such an enumeration in $\mathbb{R}^k$. We also use tools from the rich literature of Low Rank Approximation in different objectives (e.g., $\ell_p$ with $p \in (0, \infty)$) or domains (e.g., finite fields/generalized Boolean). We believe that our techniques might be useful to study other real-valued CSPs and matrix optimization problems. On the hardness side, when $k$ is part of the input, we prove that Low Rank Approximation is NP-hard to approximate within a factor of $\Omega(\log n)$. This is the first superconstant NP-hardness of approximation for any $p \in [0, \infty]$ that does not rely on stronger conjectures (e.g., the Small Set Expansion Hypothesis).
Given a complete graph G=(V, E) where each edge is labeled + or −, the CORRELATION CLUSTERING problem asks to partition V into clusters to minimize the number of +edges …
Given a complete graph G=(V, E) where each edge is labeled + or −, the CORRELATION CLUSTERING problem asks to partition V into clusters to minimize the number of +edges between different clusters plus the number of −edges within the same cluster. CORRELATION CLUSTERING has been used to model a large number of clustering problems in practice, making it one of the most widely studied clustering formulations. The approximability of CORRELATION CLUSTERING has been actively investigated [BBC04], [CGW05], [ACN08], culminating in a 2.06-approximation algorithm [CMSY15], based on rounding the standard LP relaxation. Since the integrality gap for this formulation is 2, it has remained a major open question to determine if the approximation factor of 2 can be reached, or even breached. In this paper, we answer this question affirmatively by showing that there exists a($1.994+\varepsilon$)-approximation algorithm based on $O(1/\varepsilon^{2})$ rounds of the Sherali-Adams hierarchy. In order to round a solution to the Sherali-Adams relaxation, we adapt the correlated rounding originally developed for CSPs [BRSII], [GSII], [RT12]. With this tool, we reach an approximation ratio of $2+\varepsilon$ for CORRELATION CLUSTERING. To breach this ratio, we go beyond the traditional triangle-based analysis by employing a global charging scheme that amortizes the total cost of the rounding across different triangles.
Given $x\in(\mathbb{R}_{\geqslant 0})(_{2}^{[n]})$ recording pairwise distances, the Metric Violation Distance problem asks to compute the $\ell_{0}$ distance between x and the metric cone; i.e., modify the minimum number of entries …
Given $x\in(\mathbb{R}_{\geqslant 0})(_{2}^{[n]})$ recording pairwise distances, the Metric Violation Distance problem asks to compute the $\ell_{0}$ distance between x and the metric cone; i.e., modify the minimum number of entries of x to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied recently. We present an $O(\log n)$-approximation algorithm for METRIC VIOLATION Distance, exponentially improving the previous best approximation ratio of $O(OPT^{1/3})$ of Fan, Raichel, and Van Buskirk [SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity and running time. We also study the related problem of Ultrametric Violation Distance, where the goal is to compute the $\ell_{0}$ distance to the cone of ultrametrics, and achieve a constant factor approximation algorithm. The ULTRAMETRIC VIOLATION DISTANCE problem can be regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar [SIAM J. Computing, 2011] and by Cohen-Addad, Das, Kipouridis, Parotsidis, and Thorup [FOCS, 2021] from $\ell_{1}$ norm to $\ell_{0}$ norm. We show that this problem can be favorably interpreted as an instance of CORRELATION CLUSTERING with an additional hierarchical structure, which we solve using a new $O(1)$-approximation algorithm for correlation clustering that has the structural property that it outputs a refinement of the optimum clusters. An algorithm satisfying such a property can be considered of independent interest. We also provide an $O(\log n\log\log n)$ approximation algorithm for weighted instances. Finally, we investigate the complementary version of these problems where one aims at choosing a maximum number of entries of x forming an (ultra-)metric. In stark contrast with the minimization versions, we prove that these maximization versions are hard to approximate within any constant factor assuming the Unique Games Conjecture.
A µ-biased Max-CSP instance with predicate ψ:{0,1}r → {0,1} is an instance of Constraint Satisfaction Problem (CSP) where the objective is to find a labeling of relative weight at most …
A µ-biased Max-CSP instance with predicate ψ:{0,1}r → {0,1} is an instance of Constraint Satisfaction Problem (CSP) where the objective is to find a labeling of relative weight at most µ which satisfies the maximum fraction of constraints. Biased CSPs are versatile and express several well studied problems such as Densest-k-Sub(Hyper)graph and SmallSetExpansion.
Previous chapter Next chapter Full AccessProceedings Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in ℓp-metricsVincent Cohen-Addad, Karthik C. S., …
Previous chapter Next chapter Full AccessProceedings Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in ℓp-metricsVincent Cohen-Addad, Karthik C. S., and Euiwoong LeeVincent Cohen-Addad, Karthik C. S., and Euiwoong Leepp.1493 - 1530Chapter DOI:https://doi.org/10.1137/1.9781611977073.63PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract k-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in ℓp-metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in ℓp-metrics. We introduce a new hypothesis called the Johnson Coverage Hypothesis (JCH), which roughly asserts that the well-studied Max k-Coverage problem on set systems is hard to approximate to a factor greater than (1–1/e), even when the membership graph of the set system is a subgraph of the Johnson graph. We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k-median and k-means in ℓp-metrics for factors which are close to the ones obtained for general metrics. In particular, assuming JCH we show that it is hard to approximate the k-means objective: Discrete case: To a factor of 3.94 in the ℓ1-metric and to a factor of 1.73 in the ℓ2-metric; this improves upon the previous factor of 1.56 and 1.17 respectively, obtained under the Unique Games Conjecture (UGC). Continuous case: To a factor of 2.10 in the ℓ1-metric and to a factor of 1.36 in the ℓ2-metric; this improves upon the previous factor of 1.07 in the ℓ2-metric obtained under UGC (and to the best of our knowledge, the continuous case of k-means in ℓ1-metric was not previously analyzed in literature). We also obtain similar improvements under JCH for the k-median objective. Additionally, we prove a weak version of JCH using the work of Dinur et al. (SICOMP '05) on Hypergraph Vertex Cover, and recover all the results stated above of Cohen-Addad and Karthik (FOCS '19) to (nearly) the same inapproximability factors but now under the standard NP ≠ P assumption (instead of UGC). Finally, we establish a strong connection between JCH and the long standing open problem of determining the Hypergraph Turán number. We then use this connection to prove improved SDP gaps (over the existing factors in literature) for k-means and k-median objectives. Previous chapter Next chapter RelatedDetails Published:2022eISBN:978-1-61197-707-3 https://doi.org/10.1137/1.9781611977073Book Series Name:ProceedingsBook Code:PRDA22Book Pages:xvii + 3771
A $\mu$-biased Max-CSP instance with predicate $\psi:\{0,1\}^r \to \{0,1\}$ is an instance of Constraint Satisfaction Problem (CSP) where the objective is to find a labeling of relative weight at most …
A $\mu$-biased Max-CSP instance with predicate $\psi:\{0,1\}^r \to \{0,1\}$ is an instance of Constraint Satisfaction Problem (CSP) where the objective is to find a labeling of relative weight at most $\mu$ which satisfies the maximum fraction of constraints. Biased CSPs are versatile and express several well studied problems such as Densest-$k$-Sub(Hyper)graph and SmallSetExpansion. In this work, we explore the role played by the bias parameter $\mu$ on the approximability of biased CSPs. We show that the approximability of such CSPs can be characterized (up to loss of factors of arity $r$) using the bias-approximation curve of Densest-$k$-SubHypergraph (DkSH). In particular, this gives a tight characterization of predicates which admit approximation guarantees that are independent of the bias parameter $\mu$. Motivated by the above, we give new approximation and hardness results for DkSH. In particular, assuming the Small Set Expansion Hypothesis (SSEH), we show that DkSH with arity $r$ and $k = \mu n$ is NP-hard to approximate to a factor of $\Omega(r^3\mu^{r-1}\log(1/\mu))$ for every $r \geq 2$ and $\mu < 2^{-r}$. We also give a $O(\mu^{r-1}\log(1/\mu))$-approximation algorithm for the same setting. Our upper and lower bounds are tight up to constant factors, when the arity $r$ is a constant, and in particular, imply the first tight approximation bounds for the Densest-$k$-Subgraph problem in the linear bias regime. Furthermore, using the above characterization, our results also imply matching algorithms and hardness for every biased CSP of constant arity.
We study several questions related to diversifying search results. We give improved approximation algorithms in each of the following problems, together with some lower bounds. - We give a polynomial-time …
We study several questions related to diversifying search results. We give improved approximation algorithms in each of the following problems, together with some lower bounds. - We give a polynomial-time approximation scheme (PTAS) for a diversified search ranking problem [Bansal et al., ICALP 2010] whose objective is to minimizes the discounted cumulative gain. Our PTAS runs in time $n^{2^{O(\log(1/\epsilon)/\epsilon)}} \cdot m^{O(1)}$ where $n$ denotes the number of elements in the databases. Complementing this, we show that no PTAS can run in time $f(\epsilon) \cdot (nm)^{2^{o(1/\epsilon)}}$ assuming Gap-ETH; therefore our running time is nearly tight. Both of our bounds answer open questions of Bansal et al. - We next consider the Max-Sum Dispersion problem, whose objective is to select $k$ out of $n$ elements that maximizes the dispersion, which is defined as the sum of the pairwise distances under a given metric. We give a quasipolynomial-time approximation scheme for the problem which runs in time $n^{O_{\epsilon}(\log n)}$. This improves upon previously known polynomial-time algorithms with approximate ratios 0.5 [Hassin et al., Oper. Res. Lett. 1997; Borodin et al., ACM Trans. Algorithms 2017]. Furthermore, we observe that known reductions rule out approximation schemes that run in $n^{\tilde{o}_\epsilon(\log n)}$ time assuming ETH. - We consider a generalization of Max-Sum Dispersion called Max-Sum Diversification. In addition to the sum of pairwise distance, the objective includes another function $f$. For monotone submodular $f$, we give a quasipolynomial-time algorithm with approximation ratio arbitrarily close to $(1 - 1/e)$. This improves upon the best polynomial-time algorithm which has approximation ratio $0.5$ by Borodin et al. Furthermore, the $(1 - 1/e)$ factor is tight as achieving better-than-$(1 - 1/e)$ approximation is NP-hard [Feige, J. ACM 1998].
The Uncapacitated Facility Location (UFL) problem is one of the most fundamental clustering problems: Given a set of clients $C$ and a set of facilities $F$ in a metric space …
The Uncapacitated Facility Location (UFL) problem is one of the most fundamental clustering problems: Given a set of clients $C$ and a set of facilities $F$ in a metric space $(C \cup F, dist)$ with facility costs $open : F \to \mathbb{R}^+$, the goal is to find a set of facilities $S \subseteq F$ to minimize the sum of the opening cost $open(S)$ and the connection cost $d(S) := \sum_{p \in C} \min_{c \in S} dist(p, c)$. An algorithm for UFL is called a Lagrangian Multiplier Preserving (LMP) $\alpha$ approximation if it outputs a solution $S\subseteq F$ satisfying $open(S) + d(S) \leq open(S^*) + \alpha d(S^*)$ for any $S^* \subseteq F$. The best-known LMP approximation ratio for UFL is at most $2$ by the JMS algorithm of Jain, Mahdian, and Saberi based on the Dual-Fitting technique. We present a (slightly) improved LMP approximation algorithm for UFL. This is achieved by combining the Dual-Fitting technique with Local Search, another popular technique to address clustering problems. From a conceptual viewpoint, our result gives a theoretical evidence that local search can be enhanced so as to avoid bad local optima by choosing the initial feasible solution with LP-based techniques. Using the framework of bipoint solutions, our result directly implies a (slightly) improved approximation for the $k$-Median problem from 2.6742 to 2.67059.
Given a complete graph $G = (V, E)$ where each edge is labeled $+$ or $-$, the Correlation Clustering problem asks to partition $V$ into clusters to minimize the number …
Given a complete graph $G = (V, E)$ where each edge is labeled $+$ or $-$, the Correlation Clustering problem asks to partition $V$ into clusters to minimize the number of $+$edges between different clusters plus the number of $-$edges within the same cluster. Correlation Clustering has been used to model a large number of clustering problems in practice, making it one of the most widely studied clustering formulations. The approximability of Correlation Clustering has been actively investigated [BBC04, CGW05, ACN08], culminating in a $2.06$-approximation algorithm [CMSY15], based on rounding the standard LP relaxation. Since the integrality gap for this formulation is 2, it has remained a major open question to determine if the approximation factor of 2 can be reached, or even breached. In this paper, we answer this question affirmatively by showing that there exists a $(1.994 + \epsilon)$-approximation algorithm based on $O(1/\epsilon^2$) rounds of the Sherali-Adams hierarchy. In order to round a solution to the Sherali-Adams relaxation, we adapt the {\em correlated rounding} originally developed for CSPs [BRS11, GS11, RT12]. With this tool, we reach an approximation ratio of $2+\epsilon$ for Correlation Clustering. To breach this ratio, we go beyond the traditional triangle-based analysis by employing a global charging scheme that amortizes the total cost of the rounding across different triangles.
Given $x \in (\mathbb{R}_{\geq 0})^{\binom{[n]}{2}}$ recording pairwise distances, the METRIC VIOLATION DISTANCE (MVD) problem asks to compute the $\ell_0$ distance between $x$ and the metric cone; i.e., modify the minimum …
Given $x \in (\mathbb{R}_{\geq 0})^{\binom{[n]}{2}}$ recording pairwise distances, the METRIC VIOLATION DISTANCE (MVD) problem asks to compute the $\ell_0$ distance between $x$ and the metric cone; i.e., modify the minimum number of entries of $x$ to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied recently. We present an $O(\log n)$-approximation algorithm for MVD, exponentially improving the previous best approximation ratio of $O(OPT^{1/3})$ of Fan et al. [ SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity and running time. We also study the related problem of ULTRAMETRIC VIOLATION DISTANCE (UMVD), where the goal is to compute the $\ell_0$ distance to the cone of ultrametrics, and achieve a constant factor approximation algorithm. The UMVD can be regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar [SIAM J. Computing, 2011] and by Cohen-Addad et al. [FOCS, 2021] from $\ell_1$ norm to $\ell_0$ norm. We show that this problem can be favorably interpreted as an instance of Correlation Clustering with an additional hierarchical structure, which we solve using a new $O(1)$-approximation algorithm for correlation clustering that has the structural property that it outputs a refinement of the optimum clusters. An algorithm satisfying such a property can be considered of independent interest. We also provide an $O(\log n \log \log n)$ approximation algorithm for weighted instances. Finally, we investigate the complementary version of these problems where one aims at choosing a maximum number of entries of $x$ forming an (ultra-)metric. In stark contrast with the minimization versions, we prove that these maximization versions are hard to approximate within any constant factor assuming the Unique Games Conjecture.
In the Set Cover problem, we are given a set system with each set having a weight, and we want to find a collection of sets that cover the universe, …
In the Set Cover problem, we are given a set system with each set having a weight, and we want to find a collection of sets that cover the universe, whilst having low total weight. There are several approaches known (based on greedy approaches, relax-and-round, and dual-fitting) that achieve a $H_k \approx \ln k + O(1)$ approximation for this problem, where the size of each set is bounded by $k$. Moreover, getting a $\ln k - O(\ln \ln k)$ approximation is hard. Where does the truth lie? Can we close the gap between the upper and lower bounds? An improvement would be particularly interesting for small values of $k$, which are often used in reductions between Set Cover and other combinatorial optimization problems. We consider a non-oblivious local-search approach: to the best of our knowledge this gives the first $H_k$-approximation for Set Cover using an approach based on local-search. Our proof fits in one page, and gives a integrality gap result as well. Refining our approach by considering larger moves and an optimized potential function gives an $(H_k - \Omega(\log^2 k)/k)$-approximation, improving on the previous bound of $(H_k - \Omega(1/k^8))$ (\emph{R.\ Hassin and A.\ Levin, SICOMP '05}) based on a modified greedy algorithm.
Let $H$ be a fixed graph. The $H$-Transversal problem, given a graph $G$, asks to remove the smallest number of vertices from $G$ so that $G$ does not contain $H$ …
Let $H$ be a fixed graph. The $H$-Transversal problem, given a graph $G$, asks to remove the smallest number of vertices from $G$ so that $G$ does not contain $H$ as a subgraph. While a simple $|V(H)|$-approximation algorithm exists and is believed to be tight for every $2$-vertex-connected $H$, the best hardness of approximation for any tree was $\Omega(\log |V(H)|)$-inapproximability when $H$ is a star. In this paper, we identify a natural parameter $\Delta$ for every tree $T$ and show that $T$-Transversal is NP-hard to approximate within a factor $(\Delta - 1 -\varepsilon)$ for an arbitrarily small constant $\varepsilon > 0$. As a corollary, we prove that there exists a tree $T$ such that $T$-Transversal is NP-hard to approximate within a factor $\Omega(|V(T)|)$, exponentially improving the best known hardness of approximation for tree transversals.
In the k -cut problem, we want to find the lowest-weight set of edges whose deletion breaks a given (multi)graph into k connected components. Algorithms of Karger and Stein can …
In the k -cut problem, we want to find the lowest-weight set of edges whose deletion breaks a given (multi)graph into k connected components. Algorithms of Karger and Stein can solve this in roughly O ( n 2k ) time. However, lower bounds from conjectures about the k -clique problem imply that Ω ( n (1- o (1)) k ) time is likely needed. Recent results of Gupta, Lee, and Li have given new algorithms for general k -cut in n 1.98k + O(1) time, as well as specialized algorithms with better performance for certain classes of graphs (e.g., for small integer edge weights). In this work, we resolve the problem for general graphs. We show that the Contraction Algorithm of Karger outputs any fixed k -cut of weight α λ k with probability Ω k ( n - α k ), where λ k denotes the minimum k -cut weight. This also gives an extremal bound of O k ( n k ) on the number of minimum k -cuts and an algorithm to compute λ k with roughly n k polylog( n ) runtime. Both are tight up to lower-order factors, with the algorithmic lower bound assuming hardness of max-weight k -clique. The first main ingredient in our result is an extremal bound on the number of cuts of weight less than 2 λ k / k , using the Sunflower lemma. The second ingredient is a fine-grained analysis of how the graph shrinks—and how the average degree evolves—in the Karger process.
K-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in $\ell_p$-metrics, remains a major …
K-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in $\ell_p$-metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in $\ell_p$-metrics. We introduce a new hypothesis called the Johnson Coverage Hypothesis (JCH), which roughly asserts that the well-studied max k-coverage problem on set systems is hard to approximate to a factor greater than 1-1/e, even when the membership graph of the set system is a subgraph of the Johnson graph. We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k-median and k-means in $\ell_p$-metrics for factors which are close to the ones obtained for general metrics. In particular, assuming JCH we show that it is hard to approximate the k-means objective: $\bullet$ Discrete case: To a factor of 3.94 in the $\ell_1$-metric and to a factor of 1.73 in the $\ell_2$-metric; this improves upon the previous factor of 1.56 and 1.17 respectively, obtained under UGC. $\bullet$ Continuous case: To a factor of 2.10 in the $\ell_1$-metric and to a factor of 1.36 in the $\ell_2$-metric; this improves upon the previous factor of 1.07 in the $\ell_2$-metric obtained under UGC. We also obtain similar improvements under JCH for the k-median objective. Additionally, we prove a weak version of JCH using the work of Dinur et al. (SICOMP '05) on Hypergraph Vertex Cover, and recover all the results stated above of Cohen-Addad and Karthik (FOCS '19) to (nearly) the same inapproximability factors but now under the standard NP$\neq$P assumption (instead of UGC).
K-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in $\ell_p$-metrics, remains a major …
K-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in $\ell_p$-metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in $\ell_p$-metrics. We introduce a new hypothesis called the Johnson Coverage Hypothesis (JCH), which roughly asserts that the well-studied max k-coverage problem on set systems is hard to approximate to a factor greater than 1-1/e, even when the membership graph of the set system is a subgraph of the Johnson graph. We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k-median and k-means in $\ell_p$-metrics for factors which are close to the ones obtained for general metrics. In particular, assuming JCH we show that it is hard to approximate the k-means objective: $\bullet$ Discrete case: To a factor of 3.94 in the $\ell_1$-metric and to a factor of 1.73 in the $\ell_2$-metric; this improves upon the previous factor of 1.56 and 1.17 respectively, obtained under UGC. $\bullet$ Continuous case: To a factor of 2.10 in the $\ell_1$-metric and to a factor of 1.36 in the $\ell_2$-metric; this improves upon the previous factor of 1.07 in the $\ell_2$-metric obtained under UGC. We also obtain similar improvements under JCH for the k-median objective. Additionally, we prove a weak version of JCH using the work of Dinur et al. (SICOMP '05) on Hypergraph Vertex Cover, and recover all the results stated above of Cohen-Addad and Karthik (FOCS '19) to (nearly) the same inapproximability factors but now under the standard NP$\neq$P assumption (instead of UGC).
We show how to round any half-integral solution to the subtour-elimination relaxation for the TSP, while losing a less-than-1.5 factor. Such a rounding algorithm was recently given by Karlin, Klein, …
We show how to round any half-integral solution to the subtour-elimination relaxation for the TSP, while losing a less-than-1.5 factor. Such a rounding algorithm was recently given by Karlin, Klein, and Oveis Gharan based on sampling from max-entropy distributions. We build on an approach of Haddadan and Newman to show how sampling from the matroid intersection polytope, and a new use of max-entropy sampling, can give better guarantees.
The Weighted ℱ-Vertex Deletion for a class ℱ of graphs asks, weighted graph G, for a minimum weight vertex set S such that G-S ∈ ℱ. The case when ℱ …
The Weighted ℱ-Vertex Deletion for a class ℱ of graphs asks, weighted graph G, for a minimum weight vertex set S such that G-S ∈ ℱ. The case when ℱ is minor-closed and excludes some graph as a minor has received particular attention but a constant-factor approximation remained elusive for Weighted ℱ-Vertex Deletion. Only three cases of minor-closed ℱ are known to admit constant-factor approximations, namely Vertex Cover, Feedback Vertex Set and Diamond Hitting Set. We study the problem for the class ℱ of θ_c-minor-free graphs, under the equivalent setting of the Weighted c-Bond Cover problem, and present a constant-factor approximation algorithm using the primal-dual method. For this, we leverage a structure theorem implicit in [Joret et al., SIDMA'14] which states the following: any graph G containing a θ_c-minor-model either contains a large two-terminal protrusion, or contains a constant-size θ_c-minor-model, or a collection of pairwise disjoint constant-sized connected sets that can be contracted simultaneously to yield a dense graph. In the first case, we tame the graph by replacing the protrusion with a special-purpose weighted gadget. For the second and third case, we provide a weighting scheme which guarantees a local approximation ratio. Besides making an important step in the quest of (dis)proving a constant-factor approximation for Weighted ℱ-Vertex Deletion, our result may be useful as a template for algorithms for other minor-closed families.
Given a graph stream, how can we estimate the number of triangles in it using multiple machines with limited storage? Specifically, how should edges be processed and sampled across the …
Given a graph stream, how can we estimate the number of triangles in it using multiple machines with limited storage? Specifically, how should edges be processed and sampled across the machines for rapid and accurate estimation? The count of triangles (i.e., cliques of size three) has proven useful in numerous applications, including anomaly detection, community detection, and link recommendation. For triangle counting in large and dynamic graphs, recent work has focused largely on streaming algorithms and distributed algorithms but little on their combinations for “the best of both worlds.” In this work, we propose CoCoS , a fast and accurate distributed streaming algorithm for estimating the counts of global triangles (i.e., all triangles) and local triangles incident to each node. Making one pass over the input stream, CoCoS carefully processes and stores the edges across multiple machines so that the redundant use of computational and storage resources is minimized. Compared to baselines, CoCoS is: (a) accurate: giving up to <?TeX $\mathbf {39\times }$?> smaller estimation error; (b) fast : up to <?TeX $\mathbf {10.4\times }$?> faster, scaling linearly with the size of the input stream; and (c) theoretically sound : yielding unbiased estimates.
The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, …
The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyd's heuristic locates a center at the mean of each cluster.Despite persistent efforts on understanding the approximability of k-means, and other classic clustering problems such as k-median and k-minsum, our knowledge of the hardness of approximation factors of these problems remains quite poor. In this paper, we significantly improve upon the hardness of approximation factors known in the literature for these objectives. We show that if the input lies in a general metric space, it is NP-hard to approximate:•Continuous k-median to a factor of 2 – o(1); this improves upon the previous inapproximability factor of 1.36 shown by Guha and Khuller (J. Algorithms '99).•Continuous k-means to a factor of 4–o(1); this improves upon the previous inapproximability factor of 2.10 shown by Guha and Khuller (J. Algorithms '99).•k-minsum to a factor of 1.415; this improves upon the APX-hardness shown by Guruswami and Indyk (SODA '03).Our results shed new and perhaps counter-intuitive light on the differences between clustering problems in the continuous setting versus the discrete setting (where the candidate centers are given as part of the input).
Consider a random graph model where there is an underlying simple graph G = (V, E), and each edge is sampled independently with probability p ∊ [0, 1]. What is …
Consider a random graph model where there is an underlying simple graph G = (V, E), and each edge is sampled independently with probability p ∊ [0, 1]. What is the smallest value of p such that the resulting graph Gp is connected with constant probability? This is a well-studied question for special classes of graphs, such as complete graphs and hypercubes. For instance, when G is the complete graph, we want the connectivity threshold for the Erdős-Rényi G(n, p) model: here the answer is known to be . However, the problem is not well-understood for more general graph classes.We first investigate this connectivity threshold problem for "somewhat dense" graphs. We show that for any and any δ-regular, δ-edge-connected graph G, the random graph Gp for is connected with probability , generalizing upon the case when G is the complete graph. Our proof also bounds the number of approximate mincuts in such a dense graph, which may be of independent interest.Next, for a general graph G with edge connectivity λ, we define an explicit parameter βG ∊ (0, 2 ln n], based on the number of approximate mincuts, and show that there is a sharp transition in the connectivity of G at p = 1 – exp(βG/λ). Moreover, we show that the width of this transition is an additive O(ln λ/λ) term; this improves upon Margulis' classical result bounding the width of the threshold by .
The $k$-Supplier problem is an important location problem that has been actively studied in both general and Euclidean metrics. Many of its variants have also been studied, primarily on general …
The $k$-Supplier problem is an important location problem that has been actively studied in both general and Euclidean metrics. Many of its variants have also been studied, primarily on general metrics. We study two variants of $k$-Supplier, namely Priority $k$-Supplier and $k$-Supplier with Outliers, in Euclidean metrics. We obtain $(1+\sqrt{3})$-approximation algorithms for both variants, which are the first improvements over the previously-known factor-$3$ approximation (that is known to be best-possible for general metrics). We also study the Matroid Supplier problem on Euclidean metrics, and show that it cannot be approximated to a factor better than $3$ (assuming $P\ne NP$); so the Euclidean metric offers no improvement in this case.
We show how to round any half-integral solution to the subtour-elimination relaxation for the TSP, while losing a less-than-1.5 factor. Such a rounding algorithm was recently given by Karlin, Klein, …
We show how to round any half-integral solution to the subtour-elimination relaxation for the TSP, while losing a less-than-1.5 factor. Such a rounding algorithm was recently given by Karlin, Klein, and Oveis Gharan based on sampling from max-entropy distributions. We build on an approach of Haddadan and Newman to show how sampling from the matroid intersection polytope, and a new use of max-entropy sampling, can give better guarantees.
We study the computational complexity of approximating the 2-to-q norm of linear operators (defined as |A|2->q = maxv≠ 0|Av|q/|v|2) for q > 2, as well as connections between this question …
We study the computational complexity of approximating the 2-to-q norm of linear operators (defined as |A|2->q = maxv≠ 0|Av|q/|v|2) for q > 2, as well as connections between this question and issues arising in quantum information theory and the study of Khot's Unique Games Conjecture (UGC). We show the following: For any constant even integer q ≥ 4, a graph G is a small-set expander if and only if the projector into the span of the top eigenvectors of G's adjacency matrix has bounded 2->q norm. As a corollary, a good approximation to the 2->q norm will refute the Small-Set Expansion Conjecture --- a close variant of the UGC. We also show that such a good approximation can be obtained in exp(n2/q) time, thus obtaining a different proof of the known subexponential algorithm for Small-Set-Expansion. Constant rounds of the "Sum of Squares" semidefinite programing hierarchy certify an upper bound on the 2->4 norm of the projector to low degree polynomials over the Boolean cube, as well certify the unsatisfiability of the "noisy cube" and "short code" based instances of Unique-Games considered by prior works. This improves on the previous upper bound of exp(logO(1) n) rounds (for the "short code"), as well as separates the "Sum of Squares"/"Lasserre" hierarchy from weaker hierarchies that were known to require ω(1) rounds. We show reductions between computing the 2->4 norm and computing the injective tensor norm of a tensor, a problem with connections to quantum information theory. Three corollaries are: (i) the 2->4 norm is NP-hard to approximate to precision inverse-polynomial in the dimension, (ii) the 2->4 norm does not have a good approximation (in the sense above) unless 3-SAT can be solved in time exp(√n poly log(n)), and (iii) known algorithms for the quantum separability problem imply a non-trivial additive approximation for the 2->4 norm.
Given a set F of n positive functions over a ground set X, we consider the problem of computing x* that minimizes the expression ∑f ∈ Ff(x), over x ∈ …
Given a set F of n positive functions over a ground set X, we consider the problem of computing x* that minimizes the expression ∑f ∈ Ff(x), over x ∈ X. A typical application is shape fitting, where we wish to approximate a set P of n elements (say, points) by a shape x from a (possibly infinite) family X of shapes. Here, each point p ∈ P corresponds to a function f such that f(x) is the distance from p to x, and we seek a shape x that minimizes the sum of distances from each point in P. In the k-clustering variant, each x\in X is a tuple of k shapes, and f(x) is the distance from p to its closest shape in x.
Given a k-uniform hypergraph, the Ek-Vertex-Cover problem is to find the smallest subset of vertices that intersects every hyperedge. We present a new multilayered probabilistically checkable proof (PCP) construction that …
Given a k-uniform hypergraph, the Ek-Vertex-Cover problem is to find the smallest subset of vertices that intersects every hyperedge. We present a new multilayered probabilistically checkable proof (PCP) construction that extends the Raz verifier. This enables us to prove that Ek-Vertex-Cover is NP-hard to approximate within a factor of $(k-1-\epsilon)$ for arbitrary constants $\epsilon>0$ and $k\ge 3$. The result is nearly tight as this problem can be easily approximated within factor k. Our construction makes use of the biased long-code and is analyzed using combinatorial properties of s-wise t-intersecting families of subsets. We also give a different proof that shows an inapproximability factor of $\lfloor \frac{k}{2} \rfloor -\eps$. In addition to being simpler, this proof also works for superconstant values of k up to (log N)1/c, where c > 1 is a fixed constant and N is the number of hyperedges.
In the k-cut problem, we are given an edge-weighted graph G and an integer k, and have to remove a set of edges with minimum total weight so that G …
In the k-cut problem, we are given an edge-weighted graph G and an integer k, and have to remove a set of edges with minimum total weight so that G has at least k connected components. The current best algorithms are an O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(2-o(1))k</sup> ) randomized algorithm due to Karger and Stein, and an Õ(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2k</sup> ) deterministic algorithm due to Thorup. Moreover, several 2-approximation algorithms are known for the problem (due to Saran and Vazirani, Naor and Rabani, and Ravi and Sinha). It has remained an open problem to (a) improve the runtime of exact algorithms, and (b) to get better approximation algorithms. In this paper we show an O(k <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(k)</sup> n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(2Ω/3 + o(1))k</sup> )-time algorithm for k-cut. Moreover, we show an (1+ε)-approximation algorithm that runs in time O((k/ε) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(k)</sup> n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k + O(1)</sup> ), and a 1.81-approximation in fixed-parameter time O(2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(k(2))</sup> poly(n)).
In this paper we derive tight bounds on the expected value of products of low influence functions defined on correlated probability spaces. The proofs are based on extending Fourier theory …
In this paper we derive tight bounds on the expected value of products of low influence functions defined on correlated probability spaces. The proofs are based on extending Fourier theory to an arbitrary number of correlated probability spaces, on a generalization of an invariance principle recently obtained with O'Donnell and Oleszkiewicz for multilinear polynomials with low influences and bounded degree and on properties of multi-dimensional Gaussian distributions.
The Small-Set Expansion Hypothesis (Raghavendra, Steurer, STOC 2010) is a natural hardness assumption concerning the problem of approximating the edge expansion of small sets in graphs. This hardness assumption is …
The Small-Set Expansion Hypothesis (Raghavendra, Steurer, STOC 2010) is a natural hardness assumption concerning the problem of approximating the edge expansion of small sets in graphs. This hardness assumption is closely connected to the Unique Games Conjecture (Khot, STOC 2002). In particular, the Small-Set Expansion Hypothesis implies the Unique Games Conjecture (Raghavendra, Steurer, STOC 2010). Our main result is that the Small-Set Expansion Hypothesis is in fact equivalent to a variant of the Unique Games Conjecture. More precisely, the hypothesis is equivalent to the Unique Games Conjecture restricted to instance with a fairly mild condition on the expansion of small sets. Alongside, we obtain the first strong hardness of approximation results for the Balanced Separator and Minimum Linear Arrangement problems. Before, no such hardness was known for these problems even assuming the Unique Games Conjecture. These results not only establish the Small-Set Expansion Hypothesis as a natural unifying hypothesis that implies the Unique Games Conjecture, all its consequences and, in addition, hardness results for other problems like Balanced Separator and Minimum Linear Arrangement, but our results also show that the Small-Set Expansion Hypothesis problem lies at the combinatorial heart of the Unique Games Conjecture. The key technical ingredient is a new way of exploiting the structure of the Unique Games instances obtained from the Small-Set Expansion Hypothesis via (Raghavendra, Steurer, 2010). This additional structure allows us to modify standard reductions in a way that essentially destroys their local-gadget nature. Using this modification, we can argue about the expansion in the graphs produced by the reduction without relying on expansion properties of the underlying Unique Games instance (which would be impossible for a local-gadget reduction).
This work is concerned with approximating constraint satisfaction problems (CSPs) with an additional global cardinality constraints. For example, \maxcut is a boolean CSP where the input is a graph $G …
This work is concerned with approximating constraint satisfaction problems (CSPs) with an additional global cardinality constraints. For example, \maxcut is a boolean CSP where the input is a graph $G = (V,E)$ and the goal is to find a cut $S \cup \bar S = V$ that maximizes the numberof crossing edges, $|E(S,\bar S)|$. The \maxbisection problem is a variant of \maxcut with an additional global constraint that each side of the cut has exactly half the vertices, i.e., $|S| = |V|/2$. Several other natural optimization problems like \minbisection and approximating Graph Expansion can be formulated as CSPs with global constraints. In this work, we formulate a general approach towards approximating CSPs with global constraints using SDP hierarchies. To demonstrate the approach we present the following results: Using the Lasserre hierarchy, we present an algorithm that runs in time $O(n^{poly(1/\epsilon)})$ that given an instance of \maxbisection with value $1-\epsilon$, finds a bisection with value $1-O(\sqrt{\epsilon})$. This approximation is near-optimal (up to constant factors in $O()$) under the Unique Games Conjecture. By a computer-assisted proof, we show that the same algorithm also achieves a 0.85-approximation for \maxbisection, improving on the previous bound of 0.70 (note that it is \uniquegames hard to approximate better than a 0.878 factor). The same algorithm also yields a 0.92-approximation for \maxtwosat with cardinality constraints. For every CSP with a global cardinality constraints, we present a generic conversion from integrality gap instances for the Lasserre hierarchy to a {\it dictatorship test} whose soundness is at most integrality gap. Dictatorship testing gadgets are central to hardness results for CSPs, and a generic conversion of the above nature lies at the core of the tight Unique Games based hardness result for CSPs. \cite{Raghavendra08}
The Small Set Expansion Hypothesis is a conjecture which roughly states that it is NP-hard to distinguish between a graph with a small subset of vertices whose (edge) expansion is …
The Small Set Expansion Hypothesis is a conjecture which roughly states that it is NP-hard to distinguish between a graph with a small subset of vertices whose (edge) expansion is almost zero and one in which all small subsets of vertices have expansion almost one. In this work, we prove conditional inapproximability results with essentially optimal ratios for the following graph problems based on this hypothesis: Maximum Edge Biclique, Maximum Balanced Biclique, Minimum k-Cut and Densest At-Least-k-Subgraph. Our hardness results for the two biclique problems are proved by combining a technique developed by Raghavendra, Steurer and Tulsiani to avoid locality of gadget reductions with a generalization of Bansal and Khot’s long code test whereas our results for Minimum k-Cut and Densest At-Least-k-Subgraph are shown via elementary reductions.
The Chordal Vertex Deletion (ChVD) problem asks to delete a minimum number of vertices from an input graph to obtain a chordal graph. In this paper we develop a polynomial …
The Chordal Vertex Deletion (ChVD) problem asks to delete a minimum number of vertices from an input graph to obtain a chordal graph. In this paper we develop a polynomial kernel for ChVD under the parameterization by the solution size. Using a new Erdös--Pósa-type packing/covering duality for holes in nearly chordal graphs, we present a polynomial-time algorithm that reduces any instance $(G,k)$ of ChVD to an equivalent instance with ${poly}(k)$ vertices. The existence of a polynomial kernel answers an open problem posed by Marx in 2006 [D. Marx, "Chordal Deletion Is Fixed-Parameter Tractable," in Graph-Theoretic Concepts in Computer Science, Lecture Notes in Comput. Sci. 4271, Springer, 2006, pp. 37--48]. To obtain the kernelization, we develop the first ${poly}({\sc opt})$-approximation algorithm for ChVD, which is of independent interest. In polynomial time, it either decides that $G$ has no chordal deletion set of size $k$, or outputs a solution of size $\mathcal{O}(k^4\log^2k)$.
We study the complexity of the maximum coverage problem, restricted to set systems of bounded VC-dimension. Our main result is a fixed-parameter tractable approximation scheme: an algorithm that outputs a …
We study the complexity of the maximum coverage problem, restricted to set systems of bounded VC-dimension. Our main result is a fixed-parameter tractable approximation scheme: an algorithm that outputs a (1-ε)-approximation to the maximum-cardinality union of k sets, in running time $O(f(ε,k,d)⋅ poly(n)) where n is the problem size, d is the VC-dimension of the set system, and f(ε,k,d) is exponential in (kd/ε)c for some constant c. We complement this positive result by showing that the function f(ε,k,d) in the running-time bound cannot be replaced by a function depending only on (ε,d) or on (k,d), under standard complexity assumptions. We also present an improved upper bound on the approximation ratio of the greedy algorithm in special cases of the problem, including when the sets have bounded cardinality and when they are two-dimensional halfspaces. Complementing these positive results, we show that when the sets are four-dimensional halfspaces neither the greedy algorithm nor local search is capable of improving the worst-case approximation ratio of 1-1/e that the greedy algorithm achieves on arbitrary instances of maximum coverage.
We consider the classic Facility Location, k-Median, and k-Means problems in metric spaces of constant doubling dimension. We give the first nearly linear-time approximation schemes for each problem, making a …
We consider the classic Facility Location, k-Median, and k-Means problems in metric spaces of constant doubling dimension. We give the first nearly linear-time approximation schemes for each problem, making a significant improvement over the state-of-the-art algorithms. Moreover, we show how to extend the techniques used to get the first efficient approximation schemes for the problems of prize-collecting k-Medians and k-Means, and efficient bicriteria approximation schemes for k-Medians with outliers, k-Means with outliers and k-Center.
In the Densest k-Subgraph (DkS) problem, given an undirected graph G and an integer k, the goal is to find a subgraph of G on k vertices that contains maximum …
In the Densest k-Subgraph (DkS) problem, given an undirected graph G and an integer k, the goal is to find a subgraph of G on k vertices that contains maximum number of edges. Even though Bhaskara et al.'s state-of-the-art algorithm for the problem achieves only O(n1/4 + ϵ) approximation ratio, previous attempts at proving hardness of approximation, including those under average case assumptions, fail to achieve a polynomial ratio; the best ratios ruled out under any worst case assumption and any average case assumption are only any constant (Raghavendra and Steurer) and 2O(log2/3 n) (Alon et al.) respectively.
Let F be a finite set of graphs. In the F-DELETION problem, we are given an n-vertex graph G and an integer k as input, and asked whether at most …
Let F be a finite set of graphs. In the F-DELETION problem, we are given an n-vertex graph G and an integer k as input, and asked whether at most k vertices can be deleted from G such that the resulting graph does not contain a graph from F as a minor. F-DELETION is a generic problem and by selecting different sets of forbidden minors F, one can obtain various fundamental problems such as VERTEX COVER, FEEDBACK VERTEX SET or TREEWIDTH η-DELETION. In this paper we obtain a number of generic algorithmic results about F-DELETION, when F contains at least one planar graph. The highlights of our work are · A constant factor approximation algorithm for the optimization version of F-DELETION; · A linear time and single exponential parameterized algorithm, that is, an algorithm running in time O(2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(k)</sup> n), for the parameterized version of F-DELETION where all graphs in F are connected; · A polynomial kernel for parameterized F-DELETION. These algorithms unify, generalize, and improve a multitude of results in the literature. Our main results have several direct applications, but also the methods we develop on the way have applicability beyond the scope of this paper. Our results - constant factor approximation, polynomial kernelization and FPT algorithms - are stringed together by a common theme of polynomial time preprocessing.
We study the complexity of approximating the vertex expansion of graphs G = (V, E), defined as Φ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">V</sup> def = minSCV n . |N(S)|/(|S||V\S). We give a …
We study the complexity of approximating the vertex expansion of graphs G = (V, E), defined as Φ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">V</sup> def = minSCV n . |N(S)|/(|S||V\S). We give a simple polynomialtime algorithm for finding a subset with vertex expansion O(√(Φ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">V</sup> log d)) where d is the maximum degree of the graph. Our main result is an asymptotically matching lower bound: under the Small Set Expansion (SSE) hypothesis, it is hard to find a subset with expansion less than C(√(Φ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">V</sup> log d)) for an absolute constant C. In particular, this implies for all constant ε > 0, it is SSE-hard to distinguish whether the vertex expansion <; ε or at least an absolute constant. The analogous threshold for edge expansion is √Φ with no dependence on the degree (Here Φ denotes the optimal edge expansion). Thus our results suggest that vertex expansion is harder to approximate than edge expansion. In particular, while Cheeger's algorithm can certify constant edge expansion, it is SSE-hard to certify constant vertex expansion in graphs.
We give a test that can distinguish efficiently between product states of n quantum systems and states which are far from product. If applied to a state psi whose maximum …
We give a test that can distinguish efficiently between product states of n quantum systems and states which are far from product. If applied to a state psi whose maximum overlap with a product state is 1-epsilon, the test passes with probability 1-Theta(epsilon), regardless of n or the local dimensions of the individual systems. The test uses two copies of psi. We prove correctness of this test as a special case of a more general result regarding stability of maximum output purity of the depolarising channel. A key application of the test is to quantum Merlin-Arthur games with multiple Merlins, where we obtain several structural results that had been previously conjectured, including the fact that efficient soundness amplification is possible and that two Merlins can simulate many Merlins: QMA(k)=QMA(2) for k>=2. Building on a previous result of Aaronson et al, this implies that there is an efficient quantum algorithm to verify 3-SAT with constant soundness, given two unentangled proofs of O(sqrt(n) polylog(n)) qubits. We also show how QMA(2) with log-sized proofs is equivalent to a large number of problems, some related to quantum information (such as testing separability of mixed states) as well as problems without any apparent connection to quantum mechanics (such as computing injective tensor norms of 3-index tensors). As a consequence, we obtain many hardness-of-approximation results, as well as potential algorithmic applications of methods for approximating QMA(2) acceptance probabilities. Finally, our test can also be used to construct an efficient test for determining whether a unitary operator is a tensor product, which is a generalisation of classical linearity testing.
We present several polynomial- and quasipolynomial-time approximation schemes for a large class of generalized operator norms. Special cases include the $2\rightarrow q$ norm of matrices for $q>2$, the support function …
We present several polynomial- and quasipolynomial-time approximation schemes for a large class of generalized operator norms. Special cases include the $2\rightarrow q$ norm of matrices for $q>2$, the support function of the set of separable quantum states, finding the least noisy output of entanglement-breaking quantum channels, and approximating the injective tensor norm for a map between two Banach spaces whose factorization norm through $\ell_1^n$ is bounded. These reproduce and in some cases improve upon the performance of previous algorithms by Brandão-Christandl-Yard and followup work, which were based on the Sum-of-Squares hierarchy and whose analysis used techniques from quantum information such as the monogamy principle of entanglement. Our algorithms, by contrast, are based on brute force enumeration over carefully chosen covering nets. These have the advantage of using less memory, having much simpler proofs and giving new geometric insights into the problem. Net-based algorithms for similar problems were also presented by Shi-Wu and Barak-Kelner-Steurer, but in each case with a run-time that is exponential in the rank of some matrix. We achieve polynomial or quasipolynomial runtimes by using the much smaller nets that exist in $\ell_1$ spaces. This principle has been used in learning theory, where it is known as Maurey's empirical method.
We introduce a new technique for designing fixed-parameter algorithms for cut problems, called randomized contractions. We apply our framework to obtain the first fixed-parameter algorithms (FPT algorithms) with exponential speed …
We introduce a new technique for designing fixed-parameter algorithms for cut problems, called randomized contractions. We apply our framework to obtain the first fixed-parameter algorithms (FPT algorithms) with exponential speed up for the Steiner Cut and Node Multiway Cut-Uncut problems. We prove that the parameterized version of the Unique Label Cover problem, which is the base of the Unique Games Conjecture, can be solved in $2^{O(k^2\log |\Sigma|)}n^4\log n$ deterministic time (even in the stronger, vertex-deletion variant), where $k$ is the number of unsatisfied edges and $|\Sigma|$ is the size of the alphabet. As a consequence, we show that one can in polynomial time solve instances of Unique Games where the number of edges allowed not to be satisfied is upper bounded by $O(\sqrt{\log n})$ to optimality, which improves over the trivial $O(1)$ upper bound. We prove that the Steiner Cut problem can be solved in $2^{O(k^2\log k)}n^4\log n$ deterministic time and $\tilde{O}(2^{O(k^2\log k)}n^2)$ randomized time, where $k$ is the size of the cutset. This result improves the double exponential running time of the recent work of Kawarabayashi and Thorup presented at FOCS'11. We show how to combine considering "cut" and "uncut" constraints at the same time. More precisely, we define a robust problem, Node Multiway Cut-Uncut, that can serve as an abstraction of introducing uncut constraints and show that it admits an algorithm running in $2^{O(k^2\log k)}n^4\log n$ deterministic time, where $k$ is the size of the cutset. To the best of our knowledge, the only known way of tackling uncut constraints was via the approach of Marx, O'Sullivan, and Razgon [ACM Trans. Algorithms, 9 (2013), 30], which yields algorithms with double exponential running time. An interesting aspect of our algorithms is that they can handle positive real weights.
Previous chapter Next chapter Full AccessProceedings Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms (SODA)Tight Running Time Lower Bounds for Strong Inapproximability of Maximum k-Coverage, Unique Set Cover and …
Previous chapter Next chapter Full AccessProceedings Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms (SODA)Tight Running Time Lower Bounds for Strong Inapproximability of Maximum k-Coverage, Unique Set Cover and Related Problems (via t-Wise Agreement Testing Theorem)Pasin ManurangsiPasin Manurangsipp.62 - 81Chapter DOI:https://doi.org/10.1137/1.9781611975994.5PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract We show, assuming the (randomized) Gap Exponential Time Hypothesis (Gap-ETH), that the following tasks cannot be done in T(k) · No(k)-time for any function T where N denote the input size: ()-approximation for Max k-Coverage for any constant ɛ > 0, ()-approximation for k-Median (in general metrics) for any constant ɛ > 0. ()-approximation for k-Mean (in general metrics) for any constant ɛ > 0. Any constant factor approximation for k-Unique Set Cover, k-Nearest Codeword Problem and k-Closest Vector Problem. (1 + δ)-approximation for k-Minimum Distance Problem and k-Shortest Vector Problem for some δ > 0. Since all problems considered here can be trivially solved in NO(k) time, our running time lower bounds are tight up to a constant factor in the exponent. In terms of approximation ratios, Max k-Coverage is well-known to admit polynomial-time ()-approximation algorithms, and, recently, it was shown that k-Median and k-Median are approximable to within factors of () and () respectively in FPT time [20]; hence, our inapproximability ratios are also tight for these three problems. For the remaining problems, no non-trivial FPT approximation algorithms are known. The starting point of all our hardness results is the Label Cover problem (with projection constraints). We show that Label Cover cannot be approximated to within any constant factor in T(k) · No(k) time, where N and k denote the size of the input and the number of nodes on the side with the larger alphabet respectively. With this hardness, the above results follow immediately from known reductions. The hardness of Label Cover is in turn shown via a t-wise agreement testing theorem of the following form: given local boolean functions f1, …,fk on domains S1, …, Sk ⊆ [n], if random t functions “weakly agree” with sufficiently large probability, then we can find a global boolean function g: [n] → {0, 1} that “mostly agrees” with “many” of the local functions. We prove such a statement in the regime where S1, …, Sk are “random-looking” sets of size Θ(n/k). Previous chapter Next chapter RelatedDetails Published:2020eISBN:978-1-61197-599-4 https://doi.org/10.1137/1.9781611975994Book Series Name:ProceedingsBook Code:PRDA20Book Pages:xxii + 3011
We consider the (exact, minimum) k-CUT problem: given a graph and an integer k, delete a minimum-weight set of edges so that the remaining graph has at least k connected …
We consider the (exact, minimum) k-CUT problem: given a graph and an integer k, delete a minimum-weight set of edges so that the remaining graph has at least k connected components. This problem is a natural generalization of the global minimum cut problem, where the goal is to break the graph into k=2 pieces. Our main result is a (combinatorial) k-CUT algorithm on simple graphs that runs in n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(1+o(1))k</sup> time for any constant k, improving upon the previously best n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(2ω/3+o(1))k</sup> time algorithm of Gupta et al. [FOCS'18] and the previously best n^ (1.981+o(1))k time combinatorial algorithm of Gupta et al. [STOC'19]. For combinatorial algorithms, this algorithm is optimal up to o(1) factors assuming recent hardness conjectures: we show by a straightforward reduction that k-CUT on even a simple graph is as hard as (k-1)-clique, establishing a lower bound of n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(1-o(1))k</sup> for k-CUT. This settles, up to lower-order factors, the complexity of k-CUT on a simple graph for combinatorial algorithms.
We study the parameterized complexity of approximating the k -Dominating Set (DomSet) problem where an integer k and a graph G on n vertices are given as input, and the …
We study the parameterized complexity of approximating the k -Dominating Set (DomSet) problem where an integer k and a graph G on n vertices are given as input, and the goal is to find a dominating set of size at most F ( k ) ⋅ k whenever the graph G has a dominating set of size k . When such an algorithm runs in time T ( k ) ⋅ poly ( n ) (i.e., FPT-time) for some computable function T , it is said to be an F ( k )- FPT-approximation algorithm for k -DomSet. Whether such an algorithm exists is listed in the seminal book of Downey and Fellows (2013) as one of the “most infamous” open problems in parameterized complexity. This work gives an almost complete answer to this question by showing the non-existence of such an algorithm under W[1] ≠ FPT and further providing tighter running time lower bounds under stronger hypotheses. Specifically, we prove the following for every computable functions T , F and every constant ε > 0: • Assuming W[1] ≠ FPT, there is no F ( k )- FPT-approximation algorithm for k -DomSet. • Assuming the Exponential Time Hypothesis (ETH), there is no F ( k )-approximation algorithm for k -DomSet that runs in T ( k ) ⋅ n o ( k ) time. • Assuming the Strong Exponential Time Hypothesis (SETH), for every integer k ≥ 2, there is no F ( k )-approximation algorithm for k -DomSet that runs in T ( k ) ⋅ n k − ε time. • Assuming the k -SUM Hypothesis, for every integer k ≥ 3, there is no F ( k )-approximation algorithm for k -DomSet that runs in T ( k ) ⋅ n ⌈ k /2 ⌉ − ε time. Previously, only constant ratio FPT-approximation algorithms were ruled out under sf W[1] ≠ FPT and (log 1/4 &minus ε k )-FPT-approximation algorithms were ruled out under ETH [Chen and Lin, FOCS 2016]. Recently, the non-existence of an F ( k )-FPT-approximation algorithm for any function F was shown under Gap-ETH [Chalermsook et al., FOCS 2017]. Note that, to the best of our knowledge, no running time lower bound of the form n &delta k for any absolute constant δ > 0 was known before even for any constant factor inapproximation ratio. Our results are obtained by establishing a connection between communication complexity and hardness of approximation, generalizing the ideas from a recent breakthrough work of Abboud et al. [FOCS 2017]. Specifically, we show that to prove hardness of approximation of a certain parameterized variant of the label cover problem, it suffices to devise a specific protocol for a communication problem that depends on which hypothesis we rely on. Each of these communication problems turns out to be either a well-studied problem or a variant of one; this allows us to easily apply known techniques to solve them.
We show how to bound the accuracy of a family of semi-definite programming relaxations for the problem of polynomial optimization on the hypersphere. Our method is inspired by a set …
We show how to bound the accuracy of a family of semi-definite programming relaxations for the problem of polynomial optimization on the hypersphere. Our method is inspired by a set of results from quantum information known as quantum de Finetti theorems. In particular, we prove a de Finetti theorem for a special class of real symmetric matrices to establish the existence of approximate representing measures for moment matrix relaxations.
We study variants of the classic $s$-$t$ cut problem and prove the following improved hardness results assuming the Unique Games Conjecture (UGC). - For any constant $k \geq 2$ and …
We study variants of the classic $s$-$t$ cut problem and prove the following improved hardness results assuming the Unique Games Conjecture (UGC). - For any constant $k \geq 2$ and $ε> 0$, we show that Directed Multicut with $k$ source-sink pairs is hard to approximate within a factor $k - ε$. This matches the trivial $k$-approximation algorithm. By a simple reduction, our result for $k = 2$ implies that Directed Multiway Cut with two terminals (also known as $s$-$t$ Bicut) is hard to approximate within a factor $2 - ε$, matching the trivial $2$-approximation algorithm. Previously, the best hardness factor for these problems (for constant $k$) was $1.5 - ε$ under the UGC. - For Length-Bounded Cut and Shortest Path Interdiction, we show that both problems are hard to approximate within any constant factor, even if we allow bicriteria approximation. If we want to cut vertices or the graph is directed, our hardness factor for Length-Bounded Cut matches the best approximation ratio up to a constant. Previously, the best hardness factor was $1.1377$ for Length-Bounded Cut and $2$ for Shortest Path Interdiction. - Assuming a variant of the UGC (implied by another variant of Bansal and Khot), we prove that it is hard to approximate Resource Minimization Fire Containment within any constant factor. Previously, the best hardness factor was $2$. Our results are based on a general method of converting an integrality gap instance to a length-control dictatorship test for variants of the $s$-$t$ cut problem, which may be useful for other problems.
We propose an analytical framework for studying parallel repetition, a basic product operation for one-round twoplayer games. In this framework, we consider a relaxation of the value of projection games. …
We propose an analytical framework for studying parallel repetition, a basic product operation for one-round twoplayer games. In this framework, we consider a relaxation of the value of projection games. We show that this relaxation is multiplicative with respect to parallel repetition and that it provides a good approximation to the game value. Based on this relaxation, we prove the following improved parallel repetition bound: For every projection game G with value at most ρ, the k-fold parallel repetition G⊗k has value at most
We study the problem of deleting the smallest set S of vertices (resp. edges) from a given graph G such that the induced subgraph (resp. subgraph) G\S belongs to some …
We study the problem of deleting the smallest set S of vertices (resp. edges) from a given graph G such that the induced subgraph (resp. subgraph) G\S belongs to some class ℋ. We consider the case where graphs in ℋ have treewidth bounded by t, and give a general framework to obtain approximation algorithms for both vertex and edge-deletion settings from approximation algorithms for certain natural graph partitioning problems called k-Subset Vertex Separator and k-Subset Edge Separator, respectively.For the vertex deletion setting, our framework combined with the current best result for k-Subset Vertex Separator, improves approximation ratios for basic problems such as k-Treewidth Vertex Deletion and Planar-ℱ Vertex Deletion. Our algorithms are simpler than previous works and give the first deterministic and uniform approximation algorithms under the natural parameterization.For the edge deletion setting, we give improved approximation algorithms for k-Subset Edge Separator combining ideas from LP relaxations and important separators. We present their applications in bounded-degree graphs, and also give an APX-hardness result for the edge deletion problems.
We prove improved inapproximability results for hypergraph coloring using the low-degree polynomial code (aka, the"short code" of Barak et. al. [FOCS 2012]) and the techniques proposed by Dinur and Guruswami …
We prove improved inapproximability results for hypergraph coloring using the low-degree polynomial code (aka, the"short code" of Barak et. al. [FOCS 2012]) and the techniques proposed by Dinur and Guruswami [FOCS 2013] to incorporate this code for inapproximability results.
We consider the matroid median problem [Krishnaswamy et al. 2011], wherein we are given a set of facilities with opening costs and a matroid on the facility-set, and clients with …
We consider the matroid median problem [Krishnaswamy et al. 2011], wherein we are given a set of facilities with opening costs and a matroid on the facility-set, and clients with demands and connection costs, and we seek to open an independent set of facilities and assign clients to open facilities so as to minimize the sum of the facility-opening and client-connection costs. We give a simple 8-approximation algorithm for this problem based on LP-rounding, which improves upon the 16-approximation in Krishnaswamy et al. [2011]. We illustrate the power and versatility of our techniques by deriving (a) an 8-approximation for the two-matroid median problem, a generalization of matroid median that we introduce involving two matroids; and (b) a 24-approximation algorithm for matroid median with penalties , which is a vast improvement over the 360-approximation obtained in Krishnaswamy et al. [2011]. We show that a variety of seemingly disparate facility-location problems considered in the literature—data placement problem, mobile facility location, k -median forest, metric uniform minimum-latency Uncapacitated Facility Location (UFL)—in fact reduce to the matroid median or two-matroid median problems, and thus obtain improved approximation guarantees for all these problems. Our techniques also yield an improvement for the knapsack median problem.
In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1 + є ≤ 7.081 + є)-approximation algorithm for k-median with …
In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1 + є ≤ 7.081 + є)-approximation algorithm for k-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen. For k-means with outliers, we give an (α2+є ≤ 53.002 + є)-approximation, which is the first O(1)-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give α1- and (α1 + є)-approximation algorithms for matroid and knapsack median problems respectively, improving upon the previous best approximations ratios of 8 due to Swamy and 17.46 due to Byrka et al. The natural LP relaxation for the k-median/k-means with outliers problem has an unbounded integrality gap. In spite of this negative result, our iterative rounding framework shows that we can round an LP solution to an almost-integral solution of small cost, in which we have at most two fractionally open facilities. Thus, the LP integrality gap arises due to the gap between almost-integral and fully-integral solutions. Then, using a pre-processing procedure, we show how to convert an almost-integral solution to a fully-integral solution losing only a constant-factor in the approximation ratio. By further using a sparsification technique, the additive factor loss incurred by the conversion can be reduced to any є > 0.
We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of …
We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O(√n) space (n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 60,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-time estimate of the transitivity/number of triangles of a graph, by storing a miniscule fraction of edges.
One of the key results in Robertson and Seymour’s seminal work on graph minors is the grid-minor theorem (also called the excluded grid theorem ). The theorem states that for …
One of the key results in Robertson and Seymour’s seminal work on graph minors is the grid-minor theorem (also called the excluded grid theorem ). The theorem states that for every grid H , every graph whose treewidth is large enough relative to | V ( H )| contains H as a minor. This theorem has found many applications in graph theory and algorithms. Let f ( k ) denote the largest value such that every graph of treewidth k contains a grid minor of size ( f ( k ) × f ( k )). The best previous quantitative bound, due to recent work of Kawarabayashi and Kobayashi, and Leaf and Seymour, shows that f ( k )=Ω(√log k /log log k ). In contrast, the best known upper bound implies that f ( k ) = O (√ k /log k ). In this article, we obtain the first polynomial relationship between treewidth and grid minor size by showing that f ( k ) = Ω( k δ ) for some fixed constant δ > 0, and describe a randomized algorithm, whose running time is polynomial in | V ( G )| and k , that with high probability finds a model of such a grid minor in G .
In the minimum Multicut problem, the input is an edge-weighted supply graph G = (V, E) and a demand graph H = (V, F). Either G and H are directed …
In the minimum Multicut problem, the input is an edge-weighted supply graph G = (V, E) and a demand graph H = (V, F). Either G and H are directed (Dir-MulC) or both are undirected (Undir-MulC). The goal is to remove a minimum weight set of supply edges E' ⊆ E such that in G − E' there is no path from s to t for any demand edge (s, t) ∈ F. Undir-MulC admits O(log k)-approximation where k is the number of edges in H while the best known approximation for Dir-MulC is min{k, O(|V|11/23)}. These approximations are obtained by proving corresponding results on the multicommodity flow-cut gap. In this paper we consider the role that the structure of the demand graph plays in determining the approximability of Multicut. We obtain several new positive and negative results.In undirected graphs our main result is a 2-approximation in nO(t) time when the demand graph excludes an induced matching of size t. This gives a constant factor approximation for a specific demand graph that motivated this work, and is based on a reduction to uniform metric labeling and not via the flow-cut gap.In contrast to the positive result for undirected graphs, we prove that in directed graphs such approximation algorithms can not exist. We prove that, assuming the Unique Games Conjecture (UGC), that for a large class of fixed demand graphs Dir-MulC cannot be approximated to a factor better than the worst-case flow-cut gap. As a consequence we prove that for any fixed k, assuming UGC, Dir-MulC with k demand pairs is hard to approximate to within a factor better than k. On the positive side, we obtain a k approximation when the demand graph excludes certain graphs as an induced subgraph. This generalizes the known 2 approximation for directed Multiway Cut to a larger class of demand graphs.
Recently, considerable efforts have been devoted to approximately computing the global and local (i.e., incident to each node) triangle counts of a large graph stream represented as a sequence of …
Recently, considerable efforts have been devoted to approximately computing the global and local (i.e., incident to each node) triangle counts of a large graph stream represented as a sequence of edges. Existing approximate triangle counting algorithms rely on sampling techniques to reduce the computational cost. However, their estimation errors are significantly determined by the covariance between sampled triangles. Moreover, little attention has been paid to developing parallel one-pass streaming algorithms that can be used to fast and approximately count triangles on a multi-core machine or a cluster of machines. To solve these problems, we develop a novel parallel method REPT to significantly reduce the covariance (even completely eliminate the covariance for some cases) between sampled triangles. We theoretically prove that REPT is more accurate than parallelizing existing triangle count estimation algorithms in a direct manner. In addition, we also conduct extensive experiments on a variety of real-world graphs, and the results demonstrate that our method REPT is several times more accurate than state-of-the-art methods.
Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help …
Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.
Random constraint satisfaction problems (CSPs) are known to exhibit threshold phenomena: given a uniformly random instance of a CSP with n variables and m clauses, there is a value of …
Random constraint satisfaction problems (CSPs) are known to exhibit threshold phenomena: given a uniformly random instance of a CSP with n variables and m clauses, there is a value of m = Ω(n) beyond which the CSP will be unsatisfiable with high probability. Strong refutation is the problem of certifying that no variable assignment satisfies more than a constant fraction of clauses; this is the natural algorithmic problem in the unsatisfiable regime (when m/n = ω(1)).
We design a linear time approximation scheme for the Gale-Berlekamp Switching Game and generalize it to a wider class of dense fragile minimization problems including the Nearest Codeword Problem (NCP) …
We design a linear time approximation scheme for the Gale-Berlekamp Switching Game and generalize it to a wider class of dense fragile minimization problems including the Nearest Codeword Problem (NCP) and Unique Games Problem. Further applications include, among other things, finding a constrained form of matrix rigidity and maximum likelihood decoding of an error correcting code. As another application of our method we give the first linear time approximation schemes for correlation clustering with a fixed number of clusters and its hierarchical generalization. Our results depend on a new technique for dealing with small objective function values of optimization problems and could be of independent interest.
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social …
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempting to interpret these sets as "real" communities, we employ approximation algorithms for the graph-partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the _network community profile plot_, which characterizes the "best" possible community—according to the conductance measure—over a wide range of size scales. We study over one hundred large real-world networks, ranging from traditional and online social networks, to technological and information networks and web graphs, and ranging in size from thousands up to tens of millions of nodes. Our results suggest a significantly more refined picture of community structure in large networks than has been appreciated previously. Our observations agree with previous work on small networks, but we show that large networks have a very different structure. In particular, we observe tight communities that are barely connected to the rest of the network at very small size scales (up to ≈ 100 nodes); and communities of size scale beyond ≈ 100 nodes gradually "blend into" the expander-like core of the network and thus become less "community-like," with a roughly inverse relationship between community size and optimal community quality. This observation agrees well with the so-called Dunbar number, which gives a limit to the size of a well-functioning community. However, this behavior is not explained, even at a qualitative level, by any of the commonly used network-generation models. Moreover, it is exactly the opposite of what one would expect based on intuition from expander graphs, low-dimensional or manifold-like graphs, and from small social networks that have served as test beds of community-detection algorithms. The relatively gradual increase of the network community profile plot as a function of increasing community size depends in a subtle manner on the way in which local clustering information is propagated from smaller to larger size scales in the network. We have found that a generative graph model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community profile plot similar to what we observe in our network data sets.
In order to obtain the best-known guarantees, algorithms are traditionally tailored to the particular problem we want to solve. Two recent developments, the Unique Games Conjecture (UGC) and the Sum-of-Squares …
In order to obtain the best-known guarantees, algorithms are traditionally tailored to the particular problem we want to solve. Two recent developments, the Unique Games Conjecture (UGC) and the Sum-of-Squares (SOS) method, surprisingly suggest that this tailoring is not necessary and that a single efficient algorithm could achieve best possible guarantees for a wide range of different problems. The Unique Games Conjecture (UGC) is a tantalizing conjecture in computational complexity, which, if true, will shed light on the complexity of a great many problems. In particular this conjecture predicts that a single concrete algorithm provides optimal guarantees among all efficient algorithms for a large class of computational problems. The Sum-of-Squares (SOS) method is a general approach for solving systems of polynomial constraints. This approach is studied in several scientific disciplines, including real algebraic geometry, proof complexity, control theory, and mathematical programming, and has found applications in fields as diverse as quantum information theory, formal verification, game theory and many others. We survey some connections that were recently uncovered between the Unique Games Conjecture and the Sum-of-Squares method. In particular, we discuss new tools to rigorously bound the running time of the SOS method for obtaining approximate solutions to hard optimization problems, and how these tools give the potential for the sum-of-squares method to provide new guarantees for many problems of interest, and possibly to even refute the UGC.
Minimizing the discrepancy of a set system is a fundamental problem in combinatorics. One of the cornerstones in this area is the celebrated six standard deviations result of Spencer (AMS …
Minimizing the discrepancy of a set system is a fundamental problem in combinatorics. One of the cornerstones in this area is the celebrated six standard deviations result of Spencer (AMS 1985): In any system of n sets in a universe of size n, there always exists a coloring which achieves discrepancy 6√n. The original proof of Spencer was existential in nature, and did not give an efficient algorithm to find such a coloring. Recently, a breakthrough work of Bansal (FOCS 2010) gave an efficient algorithm which finds such a coloring. His algorithm was based on an SDP relaxation of the discrepancy problem and a clever rounding procedure. In this work we give a new randomized algorithm to find a coloring as in Spencer's result based on a restricted random walk we call Edge-Walk. Our algorithm and its analysis use only basic linear algebra and is "truly" constructive in that it does not appeal to the existential arguments, giving a new proof of Spencer's theorem and the partial coloring lemma.
We consider the minimum k-way cut problem for unweighted undirected graphs with a size bound s on the number of cut edges allowed. Thus we seek to remove as few …
We consider the minimum k-way cut problem for unweighted undirected graphs with a size bound s on the number of cut edges allowed. Thus we seek to remove as few edges as possible so as to split a graph into k components, or report that this requires cutting more than s edges. We show that this problem is fixed-parameter tractable (FPT) with the standard parameterization in terms of the solution size s. More precisely, for s=O(1), we present a quadratic time algorithm. Moreover, we present a much easier linear time algorithm for planar graphs and bounded genus graphs. Our tractability result stands in contrast to known W[1] hardness of related problems. Without the size bound, Downey et al. [2003] proved that the minimum k-way cut problem is W[1] hard with parameter k, and this is even for simple unweighted graphs. Downey et al. asked about the status for planar graphs. We get linear time with fixed parameter k for simple planar graphs since the minimum k-way cut of a planar graph is of size at most 6k. More generally, we get FPT with parameter k for any graph class with bounded average degree. A simple reduction shows that vertex cuts are at least as hard as edge cuts, so the minimum k-way vertex cut is also W[1] hard with parameter k. Marx [2004] proved that finding a minimum k-way vertex cut of size s is also W[1] hard with parameter s. Marx asked about the FPT status with edge cuts, which we prove tractable here. We are not aware of any other cut problem where the vertex version is W[1] hard but the edge version is FPT, e.g., Marx [2004] proved that the k-terminal cut problem is FPT parameterized by the cut size, both for edge and vertex cuts.
We show a new way to round vector solutions of semidefinite programming (SDP) hierarchies into integral solutions, based on a connection between these hierarchies and the spectrum of the in- …
We show a new way to round vector solutions of semidefinite programming (SDP) hierarchies into integral solutions, based on a connection between these hierarchies and the spectrum of the in- put graph. We demonstrate the utility of our method by providing a new SDP-hierarchy based algorithm for constraint satisfaction problems with 2-variable constraints (2-CSP's). More concretely, we show for every 2-CSP instance 3, a rounding algorithm for r rounds of the Lasserre SDP hierarchy for 3 that obtains an integral solution which is at most ε worse than the relaxation's value (normalized to lie in [0, 1]), as long as r >; k·rank <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">≥θ</sub> (3)/ poly(ε), where k is the alphabet size of J, θ = poly(ε/k), and rank <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">≥θ</sub> (J) denotes the number of eigenvalues larger than θ in the normalized adjacency matrix of the constraint graph of J. In the case that J is a Unique Games instance, the threshold θ is only a polynomial in ε, and is independent of the alphabet size. Also in this case, we can give a non-trivial bound on the number of rounds for every instance. In particular our result yields an SDP-hierarchy based algorithm that matches the performance of the recent subexponential algorithm of Arora, Barak and Steurer (FOCS 2010) in the worst case, but runs faster on a natural family of instances, thus further restricting the set of possible hard instances for Khot's Unique Games Conjecture. Our algorithm actually requires less than the nol <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(r)</sup> constraints specified by the r <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> level of the Lasserre hierarchy, and in some cases r rounds of our program can be evaluated in time 2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(r)</sup> poly(n).
We consider the problem of coloring k -colorable graphs with the fewest possible colors. We present a randomized polynomial time algorithm that colors a 3-colorable graph on n vertices with …
We consider the problem of coloring k -colorable graphs with the fewest possible colors. We present a randomized polynomial time algorithm that colors a 3-colorable graph on n vertices with min{ O (Δ 1/3 log 1/2 Δ log n ), O ( n 1/4 log 1/2 n )} colors where Δ is the maximum degree of any vertex. Besides giving the best known approximation ratio in terms of n , this marks the first nontrivial approximation result as a function of the maximum degree Δ. This result can be generalized to k -colorable graphs to obtain a coloring using min{ O (Δ 1-2/ k log 1/2 Δ log n ), O ( n 1−3/( k +1) log 1/2 n )} colors. Our results are inspired by the recent work of Goemans and Williamson who used an algorithm for semidefinite optimization problems , which generalize linear programs, to obtain improved approximations for the MAX CUT and MAX 2-SAT problems. An intriguing outcome of our work is a duality relationship established between the value of the optimum solution to our semidefinite program and the Lovász θ-function. We show lower bounds on the gap between the optimum solution of our semidefinite program and the actual chromatic number; by duality this also demonstrates interesting new facts about the θ-function.
We give new rounding schemes for the standard linear programming relaxation of the correlation clustering problem, achieving approximation factors almost matching the integrality gaps: For complete graphs our approximation is …
We give new rounding schemes for the standard linear programming relaxation of the correlation clustering problem, achieving approximation factors almost matching the integrality gaps: For complete graphs our approximation is 2.06 - ε, which almost matches the previously known integrality gap of 2. For complete k-partite graphs our approximation is 3. We also show a matching integrality gap. For complete graphs with edge weights satisfying triangle inequalities and probability constraints, our approximation is 1.5, and we show an integrality gap of 1.2.
We give a O (√log n )-approximation algorithm for the sparsest cut, edge expansion, balanced separator, and graph conductance problems. This improves the O (log n )-approximation of Leighton and …
We give a O (√log n )-approximation algorithm for the sparsest cut, edge expansion, balanced separator, and graph conductance problems. This improves the O (log n )-approximation of Leighton and Rao (1988). We use a well-known semidefinite relaxation with triangle inequality constraints. Central to our analysis is a geometric theorem about projections of point sets in R d , whose proof makes essential use of a phenomenon called measure concentration. We also describe an interesting and natural “approximate certificate” for a graph's expansion, which involves embedding an n -node expander in it with appropriate dilation and congestion. We call this an expander flow.