Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression

Xiangrui Meng, Michael W. Mahoney

Type: Article

Publication Date: 2013-05-28

Citations: 209

DOI: https://doi.org/10.1145/2488608.2488621

Abstract

Low-distortion embeddings are critical building blocks for developing random sampling and random projection algorithms for common linear algebra problems. We show that, given a matrix A ∈ Rn x d with n >> d and a p ∈ [1, 2), with a constant probability, we can construct a low-distortion embedding matrix Π ∈ RO(poly(d)) x n that embeds Ap, the lp subspace spanned by A's columns, into (RO(poly(d)), |~cdot~|p); the distortion of our embeddings is only O(poly(d)), and we can compute Π A in O(nnz(A)) time, i.e., input-sparsity time. Our result generalizes the input-sparsity time l2 subspace embedding by Clarkson and Woodruff [STOC'13]; and for completeness, we present a simpler and improved analysis of their construction for l2. These input-sparsity time lp embeddings are optimal, up to constants, in terms of their running time; and the improved running time propagates to applications such as (1 pm ε)-distortion lp subspace embedding and relative-error lp regression. For l2, we show that a (1+ε)-approximate solution to the l2 regression problem specified by the matrix A and a vector b ∈ Rn can be computed in O(nnz(A) + d3 log(d/ε) /ε^2) time; and for lp, via a subspace-preserving sampling procedure, we show that a (1 pm ε)-distortion embedding of Ap into RO(poly(d)) can be computed in O(nnz(A) ⋅ log n) time, and we also show that a (1+ε)-approximate solution to the lp regression problem minx ∈ Rd |A x - b|p can be computed in O(nnz(A) ⋅ log n + poly(d) log(1/ε)/ε2) time. Moreover, we can also improve the embedding dimension or equivalently the sample size to O(d3+p/2 log(1/ε) / ε2) without increasing the complexity.

Locations

arXiv (Cornell University) - View - PDF
CiteSeer X (The Pennsylvania State University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression	2012	Xiangrui Meng Michael W. Mahoney
+ PDF Chat	Low-Rank Approximation and Regression in Input Sparsity Time	2017	Kenneth L. Clarkson David P. Woodruff
+	Low Rank Approximation and Regression in Input Sparsity Time	2012	Kenneth L. Clarkson David P. Woodruff
+	Low Rank Approximation and Regression in Input Sparsity Time	2012	Kenneth L. Clarkson David P. Woodruff
+ PDF Chat	Low rank approximation and regression in input sparsity time	2013	Kenneth L. Clarkson David P. Woodruff
+	Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods	2006	Petros Drineas Michael W. Mahoney S. Muthukrishnan
+	Subspace Embeddings and $\ell_p$-Regression Using Exponential Random Variables	2013	David P. Woodruff Qin Zhang
+	Subspace Embeddings and $\ell_p$-Regression Using Exponential Random Variables	2013	David P. Woodruff Qin Zhang
+	Subspace Embedding and Linear Regression with Orlicz Norm	2018	Alexandr Andoni Chengyu Lin Ying Sheng Peilin Zhong Ruiqi Zhong
+	Linear Dimensionality Reduction in Linear Time: Johnson-Lindenstrauss-type Guarantees for Random Subspace	2017	Nick Lim Robert J. Durrant
+	Ridge Leverage Scores for Low-Rank Approximation	2015	Michael B. Cohen Cameron Musco Christopher Musco
+	Learning-Based Low-Rank Approximations	2019	Piotr Indyk Ali Vakilian Yuan Yang
+	Learning-Based Low-Rank Approximations	2019	Piotr Indyk Ali Vakilian Yuan Yang
+ PDF Chat	Tight Bounds for the Subspace Sketch Problem with Applications	2019	Yi Li Ruosong Wang David P. Woodruff
+	Precise expressions for random projections: Low-rank approximation and randomized Newton	2020	Michał Dereziński Feynman Liang Zhenyu Liao Michael W. Mahoney
+	Precise expressions for random projections: Low-rank approximation and randomized Newton	2020	Michał Dereziński Feynman Liang Zhenyu Liao Michael W. Mahoney
+	Input Sparsity and Hardness for Robust Subspace Approximation	2015	Kenneth L. Clarkson David P. Woodruff
+	Input Sparsity and Hardness for Robust Subspace Approximation	2015	Kenneth L. Clarkson David P. Woodruff
+	Input Sparsity Time Low-Rank Approximation via Ridge Leverage Score Sampling	2015	Michael B. Cohen Cameron Musco Christopher Musco
+	Fast Regression with an $ell_infty$ Guarantee	2017	Eric Price Zhao Song David P. Woodruff

Works That Cite This (157)

Action	Title	Year	Authors
+ PDF Chat	No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks	2021	Murad Tukan Alaa Maalouf Matan Weksler Dan Feldman
+	Input-Sparsity Low Rank Approximation in Schatten Norm	2020	Yi Li David P. Woodruff
+	Nearly Linear Row Sampling Algorithm for Quantile Regression	2020	Yi Li Ruosong Wang Lin F. Yang Hanrui Zhang
+ PDF Chat	New constructions of RIP matrices with fast multiplication and fewer rows	2013	Jelani Nelson Eric Price Mary Wootters
+	An Efficient Algorithm for Computing the Approximate t-URV and its Applications	2022	Maolin Che Yimin Wei
+	Matrix Sketching for Secure Collaborative Machine Learning	2019	Mengjiao Zhang Shusen Wang
+ PDF Chat	Estimating Leverage Scores via Rank Revealing Methods and Randomization	2021	Aleksandros Sobczyk Efstratios Gallopoulos
+ PDF Chat	Fast Graph Neural Tangent Kernel via Kronecker Sketching	2022	Shunhua Jiang Yunze Man Zhao Song Zheng Yu Danyang Zhuo
+	Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition	2015	Cameron Musco Christopher Musco
+	A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares	2014	Garvesh Raskutti Michael W. Mahoney

Works Cited by This (20)

Action	Title	Year	Authors
+	Embedding the diamond graph in $L_p$ and dimension reduction in $L_1$	2004	J. R. Lee Assaf Naor
+	Local Operator Theory, Random Matrices and Banach Spaces	2001	Kenneth R. Davidson Stanisław J. Szarek
+	A bound on the deviation probability for sums of non-negative random variables.	2003	Andreas Maurer
+ PDF Chat	Randomized Algorithms for Matrices and Data	2012	Michael W. Mahoney
+	The L1-norm best-fit hyperplane problem	2012	J. Paul Brooks José H. Dulá
+ PDF Chat	Faster least squares approximation	2010	Petros Drineas Michael W. Mahoney S. Muthukrishnan Tamás Sarlós
+ PDF Chat	Blendenpik: Supercharging LAPACK's Least-Squares Solver	2010	Haim Avron Petar Maymounkov Sivan Toledo
+	Sampling Algorithms and Coresets for $\ell_p$ Regression	2009	Anirban Dasgupta Petros Drineas Boulos Harb Ravi Kumar Michael W. Mahoney
+ PDF Chat	Approximation of zonoids by zonotopes	1989	Jean Bourgain Joram Lindenstrauss Vitali Milman
+	Subgradient and sampling algorithms for l1 regression	2005	Kenneth L. Clarkson