Engineering â€ș Computational Mechanics

Sparse and Compressive Sensing Techniques

Description

This cluster of papers focuses on the theory and applications of compressed sensing, including sparse representation, signal recovery, convex optimization, matrix completion, dictionary learning, orthogonal matching pursuit, robust reconstruction, and sparsity in signal processing.

Keywords

Compressed Sensing; Sparse Representation; Signal Recovery; Convex Optimization; Matrix Completion; Sparse Approximation; Dictionary Learning; Orthogonal Matching Pursuit; Robust Reconstruction; Sparsity in Signal Processing

A full-rank matrix ${\bf A}\in \mathbb{R}^{n\times m}$ with $n<m$ generates an underdetermined system of linear equations ${\bf Ax} = {\bf b}$ having infinitely many solutions. Suppose we seek the sparsest 
 A full-rank matrix ${\bf A}\in \mathbb{R}^{n\times m}$ with $n<m$ generates an underdetermined system of linear equations ${\bf Ax} = {\bf b}$ having infinitely many solutions. Suppose we seek the sparsest solution, i.e., the one with the fewest nonzero entries. Can it ever be unique? If so, when? As optimization of sparsity is combinatorial in nature, are there efficient methods for finding the sparsest solution? These questions have been answered positively and constructively in recent years, exposing a wide variety of surprising phenomena, in particular the existence of easily verifiable conditions under which optimally sparse solutions can be found by concrete, effective computational methods. Such theoretical results inspire a bold perspective on some important practical problems in signal and image processing. Several well-known signal and image processing problems can be cast as demanding solutions of undetermined systems of equations. Such problems have previously seemed, to many, intractable, but there is considerable evidence that these problems often have sparse solutions. Hence, advances in finding sparse solutions to underdetermined systems have energized research on such signal and image processing problems—to striking effect. In this paper we review the theoretical results on sparse solutions of linear systems, empirical results on sparse modeling of signals and images, and recent applications in inverse problems and compression in image processing. This work lies at the intersection of signal processing and applied mathematics, and arose initially from the wavelets and harmonic analysis research communities. The aim of this paper is to introduce a few key notions and applications connected to sparsity, targeting newcomers interested in either the mathematical aspects of this area or its applications.
Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional 
 Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.
In this article, we have provided general, comprehensive coverage of the SDR technique, from its practical deployments and scope of applicability to key theoretical results. We have also showcased several 
 In this article, we have provided general, comprehensive coverage of the SDR technique, from its practical deployments and scope of applicability to key theoretical results. We have also showcased several representative applications, namely MIMO detection, B¿ shimming in MRI, and sensor network localization. Another important application, namely downlink transmit beamforming, is described in [1]. Due to space limitations, we are unable to cover many other beautiful applications of the SDR technique, although we have done our best to illustrate the key intuitive ideas that resulted in those applications. We hope that this introductory article will serve as a good starting point for readers who would like to apply the SDR technique to their applications, and to locate specific references either in applications or theory.
In this work we address the subspace recovery problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to segment the 
 In this work we address the subspace recovery problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to segment the samples into their respective subspaces and correct the possible errors as well. To this end, we propose a novel method termed Low-Rank Representation (LRR), which seeks the lowest-rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that LRR well solves the subspace recovery problem: when the data is clean, we prove that LRR exactly captures the true subspace structures; for the data contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for the data corrupted by arbitrary errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace segmentation and error correction, in an efficient way.
Sparse coding---that is, modelling data vectors as sparse linear combinations of basis elements---is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on learning the basis 
 Sparse coding---that is, modelling data vectors as sparse linear combinations of basis elements---is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on learning the basis set, also called dictionary, to adapt it to specific data, an approach that has recently proven to be very effective for signal reconstruction and classification in the audio and image processing domains. This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples. A proof of convergence is presented, along with experiments with natural images demonstrating that it leads to faster performance and better dictionaries than classical batch algorithms for both small and large datasets.
We show that various inverse problems in signal recovery can be formulated as the generic problem of minimizing the sum of two convex functions with certain regularity properties. This formulation 
 We show that various inverse problems in signal recovery can be formulated as the generic problem of minimizing the sum of two convex functions with certain regularity properties. This formulation makes it possible to derive existence, uniqueness, characterization, and stability results in a unified and standardized fashion for a large class of apparently disparate problems. Recent results on monotone operator splitting methods are applied to establish the convergence of a forward-backward algorithm to solve the generic problem. In turn, we recover, extend, and provide a simplified analysis for a variety of existing iterative methods. Applications to geometry/texture image decomposition schemes are also discussed. A novelty of our framework is to use extensively the notion of a proximity operator, which was introduced by Moreau in the 1960s.
Abstract We consider linear equations y = Ί x where y is a given vector in ℝ n and Ί is a given n × m matrix with n &lt; 
 Abstract We consider linear equations y = Ί x where y is a given vector in ℝ n and Ί is a given n × m matrix with n &lt; m ≀ τ n , and we wish to solve for x ∈ ℝ m . We suppose that the columns of Ί are normalized to the unit 𝓁 2 ‐norm, and we place uniform measure on such Ί. We prove the existence of ρ = ρ(τ) &gt; 0 so that for large n and for all Ί's except a negligible fraction, the following property holds: For every y having a representation y = Ί x 0 by a coefficient vector x 0 ∈ ℝ m with fewer than ρ · n nonzeros, the solution x 1 of the 𝓁 1 ‐minimization problem is unique and equal to x 0 . In contrast, heuristic attempts to sparsely solve such systems—greedy algorithms and thresholding—perform poorly in this challenging setting. The techniques include the use of random proportional embeddings and almost‐spherical sections in Banach space theory, and deviation bounds for the eigenvalues of random Wishart matrices. © 2006 Wiley Periodicals, Inc.
The data of interest are assumed to be represented as N-dimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed 
 The data of interest are assumed to be represented as N-dimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M Lt N of basis-function coefficients associated with B. Compressive sensing is a framework whereby one does not measure one of the aforementioned N-dimensional signals directly, but rather a set of related measurements, with the new measurements a linear combination of the original underlying N-dimensional signal. The number of required compressive-sensing measurements is typically much smaller than N, offering the potential to simplify the sensing system. Let f denote the unknown underlying N-dimensional signal, and g a vector of compressive-sensing measurements, then one may approximate f accurately by utilizing knowledge of the (under-determined) linear relationship between f and g, in addition to knowledge of the fact that f is compressible in B. In this paper we employ a Bayesian formalism for estimating the underlying signal f based on compressive-sensing measurements g. The proposed framework has the following properties: i) in addition to estimating the underlying signal f, "error bars" are also estimated, these giving a measure of confidence in the inverted signal; ii) using knowledge of the error bars, a principled means is provided for determining when a sufficient number of compressive-sensing measurements have been performed; iii) this setting lends itself naturally to a framework whereby the compressive sensing measurements are optimized adaptively and hence not determined randomly; and iv) the framework accounts for additive noise in the compressive-sensing measurements and provides an estimate of the noise variance. In this paper we present the underlying theory, an associated algorithm, example results, and provide comparisons to other compressive-sensing inversion algorithms in the literature.
Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in 
 Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in important large-scale applications. Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately known fast algorithms offer substantially worse sparsity-undersampling tradeoffs than convex optimization. We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models. Our empirical measurements of the sparsity-undersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity-undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.
Overcomplete representations are attracting interest in signal processing theory, particularly due to their potential to generate sparse representations of signals. However, in general, the problem of finding sparse representations must 
 Overcomplete representations are attracting interest in signal processing theory, particularly due to their potential to generate sparse representations of signals. However, in general, the problem of finding sparse representations must be unstable in the presence of noise. This paper establishes the possibility of stable recovery under a combination of sufficient sparsity and favorable structure of the overcomplete system. Considering an ideal underlying signal that has a sufficiently sparse representation, it is assumed that only a noisy version of it can be observed. Assuming further that the overcomplete system is incoherent, it is shown that the optimally sparse approximation to the noisy data differs from the optimally sparse decomposition of the ideal noiseless signal by at most a constant multiple of the noise level. As this optimal-sparsity method requires heavy (combinatorial) computational effort, approximation algorithms are considered. It is shown that similar stability is also available using the basis and the matching pursuit algorithms. Furthermore, it is shown that these methods result in sparse approximation of the noisy data that contains only terms also appearing in the unique sparsest representation of the ideal noiseless sparse signal.
We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an extension of 
 We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an extension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for solving large-scale problems even with dense matrix data. However, such methods are also known to converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising numerical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is shown to be faster than ISTA by several orders of magnitude.
We present a source localization method based on a sparse representation of sensor measurements with an overcomplete basis composed of samples from the array manifold. We enforce sparsity by imposing 
 We present a source localization method based on a sparse representation of sensor measurements with an overcomplete basis composed of samples from the array manifold. We enforce sparsity by imposing penalties based on the /spl lscr//sub 1/-norm. A number of recent theoretical results on sparsifying properties of /spl lscr//sub 1/ penalties justify this choice. Explicitly enforcing the sparsity of the representation is motivated by a desire to obtain a sharp estimate of the spatial spectrum that exhibits super-resolution. We propose to use the singular value decomposition (SVD) of the data matrix to summarize multiple time or frequency samples. Our formulation leads to an optimization problem, which we solve efficiently in a second-order cone (SOC) programming framework by an interior point implementation. We propose a grid refinement method to mitigate the effects of limiting estimates to a grid of spatial locations and introduce an automatic selection criterion for the regularization parameter involved in our approach. We demonstrate the effectiveness of the method on simulated data by plots of spatial spectra and by comparing the estimator variance to the Crame/spl acute/r-Rao bound (CRB). We observe that our approach has a number of advantages over other source localization techniques, including increased resolution, improved robustness to noise, limitations in data quantity, and correlation of the sources, as well as not requiring an accurate initialization.
This paper introduces a novel algorithm to approximate the matrix with minimum nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood as the 
 This paper introduces a novel algorithm to approximate the matrix with minimum nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood as the convex relaxation of a rank minimization problem and arises in many important applications as in the task of recovering a large matrix from a small subset of its entries (the famous Netflix problem). Off-the-shelf algorithms such as interior point methods are not directly amenable to large problems of this kind with over a million unknown entries. This paper develops a simple first-order and easy-to-implement algorithm that is extremely efficient at addressing problems in which the optimal solution has low rank. The algorithm is iterative, produces a sequence of matrices $\{\boldsymbol{X}^k,\boldsymbol{Y}^k\}$, and at each step mainly performs a soft-thresholding operation on the singular values of the matrix $\boldsymbol{Y}^k$. There are two remarkable features making this attractive for low-rank matrix completion problems. The first is that the soft-thresholding operation is applied to a sparse matrix; the second is that the rank of the iterates $\{\boldsymbol{X}^k\}$ is empirically nondecreasing. Both these facts allow the algorithm to make use of very minimal storage space and keep the computational cost of each iteration low. On the theoretical side, we provide a convergence analysis showing that the sequence of iterates converges. On the practical side, we provide numerical examples in which $1,000\times1,000$ matrices are recovered in less than a minute on a modest desktop computer. We also demonstrate that our approach is amenable to very large scale problems by recovering matrices of rank about 10 with nearly a billion unknowns from just about 0.4% of their sampled entries. Our methods are connected with the recent literature on linearized Bregman iterations for $\ell_1$ minimization, and we develop a framework in which one can understand these algorithms in terms of well-known Lagrange multiplier algorithms.
This lecture note presents a new method to capture and represent compressible signals at a rate significantly below the Nyquist rate. This method, called compressive sensing, employs nonadaptive linear projections 
 This lecture note presents a new method to capture and represent compressible signals at a rate significantly below the Nyquist rate. This method, called compressive sensing, employs nonadaptive linear projections that preserve the structure of the signal; the signal is then reconstructed from these projections using an optimization process.
Many problems in signal processing and statistical inference involve finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. A standard approach consists in minimizing an objective function which 
 Many problems in signal processing and statistical inference involve finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ) error term combined with a sparseness-inducing regularization term. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), wavelet-based deconvolution, and compressed sensing are a few well-known examples of this approach. This paper proposes gradient projection (GP) algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems. We test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method. Computational experiments show that these GP approaches perform well in a wide range of applications, often being significantly faster (in terms of computation time) than competing methods. Although the performance of GP methods tends to degrade as the regularization term is de-emphasized, we show how they can be embedded in a continuation scheme to recover their efficient practical performance.
Sparse coding--that is, modelling data vectors as sparse linear combinations of basis elements--is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix 
 Sparse coding--that is, modelling data vectors as sparse linear combinations of basis elements--is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large data sets.
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends 
 Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an $m \times n$ matrix. (i) For a dense input matrix, randomized algorithms require $\bigO(mn \log(k))$ floating-point operations (flops) in contrast to $ \bigO(mnk)$ for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to $\bigO(k)$ passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of 
 The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NP-hard. In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability. The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this pre-existing concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization.
Conventional approaches to sampling signals or images follow Shannon's theorem: the sampling rate must be at least twice the maximum frequency present in the signal (Nyquist rate). In the field 
 Conventional approaches to sampling signals or images follow Shannon's theorem: the sampling rate must be at least twice the maximum frequency present in the signal (Nyquist rate). In the field of data conversion, standard analog-to-digital converter (ADC) technology implements the usual quantized Shannon representation - the signal is uniformly sampled at or above the Nyquist rate. This article surveys the theory of compressive sampling, also known as compressed sensing or CS, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition. CS theory asserts that one can recover certain signals and images from far fewer samples or measurements than traditional methods use.
The authors present a new approach to building simpler, smaller, and cheaper digital cameras that can operate efficiently across a broader spectral range than conventional silicon-based cameras. The approach fuses 
 The authors present a new approach to building simpler, smaller, and cheaper digital cameras that can operate efficiently across a broader spectral range than conventional silicon-based cameras. The approach fuses a new camera architecture based on a digital micromirror device with the new mathematical theory and algorithms of compressive sampling.
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> This paper demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with <emphasis><formula formulatype="inline"><tex>$m$</tex></formula></emphasis> nonzero entries in dimension 
 <para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> This paper demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with <emphasis><formula formulatype="inline"><tex>$m$</tex></formula></emphasis> nonzero entries in dimension <emphasis><formula formulatype="inline"><tex>$d$</tex> </formula></emphasis> given <emphasis><formula formulatype="inline"><tex>$ {\rm O}(m \ln d)$</tex></formula></emphasis> random linear measurements of that signal. This is a massive improvement over previous results, which require <emphasis><formula formulatype="inline"><tex>${\rm O}(m^{2})$</tex></formula></emphasis> measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems. </para>
This paper considers a natural error correcting problem with real valued input/output. We wish to recover an input vector f/spl isin/R/sup n/ from corrupted measurements y=Af+e. Here, A is an 
 This paper considers a natural error correcting problem with real valued input/output. We wish to recover an input vector f/spl isin/R/sup n/ from corrupted measurements y=Af+e. Here, A is an m by n (coding) matrix and e is an arbitrary and unknown vector of errors. Is it possible to recover f exactly from the data y? We prove that under suitable conditions on the coding matrix A, the input f is the unique solution to the /spl lscr//sub 1/-minimization problem (/spl par/x/spl par//sub /spl lscr/1/:=/spl Sigma//sub i/|x/sub i/|) min(g/spl isin/R/sup n/) /spl par/y - Ag/spl par//sub /spl lscr/1/ provided that the support of the vector of errors is not too large, /spl par/e/spl par//sub /spl lscr/0/:=|{i:e/sub i/ /spl ne/ 0}|/spl les//spl rho//spl middot/m for some /spl rho/>0. In short, f can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program). In addition, numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted. This work is related to the problem of finding sparse solutions to vastly underdetermined systems of linear equations. There are also significant connections with the problem of recovering signals from highly incomplete measurements. In fact, the results introduced in this paper improve on our earlier work. Finally, underlying the success of /spl lscr//sub 1/ is a crucial property we call the uniform uncertainty principle that we shall describe in detail.
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> Suppose we are given a vector &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$f$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; in a class &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;${\cal F} \subset{\BBR}^N$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt;, e.g., a class of digital signals or digital images. How 
 <para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> Suppose we are given a vector &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$f$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; in a class &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;${\cal F} \subset{\BBR}^N$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt;, e.g., a class of digital signals or digital images. How many linear measurements do we need to make about &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$f$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; to be able to recover &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$f$&lt;/tex&gt; &lt;/formula&gt;&lt;/emphasis&gt; to within precision &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$\epsilon$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; in the Euclidean &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$(\ell_2)$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; metric? This paper shows that if the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$f$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; to within very high accuracy from a small number of random measurements by solving a simple linear program. More precisely, suppose that the &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$n$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt;th largest entry of the vector &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$\vert f\vert$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; (or of its coefficients in a fixed basis) obeys &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$\vert f\vert _{(n)} \le R \cdot n^{-1/p}$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt;, where &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$R &gt; 0$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; and &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$p &gt; 0$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt;. Suppose that we take measurements &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$y_k = \langle f, X_k\rangle, k = 1, \ldots, K$&lt;/tex&gt; &lt;/formula&gt;&lt;/emphasis&gt;, where the &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$X_k$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; are &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$N$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt;-dimensional Gaussian vectors with independent standard normal entries. Then for each &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$f$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; obeying the decay estimate above for some &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$0 &lt; p &lt; 1$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; and with overwhelming probability, our reconstruction &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$f^\sharp$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt;, defined as the solution to the constraints &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$y_k = \langle f^\sharp, X_k \rangle$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; with minimal &lt;emphasis&gt;&lt;formula formulatype="inline"&gt; &lt;tex&gt;$\ell_1$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; norm, obeys &lt;emphasis&gt; &lt;formula formulatype="display"&gt;&lt;tex&gt;$$ \Vert f - f^\sharp\Vert _{\ell_2} \le C_p \cdot R \cdot (K/\log N)^{-r}, \quad r = 1/p - 1/2. $$&lt;/tex&gt; &lt;/formula&gt;&lt;/emphasis&gt;There is a sense in which this result is optimal; it is generally impossible to obtain a higher accuracy from any set of &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$K$&lt;/tex&gt;&lt;/formula&gt;&lt;/emphasis&gt; measurements whatsoever. The methodology extends to various other random measurement ensembles; for example, we show that similar results hold if one observes a few randomly sampled Fourier coefficients of &lt;emphasis&gt;&lt;formula formulatype="inline"&gt;&lt;tex&gt;$f$&lt;/tex&gt; &lt;/formula&gt;&lt;/emphasis&gt;. In fact, the results are quite general and require only two hypotheses on the measurement ensemble which are detailed. </para>
We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one 
 We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by l{1}-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as Eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.
This paper is concerned with the problem of recovering an unknown matrix from a small fraction of its entries. This is known as the <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">matrix completion</i> problem, and 
 This paper is concerned with the problem of recovering an unknown matrix from a small fraction of its entries. This is known as the <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">matrix completion</i> problem, and comes up in a great number of applications, including the famous <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Netflix Prize</i> and other similar questions in collaborative filtering. In general, accurate recovery of a matrix from a small number of entries is impossible, but the knowledge that the unknown matrix has low rank radically changes this premise, making the search for solutions meaningful. This paper presents optimality results quantifying the minimum number of entries needed to recover a matrix of rank <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r</i> exactly by any method whatsoever (the information theoretic limit). More importantly, the paper shows that, under certain incoherence assumptions on the singular vectors of the matrix, recovery is possible by solving a convenient convex program as soon as the number of entries is on the order of the information theoretic limit (up to logarithmic factors). This convex program simply finds, among all matrices consistent with the observed entries, that with minimum nuclear norm. As an example, we show that on the order of <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">nr</i> log( <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> ) samples are needed to recover a random <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> x <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> matrix of rank <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r</i> by any method, and to be sure, nuclear norm minimization succeeds as soon as the number of entries is of the form <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">nr</i> polylog( <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> ).
The class of L1-regularized optimization problems has received much attention recently because of the introduction of “compressed sensing,” which allows images and signals to be reconstructed from small amounts of 
 The class of L1-regularized optimization problems has received much attention recently because of the introduction of “compressed sensing,” which allows images and signals to be reconstructed from small amounts of data. Despite this recent attention, many L1-regularized problems still remain difficult to solve, or require techniques that are very problem-specific. In this paper, we show that Bregman iteration can be used to solve a wide variety of constrained optimization problems. Using this technique, we propose a “split Bregman” method, which can solve a very broad class of L1-regularized problems. We apply this technique to the Rudin–Osher–Fatemi functional for image denoising and to a compressed sensing problem that arises in magnetic resonance imaging.
This paper considers the model problem of reconstructing an object from incomplete frequency samples. Consider a discrete-time signal f/spl isin/C/sup N/ and a randomly chosen set of frequencies /spl Omega/. 
 This paper considers the model problem of reconstructing an object from incomplete frequency samples. Consider a discrete-time signal f/spl isin/C/sup N/ and a randomly chosen set of frequencies /spl Omega/. Is it possible to reconstruct f from the partial knowledge of its Fourier coefficients on the set /spl Omega/? A typical result of this paper is as follows. Suppose that f is a superposition of |T| spikes f(t)=/spl sigma//sub /spl tau//spl isin/T/f(/spl tau/)/spl delta/(t-/spl tau/) obeying |T|/spl les/C/sub M//spl middot/(log N)/sup -1/ /spl middot/ |/spl Omega/| for some constant C/sub M/>0. We do not know the locations of the spikes nor their amplitudes. Then with probability at least 1-O(N/sup -M/), f can be reconstructed exactly as the solution to the /spl lscr//sub 1/ minimization problem. In short, exact recovery may be obtained by solving a convex optimization problem. We give numerical values for C/sub M/ which depend on the desired probability of success. Our result may be interpreted as a novel kind of nonlinear sampling theorem. In effect, it says that any signal made out of |T| spikes may be recovered by convex programming from almost every set of frequencies of size O(|T|/spl middot/logN). Moreover, this is nearly optimal in the sense that any method succeeding with probability 1-O(N/sup -M/) would in general require a number of frequency samples at least proportional to |T|/spl middot/logN. The methodology extends to a variety of other situations and higher dimensions. For example, we show how one can reconstruct a piecewise constant (one- or two-dimensional) object from incomplete frequency samples - provided that the number of jumps (discontinuities) obeys the condition above - by minimizing other convex functionals such as the total variation of f.
This article is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component 
 This article is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the ℓ1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.
Given a dictionary D = { d k } of vectors d k , we seek to represent a signal S as a linear combination S = ∑ k Îł( 
 Given a dictionary D = { d k } of vectors d k , we seek to represent a signal S as a linear combination S = ∑ k Îł( k ) d k , with scalar coefficients Îł ( k ). In particular, we aim for the sparsest representation possible. In general, this requires a combinatorial optimization process. Previous work considered the special case where D is an overcomplete system consisting of exactly two orthobases and has shown that, under a condition of mutual incoherence of the two bases, and assuming that S has a sufficiently sparse representation, this representation is unique and can be found by solving a convex optimization problem: specifically, minimizing the ℓ 1 norm of the coefficients ÎłÌ±. In this article, we obtain parallel results in a more general setting, where the dictionary D can arise from two or several bases, frames, or even less structured systems. We sketch three applications: separating linear features from planar ones in 3D data, noncooperative multiuser encoding, and identification of over-complete independent component models.
SUMMARY With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, 
 SUMMARY With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, to the unknown function. Estimation with the aid of an oracle offers dramatic advantages over traditional linear estimation by nonadaptive kernels; however, it is a priori unclear whether such performance can be obtained by a procedure relying on the data alone. We describe a new principle for spatially-adaptive estimation: selective wavelet reconstruction. We show that variable-knot spline fits and piecewise-polynomial fits, when equipped with an oracle to select the knots, are not dramatically more powerful than selective wavelet reconstruction with an oracle. We develop a practical spatially adaptive method, RiskShrink, which works by shrinkage of empirical wavelet coefficients. RiskShrink mimics the performance of an oracle for selective wavelet reconstruction as well as it is possible to do so. A new inequality in multivariate normal decision theory which we call the oracle inequality shows that attained performance differs from ideal performance by at most a factor of approximately 2 log n, where n is the sample size. Moreover no estimator can give a better guarantee than this. Within the class of spatially adaptive procedures, RiskShrink is essentially optimal. Relying only on the data, it comes within a factor log 2 n of the performance of piecewise polynomial and variableknot spline methods equipped with an oracle. In contrast, it is unknown how or if piecewise polynomial methods could be made to function this well when denied access to an oracle and forced to rely on data alone.
In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse 
 In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data
We propose a new method for reconstruction of sparse signals with and without noisy perturbations, termed the subspace pursuit algorithm. The algorithm has two important characteristics: low computational complexity, comparable 
 We propose a new method for reconstruction of sparse signals with and without noisy perturbations, termed the subspace pursuit algorithm. The algorithm has two important characteristics: low computational complexity, comparable to that of orthogonal matching pursuit techniques when applied to very sparse signals, and reconstruction accuracy of the same order as that of linear programming (LP) optimization methods. The presented analysis shows that in the noiseless setting, the proposed algorithm can exactly reconstruct arbitrary sparse signals provided that the sensing matrix satisfies the restricted isometry property with a constant parameter. In the noisy setting and in the case that the signal is not exactly sparse, it can be shown that the mean-squared error of the reconstruction is upper-bounded by constant multiples of the measurement and signal perturbation energies.
Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While region-level models often feature dense pairwise connectivity, pixel-level models are 
 Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While region-level models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy.
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, 
 Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers argues that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas-Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ?1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, it discusses applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. It also discusses general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.
Abstract Suppose we wish to recover a vector x 0 ∈ ℝ 𝓂 (e.g., a digital signal or image) from incomplete and contaminated observations y = A x 0 + 
 Abstract Suppose we wish to recover a vector x 0 ∈ ℝ 𝓂 (e.g., a digital signal or image) from incomplete and contaminated observations y = A x 0 + e ; A is an 𝓃 × 𝓂 matrix with far fewer rows than columns (𝓃 â‰Ș 𝓂) and e is an error term. Is it possible to recover x 0 accurately based on the data y ? To recover x 0 , we consider the solution x # to the 𝓁 1 ‐regularization problem where Ï” is the size of the error term e . We show that if A obeys a uniform uncertainty principle (with unit‐normed columns) and if the vector x 0 is sufficiently sparse, then the solution is within the noise level As a first example, suppose that A is a Gaussian random matrix; then stable recovery occurs for almost all such A 's provided that the number of nonzeros of x 0 is of about the same order as the number of observations. As a second instance, suppose one observes few Fourier samples of x 0 ; then stable recovery occurs for almost any set of 𝓃 coefficients provided that the number of nonzeros is of the order of 𝓃/(log 𝓂) 6 . In the case where the error term vanishes, the recovery is of course exact, and this work actually provides novel insights into the exact recovery phenomenon discussed in earlier papers. The methodology also explains why one can also very nearly recover approximately sparse signals. © 2006 Wiley Periodicals, Inc.
We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random 
 We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys $$m\ge C\,n^{1.2}r\log n$$ for some positive numerical constant C, then with very high probability, most n×n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive 
 L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularization (often calling it "weight decay" in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by \emph{decoupling} the weight decay from the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam and (ii) substantially improves Adam's generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperformed by the latter). Our proposed decoupled weight decay has already been adopted by many researchers, and the community has implemented it in TensorFlow and PyTorch; the complete source code for our experiments is available at https://github.com/loshchil/AdamW-and-SGDW
In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y=XÎČ+z, where ÎČ∈Rp 
 In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y=XÎČ+z, where ÎČ∈Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, nâ‰Șp, and the zi’s are i.i.d. N(0, σ2). Is it possible to estimate ÎČ reliably based on the noisy data y? To estimate ÎČ, we introduce a new estimator—we call it the Dantzig selector—which is a solution to the ℓ1-regularization problem $$\min_{\tilde{\beta}\in\mathbf{R}^{p}}\|\tilde{\beta}\|_{\ell_{1}}\quad\mbox{subject to}\quad \|X^{*}r\|_{\ell_{\infty}}\leq(1+t^{-1})\sqrt{2\log p}\cdot\sigma,$$ where r is the residual vector y−XÎČ̃ and t is a positive scalar. We show that if X obeys a uniform uncertainty principle (with unit-normed columns) and if the true parameter vector ÎČ is sufficiently sparse (which here roughly guarantees that the model is identifiable), then with very large probability, ‖ÎČ̂−ÎČ‖ℓ22≀C2⋅2log p⋅(σ2+∑imin(ÎČi2, σ2)). Our results are nonasymptotic and we give values for the constant C. Even though n may be much smaller than p, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information about which coordinates are nonzero, and which were above the noise level. In multivariate regression and from a model selection viewpoint, our result says that it is possible nearly to select the best subset of variables by solving a very simple convex program, which, in fact, can easily be recast as a convenient linear program (LP).
Suppose x is an unknown vector in Ropf <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</sup> (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If 
 Suppose x is an unknown vector in Ropf <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</sup> (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/4</sup> log <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5/2</sup> (m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscr <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> ball for 0<ples1. The N most important coefficients in that expansion allow reconstruction with lscr <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> error O(N <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/2-1</sup> p/). It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients. Moreover, a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing. The nonadaptive measurements have the character of "random" linear combinations of basis/frame elements. Our results use the notions of optimal recovery, of n-widths, and information-based complexity. We estimate the Gel'fand n-widths of lscr <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> balls in high-dimensional Euclidean space in the case 0<ples1, and give a criterion identifying near- optimal subspaces for Gel'fand n-widths. We show that "most" subspaces are near-optimal, and show that convex optimization (Basis Pursuit) is a near-optimal way to extract information derived from these near-optimal subspaces
Interrupted sampling repeater jamming (ISRJ) is characterized by its coherent processing gains and flexible modulation techniques. ISRJ generates spurious targets along the range, which presents significant challenges to the radar 
 Interrupted sampling repeater jamming (ISRJ) is characterized by its coherent processing gains and flexible modulation techniques. ISRJ generates spurious targets along the range, which presents significant challenges to the radar systems. However, existing ISRJ countermeasure methods struggle to eliminate ISRJ signals without compromising the integrity of the real target signal, especially under low-signal-to-noise-ratio (SNR) conditions, resulting in a deteriorated sidelobe and diminished detection performance. We propose a complex-valued encoder–decoder network (CVEDNet) to address these challenges based on signal decomposition. This network offers an end-to-end ISRJ suppression approach, working on complex-valued time-domain signals without the need for additional preprocessing. The encoding and decoding structure suppresses noise components and obtains more compact echo feature representations through layer-by-layer compression and reconstruction. A stacked dual-branch structure and multi-scale dilated convolutions are adopted to further separate the echo signal and ISRJ based on high-dimensional features. A multi-domain combined loss function integrates the waveform and range-pulse-compression information to ensure the amplitude and phase integrity of the reconstructed echo waveform during the training process. The effectiveness of the proposed method was validated in terms of its jamming suppression capability, echo fidelity, and detection performance indicators under low-SNR conditions compared to conventional methods.
Compressive sensing in the ambiguity domain facilitates high-performance reconstruction of time-frequency distributions (TFDs) for non-stationary signals. However, identifying the optimal regularization parameter in the absence of prior knowledge remains a 
 Compressive sensing in the ambiguity domain facilitates high-performance reconstruction of time-frequency distributions (TFDs) for non-stationary signals. However, identifying the optimal regularization parameter in the absence of prior knowledge remains a significant challenge. The Rényi entropy-based two-step iterative shrinkage/thresholding (RTwIST) algorithm addresses this issue by incorporating local component estimates to guide adaptive thresholding, thereby improving interpretability and robustness. Nevertheless, RTwIST may struggle to accurately isolate components in cases of significant amplitude variations or component intersections. In this work, an enhanced RTwIST framework is proposed, integrating the random sample consensus (RANSAC)-based refinement stage that iteratively extracts individual components and fits smooth trajectories to their peaks. The best-fitting curves are selected by minimizing a dedicated objective function that balances amplitude consistency and trajectory smoothness. Experimental validation on both synthetic and real-world electroencephalogram (EEG) signals demonstrates that the proposed method achieves superior reconstruction accuracy, enhanced auto-term continuity, and improved robustness compared to the original RTwIST and several state-of-the-art algorithms.
The goal of this repository is to demonstrate the efficiency and effectiveness of our strengthened Quadratic Convex (QC) relaxation formulation and algorithms for the AC Optimal Transmission Switching (ACOTS) problem. The goal of this repository is to demonstrate the efficiency and effectiveness of our strengthened Quadratic Convex (QC) relaxation formulation and algorithms for the AC Optimal Transmission Switching (ACOTS) problem.
This special issue addresses Bayesian inverse problems using data-driven priors derived from deep generative models (DGMs) and the convergence of generative modelling techniques and Bayesian inference methods. Conventional Bayesian priors 
 This special issue addresses Bayesian inverse problems using data-driven priors derived from deep generative models (DGMs) and the convergence of generative modelling techniques and Bayesian inference methods. Conventional Bayesian priors often fail to accurately capture the properties and the underlying geometry of complex, real-world data distributions. In contrast, deep generative models (DGMs), which include generative adversarial networks (GANs), variational auto-encoders (VAEs), normalizing flows and diffusion models (DMs), have demonstrated tremendous success in capturing detailed data representations learned directly from empirical observations. As a result, these models produce priors endowed with superior accuracy, increased perceptual realism and enhanced capacities for uncertainty quantification within inverse problem contexts. This paradigm emerged in the late 2010s, when pioneering efforts were made to explicitly formulate Bayesian inverse problems using conditional Wasserstein generative adversarial networks (GANs). These advances have greatly improved methods for quantifying uncertainties, especially in large-scale imaging applications. Building on these fundamental insights, posterior sampling techniques utilizing DMs have demonstrated remarkable efficiency and robustness, highlighting their potential to effectively tackle complex and diverse inverse problems. The articles collected herein provide essential theoretical breakthroughs and significant algorithmic innovations, collectively demonstrating how deep generative priors mitigate traditional limitations and profoundly enrich both the practical applicability and theoretical foundations of Bayesian inversion.This article is part of the theme issue 'Generative modelling meets Bayesian inference: a new paradigm for inverse problems'.
Non-smooth, non-convex optimization problems frequently arise in modern machine learning applications, yet solving them efficiently remains a challenge. This paper addresses the minimization of functions of the form f(x)=∑i=1mfi(x) where 
 Non-smooth, non-convex optimization problems frequently arise in modern machine learning applications, yet solving them efficiently remains a challenge. This paper addresses the minimization of functions of the form f(x)=∑i=1mfi(x) where each component is Lipschitz continuous but potentially non-smooth and non-convex. We extend the incremental subgradient method by incorporating weak subgradients, resulting in a framework better suited for non-convex objectives. We provide a comprehensive convergence analysis for three step size strategies: constant, diminishing, and a novel dynamic approach. Our theoretical results show that all variants converge to a neighborhood of the optimal solution, with the size of this neighborhood governed by the weak subgradient parameters. Numerical experiments on classification tasks with non-convex regularization, evaluated on the Breast Cancer Wisconsin dataset, demonstrate the effectiveness of the proposed approach. In particular, the dynamic step size method achieves superior practical performance, outperforming both classical and diminishing step size variants in terms of accuracy and convergence speed. These results position the incremental weak subgradient framework as a promising tool for scalable and efficient optimization in machine learning settings involving non-convex objectives.
This paper investigates the phenomenon of double descent and proposes the use of minimax approximation (L∞-norm) as an alternative to L2-regularization to improve the quality of model approximation. Double descent 
 This paper investigates the phenomenon of double descent and proposes the use of minimax approximation (L∞-norm) as an alternative to L2-regularization to improve the quality of model approximation. Double descent describes the dependence of the error on the complexity of the model: the error first decreases, then increases due to overfitting, and then decreases again. In contrast, in experiments with a model without regularization, a predominantly increasing trend of the error with short periods of decline was found, which is observed for an incomplete manifestation of the phenomenon. This is probably due to anomalous points in the data that caused an exponential increase in the error at high powers. Three approaches were noted: a classical model without regularization, a model with L2-regularization, and minimal approximation. L2 regularization added a penalty for large coefficient norms, which stabilized the error and prevented overfitting, especially at high polynomial degrees (200+). Minimax approximation minimized the error, thereby providing better maximum anomaly robustness and outperforming L2 regularization at low degrees (up to 50). The results confirmed that minimax approximation is more effective for problems with anomalies, while L2 regularization performs better on complex models with high polynomial degrees. The findings contribute to the understanding of the double descent phenomenon and show the practicality of applying different approaches due to data features and model requirements.
Abstract Purpose To determine how various compressed sensing (CS) models can accelerate alternating Look‐Locker mapping. Methods An alternating Look‐Locker acquisition was retrospectively accelerated by factors of 1–12. The data was 
 Abstract Purpose To determine how various compressed sensing (CS) models can accelerate alternating Look‐Locker mapping. Methods An alternating Look‐Locker acquisition was retrospectively accelerated by factors of 1–12. The data was reconstructed into 12 images with multiple CS models, which utilized combinations of spatial total variation, locally low‐rank regularization, and subspace constraints. Complex non‐linear least squares signal fitting was performed to obtain the maps. The accelerated maps were compared against the map of a full data reference reconstruction. Results A subspace‐constrained reconstruction model with spatial total variation and locally low‐rank regularization outperformed all other models as measured by map normalized root mean squared error, structural similarity index, and normalized mean absolute deviation. The subspace constraint benefited models utilizing spatial total variation but, conversely, did not benefit models utilizing only locally low‐rank regularization. Conclusion The radial 3‐D alternating Look‐Locker mapping acquisition was successfully accelerated by up to a factor of 12 with various CS models. The best‐performing model was a subspace‐constrained reconstruction, which utilized spatial total variation and locally low‐rank regularization.
Abstract Due to its reduced memory and computational demands, dynamical low-rank approximation (DLRA) has sparked significant interest in multiple research communities. A central challenge in DLRA is the development of 
 Abstract Due to its reduced memory and computational demands, dynamical low-rank approximation (DLRA) has sparked significant interest in multiple research communities. A central challenge in DLRA is the development of time integrators that are robust to the curvature of the manifold of low-rank matrices. Recently, a parallel robust time integrator that permits dynamic rank adaptation and enables a fully parallel update of all low-rank factors was introduced. Despite its favorable computational efficiency, the construction as a first-order approximation to the augmented basis-update &amp; Galerkin integrator restricts the parallel integrator’s accuracy to order one. In this work, an extension to higher order is proposed by a careful basis augmentation before solving the matrix differential equations of the factorized solution. A robust error bound with an improved dependence on normal components of the vector field together with a norm preservation property up to small terms is derived. These analytic results are complemented and demonstrated through a series of numerical experiments.
Abstract To enhance the effectiveness of rescue operations, particularly in the context of fire outbreaks and building collapses, the adoption of through-the-wall radar imaging (TWRI) technology is being explored to 
 Abstract To enhance the effectiveness of rescue operations, particularly in the context of fire outbreaks and building collapses, the adoption of through-the-wall radar imaging (TWRI) technology is being explored to provide pre-event situational awareness, thereby increasing the likelihood of survival. The critical nature of time during rescue missions is underscored by the fact that the majority of existing image reconstruction algorithms fail to operate within the essential four-minute survival window for humans. Recent advancements have introduced a limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS)-based algorithm within the TWRI framework, which has demonstrated a significant reduction in image reconstruction time, positioning it as a viable solution for time-sensitive operations. This study rigorously evaluates the performance of the LBFGS algorithm to ascertain its applicability in rescue scenarios. Utilizing MATLAB, the algorithm was systematically assessed across various parameters, including the number of targets, data volumes, room dimensions, and signal-to-noise ratios. In all tested scenarios, the LBFGS-based algorithm consistently reconstructed images within the four-minute threshold, achieving satisfactory mean square error values and outperforming traditional sparse reconstruction algorithms. The findings suggest that the LBFGS algorithm is suitable for urgent rescue missions.
Abstract In this paper, we consider a class of structured nonconvex nonsmooth optimization problems whose objective function is the sum of three nonconvex functions, one of which is expressed in 
 Abstract In this paper, we consider a class of structured nonconvex nonsmooth optimization problems whose objective function is the sum of three nonconvex functions, one of which is expressed in a difference-of-convex (DC) form. This problem class covers several important structures in the literature including the sum of three functions and the general DC program. We propose a splitting algorithm and prove the subsequential convergence to a stationary point of the problem. The full sequential convergence, along with convergence rates for both the iterates and objective function values, is then established without requiring differentiability of the concave part. Our analysis not only extends but also unifies and improves recent convergence analyses in nonconvex settings. We benchmark our proposed algorithm with notable algorithms in the literature to show its competitiveness on a low rank matrix completion problem and a simultaneously sparse and low-rank matrix estimation problem. Our algorithm exhibits very competitive results compared to notable algorithms in the literature, on both synthetic data and public dataset.
Abstract Noiseless compressive sensing is a two-step setting that allows for undersampling a sparse signal and then reconstructing it without loss of information. The LASSO algorithm, based on <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" 
 Abstract Noiseless compressive sensing is a two-step setting that allows for undersampling a sparse signal and then reconstructing it without loss of information. The LASSO algorithm, based on <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:msub> <mml:mi>ℓ</mml:mi> <mml:mn>1</mml:mn> </mml:msub> </mml:mrow> </mml:math> regularization, provides an efficient and robust way to address this problem, but it fails in the regime of very high compression rate. Here we present two algorithms, based on <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:msub> <mml:mi>ℓ</mml:mi> <mml:mn>0</mml:mn> </mml:msub> </mml:mrow> </mml:math> -norm regularization, that outperform LASSO in terms of compression rate in the Gaussian design setting for measurement matrix. These algorithms are based on the approximate survey propagation, an algorithmic family within the approximate message Passing class. In the large system limit, they can be rigorously tracked through state evolution equations, and it is possible to exactly predict the range compression rates for which perfect signal reconstruction is possible. We also provide a statistical physics analysis of the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:msub> <mml:mi>ℓ</mml:mi> <mml:mn>0</mml:mn> </mml:msub> </mml:mrow> </mml:math> -norm noiseless compressive sensing model. We show the existence of both a replica symmetric state and a one-step replica symmmetry broken (1RSB) state for sufficiently low <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:msub> <mml:mi>ℓ</mml:mi> <mml:mn>0</mml:mn> </mml:msub> </mml:mrow> </mml:math> -norm regularization. The recovery limits of our algorithms are linked to the behavior of the 1RSB solution.