Compressed Smooth Sparse Decomposition

Published Online:https://doi.org/10.1287/ijds.2022.0023

Abstract

Image-based anomaly detection systems are of vital importance in various manufacturing applications. The resolution and acquisition rate of such systems are increasing significant in recent years under the fast development of image sensing technology. This enables the detection of tiny anomalies in real time. However, such a high resolution and a high acquisition rate of image data not only slow down the speed of image processing algorithms but also, increase data storage and transmission cost. To tackle this problem, we propose a fast and data-efficient method with theoretical performance guarantee that is suitable for sparse anomaly detection in images with a smooth background (smooth plus sparse signal). The proposed method, named compressed smooth sparse decomposition (CSSD), is a one-step method that unifies the compressive image acquisition- and decomposition-based image processing techniques. To further enhance its performance in a high-dimensional scenario, a Kronecker compressed smooth sparse decomposition (KronCSSD) method is proposed. Compared with traditional smooth and sparse decomposition algorithms, significant transmission cost reduction and computational speed boost can be achieved with negligible performance loss. Simulation examples and several case studies in various applications illustrate the effectiveness of the proposed framework.

History: Kwok-Leung Tsui served as the senior editor for this article.

Funding: This work is partially support by the National Science Foundation Division of Engineering Education and Centers [Grant 2052714].

Data Ethics & Reproducibility Note: The code capsule is available on Code Ocean at https://doi.org/10.24433/CO.6352310.v2 and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2022.0023).

1. Introduction

High-quality image sensing systems are widely used in manufacturing processes for product quality monitoring and fault diagnosis. The resolution and acquisition rate of such systems increase significantly benefiting from the rapid development of image sensing technology. For example, in a hot rolling process, an in situ image-based sensor can detect a micrometer-sized seam on a rolling bar at a speed of up to 225 miles per hour (Yan et al. 2018). For another example, to monitor solar activity, satellites can capture high-resolution solar images with a high acquisition rate, producing terabytes of data per day (Wang et al. 2018). To achieve real-time inspection, a large volume of high-resolution images needs to be transmitted and processed in real time. Such a large volume of high-resolution image data poses a big challenge not only on the speed of image processing algorithms but also, for the storage and transmission of the data itself.

Matrix decomposition-based image processing techniques are widely used in image-based process monitoring and anomaly detection. They achieve the goal by integrating the prior for background and anomaly components into the optimization problem. In terms of utilizing the low-rank and sparse property, robust principal component analysis was first proposed by Candès et al. (2011) to decompose a data matrix into low-rank and element-wise sparse components. One of its famous applications is dynamic foreground and static background separation (Bouwmans and Zahzah 2014). Following this approach, numerous algorithm variants have been proposed, including outlier pursuit (Xu et al. 2012), which aims to decompose the data matrix into a low-rank component and a column-wise sparse component, and low-rank plus compressed sparse decomposition (Mardani et al. 2013), which aims to decompose the data matrix into low-rank and compressed sparse components and so on. For utilizing smooth and sparse properties, smooth and sparse decomposition (SSD) methods (Minaee et al. 2015, Yan et al. 2017) are proposed for anomaly detection in images with smooth backgrounds. Following this approach, several explorations have been conducted, including spatiotemporal smooth sparse decomposition (ST-SSD) (Yan et al. 2018), additive tensor decomposition (Mou et al. 2021), and so on. By adopting the matrix decomposition approach, both the background and anomaly can be captured without detection time delay. However, because of the requirement of storage, transmission, and processing of the whole image signal, it cannot be applied in the scenario with low-transmission bandwidth but high processing speed requirements: for example, the solar flare detection application (Augusto et al. 2011).

To mitigate the data storage and transmission burden and improve sensing efficiency, compressive sensing (CS) (Candes et al. 2006) has been proposed, in which the data are directly collected in a compressed form and then, reconstructed accurately with high probability. More specifically, suppose that the original signal is a sparse vector yRn, and the main idea is to store and transmit a small set of compressive measurements y'=AyRp, where ARp×n is an underdetermined sensing matrix (pn) satisfying specific properties. Then, the original signal can be reconstructed from its compressed form y', on which assorted image processing algorithms can be applied for defect detection and so on. For a comprehensive review of CS, please refer to Marques et al. (2018) and Rani et al. (2018). Even though promising, the naïve approach that tries to first reconstruct the image from the compressed measurement and then, apply matrix decomposition algorithms for anomaly detection has two issues.

  1. For smooth plus sparse signals, the existence of such a sensing matrix A satisfying specific properties is unknown.

  2. The reconstruction process is usually computationally intensive (Marques et al. 2018), which on the other hand, slows down the overall computational speed of the defect detection algorithms.

Recently, to integrate the CS with matrix decomposition algorithms, Waters et al. (2011) proposed an SpaRCS method to recover low-rank and sparse matrices directly from compressive measurements, and Tanner and Vary (2020) gave a rigorous performance discussion. However, those methods do not consider the smooth plus sparse decomposition problems and are not efficient in dealing with high-order data.

In this paper, we discuss the possibility of adopting compressive data acquisition systems for image-based quality monitoring and fault detection in applications where the background is smooth, and anomalies are sparse. To achieve so, we propose a compressed smooth sparse decomposition (CSSD) framework. In this framework, the signal processing algorithms are directly applied to the compressed data, and no reconstruction step is needed. By doing so, a significant cost reduction in sensing, storage, and transmission as well as a boost in the speed of image processing algorithms can be achieved with negligible performance loss. We also established the theoretical foundation of adopting such a compressive data acquisition system for smooth plus sparse signals as well as the performance guarantee of the proposed algorithm. To further improve its performance in high-order scenarios, a Kronecker compressed smooth sparse sensing (KronCSSD) is proposed.

The remainder of this paper is organized as follows. In Section 2, we present the CSSD framework. In Section 3, we use simulation studies to validate the proposed framework. In Section 4, we demonstrate the proposed framework using several case studies. Finally, Section 5 concludes the paper.

2. Compressed Smooth Sparse Decomposition Framework

In this section, we will present the proposed CSSD framework. As mentioned in Section 1, we aim to design a fast and data-efficient method for sparse anomaly (sparse signal component) detection in signals with a smooth background (smooth signal component). For simplicity, we first discuss the methodology for one-dimensional (1D) signals and then, generalize it to n-dimensional images. Figure 1 provides an overview of the proposed methodology. There are three stages in the proposed methodology. (i) The signal is acquired in its compressed form through compressive measurement. (ii) Then, the compressed data are transmitted to the server. (iii) Finally, a decomposition algorithm will be applied directly to the compressed signal to decompose it into its corresponding smooth and sparse signal components.

Figure 1. (Color online) An Overview of the CSSD/KronCSSD Framework

Mathematically, let yRn be a smooth plus sparse signal (which will be defined formally in Section 2.1). We aim to store and transmit a small set of compressive measurements y' (i.e., y'=Ay) and then, reconstruct the smooth and sparse signal components from y'. To achieve so, there are three questions to be addressed.

  1. What is a smooth plus sparse signal?

  2. How do we compress such a signal?

  3. How do we reconstruct such a signal from compressive measurements?

The remainder of this section is organized as follows to answer those three questions. We start with a formal definition of 1D smooth plus sparse signals in Section 2.1. Based on that, we introduce the compressive measurement method and discuss its theoretical properties for such signals in Section 2.2. In Section 2.3, we present the proposed CSSD framework that can directly reconstruct the smooth and sparse signal components from the compressive measurement by solving the following optimization problem:

minθ,θaθa1s.t.A(Bθ+Baθa)y2ϵ1,(1)
where B and Ba are bases, θ and θa are corresponding coefficients, and ϵ1 is the bound for measurement error. The reconstruction accuracy is also characterized theoretically. Then, we generalize the CSSD algorithm to n dimensions using Kronecker compressive sensing (KCS) (Duarte and Baraniuk 2011) and propose a KronCSSD formulation in Section 2.4. Finally, in Section 2.5, we present the advantage of the proposed framework and give the strategy of selecting the compressive ratio, tuning parameters, and bases in practice.

2.1. The Set of Smooth Plus Sparse Signals

In this section, we define the set of smooth plus sparse signals mathematically. The smooth signals originate from the spline smoothing (De Boor and De Boor 1978, Eilers and Marx 1996), where the raw signal is approximated by a linear combination of a set of spline basis functions for smooth interpolation and denoising, and a spline regression technique is usually utilized. To improve the regression robustness with respect to outliers, outliers are explicitly accounted for in the regression model as a sparse component of the raw signal (Giannakis et al. 2011, Mateos and Giannakis 2011). This idea was further generalized to incorporate the special structure of outliers (Yan et al. 2017, 2018).

As a summary, a 1D signal yRn is defined as a smooth plus sparse signal if it can be decomposed into two signal components: (i) a smooth signal mRn in a low-dimensional subspace spanned by a set of smooth bases (i.e., m=Bθ, where BRn×r is a basis matrix with rn) and (ii) a sparse signal aRn in a relatively high-dimensional subspace spanned by a set of predefined bases, of which the coefficients admit sparse property (i.e., a=Baθa, where BaRn×q is a basis matrix with qn and θaRq is an s-sparse vector; i.e., θa0s). Given a smooth plus sparse signal y, we define the aforementioned decomposition as SSD (i.e., y=m+a).

To ensure the uniqueness of SSD in a nontrivial case when nr+q, the following definition is introduced.

Definition 2.1.

The local support property of Ba, BaRn×q. Ba only has local support such that each column of Ba only has nonzero values inside a specific interval. The length of this interval is defined as l(Ba).

Notice that l(Ba){1,,n}. The local support property with a small l ensures the sparsity of a. For example, the B-spline basis has a local support property (Unser 1999).

Definition 2.2.

The incoherence condition of B, BRn×r. Let B=VT be the reduced singular value decomposition (SVD) of B, where URn×r, ΣRr×r, and VRr×r. Its incoherence condition parameter μ(B) is defined as the smallest value such that

maxi{1,..,r}UTei2μ(B)rn,
where ei is the i th standard basis vector in Rn.

Notice that μ(B)[1,n/r]. The incoherent condition with a small μ ensures that the m is not sparse (Candès et al. 2011).

The following theorem ensures the uniqueness of the SSD decomposition.

Theorem 2.1.

If μ(B)<n(2rsl)1, then the SSD decomposition is unique with respect to m and a.

Theorem 2.1 gives the condition that the smooth plus sparse signal can be uniquely decomposed into a smooth part and a sparse part. The proof of Theorem 2.1 is in Appendix A.

Formally, we define the set of smooth plus sparse signals as follows.

Definition 2.3.

The set of smooth and sparse signals is defined as MSr,s,μ,l:

MSr,s,μ,l={yRn|y=Bθ+Baθa, BaRn×q, l(Ba)=l, θaRq,θa0s, BRn×r, μ(B)=μ<n(2rsl)1,θRr}.

For such a smooth plus sparse signal, how to compress it while ensuring the reconstruction performance will be discussed in the next section.

2.2. Compressive Sensing for Smooth Plus Sparse Signals

As mentioned in Section 1, to ensure the reconstruction performance, the sensing matrix A has to satisfy the so-called restricted isometry property (RIP) (Candes 2008). It has been proven that a random matrix can satisfy the RIP property for the sparse signal (Candes 2008), the low-rank signal (Recht et al. 2010), and the rank plus sparse signal (Tanner and Vary 2020). However, the existence of such a matrix for the smooth plus sparse signal, which is the foundation of adopting compressed data acquisition techniques in applications with smooth background and sparse anomalies, is still unknown. In this section, we will discuss the existence of such a sensing matrix. Before stating the result, we will first present the relevant definitions that are necessary to derive the main result.

Definition 2.4.

RIP for MSr,s,μ,l. Let ARp×n be a linear measurement matrix. For every quadruple (r,s,μ,l), define the restricted isometry constant (RIC) δr,s,μ,l to be the smallest positive constant such that

(1δr,s,μ,l)y2Ay2(1+δr,s,μ,l)y2, yMSr,s,μ,l.

If such a δr,s,μ,l(0,1) exists, we say that A satisfies the RIP.

Theorem 2.2.

Suppose that δr,2s,μ,l<1 for some integer r,s,l1 and positive numbers μ<n(2rsl)1; then, there is a y0 in the set MSr,s,μ,l, which is the only solution for Ay0=b. 

Theorem 2.2 guarantees the uniqueness of the smooth plus sparse signal that satisfies the sensing equation when A satisfies the RIP. The proof of Theorem 2.2 is in Appendix B.

Next, we prove that for the set of smooth plus sparse signals, MSr,s,μ,l, there exists such a matrix A satisfying the RIP property with RIC = δr,s,μ,l with high probability.

Notice that the RIP for a matrix is difficult to verify. A suitable set of random matrices that obey the RIP for the set of sparse vectors with high probability (Recht et al. 2010, Tanner and Vary 2020) is defined as follows.

Definition 2.5.

Nearly isometric matrices (Baraniuk et al. 2008). Let ARp×n be a random variable that takes values in linear maps from Rn to Rp; then, for any yRn, A is nearly isometrically distributed if

  1. E[Ay22]=y22 and

  2. Pr(|Ay22y22|ϵy22)2epc0(ϵ), 0<ϵ<1,

    where c0(ϵ) is a constant that only depends on ϵ.

The p×n matrix with independent, identically distributed (i.i.d.) Gaussian entries satisfies those two properties (Baraniuk et al. 2008) (i.e., AijN(0,1p), with c0(ϵ)=ϵ2/4ϵ3/6). There are also other distributions satisfying the nearly isometric property, such as the p×n matrix with i.i.d. Bernoulli entries and their related distribution (Baraniuk et al. 2008).

Then, the following theorem states that the nearly isometric matrices can also serve as the sensing matrix for smooth plus sparse signals and gives the magnitude of the number of linear measurements.

Theorem 2.3.

Let ARp×n be a matrix from the families described in Definition 2.5. Furthermore, assume that μ<n(2rsl)1 and the basis matrix Ba for the sparse signal component satisfies the RIP with RIC δBa,s(0,1): that is, δBa,s to be the smallest positive constant such that

(1δBa,s)θa2Baθa2(1+δBa,s)θa2, θa{θaRq,θa0s}.

For a given δ(0,1), there exists constants c1,c2>0 depending only on δ, such that the RIC for MSr,s,μ,l is upper bounded by δ, with the probability of at least 1exp(c1p), whenever

pc2(ln2+rln24δτ1+s(1+ln24δτ0+lnns)),
where η=μrsln, τ0=1(1δBa,s)(1η2), τ1=B2(1+11η2)and B=(BTB)1BT.

Theorem 2.3 states that if the nearly isometric matrix is selected as the sensing matrix, the RIC for MSr,s,μ,l is upper bounded with high probability. In practice, B and Ba are prespecified based on the understanding/engineering knowledge of the process (please refer to Section 2.5.4 for more detail). Theorem 2.3 also provides a guidance on the magnitude of linear measurements p, which determines the compressive measurement matrix A.

The proof of Theorem 2.3 is in Appendix C.

In this section, we answered two fundamental questions. (i) How do we compress the smooth plus sparse signal (design the compressive measurement matrix A)? (ii) How many linear measurements are needed to preserve the information in smooth plus sparse signals with high probability?

In the next section, we will discuss the problem formulation to recover the smooth and sparse signal components simultaneously from the compressed signal using the CSSD framework.

2.3. Compressed Smooth Sparse Decomposition

As mentioned in Section 1, one way to recover the smooth component and sparse component is first reconstructing the compressed image and then, using the SSD algorithm. However, this will slow down the speed of the defect detection algorithm. Instead, we propose to solve the one-step convex relaxation Problem (1). Natural questions are if we can recover the smooth and sparse signals from the compressed measurement y by solving Problem (1) and what the accuracy is. The following theorem guarantees the recovery performance of the proposed convex relaxation Problem (1).

Theorem 2.4.

Let ARp×n be a matrix from the families described inDefinition 2.5. Let the signal y=m0+a0MSr,s,μ,l, where m0=Bθ0 and a0=Baθa0. Assume that Problem (1) is feasible, and let the optimal solution be m*=Bθ* and a*=Baθa*. Assume that the basis matrix Ba for sparse signal component satisfies the RIP with RIC δBa,2s(0,1). Let a=(1+α1+α2)γ2+2α2+2 and c =1γ2α1α2α22, where α1=η1η2, α2=2η12η2, γ=1+δBa,2s1δBa,2s, and η=μrsln. Suppose that r,s,lN and μ<n(2rsl)1, such that c>0  and δr,3s,μ,l(0,c/a); then,

a0a*2=Baθa0Baθa*2Caϵ1
and
m0m*2=Bθ0Bθ*2Cmϵ1,
where
Ca=(1+γ2)(1+α2)1+δr,3s,μ,lcaδr,3s,μ,l
and
Cm=1+δr,3s,μ,l+(δr,3s,μ,l+γ21+γ2α1+11+γ2α2)Ca(1δr,3s,μ,l).

Theorem 2.4 gives the conditions that the proposed convex relaxation Problem (1) can recover the true smooth and sparse signal up to a constant times the noise bound.

The proof of Theorem 2.4 is in Appendix D.

Notice that one advantage of the proposed CSSD framework is that it is compatible with existing decomposition algorithms. For example, for the SSD algorithm proposed by Yan et al. (2017), the problem formulation becomes

minθ,θayA(Bθ+Baθa)22+λθa1,(2)
which can be solved efficiently by using the algorithm proposed by Yan et al. (2017).

2.4. Kronecker Compressed Smooth Sparse Decomposition

In the previous section, we discussed the CSSD framework for the 1D signal. In this section, we will generalize the proposed CSSD to KronCSSD framework for high-order tensor data.

Let YRn1××nd be the original signal and yRN be its corresponding vectorized signal (i.e.,y=vec(Y) and N=i=1dni). Let ARp×N be a measurement matrix satisfying the RIP. Let yRp be the compressed data, such that y=Ay. B=i=1dBi and Ba=i=1dBai are the known bases for smooth and sparse components, respectively. Notice that problem formulation (1) can still be used for recovering the high-order smooth and sparse signal components from the compressive measurement. However, there are two issues. (i) In practice, the global CS measurements matrix A is hard to realize using the CS device (Duarte and Baraniuk 2011). (ii) The resulting bases B and Ba can be extremely large as the dimension of the data increases, which will not only cause a big challenge in the storage of such a large matrix but also, result in the computational issue in handling such large matrices.

In reality, the high-dimensional tensor data usually have low-rank properties along each mode, which have been extensively exploited in tensor low-rank modeling techniques, such as CANDECOMP/PARAFAC (CP)/Tucker decompositions (Kolda and Bader 2009). This makes it possible for designing a sensing matrix for each mode, which is called Kronecker CS (Duarte and Baraniuk 2011). Inspired by the KCS method, we propose the KronCSSD framework formulation as follows.

Let AiRpi×ni be a measurement matrix with RIP for each mode of the tensor; then, we have the following formulation,

minΘ,Θavec(Θa)1s.t.vec(Y1Y)2ϵ1,Y1=Θ×1(A1B1)×2×d(AdBd)+Θa×1(A1Ba1)×2×d(AdBad),(3)
where Y=Y×1A1×2×dAd is the compressive measurement. ΘRr1××rd and ΘaRq1××qd are the basis coefficients for the smooth and sparse signal components, respectively.

The proposed KronCSSD framework is a nontrivial generalization of the CSSD framework for 1D signals. Its performance will be shown empirically in simulation and case studies. The theoretical discussion of the KronSSD framework is left for future work.

For example, for a two-dimensional (2D) image YRn1×n2, when adopting the SSD algorithm (Yan et al. 2017), the problem formulation becomes

minΘ,ΘaYA1B1ΘB2TA2TA1Ba1ΘaBa2TA2T22+λvec(Θa)1,(4)
where Y is the compressed image (i.e., Y=A1YA2T). It can be solved efficiently by using the algorithm proposed by Yan et al. (2017).

2.5. Discussion

2.5.1. Advantages of the Proposed CSSD/KronCSSD Methods.

In this section, we will give a brief discussion about the advantages of the proposed CSSD/KronCSSD methods. We define the compressive ratio as c=i=1dpi/ni. The smaller the compressive ratio, the fewer data will be transmitted. The proposed CSSD/KronCSSD methods have the following characteristics.

  1. We propose to directly acquire the compressed image, which not only reduces the sensing cost but also, reduces the data transmission and storage cost by i=1dpi/ni times.

  2. The smooth and sparse signal components can be recovered by solving a smaller-scale convex optimization problem with input data i=1dpi/ni times smaller than that of the original problem, which significantly boosts the computation.

2.5.2. Compressive Ratio Selection in Practice.

Theorems 2.3 and 2.4 show that the 1D smooth plus sparse signal can be recovered with high probability from a compressive measurement if the compressive ratio is above a specific threshold. However, there are several parameters (such as c1,c2) that are difficult to obtain when calculating the threshold in practice. We propose a practical procedure of selecting the compressive ratio utilizing the historical data. Suppose some training signals with their real background and anomalies are available; the compressive ratio can be selected with the following guidelines.

  1. If there is a requirement for reconstruction accuracy for smooth and sparse signal components, then p^ is chosen as the smallest value that satisfies such a requirement.

  2. If there is no such requirement, we recommend choosing the compressive ratio corresponding to the sharp change point of the slope of the loss function-compressive ratio curve. This point exists because of the existence of such a threshold, after which the reconstruction with high probability is guaranteed by Theorems 2.3 and 2.4. We will demonstrate this in the simulation study in Section 3.1.

For high-order tensor data, the selection of the pi for each mode can be challenging. We provide some empirical guidelines as follows.

  1. If the smoothness of the background is similar along different modes, a unified compressive ratio is recommended.

  2. If the smoothness of the background is different, more sensing budget should be allocated to the mode along which the background is less smooth.

  3. We recommend fixing the ratio among pi along different modes and adopting steps (i) and (ii) for 1D signal to determine the compressive ratio.

2.5.3. Tuning Parameter Selection in Practice.

Notice that in problem formulations (1) and (3), there is a hyperparameter ϵ1, indicating the bound for measurement noise. If the measurement error bound is known from the accuracy of the measurement device, it can be directly used here. Otherwise, a crossvalidation step using historical data is recommended. Similarly, there is a tuning parameter λ in their corresponding Lagrangian form Equations (2) and (4), which controls the sparsity of the decomposed anomaly. Crossvalidation can be used to determine this parameter. For more detail, please refer to Yan et al. (2017).

2.5.4. Bases Selection in Practice.

The bases B and Ba are prespecified based on the understanding/engineering knowledge of the process. The selection of such bases is discussed in detail in section 3.4 of Yan et al. (2017). In general, any smooth basis, such as splines or kernels, can be used for the background. For sparse anomalies, such as small regions scattered over the background or in the form of thin lines, an identity basis is recommended. Linear (quadratic) B splines are recommended for anomalous regions with sharp corners (curved boundaries). However, to ensure the uniqueness of the smooth sparse decomposition, we do require the bases B and Ba to satisfy specific properties, such as those mentioned in Definitions 2.1 and 2.2 and Theorem 2.1, which should be checked for selected bases. We will demonstrate this in the simulation study in Section 3.1.

3. Simulation Study

In this section, we will demonstrate the proposed CSSD and KronCSSD framework with simulation studies. First, the CSSD method is applied on 1D signals in Section 3.1, and then, we apply the KronCSSD method on 2D images in Section 3.2.

3.1. CSSD on a 1D Signal

A 1D signal (yRn) is assumed to be a superposition of the smooth signal component (m=Bθ), the sparse signal component (a=Baθa), and noise e (i.e., y=m+a+e). To demonstrate the proposed CSSD framework, we will first conduct compressive data acquisition on the raw signal (i.e., y=Ay). Then, the data reconstruction and decomposition are achieved in one step. The performance is evaluated by the relative error between the true signal a and the reconstructed one a^ (i.e., aa^2/a2 and mm^2/m2 for the smooth background).

In the simulation study, we generate a 1D signal from MSr,s,μ,l and let n=1,000. The smooth background is generated from a random linear combination of B-spline bases with three knots (r=10) (i.e., m=Bθ, where BRn×r and θRr is a random vector such that θiN(0,1), i{1,,4}). The incoherence condition parameter μ=μ(B)=0.82. The sparse signal component is generated by a sparse random linear combination of degree 2 B-spline bases with 500 knots (q=500,l=4): that is, a=Baθa, where BaRn×q and θaRq is a four-sparse random vector (s=4) such that its nonzero elements follow i.i.d. standard normal distribution. Notice that the RIC, δBa,s for matrix Ba, is hard to calculate in general. However, there is a loose upper bound that can be used, which is δBa,sδBa,p=max{λmax1,1λmin}=0.60 , where λmax and λmin are the maximum and minimum singular values of Ba, respectively. The noise signal generated as a random vector eRn is a random vector such that eiN(0, 0.0012), i{1,,n}. Figure 2 shows a sample raw signal and its smooth, sparse components.

Figure 2. (Color online) A Sample Raw Signal and Its Smooth and Sparse Components
Notes. (a) Raw signal. (b) Smooth signal component m. (c) Sparse signal component a.

Before we state the result, we first check the assumption of Theorems 2.3 and 2.4. For Theorem 2.3, it is easy to check that 0.82=μn2rsl=1,0002×10×4×4=6.25 and also, that δBa,s0.60(0,1). For Theorem 2.4, δBa,2s0.60(0,1), c=0.37>0, and δr,3s,μ,l(0,0.04). Notice that the range for δr,3s,μ,l is small in this case because of the loose upper bound for δBa,2s, which can be further improved. This indicates that the smooth and sparse signal can be recovered by the proposed algorithm with high probability provided that the compressive ratio is above a specific threshold, which is demonstrated by the following observation.

The simulation is repeated 100 times, and the average log relative error for the background and sparse signal components with respect to compressive ratio are shown in Figure 3. We can observe a large error at the beginning, and it drops very fast with an increase of the compressive ratio. Then, above a threshold (0.1 approximately) of compressive ratio, the error becomes small (below 3%), and the decrease of error becomes less significant, which demonstrates the effectiveness of the reconstruction algorithm. The threshold of 0.1 can be chosen as the compressive ratio mentioned in Section 2.5.2.

Figure 3. (Color online) Average Log Relative Reconstruction Error
Notes. (a) Log relative error for the smooth component. (b) Log relative error for the sparse component.

The reconstructed signal components in 1 of the 100 simulations are shown in Figure 4. We can observe that when adopting the compressive ratio of 0.1, both the smooth and sparse signal components can be reconstructed with high accuracy.

Figure 4. (Color online) Reconstructed Smooth and Sparse Signal Components
Notes. (a) Smooth signal component of compressive ratios 0.02 (left panel), 0.05 (center panel), and 0.1 (right panel). (b) Sparse signal component of compressive ratios 0.02 (left panel), 0.05 (center panel), and 0.1 (right panel).

We also vary the magnitude of sparse signals to examine the reconstruction performance of the proposed method. The result and analysis are provided in Appendix H.

3.2. CSSD on a 2D Image

In this simulation study, we aim to decompose an image into the smooth background, sparse anomalies, and noise. A 350×350 image with smooth background and the sparse anomaly is generated similar to Yan et al. (2017) (i.e., Y=M+A+E, where M is the smooth background, A is the sparse anomalies, and E is i.i.d. Gaussian noise such that EiNID(0,σ2)). The smooth background is generated from a linear combination of B-spline bases with 3×3 knots, and the anomalies are generated from a sparse linear combination of B-spline bases with 88×88 knots. The background and anomalies are shown in Figure 5.

Figure 5. (Color online) True Background and Anomalies

We can see that the anomaly size covers a large range. The mean absolute value of the background plus anomaly is μ=0.21. In this simulation, we first study the reconstruction performance under a noise-free scenario. Then, we increase the noise level to test the robustness of the algorithm.

3.2.1. Noise-Free Case.

In this section, we vary the compressive ratio from 4% to 100%. For each compressive ratio, we simulate 100 times and the average false-negative rate (FNR) and false-positive rate (FPR) are reported in Figure 6. The FPR is defined as the portion of normal pixels predicted as anomaly,

FPR=i,j(1IA0{A(i,j)})IA^0{A^(i,j)}i,j(1IA0{A(i,j)}),
and the FNR is defined as the portion of anomalous pixels predicted as normal background,
FNR=i,jIA0{A(i,j)}(1IA^0{A^(i,j)})i,jIA0{A(i,j)},
where A is the true anomaly, A^ is the predicted anomaly, and IΩ() is the indicator function: that is,
IΩ(x)={1,  if xΩ0,  otherwise.

Figure 6. (Color online) Average FNR and FPR

From Figure 6, we can see that both the FPR and FNR ratios decrease as the compressive ratio increases. The FPR drops so fast that when the compressive ratio achieves 8%, there is no false alarm, which is desired for the anomaly detection algorithm. The FNR drops slower, and less than 5% of the anomaly pixels are ignored when the compressive ratio achieves 8%. However, it does not miss any cluster of the anomalies, even though some of them are small.

The recovered sparse signal components are shown in Figure 7 for compressive ratios 4% (Figure 7(a)), 8% (Figure 7(b)), 33% (Figure 7(c)), and 73% (Figure 7(d)). For comparison, we apply the SSD algorithm (Yan et al. 2017), of which the FNR and the FPR are zero and computation time is 0.16 seconds.

Figure 7. (Color online) The Recovered Sparse Signal Components
Notes. (a) Compressive ratio 4%. (b) Compressive ratio 8%. (c) Compressive ratio 33%. (d) Compressive ratio 73%.

We record the computation time in Figure 8. A significant boosting of the computation is observed when the compressive ratio is 8%, which speeds up the SSD algorithm by 4.3 times.

Figure 8. (Color online) The Computation Time

3.2.2. Noisy Case.

The signal-to-noise ratio μ/ϵ [4,40] is studied to evaluate the robustness of the algorithm. The three-dimensional plot of FPR, the signal-to-noise ratio, and the compressive ratio are shown in Figure 9.

Figure 9. (Color online) The Contour Plot of FPR, Signal-to-Noise Ratio, and Compressive Ratio

The purple line indicates the equipotential line of FPR = 0.01. We can see that with the increase of the signal-to-noise ratio, we are allowed to use a less compressive ratio to achieve satisfactory decomposition results. The majority of the area lies below the equipotential line of FPR = 0.01, which means that the proposed algorithm is robust.

4. Case Study

In this section, we use three real cases to demonstrate the effectiveness of the proposed CSSD/KronCSSD framework. For comparison, we also apply the SSD method in each case study. The compressive ratio (c) and the average computational time (t) for a single image are reported in Table 1. A significant transmission bandwidth reduction and computation boost can be observed with negligible performance degradation (Figures 1012), compared with the vanilla SSD algorithm.

Table

Table 1. Comparison Between KronCSSD and SSD

Table 1. Comparison Between KronCSSD and SSD

Surface defectSolar flareIndentation
ct/sct/sct/s
KronCSSD54%0.07322%0.03448%0.053
SSD10.13510.08610.094
Figure 10. (Color online) Steel Rolling Images and Detected Anomalies
Notes. (a) Raw image. (b) KronCSSD result. (c) SSD result.
Figure 11. (Color online) Solar Activity Images and Detected Solar Flare
Notes. (a) Raw image. (b) KronCSSD result. (c) SSD result.
Figure 12. (Color online) Silicon Stress Map and Detected Indentation
Notes. (a) Raw image. (b) KronCSSD result. (c) SSD result.

4.1. Surface Defect Detection in a Steel Rolling Process

As mentioned in Section 1, vision sensors collect high-resolution images of the product surface with a high data acquisition rate in the rolling processes. This poses a challenge in data storage, transmission, and processing. One sample image of size 128×512 with typical anomalies is shown in Figure 10(a). The black scratches shown in the red block are surface anomalies. For a detailed description, the readers are encouraged to refer to Yan et al. (2018). The detected anomalies are shown in Figure 10, (b) and (c) by using the proposed KronCSSD algorithm and the SSD algorithm, respectively. The data set has 100 images, and more example results can be found in Appendix I.

The KronCSSD method achieves a similar anomaly detection performance with 54% bandwidth and is 1.8 times faster. By adopting the KronCSSD method, (i) we can achieve a faster anomaly detection and thus, reduce the loss through a timely intervention of the manufacturing process, and (ii) we can keep the manufacturing inspection information for a longer time period with the same storage capability, which is important for root cause analysis.

4.2. Solar Flare Detection

Another important application is solar flare detection from satellite images. A solar flare is defined as a sudden, transient, and intense variation in brightness over the sun’s surface. It has a significant influence on radio communication on Earth. Each second, thousands of high-resolution images are captured by a satellite, which poses a big challenge to real-time data transmission and processing (Yan et al. 2018). The data set has 300 images, and more example images can be found in Appendix I.

One sample image of size 232× 292 with a typical solar flare is shown in Figure 11(a), where the yellowish bright region is the solar flare. The detected anomalies are shown in Figure 11, (b) and (c) by using the proposed KronCSSD and SSD algorithms, respectively. The KronCSSD method achieves a similar solar flare detection performance as the SSD method but with 22% bandwidth, and it is 2.5 times faster. Notice that the decomposed images can also be used for downstream tasks, such as control charts and so on, which are beyond the scope of this paper.

By adopting the KronCSSD method, we can improve the transmission rate under the same transmission bandwidth and thus, achieve almost five times faster solar flare detection, which is of vital importance for protecting radio communications, power grids, and navigation systems.

4.3. Silicon Surface Indentation Detection

The stress map of size 90×550 of a silicon surface laminate with surface indentation is shown in Figure 12, where clusters of high-stress areas indicate the surface indentation (Yan et al. 2017). We aim to detect those high-stress areas. The detected anomalies are shown in Figure 12, (b) and (c). We can see that the KronCSSD method achieves a similar detection performance with the SSD method but with 40% bandwidth, and it is 1.8 times faster, which significantly reduces the storage and transmission cost and improves the processing speed.

5. Conclusion

In this paper, we proposed a CSSD framework for efficient data acquisition, transmission, and processing for sparse anomaly detection in smooth backgrounds. To further enhance its computational efficiency, a KronCSSD framework is proposed for tensor data.

The contributions of this work are twofold. (i) Theoretically, we showed the feasibility of combining compressive sensing and smooth sparse decomposition. This enables the adoption of a compressive data acquisition approach. (ii) Practically, the proposed framework is compatible with many existing decomposition-based anomaly detection algorithms, such as SSD, ST-SSD, and so on, which achieve both a significant cost reduction in sensing, storage, and transmission and a boost in speed but with negligible loss in their performance.

In this article, we use a simulation study to demonstrate the effectiveness and robustness of the proposed CSSD/KronCSSD framework. Three case studies across different applications demonstrate the versatility of the proposed framework. The authors believe that the CSSD/KronCSSD framework can be applied in a wider range of applications toward more efficient data acquisition, transmission, and processing.

Further studies on the theoretical properties of the proposed KronCSSD can be a future direction.

Appendix A

Proof of Theorem 2.1.

We prove Theorem 2.1 by contradiction. Assume there exists two different decompositions for the same smooth plus sparse signal y (i.e., y=m1+a1=m2+a2, where m1m2 and a1a2). Then,

m1m2=a1+a2.(A.1)

Because m1m2, we can normalize both side by m1m22 and denote m˜=(m1m2)/m1m22 and a˜=(a1+a2)/m1m22. Notice that m˜ is in the column space of B, which is spanned by columns of the U (recall that B=UΣVT; i.e., m˜=Ux, where x is the coefficient vector, xRr). Because m˜2=1, we have x2=1. We can bound each element in m˜ as follows:

|m˜i|=eiTUxeiTU2x2maxi{1,..,r}UTei2μ(B)rn<12ls, i{1,,n}.

According to Definition 2.1, a10ls and a20ls. Therefore, a˜02ls. Moreover, according to Equation (A.1), we conclude that m˜02ls. Denote the support of m˜ as Tm˜; we have m˜2=iTm˜|m˜i|2<1, which is a contradiction. □

Appendix B

Proof of Theorem 2.2.

We prove Theorem 2.2 with contradiction. Assume there exists another vector y=Bθ+BaθaMSr,s,μ,l such that Ay=b and yy0. Then, z=yy0=B(θθ0)+Ba(θaθa0) is a nonzero vector. Because θaθa01θa1+θa012s, by Definition 2.3, we have that zMSr,2s,μ,l. Therefore, 0=Az2(1δr,2s,μ,l)z2>0, which is a contradiction. Notice that the proof is inspired by the proof of lemma 3.1 in Candes and Tao (2005). 

Appendix C

Proof of Theorem 2.3.

This proof is inspired by Baraniuk et al. (2008) and Tanner and Vary (2020).

We will first derive the RIC for a fixed subspace MSr,T,μ,l of MSr,s,μ,l when θa is restricted in a fixed subspace T with the fixed support such that the number of nonzero elements is s:

MSr,T,μ,l={yRn|y=Bθ+Baθa,BaRn×q, l(Ba)=l, θaT, BRn×r,μ(B)=μ<n(2rsl)1, θRr}.

Then, we use a covering argument that counts over all possible sparse subspaces T with support less than or equal to s. Finally, we can derive the RIC for MSr,s,μ,l.

The following lemma describes the RIC for a fixed subspace MSr,T,μ,l and is proved in Appendix E.

Lemma C.1.

RIC for a fixed subspace MSr,T,μ,l. Let ARp×n be a matrix from the families described inDefinition 2.5. Furthermore, assume that μ<n(2rsl)1 and that the basis matrix Ba for the sparse signal component satisfies the RIP with RIC δBa,s(0,1): that is, δBa,s is the smallest positive constant such that

(1δBa,s)θa2Baθa2(1+δBa,s)θa2, θa{θaRq|θa0s}.

For a given δ(0,1), there exists a constant c0>0 depending only on δ, such that the RIC for MSr,T,μ,l is upper bounded by δ with the probability of at least 12(24δτ1)r(24δτ0)sepc0(δ/2), where η=μrsln, τ0=1(1δBa,s)(1η2), τ1=B2(1+11η2), and B=(BTB)1BT.

Notice that for a fixed subspace MSr,T,μ,l, the RIP will fail with probability less than or equal to 2(24δτ1)r(24δτ0)sepc0(δ/2). Because there are (ns)(ens)s such subspaces, the probability to fail for MSr,s,μ,l, which is a combination of those (ns) subspaces, will be less than or equal to

(ns)2(24δτ1)r(24δτ0)sepc0(δ2)exp(c0(δ2)p+ln2+rln24δτ1+s(1+ln24δτ0+lnns)).

Then, for any give δ, there exist c1,c2>0, such that the probability to fail for MSr,s,μ,l is less than or equal to  exp(c1p), provided that pc2(ln2+rln24δτ1+s(1+ln24δτ0+lnns)), where c2=[c0(δ2)c1]1. This finishes the proof.

Appendix D

Proof of Theorem 2.4.

The proof of Theorem 2.4 is inspired by Candes et al. (2006) and Tanner and Vary (2020). Assume that in Problem (1), ϵ1 is properly chosen such that Problem (1) is feasible. In the following discussion, we will use ()* to denote the optimal solution of Problem (1) and ()0 to denote the signal we wish to recover. Let R=X*X0=Rm+Ra, where Rm=mm0=Bθ*Bθ0 and Ra=Baθa*Baθa0 are the residuals of the smooth and sparse signal components, respectively.

Let h=θa*θa0=hT0+hT0c, where T0 is the support of θa0, hT0 denotes the projection of h onto T0 such that

hT0(t)={t,if tT00,otherwise,
and T0c denotes the complementary set of T0.

Because θa0 is feasible and θa* is the optimal solution of Problem (1), we must have θa*1θa01, which is equivalent to θa0+hT0+hT0c1θa01. Because T0 and T0c are complementary to each other, we have θa0+hT01+hT0c1θa01. Because θa0+hT01θa01hT01, we have θa01hT01+hT0c1θa01. Hence, hT0c1hT01. Because hT01shT02, we have

hT0c1shT02.(D.1)

Similar to Candes et al. (2006), we order the elements of T0c in decreasing order of their magnitude and enumerate T0c as v1,,vn|T0|. Then, T0c is divided into subsets Tic of size M, where

Tic={vj:(i1)MjiM}.

Let hTic  be the projection of h onto Tic; we have

hTic0M, i1TicTjc=Ø, ijhTi+1c21MhTic1, i1,(D.2)
where the last inequality comes from the fact that T0c is in decreasing order, such that
|hTi+1c|(v)1MjTic|hTic|(j) .vTi+1c.

Define RTica=BahTic and RT0a=BahT0, and combine Equations (D.1) and (D.2). Then, we have

j2RTjca2j21+δBa,MhTjc2(a)j11+δBa,MhTjc1M=1+δBa,MhT0c1M(b)s1+δBa,MhT02Ms1+δBa,MRT0a2M1δBa,M,
where (a) follows Equation (D.2), (b) follows Equation (D.1), and the RIP property of Ba is used because hTjc0M.

Denote γ=1+δBa,M+s1δBa,M+s. (A tighter bound can be achieved by using 1+δBa,M1δBa,M. However, we adopt δBa,M+s instead of δBa,M for simplicity in the following proof.) Then, we have

j2RTjca2sMγRT0a2.(D.3)

Next, we derive the bounds for Rm and Ra.

Bound for Rm:

ARm22=|ARm, A(RRa)|=|ARm,A(RRa)|=|ARm,AR+ARm,ARa||ARm,AR|+|ARm,ARa|=|ARm,AR|+|ARm,A(RT0a+RT1ca+j2RTjca)||ARm,AR|+|ARm,A(RT0a+RT1ca)|+j2|ARm,ARTjca|.(D.4)

In the following discussion, we will bound those terms. According to the Cauchy–Schwarz inequality, the first term can be bounded as

|ARm,AR|ARm2AR21+δr,s,μ,lϵ1Rm2,(D.5)
where the last inequality comes from the RIP property and the first constraint in Problem (1).

The third term can be bounded as follows. Denoting z1=Rm/Rm2 and z2=RTjca/RTjca2, we have

|ARm,ARTjca|Rm2RTjca2=|Az1,Az2|=14|A(z1+z2)22A(z1z2)22|(a)14max{|(1+δr,M,μ,l)z1+z222(1δr,M,μ,l)z1z222|,|(1+δr,M,μ,l)z1z222(1δr,M,μ,l)z1+z222|}=|δr,M,μ,l+z1,z2|(b)δr,M,μ,l+η11η12,
where η1=μrMln; inequality (a) follows the RIP property because z1+z2MSr,M,μ,l and z1z2MSr,M,μ,l; and inequality (b) follows Equation (F.1) in the proof of Lemma E.1 and z12=z22=1,z1,z2η11η12z12z22=η11η12, provided that M2s.

Therefore,

|ARm,ARTjca|(δr,M,μ,l+η11η12)Rm2RTjca2.(D.6)

Similarly, the second term can be bounded as

|ARm,A(RT0a+RT1ca)|(δr,M+s,μ,l+η21η22)Rm2RT0a+RT1ca2,(D.7)
where η2=μr(M+s)ln, provided that Ms.

Plugging Equations (D.5)–(D.7) into Equation (D.4), we have

ARm22Rm2(1+δr,s,μ,lϵ1+(δr,M+s,μ,l+η21η22)RT0a+RT1ca2+(δr,M,μ,l+η11η12)RTjca2)
(a)Rm2(1+δr,s,μ,lϵ1+(δr,M+s,μ,l+η21η22)RT0a+RT1ca2+(δr,M,μ,l+η11η12)sMγRT0a2),
where inequality (a) follows Equation (D.3).

According to the RIP property, we have

(1δr,s,μ,l)Rm22Rm2(1+δr,s,μ,lϵ1+(δr,M+s,μ,l+η21η22)RT0a+RT1ca2+(δr,M,μ,l+η11η12)sMγRT0a2).

Consequently, we have

Rm21+δr,s,μ,lϵ1+((δr,M,μ,l+η11η12)sMγ2+(δr,M+s,μ,l+η21η22))RT0a+RT1ca2(1δr,s,μ,l),(D.8)
where the inequality follows from
RT0a2=BahT02(a)1+δBa,shT02(b)1+δBa,shT0+hT1c2(c)1+δBa,s1δBa,M+sBa(hT0+hT1c)2γRT0a+RT1ca2,(D.9)
inequalities (a) and (c) follow the RIP property of Ba, and inequality (b) follows that T0T1c=Ø.

Bound for Ra:

A(RT0a+RT1ca)22=|A(RT0a+RT1ca), A(RT0a+RT1caR+R)|=|A(RT0a+RT1ca),AR|+|A(RT0a+RT1ca),A(Rm+j2RTjca)||A(RT0a+RT1ca),AR|+|ARm,A(RT0a+RT1ca)|+j2|A(RT0a+RT1ca),ARTjca|.(D.10)

In the following discussion, we will bound those terms. According to the Cauchy–Schwarz inequality, the first term can be bounded as

|A(RT0a+RT1ca),AR|A(RT0a+RT1ca)2AR21+δr,M+s,μ,lϵ1RT0a+RT1ca2,(D.11)
where the last inequality comes from the RIP property and the first constraint in Problem (1).

The third term can be bounded as follows. Denoting z2=RTjca/RTjca2, j2 and z3=(R(T0)a+R(T1c)a)/R(T0)a+R(T1c)a2, we have

|A(RT0a+RT1ca),ARTjca|RT0a+RT1ca2RTjca2=|Az3,Az2|=14|A(z3+z2)22A(z3z2)22|(a)14max{|(1+δr,2M+s,μ,l)z3+z222(1δr,2M+s,μ,l)z3z222|, |(1+δr,M+2s,μ,l)z3z222(1δr,2M+s,μ,l)z3+z222|}=|δr,2M+s,μ,l+z3,z2|(b)δr,2M+s,μ,l,
where inequality (a) follows the RIP property because z3+z2MSr,2M+s,μ,l and z3z2MSr,2M+s,μ,l. Inequality (b) comes from that TicTjc=Ø, ij and T0Tic=Ø,i.

Therefore,

|ARm,ARTjca|δr,2M+s,μ,lRT0a+RT1ca2RTjca2.(D.12)

Plugging Equations (D.7), (D.11), and (D.12) into Equation (D.10), we have

A(RT0a+RT1ca)22RT0a+RT1ca2(1+δr,M+s,μ,lϵ1+(δr,M+s,μ,l+η21η22)Rm2+δr,2M+s,μ,lj2RTjca2) (a)RT0a+RT1ca2(1+δr,M+s,μ,lϵ1+(δr,M+s,μ,l+η21η22)Rm2+δr,2M+s,μ,lsMγRT0a2)(b)RT0a+RT1ca2(1+δr,M+s,μ,lϵ1+(δr,M+s,μ,l+η21η22)Rm2+δr,2M+s,μ,lsMγ2RT0a+RT1ca),
where inequalities (a) and (b) follows the same argument as deriving Equation (D.8).

According to the RIP property, we have

(1δr,M+s,μ,l)RT0a+RT1ca22RT0a+RT1ca2(1+δr,M+s,μ,lϵ1+(δr,M+s,μ,l+η21η22)Rm2+δr,2M+s,μ,lsMγ2RT0a2).

Consequently, we have

RT0a+RT1ca2(1+δr,M+s,μ,lϵ1+(δr,M+s,μ,l+η21η22)Rm2+δr,2M+s,μ,lsMγ2RT0a+RT1ca)(1δr,M+s,μ,l).(D.13)

Notice that Equations (D.8) and (D.13) still hold if we relax δr,M+s,μ,l, δr,s,μ,l to δr,2M+s,μ,l. For simplicity, here we replace δr,M+s,μ,l, δr,s,μ,l with δr,2M+s,μ,l in the following derivation.

Plugging Equation (D.8) into Equation (D.13) and letting xRT0a+RT1ca2, yRm2, we have

(D1B1B2D2C1)xA1ϵ1+B1D2A2ϵ1,(D.14)
where
A1=1+δr,2M+s,μ,l, B1=(δr,2M+s,μ,l+η21η22),C1=δr,2M+s,μ,lsMγ2, D1=(1δr,2M+s,μ,l),
and
A2=1+δr,2M+s,μ,l, B2=(δr,2M+s,μ,l+η11η12)sMγ2+(δr,2M+s,μ,l+η21η22), D2=(1δr,2M+s,μ,l).

Here, we require D1B1B2D2C1>0 in Equation (D.14) and let M=s, which is

1γ2α1α2α22((1+α1+α2)γ2+2α2+2)δr,3s,μ,l>0.(D.15)

Let a=(1+α1+α2)γ2+2α2+2 and c=1γ2α1α2α22, where α1=η1η2, α2=2η12η2, γ=1+δBa,2s1δBa,2s, and η=μrsln. If c>0,  then there exist a δr,3s,μ,l>0 such that Equation (D.15) is valid. Then, the denominator aδr,3s,μ,l+c>0 δr,3s,μ,l(0,c/a) . Consequently,

RT0a+RT1ca2(1+α2)1+δr,3s,μ,lcaδr,3s,μ,lϵ1.

Notice that from Equations (D.1) and (D.9), we have

j2RTjca2γRT0a2γ2RT0a+RT1ca2.

Therefore, Ra2RT0a+RT1ca2+j2RTjca2Caϵ1, where

Ca=(1+γ2)(1+α2)1+δr,3s,μ,lcaδr,3s,μ,l.

Similarly, we can bound Rm2 as Rm2Cmϵ1,  where

Cm=1+δr,3s,μ,l+(δr,3s,μ,l+γ21+γ2α1+11+γ2α2)Ca(1δr,3s,μ,l).

Appendix E

Proof of Lemma C.1.

In this section, we will provide the proof for Lemma C.1. By linearity of the measurement matrix A, without loss of generality, it is enough to prove this lemma when y2=1. The proof mainly has two steps. First, the bounds for θa and θ are derived, and a finite set of points to approximate the set MSr,T,μ,l to any accuracy in norm 2 sense can be found. Then, the concentration inequality can be applied through a union bound. This is a common approach in compressive sensing literature (Baraniuk et al. 2008, Tanner and Vary 2020).

To derive the upper bounds for θa and θ, we first derive the upper bounds for the signals m=Bθ and a=Baθa, which are given in the following lemma.

Lemma E.1.

The smooth signal component m and sparse signal component a of the signal y in MSr,T,μ,l with μ<n2rsl can be bounded as follows:

m2=Bθ2y21η2,(E.1)
a2=Baθa2y21η2,(E.2)
where η=μrsln.

The proof is presented in Appendix F.

According to the RIP for Ba, we have

(1δBa,s)θa2Baθa2(1+δBa,s)θa2.(E.3)

Combining Equations (E.2) and (E.3), we have

θa21(1δBa,s)Baθa2y2(1δBa,s)(1η2)=1(1δBa,s)(1η2).

Denoting τ0=1(1δBa,s)(1η2), we have θa2τ0. Recall that y=Bθ+Baθa; we have θ=B(yBaθa), where B=(BTB)1BT. Therefore, according to the triangle inequality and the Cauchy–Schwarz inequality, we have

θ2B2(y2+Baθa2)B2(1+11η2).

Denoting τ1=B2(1+11η2), we have θ2τ1.

Because we have derived the bounds for θa and θ, the covering number of MSr,T,μ,l is given by the following lemma whose proof is in Appendix G.

Lemma E.2.

There exists a set QMSr,T,μ,l, such that for all yMSr,T,μ,l, with y2=1 we have minqQqy2δ4 and |Q|(24δτ1)r(24δτ0)s, where |Q| is its cardinality.

Next, we will prove the main result by applying the concentration inequality (Definition 2.5(ii)) with union bound. Let ϵ=δ/2,

(1δ2)q22Aq22(1+δ2)q22 qQ,(E.4)
with probability greater than 12|Q|epc0(δ/2).

Because δ(0,1), we have 1δ21δ2 and 1+δ21+δ2. Then, Equation (E.4) can be written as

(1δ2)q2Aq2(1+δ2)q2 qQ,(E.5)
with probability greater than 12|Q|epc0(δ/2).

By the triangle inequality, we have

Ay2A(yq)2+Aq2.(E.6)

Define

U=maxyMSr,T,μ,l,y2=1Ay2,(E.7)
which is attainable because MSr,T,μ,l is closed.

Combining Equations (E.5) and (E.6), we have yMSr,T,μ,l, with y2=1; there exists a qMSr,T,μ,l, such that Ay2A(qy)2+(1+δ2)q2, with probability greater than 12|Q|epc0(δ/2).

Because qyMSr,T,μ,l, if qy=0, we have Ay2(1+δ2)q2=1+δ2.

If qy0, we have Ay2A(qy)qy22qy2+(1+δ2)q2δ4U+1+δ2.

Notice that the second inequality comes from Equation (E.7) combined with Q being a δ4 covering of MSr,T,μ,l. In summary, we have Ay2δ4U+1+δ2.

Because U is attainable, according to Equation (E.7), we have Uδ4U+1+δ2. Consequently, we have U1+34δδ1+δ because δ<1. Therefore, Ay21+δ, with probability greater than 12|Q|epc0(δ/2).

Similarly, we can prove that Ay21δ with probability greater than 12|Q|epc0(δ/2).

Finally, according to Lemma E.2, we have that

12|Q|epc0(δ2)12(24δτ1)r(24δτ0)sepc0(δ2).

This finishes the proof.

Appendix F

Proof of Lemma E.1.

To prove the result, we first derive a nontrivial upper bound for the inner produce between m and a. Let B=UΣVT be the reduced SVD of B; then,

|mTa|=|θTBTa|=|θTVΣUTa|=|θTVΣUTinaiei|=|θTVΣinaiUTei|θTVΣ2inaiUTei2θTVΣ2in|ai|UTei2θTVΣ2in|ai|maxj{1,r}UTej2(a)θTVΣUT2a1μrn(b)μrslnm2a2,(F.1)
where inequality (a) follows θTVΣ2=θTVΣUT2 because UTU=I and Definition 2.2. Inequality (b) follows from a1lsa2.

Let η=μrsln because μ<nrsl; we have η<1 and

|mTa|=|y22m22a22|2ηm2a2.

Therefore, we have

m22+a22y222ηm2a2.

By completing the square, we have

(m2+ηa2)2+(1η2)a22y220.

Because (m2+ηa2)20, we have

a21(1η2)y2.

Similarly, we can derive that

m21(1η2)y2.

This finishes the proof.

Appendix G

Proof of Lemma E.2.

We first state results for the covering number of a set (Vershynin 2018). The covering number of a smallest ϵ net for a unit l2 norm ball in d-dimensional space is (3/ϵ)d.

Let M={mRn|m=Bθ, θRr, θ2τ1,μ(B)=μ} and S={aRn|a=Baθa,θaT,θa2τ0,l(Ba)=l}. There exist two finite δ8 covering sets of M and S, which are QMM and QSS.

For all qMQM and for all mM, we have

minqMQMmqM2δ8;
for all qSQs and for all aS, we have
minqSQsaqS2δ8.

Therefore, we have |QM|(24δτ1)r and |Qs|(24δτ0)s.

Define QMS={qM+qS|qMQM, qSQS}MSr,T,μ,l. Then, yMSr,T,μ,l, there exists a pair qMS=qM+qSMSr,T,μ,l, such that

qMSy2=qMm+qSa2qMm2+qSa2δ4.

Therefore, QMS is a δ/4 covering of MSr,T,μ,l and |QMS|(24δτ1)r(24δτ0)s. This finishes the proof.

Appendix H. Simulation Study by Varying the Magnitude of Sparse Signals

We adopt the same simulation data generation procedure as in Section 3.1 while changing the distribution of elements in θaRq such that the nonzero elements follow i.i.d. normal distribution with mean zero and standard deviation σs in the range of {0.065, 0.125, 0.25, 0.5}. The example signals and reconstruction performance are shown in Figure H.1.

Figure H.1. (Color online) Simulation Study by Varying σs
Notes. (a)–(d) Example signals corresponding to different σs, (e) log relative error for the smooth component, and (f) log relative error for the sparse component. (a) σs=0.065. (b) σs=0.125. (c) σs=0.25. (d) σs=0.5. (e) Log relative error for the smooth component. (f) Log relative error for the sparse component

There are several observations.

  1. The log relative error of the smooth component does not change with different magnitudes of σs. This agrees with the theoretical result in Theorem 2.4, where the reconstruction error is bounded by a constant times the noise bound because the noise level is kept the same in all simulations.

  2. The log relative error of the sparse component decreases as σs increases. This also agrees with the theoretical result in Theorem 2.4. Because the log relative error of the sparse component is defined as logaa^2/a2, according to Theorem 2.4, the reconstruction error term can be approximated by Caϵ1 (i.e., aa^2Caϵ1, where Ca is independent of θa). a2=Baθa2 increases as the magnitude of elements in θa increases. Therefore, logaa^2/a2 will decrease as σs increases.

  3. The 0.1 threshold (approximately) of the compressive ratio still holds with different magnitudes of sparse signal component.

Appendix I. Sample Case Study Images

Appendix I.1. Sample Images of Case Study 4.1

See Figure I.1.

Figure I.1. Sample Images of Surface Defect Detection in the Steel Rolling Process
Notes. The top row shows raw images. The middle row shows corresponding SSD results. The bottom row shows KronCSSD results.

Appendix I.2. Sample Images of Case Study 4.2

See Figure I.2.

Figure I.2. (Color online) Sample Images of Solar Flare Detection
Notes. The top row shows raw images. The middle row shows corresponding SSD results. The bottom row shows KronCSSD results.

References

  • Augusto CRA, Fauth AC, Navia CE, Shigeouka H, Tsui KH (2011) Connection among spacecrafts and ground level observations of small solar transient events. Experiment. Astronomy 31(2):177–197.Google Scholar
  • Baraniuk R, Davenport M, DeVore R, Wakin M (2008) A simple proof of the restricted isometry property for random matrices. Constructive Approximation 28(3):253–263.Google Scholar
  • Bouwmans T, Zahzah EH (2014) Robust PCA via principal component pursuit: A review for a comparative evaluation in video surveillance. Comput. Vision Image Understanding 122:22–34.Google Scholar
  • Candes EJ (2008) The restricted isometry property and its implications for compressed sensing. Competus Rendus Math. 346(9–10):589–592.Google Scholar
  • Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Trans. Inform. Theory 51(12):4203–4215.Google Scholar
  • Candes EJ, Romberg JK, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59(8):1207–1223.Google Scholar
  • Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J. ACM 58(3):1–37.Google Scholar
  • De Boor C, De Boor C (1978) A Practical Guide to Splines, vol. 27 (Springer-Verlag, New York).Google Scholar
  • Duarte MF, Baraniuk RG (2011) Kronecker compressive sensing. IEEE Trans. Image Processing 21(2):494–504.Google Scholar
  • Eilers PH, Marx BD (1996) Flexible smoothing with B-splines and penalties. Statist. Sci. 11(2):89–121.Google Scholar
  • Giannakis GB, Mateos G, Farahmand S, Kekatos V, Zhu H (2011) USPACOR: Universal sparsity-controlling outlier rejection. Tichavsky P, Cernocky H, Prochazka A, eds. 2011 IEEE Internat. Conf. Acoustics Speech Signal Processing (IEEE, New York), 1952–1955.Google Scholar
  • Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev. 51(3):455–500.Google Scholar
  • Mardani M, Mateos G, Giannakis GB (2013) Recovery of low-rank plus compressed sparse matrices with application to unveiling traffic anomalies. IEEE Trans. Inform. Theory 59(8):5186–5205.Google Scholar
  • Marques EC, Maciel N, Naviner L, Cai H, Yang J (2018) A review of sparse recovery algorithms. IEEE Access 7:1300–1322.Google Scholar
  • Mateos G, Giannakis GB (2011) Robust nonparametric regression by controlling sparsity. 2011 IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP).Google Scholar
  • Minaee S, Abdolrashidi A, Wang Y (2015) Screen content image segmentation using sparse-smooth decomposition. 2015 49th Asilomar Conf. Signals Systems Comput.Google Scholar
  • Mou S, Wang A, Zhang C, Shi J (2021) Additive tensor decomposition considering structural data information. IEEE Trans. Automation Sci. Engrg. 19(4):2904–2917.Google Scholar
  • Rani M, Dhok SB, Deshmukh RB (2018) A systematic review of compressive sensing: Concepts, implementations and applications. IEEE Access 6:4875–4894.Google Scholar
  • Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3):471–501.Google Scholar
  • Tanner J, Vary S (2020) Compressed sensing of low-rank plus sparse matrices. Preprint, submitted July 18, https://arxiv.org/abs/2007.09457v1.Google Scholar
  • Unser M (1999) Splines: A perfect fit for signal and image processing. IEEE Signal Processing Magazine 16(6):22–38.Google Scholar
  • Vershynin R (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47 (Cambridge University Press, Cambridge, United Kingdom).Google Scholar
  • Wang A, Xian X, Tsung F, Liu K (2018) A spatial-adaptive sampling procedure for online monitoring of big data streams. J. Quality Tech. 50(4):329–343.Google Scholar
  • Waters AE, Sankaranarayanan AC, Baraniuk RG (2011) SpaRCS: Recovering low-rank and sparse matrices from compressive measurements. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1089–1097.Google Scholar
  • Xu H, Caramanis C, Sanghavi S (2012) Robust PCA via outlier pursuit. IEEE Trans. Inform. Theory 58(5):3047–3064.Google Scholar
  • Yan H, Paynabar K, Shi J (2017) Anomaly detection in images with smooth background via smooth-sparse decomposition. Technometrics 59(1):102–114.Google Scholar
  • Yan H, Paynabar K, Shi J (2018) Real-time monitoring of high-dimensional functional data streams via spatio-temporal smooth sparse decomposition. Technometrics 60(2):181–197.Google Scholar