Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision

Gu, Shuhang; Xie, Qi; Meng, Deyu; Zuo, Wangmeng; Feng, Xiangchu; Zhang, Lei

doi:10.1007/s11263-016-0930-5

Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision

Published: 18 July 2016

Volume 121, pages 183–208, (2017)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Shuhang Gu¹,
Qi Xie²,
Deyu Meng²,
Wangmeng Zuo³,
Xiangchu Feng⁴ &
…
Lei Zhang¹

10k Accesses
566 Citations
2 Altmetric
Explore all metrics

Abstract

As a convex relaxation of the rank minimization model, the nuclear norm minimization (NNM) problem has been attracting significant research interest in recent years. The standard NNM regularizes each singular value equally, composing an easily calculated convex norm. However, this restricts its capability and flexibility in dealing with many practical problems, where the singular values have clear physical meanings and should be treated differently. In this paper we study the weighted nuclear norm minimization (WNNM) problem, which adaptively assigns weights on different singular values. As the key step of solving general WNNM models, the theoretical properties of the weighted nuclear norm proximal (WNNP) operator are investigated. Albeit nonconvex, we prove that WNNP is equivalent to a standard quadratic programming problem with linear constrains, which facilitates solving the original problem with off-the-shelf convex optimization solvers. In particular, when the weights are sorted in a non-descending order, its optimal solution can be easily obtained in closed-form. With WNNP, the solving strategies for multiple extensions of WNNM, including robust PCA and matrix completion, can be readily constructed under the alternating direction method of multipliers paradigm. Furthermore, inspired by the reweighted sparse coding scheme, we present an automatic weight setting method, which greatly facilitates the practical implementation of WNNM. The proposed WNNM methods achieve state-of-the-art performance in typical low level vision tasks, including image denoising, background subtraction and image inpainting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

Himanshu Mittal, Avinash Chandra Pandey, … Garv Modwel

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

Jianchao Bai, Linyuan Jia & Zheng Peng

Notes

A general proximal operator is defined on a convex problem to guarantee an accurate projection. Although the problem here is nonconvex, we can strictly prove that it is equivalent to a convex quadratic programing problem in Sect. 3. We thus also call it a proximal operator throughout the paper for convenience.
http://www.cs.tut.fi/foi/GCF-BM3D/BM3D.zip
http://people.csail.mit.edu/danielzoran/noiseestimation.zip
http://lear.inrialpes.fr/people/mairal/software.php
http://www4.comp.polyu.edu.hk/~cslzhang/code/NCSR.rar
http://www.csee.wvu.edu/xinl/demo/saist.html
The SAR image was downloaded at http://aess.cs.unh.edu/radar%20se%20Lecture%2018%20B.html.
The color image was used in previous work (Portilla 2004).
http://www.cs.cmu.edu/ftorre/codedata.html
http://winsty.net/brmf.html
http://sites.google.com/site/yinqiangzheng/
http://www.cs.cmu.edu/~deyum/Publications.htm
The color versions of images #3, #5, #6, #7, #9, #11 are used in this MC experiment.
http://www.gris.informatik.tu-darmstadt.de/sroth/research/foe
http://gpi.upf.edu/static/vnli/interp/interp.html
http://people.ee.duke.edu/mz1/Softwares
http://www.imm.dtu.dk/pcha/mxTV/,

References

Arias, P., Facciolo, G., Caselles, V., & Sapiro, G. (2011). A variational framework for exemplar-based image inpainting. International Journal of computer Vision, 93(3), 319–347.
Article MathSciNet MATH Google Scholar
Babacan, S. D., Luessi, M., Molina, R., & Katsaggelos, A. K. (2012). Sparse bayesian methods for low-rank matrix estimation. IEEE Transactions on Signal Processing, 60(8), 3964–3977.
Article MathSciNet Google Scholar
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Article Google Scholar
Buades, A., Coll, B., & Morel, J. M. (2005). A non-local algorithm for image denoising. In CVPR.
Buades, A., Coll, B., & Morel, J. M. (2008). Nonlocal image and movie denoising. International Journal of Computer Vision, 76(2), 123–139.
Article Google Scholar
Buchanan, A.M., & Fitzgibbon, A.W, (2005). Damped newton algorithms for matrix factorization with missing data. In CVPR.
Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.
Article MathSciNet MATH Google Scholar
Candès, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6), 717–772.
Article MathSciNet MATH Google Scholar
Candès, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing sparsity by reweighted $l_1$ minimization. Journal of Fourier Analysis and Applications, 14(5–6), 877–905.
Article MathSciNet MATH Google Scholar
Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 11.
Article MathSciNet MATH Google Scholar
Chan, T. F., & Shen, J. J. (2005). Image processing and analysis: Variational, PDE, wavelet, and stochastic methods. Philadelphia: SIAM Press.
Book MATH Google Scholar
Chartrand, R. (2007). Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 14(10), 707–710.
Article Google Scholar
Chartrand, R. (2012). Nonconvex splitting for regularized low-rank+ sparse decomposition. IEEE Transaction on Signal Processing, 60(11), 5810–5819.
Article MathSciNet Google Scholar
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transaction on Image Processing, 16(8), 2080–2095.
Article MathSciNet Google Scholar
Dahl, J., Hansen, P. C., Jensen, S. H., & Jensen, T. L. (2010). Algorithms and software for total variation image reconstruction via first-order methods. Numerical Algorithms, 53(1), 67–92.
Article MathSciNet MATH Google Scholar
De La Torre, F., & Black, M. J. (2003). A framework for robust subspace learning. International Journal of Computer Vision, 54(1–3), 117–142.
Article MATH Google Scholar
Ding, X., He, L., & Carin, L. (2011). Bayesian robust principal component analysis. IEEE Transactions on Image Processing, 20(12), 3419–3430.
Article MathSciNet Google Scholar
Dong, W., Zhang, L., & Shi, G. (2011). Centralized sparse representation for image restoration. In ICCV.
Dong, W., Shi, G., & Li, X. (2013). Nonlocal image restoration with bilateral variance estimation: A low-rank approach. IEEE Transaction on Image Processing, 22(2), 700–711.
Article MathSciNet Google Scholar
Dong, W., Shi, G., Li, X., Ma, Y., & Huang, F. (2014). Compressive sensing via nonlocal low-rank regularization. IEEE Transaction on Image Processing, 23(8), 3618–3632.
Article MathSciNet Google Scholar
Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transaction on Info Theory, 41(3), 613–627.
Article MathSciNet MATH Google Scholar
Eriksson, A., & Van Den Hengel, A. (2010). Efficient computation of robust low-rank matrix approximations in the presence of missing data using the $l_1$ norm. In CVPR.
Fazel, M. (2002). Matrix rank minimization with applications. PhD thesis, PhD thesis, Stanford University.
Fazel, M., Hindi, H., & Boyd, S.P. (2001). A rank minimization heuristic with application to minimum order system approximation. In American Control Conference. (ACC).
Gu, S., Zhang, L., Zuo, W., & Feng, X. (2014). Weighted nuclear norm minimization with application to image denoising. In CVPR.
Jain, P., Netrapalli, P., & Sanghavi, S. (2013). Low-rank matrix completion using alternating minimization. In ACM symposium on theory of computing.
Ji, H., Liu, C., Shen, Z., & Xu, Y. (2010). Robust video denoising using low rank matrix completion. In CVPR.
Ji, S., & Ye, J. (2009). An accelerated gradient method for trace norm minimization. In ICML (pp. 457–464).
Ke, Q., & Kanade, T. (2005). Robust $l_1$ norm factorization in the presence of outliers and missing data by alternative convex programming. In CVPR.
Kwak, N. (2008). Principal component analysis based on l1-norm maximization. IEEE Transaction on Pattern Analysis and Machine Intelligence, 30(9), 1672–1680.
Article Google Scholar
Levin, A., & Nadler, B. (2011). Natural image denoising: Optimality and inherent bounds. In CVPR.
Levin, A., Nadler, B., Durand, F., & Freeman, W.T. (2012). Patch complexity, finite pixel correlations and optimal denoising. In ECCV.
Li, L., Huang, W., Gu, I. H., & Tian, Q. (2004). Statistical modeling of complex backgrounds for foreground object detection. IEEE Transaction on Image Processing, 13(11), 1459–1472.
Article Google Scholar
Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., & Ma, Y. (2009). Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. In International Workshop on Computational Advances in Multi-Sensor Adaptive Processing.
Lin, Z., Liu, R., & Su, Z. (2011). Linearized alternating direction method with adaptive penalty for low-rank representation. In NIPS.
Lin, Z., Liu, R., & Li, H. (2015). Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning. Machine Learning, 99(2), 287–325.
Article MathSciNet MATH Google Scholar
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, & Y., Ma, Y. (2010). Robust subspace segmentation by low-rank representation. In ICML.
Liu, R., Lin, Z., De la, Torre, F., & Su, Z. (2012). Fixed-rank representation for unsupervised visual learning. In CVPR.
Lu, C., Tang, J., Yan, S., & Lin, Z. (2014a). Generalized nonconvex nonsmooth low-rank minimization. In CVPR.
Lu, C., Zhu, C., Xu, C., Yan, S., & Lin, Z. (2014b). Generalized singular value thresholding. arXiv preprint arXiv:1412.2231.
Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009). Non-local sparse models for image restoration. In ICCV.
Meng, D., & Torre, F.D.L. (2013). Robust matrix factorization with unknown noise. In ICCV.
Mirsky, L. (1975). A trace inequality of john von neumann. Monatshefte für Mathematik, 79(4), 303–306.
Article MathSciNet MATH Google Scholar
Mnih, A.,&Salakhutdinov, R. (2007). Probabilistic matrix factorization. In NIPS.
Mohan, K., & Fazel, M. (2012). Iterative reweighted algorithms for matrix rank minimization. The Journal of Machine Learning Research, 13(1), 3441–3473.
MathSciNet MATH Google Scholar
Moreau, J. J. (1965). Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France, 93, 273–299.
MathSciNet MATH Google Scholar
Mu, Y., Dong, J., Yuan, X., & Yan, S. (2011). Accelerated low-rank visual recovery by random projection. In CVPR.
Nie, F., Huang, H., & Ding, C.H. (2012). Low-rank matrix recovery via efficient schatten p-norm minimization. In AAAI.
Oh, T.H., Kim, H., Tai, Y.W., Bazin, J.C., & Kweon, I.S. (2013). Partial sum minimization of singular values in rpca for low-level vision. In ICCV.
Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2012). Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Transaction on Pattern Analysis and Machine Intelligence, 34(11), 2233–2246.
Article Google Scholar
Portilla, J. (2004). Blind non-white noise removal in images using gaussian scale. Citeseer: In Proceedings of the IEEE benelux signal processing symposium.
Rhea, D. (2011). The case of equality in the von Neumann trace inequality. Preprint.
Roth, S., & Black, M. J. (2009). Fields of experts. International Journal of Computer Vision, 82(2), 205–229.
Article Google Scholar
Ruslan, S., & Srebro, N. (2010). Collaborative filtering in a non-uniform world: Learning with the weighted trace norm. In NIPS.
She, Y. (2012). An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Computational Statistics & Data Analysis, 56(10), 2976–2990.
Article MathSciNet MATH Google Scholar
Srebro, N., & Jaakkola, T., et al. (2003). Weighted low-rank approximations. In ICML.
Srebro, N., Rennie, J., & Jaakkola, T.S. (2004). Maximum-margin matrix factorization. In NIPS.
Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611–622.
Article MathSciNet MATH Google Scholar
Wang, N., & Yeung, D.Y. (2013). Bayesian robust matrix factorization for image and video processing. In ICCV.
Wang, S., Zhang, L.,&Y, L. (2012). Nonlocal spectral prior model for low-level vision. In ACCV.
Wright, J., Peng, Y., Ma, Y., Ganesh, A., & Rao, S. (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In NIPS.
Zhang, D., Hu, Y., Ye, J., Li, X., & He X (2012a). Matrix completion by truncated nuclear norm regularization. In CVPR.
Zhang, Z., Ganesh, A., Liang, X., & Ma, Y. (2012b). Tilt: transform invariant low-rank textures. International Journal of Computer Vision, 99(1), 1–24.
Article MathSciNet MATH Google Scholar
Zhao, Q., Meng, D., Xu, Z., Zuo, W., & Zhang, L. (2014) Robust principal component analysis with complex noise. In ICML.
Zheng, Y., Liu, G., Sugimoto, S., Yan, S., & Okutomi, M. (2012). Practical low-rank matrix approximation under robust $l_1$ norm. In CVPR.
Zhou M, Chen, H., Ren, L., Sapiro, G., Carin, L., & Paisley, J.W. (2009). Non-parametric bayesian dictionary learning for sparse image representations. In NIPS.
Zhou, X., Yang, C., Zhao, H., & Yu, W. (2014). Low-rank modeling and its applications in image analysis. arXiv preprint arXiv:1401.3409.
Zoran, D., & Weiss, Y. (2011). From learning models of natural image patches to whole image restoration. In ICCV.

Download references

Acknowledgments

This work is supported by the Hong Kong RGC GRF grant (PolyU 5313/13E).

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR, China
Shuhang Gu & Lei Zhang
School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China
Qi Xie & Deyu Meng
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Wangmeng Zuo
Department of Applied Mathematics, Xidian University, Xi’an, China
Xiangchu Feng

Authors

Shuhang Gu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Deyu Meng
View author publications
You can also search for this author in PubMed Google Scholar
Wangmeng Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Xiangchu Feng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhang.

Additional information

Communicated by Jean-Michel Morel.

Appendix

In this appendix, we provide the proof details of the theoretical results in the main text.

1.1 Proof of Theorem 1

Proof

For any ${{\varvec{X}}}, {{\varvec{Y}}}\in \mathfrak {R}^{m\times {n}}(m>n)$ , denote by $\bar{{{\varvec{U}}}}{{\varvec{D}}}\bar{{{\varvec{V}}}}^T$ and $ {{\varvec{U}}}\varvec{\varSigma }{{\varvec{V}}}^T$ the singular value decomposition of matrix ${{\varvec{X}}}$ and ${{\varvec{Y}}}$, respectively, where $\varvec{\varSigma }=\left( \begin{array}{cc} diag(\sigma _1,\sigma _2,...,\sigma _n)\\ \mathbf 0 \end{array} \right) \in \mathfrak {R}^{m\times {n}}$, and ${{\varvec{D}}}=\left( \begin{array}{cc} diag(d_1,d_2,...,d_n)\\ \mathbf 0 \end{array} \right) $ are the diagonal singular value matrices. Based on the property of Frobenius norm, the following derivations hold:

$$\begin{aligned}&\Vert {{\varvec{Y}}}-{{\varvec{X}}}\Vert _F^2+\Vert {{\varvec{X}}}\Vert _{w,*}\\&\quad = Tr\left( {{\varvec{Y}}}^T{{\varvec{Y}}}\right) -2Tr\left( {{\varvec{Y}}}^T{{\varvec{X}}}\right) +Tr\left( {{\varvec{X}}}^T{{\varvec{X}}}\right) +\sum _i^n w_i d_i\\&\quad =\sum _i^n\sigma _i^2-2Tr\left( {{\varvec{Y}}}^T{{\varvec{X}}}\right) +\sum _i^nd_i^2+\sum _i^n w_id_i. \end{aligned}$$

Based on the von Neumann trace inequality in Lemma 1, we know that $Tr\left( {{\varvec{Y}}}^T{{\varvec{X}}}\right) $ achieves its upper bound $\sum _i^n\sigma _i d_i$ if ${{\varvec{U}}} = \bar{{{\varvec{U}}}}$ and ${{\varvec{V}}} = \bar{{{\varvec{V}}}}$. Then, we have

$$\begin{aligned}&\min _{{\varvec{X}}}\Vert {{\varvec{Y}}}-{{\varvec{X}}}\Vert _F^2+\Vert {{\varvec{X}}}\Vert _{w,*}\\&\quad \Leftrightarrow \min _{{\varvec{D}}}\sum _i^n\sigma _i^2-2\sum _i^n\sigma _i d_i+\sum _i^nd_i^2+\sum _i^n w_id_i\\&\quad s.t. d_1\ge d_2 \ge ...\ge d_n \ge 0 \\&\quad \Leftrightarrow \min _{{{\varvec{D}}}}\sum _{i}(d_i-\sigma _i)^2+w_id_i\\&\quad s.t. ~d_1\ge d_2 \ge ...\ge d_n \ge 0. \end{aligned}$$

From the above derivation, we can see that the optimal solution of the WNNP problem in (5) is

$$\begin{aligned} {{\varvec{X}}}^*= {{\varvec{U}}}{{\varvec{D}}}{{\varvec{V}}}^T, \end{aligned}$$

where ${{\varvec{D}}}$ is the optimum of the constrained quadratic optimization problem in (6).

End of proof. $\square $

1.2 Proof of Corollary 1

Proof

Without considering the constraint, the optimization problem (6) degenerates to the following unconstrained formula:

$$\begin{aligned}&\min _{d_i\ge 0}(d_i-\sigma _i)^2+w_id_i\\&\quad \Leftrightarrow \min _{d_i\ge 0}\left( d_i-(\sigma _i-\frac{w_i}{2})\right) ^2. \end{aligned}$$

It is not difficult to derive its global optimum as:

$$\begin{aligned} \bar{d}_i = max\left( \sigma _i-\frac{w_i}{2},0\right) ,~i=1,2,...,n. \end{aligned}$$

(15)

Since we have $\sigma _1 \ge \sigma _2 \ge ... \ge \sigma _n$ and the weight vector has a non-descending order $w_1\le w_2 \le ... \le w_n$, it is easy to see that $\bar{d}_1 \ge \bar{d}_2 \ge ... \ge \bar{d}_n$. Thus, $\bar{d}_{i=1,2,...,n}$ satisfy the constraint of (6), and the solution in (15) is then the globally optimal solution of the original constrained problem in (6).

End of proof. $\square $

1.3 Proof of Theorem 2

Proof

Denote by ${{\varvec{U}}}_k\varvec{\varLambda }_k{{\varvec{V}}}_k^T$ the SVD of the matrix $\{{{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_k-{{\varvec{E}}}_{k+1}\}$ in the $(k+1)$-th iteration, where $\varvec{\varLambda }_k = \{diag(\sigma _k^1, \sigma _k^2 , ..., \sigma _k^n)\}$ is the diagonal singular value matrix. Based on the conclusion of Corollary 1, we have

$$\begin{aligned} {{\varvec{X}}}_{k+1}={{{\varvec{U}}}_k\varvec{\varSigma }_k{{\varvec{V}}}_k^T}, \end{aligned}$$

(16)

where $\varvec{\varSigma }_k = {\mathcal {S}}_\mathbf{w /\mu _k}(\varvec{\varLambda }_k)$ is the singular value matrix after weighted shrinkage. Based on the Lagrange multiplier updating method in step 5 of Algorithm 1, we have

$$\begin{aligned} \begin{aligned} \Vert {{\varvec{L}}}_{k+1}\Vert _F&=\Vert {{\varvec{L}}}_k+\mu _k({{\varvec{Y}}}-{{\varvec{X}}}_{k+1}-{{\varvec{E}}}_{k+1})\Vert _F\\&=\mu _k\Vert \mu _k^{-1}{{\varvec{L}}}_k+{{\varvec{Y}}}-{{\varvec{X}}}_{k+1}-{{\varvec{E}}}_{k+1}\Vert _F\\&=\mu _k\Vert {{\varvec{U}}}_k\varvec{\varLambda }_k{{\varvec{V}}}_k^T-{{\varvec{U}}}_k\varvec{\varSigma }_k{{\varvec{V}}}_k^T\Vert _F\\&=\mu _k\Vert \varvec{\varLambda }_k-\varvec{\varSigma }_k\Vert _F\\&=\mu _k\Vert \varvec{\varLambda }_k - \mathcal {S}_\mathbf{w /\mu _k}(\varvec{\varLambda }_k)\Vert _F\\&\le \mu _k\sqrt{\sum _i \left( \frac{w_i}{\mu _k}\right) ^2}\\&=\sqrt{\sum _i w_i^2}. \end{aligned} \end{aligned}$$

(17)

Thus, $\{{{\varvec{L}}}_{k}\}$ is bounded.

To analyze the boundedness of $\varGamma ({{\varvec{X}}}^{k+1},{{\varvec{E}}}^{k+1},{{\varvec{L}}}^{k},\mu ^k)$, first we can see the following inequality holds because in step 3 and step 4 we have achieved the globally optimal solutions of the ${{\varvec{X}}}$ and ${{\varvec{E}}}$ subproblems:

$$\begin{aligned} \varGamma ({{\varvec{X}}}_{k+1},{{\varvec{E}}}_{k+1},{{\varvec{L}}}_{k},\mu _k)\le \varGamma ({{\varvec{X}}}_{k},{{\varvec{E}}}_{k},{{\varvec{L}}}_{k},\mu _k). \end{aligned}$$

Then, based on the way we update ${{\varvec{L}}}$:

$$\begin{aligned} {{\varvec{L}}}_{k+1} = {{\varvec{L}}}_k+\mu _k({{\varvec{Y}}}-{{\varvec{X}}}_{k+1}-{{\varvec{E}}}_{k+1}), \end{aligned}$$

there is

$$\begin{aligned}&\varGamma (X_k,E_k,L_k,\mu _k) \\&\quad = \varGamma (X_k,E_k,L_{k-1},\mu _{k-1})\\&\qquad +\frac{\mu _k-\mu _{k-1}}{2}\left\| Y-X_{k}-E_{k}\right\| _F^2\\&\qquad +\langle L_k-L_{k-1},Y-X_k-E_k\rangle \\&\quad = \varGamma (X_k,E_k,L_{k-1},\mu _{k-1})\\&\qquad +\frac{\mu _k - \mu _{k-1}}{2}\left\| \mu ^{-1}_{k-1}\left( L_k-L_{k-1}\right) \right\| _F^2\\&\qquad +\left\langle L_k-L_{k-1},\mu ^{-1}_{k-1}\left( L_k-L_{k-1}\right) \right\rangle \\&\quad = \varGamma (X_k,E_k,L_{k-1},\mu _{k-1})\\&\qquad +\frac{\mu _k+\mu _{k-1}}{2\mu ^{2}_{k-1}}\left\| L_k-L_{k-1}\right\| _F^2. \end{aligned}$$

Denote by $\Theta $ the upper bound of $\Vert {{\varvec{L}}}_k-{{\varvec{L}}}_{k-1}\Vert _F^2$ for all $\{k=1,\ldots ,\infty \}$. We have

$$\begin{aligned} \varGamma ({{\varvec{X}}}_{k+1},{{\varvec{E}}}_{k+1},{{\varvec{L}}}_{k},\mu _k)\le&\varGamma ({{\varvec{X}}}_{1},{{\varvec{E}}}_{1},{{\varvec{L}}}_{0},\mu _0)\\&+\varTheta \sum _{k=1}^\infty \frac{\mu _k+\mu _{k-1}}{2\mu _{k-1}^{2}}. \end{aligned}$$

Since the penalty parameter $\{\mu _k\}$ satisfies $\sum _{k=1}^\infty \mu _k^{-2}\mu _{k+1}<+\infty $, we have

$$\begin{aligned} \sum _{k=1}^\infty \frac{\mu _k+\mu _{k-1}}{2\mu _{k-1}^{2}}\le \sum _{k=1}^\infty \mu _{k-1}^{-2}\mu _{k}<+\infty . \end{aligned}$$

Thus, we know that $\varGamma ({{\varvec{X}}}^{k+1},{{\varvec{E}}}^{k+1},{{\varvec{L}}}^{k},\mu ^k)$ is also upper bounded.

The boundedness of $\{{{\varvec{X}}}^{k}\}$ and $\{{{\varvec{E}}}^{k}\}$ can be easily deduced as follows:

$$\begin{aligned}&\Vert {{\varvec{E}}}_{k}\Vert _1+\Vert {{\varvec{X}}}_{k}\Vert _{w,*}\\&\quad =\varGamma ({{\varvec{X}}}_{k},{{\varvec{E}}}_{k},{{\varvec{L}}}_{k-1},\mu _{k-1})+\frac{\mu _{k-1}}{2}( \frac{1}{\mu ^2_{k-1}}\Vert {{\varvec{L}}}_{k-1}\Vert _F^2\\&\qquad - \Vert {{\varvec{Y}}}-{{\varvec{X}}}_k-{{\varvec{E}}}_k+ \frac{1}{\mu _{k-1}}{{\varvec{L}}}_{k-1}\Vert _F^2)\\&\quad = \varGamma ({{\varvec{X}}}_{k},{{\varvec{E}}}_{k},{{\varvec{L}}}_{k-1},\mu _{k-1})-\frac{1}{2\mu _{k-1}}(\Vert {{\varvec{L}}}_{k}\Vert _F^2-\Vert {{\varvec{L}}}_{k-1}\Vert _F^2). \end{aligned}$$

Thus, $\{{{\varvec{X}}}_{k}\}$, $\{{{\varvec{E}}}_{k}\}$ and $\{{{\varvec{L}}}_{k}\}$ generated by the proposed algorithm are all bounded. There exists at least one accumulation point for $\{{{\varvec{X}}}_{k},{{\varvec{E}}}_{k},{{\varvec{L}}}_{k}\}$. Specifically, we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\Vert {{\varvec{Y}}}-{{\varvec{X}}}_{k+1}-{{\varvec{E}}}_{k+1}\Vert _F&=\lim _{k\rightarrow \infty }\frac{1}{\mu _k}\Vert {{\varvec{L}}}_{k+1}-{{\varvec{L}}}_{k}\Vert _F =0, \end{aligned}$$

and the accumulation point is a feasible solution to the objective function.

We then prove that the change of the variables in adjacent iterations tends to be zero. For the ${{\varvec{E}}}$ subproblem in step 3, we have

$$\begin{aligned}&\lim _{k\rightarrow \infty }\Vert {{\varvec{E}}}_{k+1}-{{\varvec{E}}}_{k}\Vert _F\\&\quad =\lim _{k\rightarrow \infty }\Vert \mathcal {S}_{\frac{1}{\mu _k}}\left( {{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{X}}}_{k}\right) -\left( {{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{X}}}_{k}\right) \\&\qquad -2\mu _k^{-1}{{\varvec{L}}}_{k}-\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}\Vert _F\\&\quad \le \lim _{k\rightarrow \infty }\frac{mn}{\mu _k}+\Vert 2\mu _k^{-1}{{\varvec{L}}}_{k}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}\Vert _F=0, \end{aligned}$$

in which $\mathcal {S}_{\frac{1}{\mu _k}}(\cdot )$ is the soft-thresholding operation with parameter $\frac{1}{\mu _k}$, and m and n are the size of matrix ${{\varvec{Y}}}$.

To prove $\lim _{k\rightarrow \infty }\Vert {{\varvec{X}}}_{k+1}-{{\varvec{X}}}_{k}\Vert _F=0$, we recall the updating strategy in Algorithm 1 which makes the following inequalities hold:

$$\begin{aligned}&{{\varvec{X}}}_{k}={{{\varvec{U}}}_{k-1}\mathcal {S}_\mathbf{w /\mu _{k-1}}(\varvec{\varLambda }_{k-1}){{\varvec{V}}}_{k-1}^T},\\&{{\varvec{X}}}_{k+1}={{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{E}}}_{k+1}-\mu _k^{-1}{{\varvec{L}}}_{k+1}, \end{aligned}$$

where ${{\varvec{U}}}_{k-1}\varvec{\varLambda }_{k-1}{{\varvec{V}}}_{k-1}^T$ is the SVD of the matrix $\{{{\varvec{Y}}}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}-{{\varvec{E}}}_{k}\}$ in the k-th iteration. We then have

$$\begin{aligned}&\lim _{k\rightarrow \infty }\Vert {{\varvec{X}}}_{k+1}-{{\varvec{X}}}_{k}\Vert _F\\&\quad =\lim _{k\rightarrow \infty }\Vert ({{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{E}}}_{k+1}-\mu _k^{-1}{{\varvec{L}}}_{k+1})-{{\varvec{X}}}_{k}\Vert _F\\&\quad =\lim _{k\rightarrow \infty }\Vert ({{\varvec{Y}}}+\mu _k^{-1}{{\varvec{L}}}_{k}-{{\varvec{E}}}_{k+1}-\mu _k^{-1}{{\varvec{L}}}_{k+1})-{{\varvec{X}}}_{k}\\&\qquad +({{\varvec{E}}}_{k}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1})-({{\varvec{E}}}_{k}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1})\Vert _F\\&\quad \le \lim _{k\rightarrow \infty }\Vert {{\varvec{Y}}}+\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}-{{\varvec{E}}}_{k}-{{\varvec{X}}}_{k}\Vert _F+\Vert {{\varvec{E}}}_{k}\\&\qquad -{{\varvec{E}}}_{k+1}+\mu _{k}^{-1}{{\varvec{L}}}_{k}-\mu _k^{-1}{{\varvec{L}}}_{k+1}-\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}\Vert _F\\&\quad \le \lim _{k\rightarrow \infty }\Vert \varvec{\varLambda }_{k-1} - \mathcal {S}_\mathbf{w /\mu _{k-1}}(\varvec{\varLambda }_{k-1})\Vert _F+\Vert {{\varvec{E}}}_{k}-{{\varvec{E}}}_{k+1}\Vert _F\\&\qquad +\Vert \mu _{k}^{-1}{{\varvec{L}}}_{k}-\mu _k^{-1}{{\varvec{L}}}_{k+1}-\mu _{k-1}^{-1}{{\varvec{L}}}_{k-1}\Vert _F\\&\quad = 0. \end{aligned}$$

End of proof. $\square $

1.4 Proof of Remark 1

Proof

Based on the conclusion of Theorem 1, the WNNM problem can be equivalently transformed to a constrained singular value optimization problem. Furthermore, when utilizing the reweighting strategy $w_i^{\ell +1}=\frac{C}{\sigma _i^\ell ({{\varvec{X}}})+\varepsilon }$, the singular values of ${{\varvec{X}}}$ are consistently sorted in a non-ascending order. The weight vector thus follows the non-descending order. It is then easy to deduce that the sorted orders of the sequences $\{\sigma _i({{\varvec{Y}}}), \sigma _i({{\varvec{X}}}_\ell ),w_i^\ell ; i=1,2,\cdots ,n\}$ keep unchanged during the iteration. Thus, the optimization for each singular value $\sigma _i({{\varvec{X}}})$ can be analyzed independently. For the purpose of simplicity, in the following development we omit the subscript i and denote by y a singular value of matrix ${{\varvec{Y}}}$, and denote by x and w the corresponding singular value of ${{\varvec{X}}}$ and its weight.

For the weighting strategy $w^\ell =\frac{C}{x^{\ell -1}+\varepsilon }$, we have

$$\begin{aligned} x^\ell =max\left( y-\frac{C}{x^{\ell -1}+\varepsilon },0\right) . \end{aligned}$$

Since we initialize $x^0$ as the singular value of matrix ${{\varvec{X}}}_0={{\varvec{Y}}}$, and each $x^\ell $ is a result of soft-thresholding operation on positive value $y=\sigma _i({{\varvec{Y}}})$, $\{x^\ell \}$ is a non-negative sequence. The convergence value $\lim _{\ell \rightarrow \infty } x^\ell $ for different conditions are analyzed as follows.

(1)
$c_2<0$ From the definition of $c_1$ and $c_2$, we have $(y+\varepsilon )^2-4C<0$. In such case, the quadratic system $x^2+(\varepsilon -y)x+C-y\varepsilon =0$ does not have a real solution and function $f(x) = x^2+(\varepsilon -y)x+C-y\varepsilon $ gets its positive minimum value $C-y\varepsilon -\frac{(y-\varepsilon )^2}{4}$ at $x=\frac{y-\varepsilon }{2}$. $\forall \tilde{x}\ge 0$, the following inequalities hold
$$\begin{aligned}&f(\tilde{x})\ge f\left( \frac{y-\varepsilon }{2}\right) \\&\tilde{x}^2+(\varepsilon -y)\tilde{x}\ge -\frac{(y-\varepsilon )^2}{4}\\&\tilde{x}-\frac{C-y\varepsilon -\frac{(y-\varepsilon )^2}{4}}{\tilde{x}+\varepsilon }\ge y-\frac{C}{\tilde{x}+\varepsilon }. \end{aligned}$$
The sequence $x^{\ell +1}=max\left( y-\frac{C}{x^{\ell }+\varepsilon },0\right) $ with initialization $x^0=y$ is a monotonically decreasing sequence for any $x^\ell \ge 0$. We have $x^\ell <y$, and
$$\begin{aligned} x^\ell -\left( y-\frac{C}{x^\ell +\varepsilon }\right) >\frac{C-y\varepsilon -\frac{(y-\varepsilon )^2}{4}}{y+\varepsilon }. \end{aligned}$$
If $x^\ell \le \frac{C-y\varepsilon }{y}$, we have $y-\frac{C}{x^\ell +\varepsilon }\le 0$ and $x^{\ell +1} = max\left( y-\frac{C}{x^{\ell }+\varepsilon },0\right) =0$. If $x^\ell >\frac{C-y\varepsilon }{y}$, $\exists N\in \mathbb {N}$ makes $x^{\ell +N}<x^\ell -N\cdot \frac{C-y\varepsilon -\frac{(y-\varepsilon )^2}{4}}{y+\varepsilon }$ less than $\frac{C-y\varepsilon }{y}$. The sequence $\{x^\ell \}$ will shrink to 0 monotonically.
(2)
$c_2\ge 0$ In such case, we can know that $y>0$, because if $y=0$, we will have $c_2=(y+\varepsilon )^2-4C=\varepsilon ^2-4C<0$. For positive C and sufficiently small value $\varepsilon $, we can know that $c_1$ is also non-negative:
$$\begin{aligned}&c_2 = (y+\varepsilon )^2-4C\ge 0\\&(y+\varepsilon )^2\ge 4C\\&y-\varepsilon \ge 2(\sqrt{C}-\varepsilon )\\&c_1=y-\varepsilon \ge 0. \end{aligned}$$
Having $c_2\ge 0$, $c_1\ge 0$, we have
$$\begin{aligned} \bar{x}_2 = \frac{y-\varepsilon +\sqrt{(y-\varepsilon )^2-4(C-\varepsilon y)}}{2}>0. \end{aligned}$$
For any $x>\bar{x}_2>0$, the following inequalities hold:
$$\begin{aligned}&f(x) = x^2+(\varepsilon -y)x+C-y\varepsilon>0\\&\left[ x-\left( y-\frac{C}{x+\varepsilon }\right) \right] (x+\varepsilon )>0\\&x>y-\frac{C}{x+\varepsilon } . \end{aligned}$$
Furthermore, we have
$$\begin{aligned} x>y-\frac{C}{x+\varepsilon }>y-\frac{C}{\bar{x}_2+\varepsilon }=\bar{x}_2. \end{aligned}$$
Thus, for $x^0=y>\bar{x}_2$, we always have $x^\ell>x^{\ell +1}>\bar{x}_2$, the sequence is monotonically decreasing and has lower bound $\bar{x}_2$. The sequence will converge to $\bar{x}_2$, as we prove below. We propose a proof by contradiction. If ${x^\ell }$ converges to $\hat{x}\ne \bar{x}_2$, then we have $\hat{x}>\bar{x}_2$ and $f(\hat{x})>0$. By the definition of convergence, we can obtain that $\forall \epsilon >0$, $\exists N\in \mathbb {N}$ s.t. $\forall \ell \ge N$, the following inequality must be satisfied
$$\begin{aligned} |x^\ell -\hat{x}|<\epsilon . \end{aligned}$$
(18)
We can also have the following inequalies
$$\begin{aligned}&f(x^N) \ge f(\hat{x})\\&\left[ x^N-\left( y-\frac{C}{x^N+\varepsilon }\right) \right] (x^N+\varepsilon ) \ge f(\hat{x})\\&\left[ x^N-\left( y-\frac{C}{x^N+\varepsilon }\right) \right] (y+\varepsilon ) \ge f(\hat{x})\\&x^N-\left( y-\frac{C}{x^N+\varepsilon }\right) \ge \frac{f(\hat{x})}{y+\varepsilon }\\&x^{N}-x^{N+1}>\frac{f(\hat{x})}{y+\varepsilon } \end{aligned}$$
If we take $\epsilon =\frac{f(\hat{x})}{2(y+\varepsilon )}$, then $ x^{N}-x^{N+1}> 2\epsilon $, and we can thus obtain
$$\begin{aligned}&|x^{N+1}-\hat{x}|\\= & {} |x^{N+1}-x^N+x^N-\hat{x}|\\\ge & {} \left| |x^{N+1}-x^N|-|x^N-\hat{x}|\right| \\\le & {} |2\epsilon -\epsilon |=\epsilon \end{aligned}$$
This is however a contradiction to (18), and thus ${x^\ell }$ converges to $ {\bar{x}}_2$.

End of proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gu, S., Xie, Q., Meng, D. et al. Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision. Int J Comput Vis 121, 183–208 (2017). https://doi.org/10.1007/s11263-016-0930-5

Download citation

Received: 15 April 2015
Accepted: 05 July 2016
Published: 18 July 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11263-016-0930-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Theorem 1

Proof

1.2 Proof of Corollary 1

Proof

1.3 Proof of Theorem 2

Proof

1.4 Proof of Remark 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Theorem 1

Proof

1.2 Proof of Corollary 1

Proof

1.3 Proof of Theorem 2

Proof

1.4 Proof of Remark 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation