Fast deep parallel residual network for accurate super resolution image processing

doi:10.1016/j.eswa.2019.03.032

Expert Systems with Applications

Volume 128, 15 August 2019, Pages 157-168

https://doi.org/10.1016/j.eswa.2019.03.032 Get rights and content

Highlights

•
New deep 35 convolutional layers structure without loss explosion.
•
More accurate SR performance than current state-of-the-art methods.
•
Real time application and human-eye recognition at execution speed of 27 fps.

Abstract

Recently, Convolutional Network (CNN) has become an excellent solution to solve single image super resolution (SISR) problems. In this paper, a novel Deep Parallel Residual Network (DPRN) has been proposed which has been proved to be a fast and efficient algorithm to solve SISR problems using residual learning. It demonstrated significant advantages in accuracy and speed factors than other approaches with its deeper convolutional network structure, more accurate image reconstruction and real-time model execution. The proposed approach categorizes layers in branches and increases the number of layers to 35 with parallel local residual learning. Also, it adapts the Adam optimizer which can make the proposed system to achieve faster training speed and better image quality. Our experiments have been conducted using standard benchmark datasets such as Set5, Set14, BSDS100 and Urban100 and compared with current state-of-the-art approaches. Our results show that the proposed DPRN can provide higher super resolution quality with real time execution (27.18 fps average) when compare with different state-of-the-art algorithms.

Introduction

Many researchers have participated in working on single image super resolution (SISR) problem (Glasner, Bagon, & Irani, 2009) in computer vision domain since 1970s (Duchon, 1979). This typical issue has a wide range of applications including medical imaging, astronomy, machine vision, and auto-driving, etc. There is a common objective in these application areas that more image information is required for further processing. Hence the basic operation for SISR is to gain high-resolution (HR) images based on low-resolution (LR) images.

Early implementations for SISR offer multiple ideas and algorithms include local linear regression (Timofte, De Smet, & Van Gool, 2014), sparse coding (Jianchao, Wright, Huang, & Ma, 2008), dictionary learning (Yang, Wright, Huang, & Ma, 2010) and random forest (Schulter, Leistner, & Bischof, 2015). These shallow methods have been demonstrated quite successful in both image super resolution theory and application.

As convolutional neural network models blossom in computer vision community, Super-Resolution Convolutional Neural Network abbreviated as SRCNN (Dong, Loy, He, & Tang, 2016) starts a new era for faster and much more accurate outcomes that inspires new algorithms and technologies. It applied fully convolutional structure to perform no-linear mapping from LR to HR, and without further per-engineered features, SRCNN provided significant improvements compared to non-deep learning models. However, as a representative model with small size deep learning network, the learning ability of SRCNN cannot provide satisfied results when managing large and complicated data.

After SRCNN inspires new research in SISR study, many new state-of-the-art methods have appeared in recent years. There are two main methods for enhancement. On one side, some researchers have designed network structure to increase network depth and prevent gradient explosion and disappears. Kim, Kwon Lee, and Mu Lee (2016a) proposed that based on Global Residual Learning, deeper network called Very Deep Super-Resolution (VDSR) with 20 convolution layers gains better accuracy and convergence speed in image super resolution. At the same time, Kim, Kwon Lee, and Mu Lee (2016b) proposed another network structure as “Deeply-Recursive Convolutional Network” (DRCN) with weight-balanced residual learning in recursive layer design which also demonstrated excellent results. In the most recent years, Deep Laplacian Pyramid Networks for Super-Resolution (LapSRN) (Lai, Huang, Ahuja, & Yang, 2017) introduced an innovative idea which used a cascade learning (pyramid structure) to get output step by step. This network has shown great outcome in upscale 8 × condition and it proposed a new loss function. On the other side, some other methods such as Enhanced Deep Residual Networks for Super-Resolution (EDSR) (Lim, Son, Kim, Nah, & Lee, 2017) and a deep convolutional neural network with selection units for super-resolution (SelNet) (Choi & Kim, 2017) had achieved a breakthrough in some existing restrictions by training much higher quality data (Agustsson & Timofte, 2017) which can produce a better resulting system. However, EDSR, SelNet and other methods from NTIRE (Timofte et al., 2017) challenge perform average quality with recent benchmarks when using same training datasets.

In this paper, a novel deep network structure design using parallel residual learning (DPRN) has been proposed to achieve the high quality of image super resolution and provide new layout for deep network with 35 layers. In this new approach, convolutional layers have been grouped as residual combination and put into branches. Each layer gains input information from the first two layers and each branch performs local residual learning to pass branch output to next branch. After up-sampling process, the original data information conducts global residual learning with branches’ output to provide the final output. The structure of DPRN can avoid information deterioration during the layer training process while increasing the depth of network, and more information can be learned for convolutional layers in residual combinations. The new network also applied Adam optimizer (Kinga & Adam, 2015) instead of common Stochastic Gradient Descent (SGD) to provide adaptive learning rate for different parameters and reduce resource consuming during training process. The experiment results have demonstrated that proposed DPRN increased 1.08 dB, 0.21 dB, 0.22 dB than SRCNN, VDSR and LapSRN with upscale 2 × on set5 (Bevilacqua, Roumy, Guillemot, & Alberi-Morel, 2012) testing dataset. Furthermore, the model execution time of DPRN is faster than most of the existing methods, the new approach achieves 27.18 fps average value for all 4 test datasets and it demonstrates the capability for real time applications.

This paper is organized in the following five sections. Firstly, it gives the brief introduction of single image super resolution with current state-of-the-art methods and general information about this paper. Section 2 describes the related work. We review Deep residual network (ResNet) concept (He, Zhang, Ren, & Sun, 2016) and some of the recent state-of-the-art methods such as VDSR (Kim et al., 2016a), DRCN (Kim et al., 2016b) and LapSRN (Lai et al., 2017). Section 3 explains the technical parts for the proposed DPRN and explains how our new network can achieve good quality of super resolution images when compared to other existing methods. Finally, the experimental results using benchmark datasets and conclusion for our proposed method have been shown in Sections 4 and 5, respectively.

Section snippets

Related work

Learned from biological processes (Matsugu, Mori, Mitari, & Kaneda, 2003), CNN based models have showed to have a great effect and usability in computer vision applications. In this section, we review recent state-of-the-art methods in SISR area and the superior idea of ResNet (He et al., 2016) which have provided the basis for DPRN. Fig. 1 provides the network structure for ResNet, VDSR and DRCN where ReLU (Nair & Hinton, 2010) and batch normalization (Ioffe & Szegedy, 2015) layers have been

Deep parallel residual network

In this section, detailed design and technical explanation for DPRN will be presented. We introduce a new approach for connecting multiple residual branches together. Each branch has initial feature mapping at first convolution, the information will pass to Residual Combinations for parallel convolutional training. The first convolutional layer H⁰ will conduct local residual learning with output from residual combinations. The result of this branch will deliver to batch normalization layer (

Experiment

In this section, we compare the proposed DPRN with several current state-of-the-art methods on standard testing datasets. The results include quantitative and qualitative evaluation of the accuracy performance, runtime comparison and model convergence trend. The experiment shows our DPRN approach achieves the best accuracy performance for image reconstruction, while the average model execution time reaches the requirement for real-time human eye recognition (>24 fps). We also discuss the

Conclusions

In this article, a novel deep convolutional neural network for single image super resolution named Deep Parallel Residual Network (DPRN) has been proposed for superior accuracy outcome and balanced real-time model execution. Our model consists of residual combination and residual branch to conduct as an efficient local residual learning and global residual learning algorithm. Each convolutional layer in this residual combination will perform as the parallel learning from previous two layers. In

CRediT authorship contribution statement

Feng Sha: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Writing - original draft, Writing - review & editing. Seid Miad Zandavi: Software, Validation. Yuk Ying Chung: Supervision, Writing - review & editing.

References (35)

M. Matsugu et al.
Subject independent facial expression recognition with robust face detection using a convolutional neural network
Neural Networks
(2003)
E. Agustsson et al.
NTIRE 2017 challenge on single image super-resolution: dataset and study
Bevilacqua, M., Roumy, A., Guillemot, C., & Alberi-Morel, M. L. (2012). Low-complexity single-image super-resolution...
L. Bottou
Stochastic gradient descent tricks
Neural networks: Tricks of the trade
(2012)
ChoiJ.-S. et al.
A deep convolutional neural network with selection units for super-resolution
DongC. et al.
Image super-resolution using deep convolutional networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2016)
DongC. et al.
Accelerating the super-resolution convolutional neural network
J. Duchi et al.
Adaptive subgradient methods for online learning and stochastic optimization
Journal of Machine Learning Research
(2011)
C.E. Duchon
Lanczos filtering in one and two dimensions
Journal of Applied Meteorology
(1979)
D. Glasner et al.
Super-resolution from a single image

I. Goodfellow et al.

Generative adversarial nets

HeK. et al.

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification

HeK. et al.

Deep residual learning for image recognition

HuangJ.-B. et al.

Single image super-resolution from transformed self-exemplars

S. Ioffe et al.

Batch normalization: Accelerating deep network training by reducing internal covariate shift

JiaY. et al.

Caffe: Convolutional architecture for fast feature embedding

Y. Jianchao et al.

Image super-resolution as sparse representation of raw image patches

Cited by (17)

DarkDeblur: Learning single-shot image deblurring in low-light condition
2023, Expert Systems with Applications
Single-shot image deblurring in a low-light condition is known to be a profoundly challenging image translation task. This study tackles the limitations of the low-light image deblurring with a learning-based approach and proposes a novel deep network named as DarkDeblurNet. The proposed DarkDeblur- Net comprises a dense-attention block and a contextual gating mechanism in a feature pyramid structure to leverage content awareness. The model additionally incorporates a multi-term objective function to perceive a plausible perceptual image quality while performing image deblurring in the low-light settings. The practicability of the proposed model has been verified by fusing it in numerous computer vision applications. Apart from that, this study introduces a benchmark dataset collected with actual hardware to assess the low-light image deblurring methods in a real-world setup. The experimental results illustrate that the proposed method can outperform the state-of-the-art methods in both synthesized and real-world data for single-shot image deblurring, even in challenging lighting environments.
Self-supervised cycle-consistent learning for scale-arbitrary real-world single image super-resolution
2023, Expert Systems with Applications
Citation Excerpt :
Learning-based methods generally obtain the LR-to-HR mapping from training images. A variety of models have been used to learn the mapping, including sparse coding (Li et al., 2020; Yang et al., 2010; Zeyde et al., 2010), neighborhood regression (Perez-Pellitero et al., 2016; Timofte et al., 2013, 2014; Zhang, Wang et al., 2019), random forests (Huang et al., 2017; Schulter et al., 2015; Zhi-Song & Siu, 2018), and deep CNNs (Chang et al., 2020; Dai et al., 2019; Dong et al., 2015; Guo et al., 2020; Huang et al., 2021; Ledig et al., 2017; Lim et al., 2017; Sha et al., 2019; Sharma & Kumar, 2021; Wang et al., 2022; Zhang, Gao et al., 2021; Zhang, Li et al., 2018; Zhao et al., 2019; Zhou et al., 2021). Among them, deep CNNs are the current mainstream models for their impressive reconstruction accuracy and high efficiency on the GPU platform.
Whether conventional machine learning-based or current deep neural networks-based single image super-resolution (SISR) methods, they are generally trained and validated on synthetic datasets, in which low-resolution (LR) inputs are artificially produced by degrading high-resolution (HR) images based on a hand-crafted degradation model (e.g., bicubic downsampling). One of the main reasons for this is that it is challenging to build a realistic dataset composed of real-world LR–HR image pairs. However, a domain gap exists between synthetic and real-world data because the degradations in real scenarios are more complicated, limiting the performance in practical applications of SISR models trained with synthetic data. To address these problems, we propose a Self-supervised Cycle-consistent Learning-based Scale-Arbitrary Super-Resolution framework (SCL-SASR) for real-world images. Inspired by the Maximum a Posteriori estimation, our SCL-SASR consists of a Scale-Arbitrary Super-Resolution Network (SASRN) and an inverse Scale-Arbitrary Resolution-Degradation Network (SARDN). SARDN and SASRN restrain each other with the bidirectional cycle consistency constraints as well as image priors, making SASRN adapt to the image-specific degradation well. Meanwhile, considering the lack of targeted training images and the complexity of realistic degradations, SCL-SASR is designed to be online optimized solely with the LR input prior to the SR reconstruction. Benefitting from the flexible architecture and the self-supervised learning manner, SCL-SASR can easily super-resolve new images with arbitrary integer or non-integer scaling factors. Experiments on real-world images demonstrate the high flexibility and good applicability of SCL-SASR, which achieves better reconstruction performance than state-of-the-art self-supervised learning-based SISR methods as well as several external dataset-trained SISR models.
A two-stage deep generative adversarial quality enhancement network for real-world 3D CT images
2022, Expert Systems with Applications
Citation Excerpt :
Image QE makes it possible to obtain HQ images without upgrading imaging systems, which is more flexible and economical. Typical image/video QE tasks include super-resolution (SR) (Chang, Li, Ding, & Li, 2020; Chen, He, Ren, Qing, & Teng, 2018; Dong, Loy, He, & Tang, 2016; Huang, Li, Li, & Zhou, 2021; Sha, Zandavi, & Chung, 2019; Zhao, Zhang, Zhang, & Zou, 2019), denoising (Dong, et al., 2019; Gai & Bao, 2019; Zhang, Zuo, Chen, Meng, & Zhang, 2017), deblurring (Gao, Tao, Shen, & Jia, 2019; Liu, Feng, Zhang, Song, & Chen, 2019), compression artifacts reduction (Chen, He, An, & Nguyen, 2019; Liu, Cheung, Wu, & Zhao, 2017; Lu, et al., 2018), dehazing (Huang, Ye, & Chen, 2014; Zhang, Sindagi, & Patel, 2018), deraining (Fu, Liang, Huang, Ding, & Paisley, 2019; Liu, Yang, Yang, & Guo, 2018), etc. In this work, we focus on the QE approaches for real-world 3D CT images of rock samples.
High-quality (HQ) three-dimensional (3D) images are the premise of analyzing the properties of porous media such as rocks. X-ray computed tomography (CT) is one of the most widely used imaging tools to capture the 3D images of rock samples. Nevertheless, the quality (e.g., resolution, sharpness, and the signal-to-noise ratio) of the collected rock CT images may not meet the needs of practical applications in some cases due to the limitations of imaging systems, leading to inaccurate results of property analysis. In this paper, aiming at improving the quality of rock CT images as well as the accuracy of property analysis, we develop a two-stage deep generative adversarial quality enhancement network for real-world 3D CT images, namely the CTQENet. More specifically, the proposed CTQENet consists of a two-dimensional (2D) reconstruction module (2DRM) and a 3D fusion module (3DFM), which enhance the quality of 3D CT images from the perspective of 2D slices and 3D volumes, respectively. In order to remove artifacts and enhance the resolution of real-world CT images, the 2DRM takes the cycle-consistent generative adversarial network as the backbone to learn the mapping from low-quality (LQ) 2D slices to HQ ones without one-to-one paired training data. Then, the 3D CT volumes stacked by the reconstructed HQ slices along the $x$ / $y$ / $z$ -axis are adaptively fused in the generative adversarial network-based 3DFM, to achieve more reliable 3D morphological structures. Qualitative and quantitative comparisons show the effectiveness of the proposed CTQENet for real-world 3D CT images of rock samples. In particular, the reconstructed HQ 3D CT images by CTQENet show similar morphological characteristics and statistical properties with HQ targets. This study makes it possible to obtain higher quality 3D CT images that partly exceed the limitations of CT imaging systems for better visual experience and more accurate property analysis.
Deep residual network for face sketch synthesis
2022, Expert Systems with Applications
Face sketch synthesis plays a crucial role in face recognition for law enforcement applications. However, the current face sketch synthesis approaches generate sketches from photos based on a model trained by a certain database that is usually collected from individuals of the same ethnicity, and therefore such sketches merely inherit distinct facial distributions (shape and texture) of this database. This also makes such models inapplicable for real-world applications which mainly include multiple photo variations such as pose, lighting, skin color, and ethnic origin. In this paper, a unified face sketch synthesis model considering ethnicity issue as well as photo variations is proposed. A new deep learning scheme is designed to handle the generic visual representation and global structure of the face. Towards the final objective, the recent success of deep residual blocks is exploited and incorporated into a plain feedforward network, termed as DResNet, to learn a regression model for face sketch synthesis. A heterogeneous database containing photos with lighting, ethnicity, hair and skin variations is utilized for training the DResNet model. Extensive subjective and objective evaluations showed the superiority of the proposed DResNet method on state-of-the-art face sketch synthesis methods. Experimental results also demonstrated that the proposed DResNet method can be generalized to face sketch synthesis for real-world applications.
Divide and conquer: Ill-light image enhancement via hybrid deep network
2021, Expert Systems with Applications
Citation Excerpt :
The recovery of clean images from ill-lit images is an ill-posed inverse problem due to several possible solutions. The problem is also combated by super-resolution (Sha, Zandavi, & Chung, 2019) and high dynamic range (HDR) imaging methods. HDR methods gather the pixel brightness information from two or more successive photographs with multiple exposures.
Intelligent system applications in computer vision suffer detection and identification problems in ill lighting conditions (i.e., non-uniform illumination), where under-exposed and over-exposed regions coexist in the captured images. Processing on these images results in over and under enhancement with colour and contrast distortions. The traditional methods design some handcrafted constraints and rely on image pairs and priors, whereas existing deep learning-based methods rely on large scale and even paired training data. But these method’s capacity is limited to specific scenes (i.e., lighting conditions). In this paper, we present a deep-hybrid ill-light image enhancement method and propose a contrast enhancement strategy based on the decomposition of the input images into reflection J and illumination T. A Divide to Glitter network (D2G-Net) is designed to learn from the few-shots of training samples and do not require paired and large quantity training data. D2G-Net is comprised of a multilayer Division-Net for image division and a Glitter-Net to amplify the illumination map. We propose to regularize learning using a correlation consistency of decomposition extracted from the input data itself. Extensive experiments are organized under ill-lighting conditions, where a new test dataset is also proposed with robust lighting variation to evaluate the performance of the proposed method. Experimental results prove that our method has superior performance for preserving structural and texture details compared to state-of-the-art approaches, which suggests that our method is more practical in interactive computer vision and intelligent expert system applications.
A novel deep auto-encoder considering energy and label constraints for categorization
2021, Expert Systems with Applications
Citation Excerpt :
It is universally acknowledged that if supervised method can be added to AE, it would enhance the performance of categorization (Zheng, 2015). Traditional neural network suffers slow convergence, and it is prone to falling into a local minimum with too many layers (Sha et al., 2019). To deal with these issues, as a kind of mainstream learning method, deep learning usually adopts unsupervised process for pre-training on a large number of unlabeled data (Lee and Nam, 2017; Rigollet, 2006).
Deep auto-encoder (DAE) is one of the representative deep learning algorithms for feature extraction. However, it often shows relatively poor generalization performance to express data without considering the probability distribution of data. Additionally, it cannot be directly applied to classification, because label information is ignored in DAE to judge the given categories. To tackle these issues, in this paper, we propose an energy and label constrained DAE (ELDAE) by integrating energy and label constraints to improve the feature extraction ability of network for classification. Specifically, as the probability distribution for fitting data can be reflected by energy of a network, the energy constraint is designed in this study to improve the probability of ELDAE for fitting data, and make a better expression to data. Moreover, the label constraint is integrated in ELDAE using label information to describe categorization rule, contributing to enhancing the accuracy of classification. We first give the complexity analysis of ELDAE, which is crucial to the property of speed. To exhibit the performance of the proposed ELDAE, we perform comprehensive experiments on benchmark USPS and MNIST datasets, and parameter sensitivity analysis is then provided to investigate the effects of three key parameters including the balance coefficients of weight decay, energy constraint and label constraint. In addition, we compare ELDAE with six state-of-the-art algorithms including Auto-Encoder (AE), Sparse AE (SAE), Deep AE (DAE), Deep Belief Network (DBN), Noisy AE (NAE) and Semi-supervised AE (SSAE). The comparative experimental results demonstrate that ELDAE performs better than the other six competitors in terms of classification accuracy, and keeps the same order of magnitude in terms of training time and testing time.

View all citing articles on Scopus

View full text

Fast deep parallel residual network for accurate super resolution image processing

Highlights

Abstract

Introduction

Section snippets

Related work

Deep parallel residual network

Experiment

Conclusions

CRediT authorship contribution statement

Neural Networks

NTIRE 2017 challenge on single image super-resolution: dataset and study

Stochastic gradient descent tricks

Neural networks: Tricks of the trade

A deep convolutional neural network with selection units for super-resolution

Image super-resolution using deep convolutional networks

IEEE Transactions on Pattern Analysis and Machine Intelligence

Accelerating the super-resolution convolutional neural network

Adaptive subgradient methods for online learning and stochastic optimization

Journal of Machine Learning Research

Lanczos filtering in one and two dimensions

Journal of Applied Meteorology

Super-resolution from a single image