A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification

https://doi.org/10.1016/j.isprsjprs.2017.11.004Get rights and content

Abstract

High resolution remote sensing (HRRS) image scene classification plays a crucial role in a wide range of applications and has been receiving significant attention. Recently, remarkable efforts have been made to develop a variety of approaches for HRRS scene classification, wherein deep-learning-based methods have achieved considerable performance in comparison with state-of-the-art methods. However, the deep-learning-based methods have faced a severe limitation that a great number of manually-annotated HRRS samples are needed to obtain a reliable model. However, there are still not sufficient annotation datasets in the field of remote sensing. In addition, it is a challenge to get a large scale HRRS image dataset due to the abundant diversities and variations in HRRS images. In order to address the problem, we propose a semi-supervised generative framework (SSGF), which combines the deep learning features, a self-label technique, and a discriminative evaluation method to complete the task of scene classification and annotating datasets. On this basis, we further develop an extended algorithm (SSGA-E) and evaluate it by exclusive experiments. The experimental results show that the SSGA-E outperforms most of the fully-supervised methods and semi-supervised methods. It has achieved the third best accuracy on the UCM dataset, the second best accuracy on the WHU-RS, the NWPU-RESISC45, and the AID datasets. The impressive results demonstrate that the proposed SSGF and the extended method is effective to solve the problem of lacking an annotated HRRS dataset, which can learn valuable information from unlabeled samples to improve classification ability and obtain a reliable annotation dataset for supervised learning.

Introduction

The presently available technologies (e.g. multi/hyper-spectral, synthetic aperture radar) for earth observation generate many types of airplane and satellite images with high resolutions (spatial resolution, spectral resolution, and temporal resolution) (Cheng et al., 2017, Plaza et al., 2011, Gamba, 2013, Cantalloube and Nahum, 2013, Lu et al., 2017, Li et al., 2016, Yuan et al., 2017, Liu et al., 2017). The main task transfers to intelligent earth observation through massive high-resolution remote sensing (HRRS) images, which smart classify land use and land cover scenes (LULC) from airborne or space platforms (Gómez-Chova et al., 2015). Remote sensing image scene classification, which plays an important role in earth observation and is receiving significant attention, categorizes scene images into an independent set of semantic-level LULC class labels according to image contents. During the past few decades, a lot of remarkable efforts have been made to develop various methods for the task of HRRS image scene classification in a wide range of applications (Wang et al., 2016, Li and Wang, 2015, Dou et al., 2014, Cheng and Han, 2016, Ma et al., 2016, Yu et al., 2016, Dópido et al., 2013, Li et al., 2014), such as LULC determination urban planning, environmental protection, and crop monitoring.

Deep-learning-based methods, which achieve many improvements over state-of-the-art records in many research fields, have been widely applied in natural images classification, object recognition, natural language, and text processing (Chatfield et al., 2014, Simonyan and Zisserman, 2015, He et al., 2016, Krizhevsky et al., 2012, Szegedy et al., 2015). Due to their remarkable performance, these methods are used to analyze HRRS images, and have achieved more impressive results than the traditional shallow methods for scene classification (Castelluccio et al., xxxx, Hu et al., 2015, Zhang et al., 2016, Zhao and Du, 2016, Luo et al., 2017, Wang et al., 2017, Cheng et al., 2016). Though satellite and aerial images dramatically increase in both quality and quantity, deep-learning-based methods in a fully-supervised learning fashion (Zhang et al., 2015) require a large scale, artificially-annotated dataset to obtain ideal classifiers. However, there is no HRRS dataset with as comparative a scale as ImageNet (Deng et al., 2009) to meet the requirements of the deep-learning-based methods in remote sensing. Additionally, in contrast to natural images, an annotated HRRS dataset needs to be labeled by experts and engineers, which greatly increases the difficulty of acquiring a large scale annotation dataset of HRRS images.

The acquisition of unlabeled images is much easier compared to acquiring a manually-annotated dataset. Hence, the use of the original, unlabeled data to generate labeled data could solve the problem of lacking labeled samples. Self-label technique (Triguero et al., 2015) is an available solution that aims to obtain an enlarged annotation dataset from unlabeled samples via semi-supervised learning. However, the existing self-label methods have a significant weakness in that they annotate samples by employing handcrafted features. Handcrafted features are designed by experts and engineers to solve the classification tasks. Traditional handcrafted features contain many severe limitations. On the one hand, extensive of domain expertise and engineering skills are needed to design handcrafted features. On the other hand, the representational capability of handcrafted features is significantly influenced by human ingenuity in feature designing. The ideal features should be automatically generated with powerful representation ability. Fortunately, deep learning features, which are spontaneously learned from data by a deep architecture neural network with remarkable performance, could be an option to address the limitations of the handcrafted features. Therefore, we replace the handcrafted features of the self-label techniques with deep learning features to carry out our work.

We propose a semi-supervised, generative framework with deep learning features (SSGF) for HRRS image scene classification to solve the problem of lacking sufficient annotation HRRS datasets. The details of this framework are summarized below:

  • 1.

    Deep convolutional neural network (CNN) features are transferred to replace the traditionally handcrafted features due to their powerful representation ability. It enables the discovery of sufficient diversities and variations hidden in the HRRS images and provides a better understanding of scene classes.

  • 2.

    A co-training (Blum and Mitchell, 1998) self-label method is used to learn valuable information from unlabeled samples and obtain an annotated dataset. It not only makes use of the low-confidence samples, but suppresses the problem of misclassification.

  • 3.

    A discriminative evaluation method enhances classification of the confusion classes with similar texture structures and visualized features, which further improves the reliability of generative samples.

By combining the three techniques, the proposed SSGF is able to learn effective information from unlabeled data for the improvement of classification ability. Therefore, with a limited number of annotation samples and a significant number of unlabeled samples, the ideal model can be obtained. Hence, the enlarged set generated by the model is available for supervised learning. To evaluate the performance of SSGF, we further develop an extended algorithm (SSGA-E). The major contributions of this work are summarized as follows:

  • 1.

    Focusing on the problem of insufficient annotation datasets in remote sensing, we propose a semi-supervised generative framework. It can instantly improve the capability of scene classification by learning unlabeled instances, and generate a reliable annotation dataset for supervised learning.

  • 2.

    On the basis, we further develop an extended algorithm. We have performed extensive experiments to evaluate the proposed method over four public HRRS datasets. The experimental results show that the proposed method outperforms most of the fully-supervised methods, and it has achieved the third best accuracy on the UCM dataset, the second best accuracy on the WHU-RS, the NWPU-RESISC45, and the AID datasets. The experimental results demonstrate that the proposed SSGA-E is effective in solving the problem of insufficient annotated datasets for HRRS image scene classification.

The remainder of this paper is organized as follows: In Section 2, we briefly review some related works about deep learning methods and scene classification. In Section 3, the deep neural networks used in this works are introduced briefly. The semi-supervised generative framework and an extended algorithm are proposed and explained in detail in Section 4. We display and discuss the experimental results in Section 5. Finally, conclusions are drawn in Section 6.

Section snippets

Related work and background

In the early 1970s, the spatial resolution of satellite images was extremely coarse and pixel sizes were similar in size or bigger than the interest objects (Janssen and Middelkoop, 1992). Therefore, available methods for analysis of remote sensing images have been based on pixel level since the early 1970s (Blaschke et al., 2008, Blaschke, 2010). With the advance of remote sensing technology, a higher number of HRRS images are obtainable, such the UCMerced Land Use dataset (Yang and Newsam,

Deep Convolutional Neural Networks (CNNs)

In this section, we first discuss the typical structure of a CNN and the back propagation algorithm used to optimize the gradient to refer to the weight parameters of the network. Next, all CNNs used in this paper are briefly introduced.

A semi-supervised generative framework for HRRS scene classification

In this section, we first mathematically define the problem. Next, we propose the semi-supervised generative framework and an extended algorithm in detail.

Experimental results

In this section, we detail the series of experiments conducted to evaluate the performance of the proposed SSGA-E for annotation of remote sensing images over four HRRS image datasets. The detailed experimental setup and experimental results with reasonable analysis are presented below.

Conclusion and future work

In this paper, we focus on the problem of insufficient manually-labeled samples in the remote sensing to develop a semi-supervised generative framework. This goes along with a limited number of labeled samples and extensive unlabeled samples for reliable annotation datasets for HRRS scene classification. The proposed framework combines deep-learning-based features, the co-training-based self-label method and the discriminative evaluation method to complete the annotation task. The

Acknowledgments

We gratefully acknowledge the editor, associate editor, and reviewers for their comments in helping us to improve this work. We also acknowledge the support of the National Natural Science Foundation of China (No. 41571413 and No. 41701429); the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (No. CUG170625); and NVIDIA Corporation for the donation of the Titan X GPU used in this research.

References (60)

  • A. Blum et al.

    Combining labeled and unlabeled data with co-training

  • H.M. Cantalloube et al.

    Airborne sar-efficient signal processing for very high resolution

    Proc. IEEE

    (2013)
  • Castelluccio, M., Poggi, G., Sansone, C., Verdoliva, L. Land Use Classification in Remote Sensing Images by...
  • Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A., 2014. Return of the devil in the details: Delving deep into...
  • G. Cheng et al.

    Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images

    IEEE Trans. Geosci. Rem. Sens.

    (2016)
  • G. Cheng et al.

    Remote sensing image scene classification: benchmark and state of the art

    Proc. IEEE

    (2017)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

  • E. Davenport et al.

    Knowledge management: semantic drift or conceptual shift?

    J. Educ. Lib. Inf. Sci.

    (2000)
  • J. Deng et al.

    Imagenet: a large-scale hierarchical image database

  • I. Dópido et al.

    Semisupervised self-learning for hyperspectral image classification

    IEEE Trans. Geosci. Rem. Sens.

    (2013)
  • M. Fu et al.

    Unsupervised feature learning for scene classification of high resolution remote sensing image

  • P. Gamba

    Human settlements: a global challenge for eo data processing and interpretation

    Proc. IEEE

    (2013)
  • L. Gómez-Chova et al.

    Multimodal classification of remote sensing images: a review and future directions

    Proc. IEEE

    (2015)
  • J. Han et al.

    Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning

    IEEE Trans. Geosci. Rem. Sens.

    (2015)
  • He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE...
  • G.E. Hinton et al.

    A fast learning algorithm for deep belief nets

    Neural Comput.

    (2006)
  • F. Hu et al.

    Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery

    Rem. Sens.

    (2015)
  • L.L. Janssen et al.

    Knowledge-based crop classification of a landsat thematic mapper image

    Int. J. Rem. Sens.

    (1992)
  • Y. Jia et al.

    Caffe: convolutional architecture for fast feature embedding

  • I. Jolliffe

    Principal Component Analysis

    (2002)
  • Cited by (188)

    • Trustworthy remote sensing interpretation: Concepts, technologies, and applications

      2024, ISPRS Journal of Photogrammetry and Remote Sensing
    • Satellite-derived sediment distribution mapping using ICESat-2 and SuperDove

      2023, ISPRS Journal of Photogrammetry and Remote Sensing
    View all citing articles on Scopus
    View full text