Elsevier

Biosystems Engineering

Volume 174, October 2018, Pages 50-65
Biosystems Engineering

Research Paper
Transfer learning for the classification of sugar beet and volunteer potato under field conditions

https://doi.org/10.1016/j.biosystemseng.2018.06.017Get rights and content

Highlights

  • Transfer learning provided very promising performance for weed/crop classification.

  • The highest classification accuracy of 98.7% was obtained with VGG-19.

  • All scenarios and pre-trained networks were feasible for real-time applications.

  • Data augmentation may improve the classification accuracy.

Classification of weeds amongst cash crops is a core procedure in automated weed control. Addressing volunteer potato control in sugar beets, in the EU Smartbot project the aim was to control more than 95% of volunteer potatoes and ensure less than 5% of undesired control of sugar beet plants. A promising way to meet these requirements is deep learning. Training an entire network from scratch, however, requires a large dataset and a substantial amount of time. In this situation, transfer learning can be a promising solution. This study first evaluates a transfer learning procedure with three different implementations of AlexNet and then assesses the performance difference amongst the six network architectures: AlexNet, VGG-19, GoogLeNet, ResNet-50, ResNet-101 and Inception-v3. All nets had been pre-trained on the ImageNet Dataset. These nets were used to classify sugar beet and volunteer potato images taken under ambient varying light conditions in agricultural environments. The highest classification accuracy for different implementations of AlexNet was 98.0%, obtained with an AlexNet architecture modified to generate binary output. Comparing different networks, the highest classification accuracy 98.7%, obtained with VGG-19 modified to generate binary output. Transfer learning proved to be effective and showed robust performance with plant images acquired in different periods of the various years on two types of soils. All scenarios and pre-trained networks were feasible for real-time applications (classification time < 0.1 s). Classification is only one step in weed detection, and a complete pipeline for weed detection may potentially reduce the overall performance.

Introduction

Volunteer potato is a source of potato blight (Phytophthora infestans) and viral diseases. Volunteer potato in a sugar beet field can reduce the crop yield by 30% (O'Keeffe, 1980). There is a statutory obligation for sugar beet farmers in the Netherlands to control volunteer potato plants to no more than two remaining plants per m2 by 1st of July (Nieuwenhuizen, 2009). For the automated control of volunteer potato in a sugar beet field, a vision-based and small-sized robot was developed within the EU-funded project SmartBot. Due to the small size of the robot and the required battery operation, the platform design had to refrain from additional infrastructure and needed to be able to robustly detect weeds in a scene that was fully exposed to ambient lighting conditions. Additional infrastructure such as a hood and lighting equipment, as used for instance by Nieuwenhuizen, Hofstee, and Van Henten (2010) and Lottes et al. (2016), was not considered viable. The robotic platform is shown in Fig. 1.

The classification of weeds amongst cash crops, i.e. weed/crop discrimination, is the core procedure for automated weed detection. In a pipeline for weed detection, vegetation segmentation is followed by classification of the segmented vegetation into weeds and crop. This classification step traditionally involves two aspects: selection of the discriminative features as well as selection of the classification techniques (Suh, Hofstee, IJsselmuiden, & Van Henten, 2016).

Regarding the features used for discrimination, many studies have used colour, shape (biological morphology) and texture on an individual basis or as a combination of multiple features (Ahmed et al., 2012, Gebhardt and Kühbauch, 2007, Persson and Åstrand, 2008, Pérez et al., 2000, Slaughter et al., 2008, Swain et al., 2011, Zhang et al., 2010, Åstrand and Baerveldt, 2002). However, these features have shown poor performance under widely varying natural light conditions (Suh, Hofstee, IJsselmuiden, & Van Henten, 2018). Other features such as Scale Invariant Feature Transform (SIFT) (Lowe, 2004) and Speeded Up Robust Features (SURF) (Bay, Ess, Tuytelaars, & Van Gool, 2008) have shown their potential in recent studies in the classification of plant species (Kazmi et al., 2015, Suh et al., 2018, Wilf et al., 2016). However, the highest classification accuracy using SIFT and SURF obtained in Suh et al. (2018) is still not a satisfactory performance in view of the requirements set by the previous study of Nieuwenhuizen (2009): the resulting automatic weeding system should effectively control more than 95% of the volunteer potatoes as well as ensure less than 5% of undesired control of the sugar beet plants. Therefore, within the framework of the EU Smartbot Project, a solution was needed that achieves a classification accuracy of 95% or more as well as a misclassification of both sugar beet [false-negative (FN)] and volunteer potato [false-positive (FP)] of less than 5%. In addition, a classification time of less than 0.1 s per image was also needed because these algorithms should be used in a real-time field application.

A promising way to meet these requirements is to use a deep learning approach. In recent studies, the deep neural network has shown its potential in an agricultural context for plant identification and classification. Grinblat, Uzal, Larese, and Granitto (2016) used a convolutional neural network (ConvNet, or CNN), a specific type of deep network, for plant identification from leaf vein patterns. Although the binary images of vein patterns were used, the study showed the potential of ConvNet for plant identification. Sun, Liu, Wang, and Zhang (2017) used a residual network (ResNet), one of the most common ConvNet architectures used for classification tasks, for plant species identification with images acquired by mobile phones. A 91.78% of classification accuracy was obtained, but they needed 10,000 images to train the network. Dyrmann, Karstoft, and Midtiby (2016) classified 22 plants species using a ConvNet and obtained 86.2% of classification accuracy. In their study, images were acquired under controlled conditions, a quite distinct difference from the conditions that confronted SmartBot, and the number of images needed to train the network from scratch was even more than 10,000. Obtaining such a large number of images, however, is a challenging task in agricultural fields (Xie, Jean, Burke, Lobell, & Ermon, 2016). Besides, training an entire ConvNet from scratch requires a substantial amount of time (Jean et al., 2016, Yosinski et al., 2014) and is an expensive task that may be hard to realise in practice. Then, transfer learning can be a promising solution.

The objective and novelty of this paper are to deal with crop/weed classification under uncontrolled agricultural environments as well as to reduce the amount of data and time using transfer learning.

Transfer learning has gained its success in real-world applications (Jean et al., 2016, Shin et al., 2016; Yi; Sun et al., 2014, Xie et al., 2016). Transfer learning, according to Goodfellow, Bengio, and Courville (2016), refers to exploiting what has been learned from one setting into another different setting. In transfer learning, the base network is trained on a base dataset and task, and then the (pre-)trained network is reused for another task (Yosinski et al., 2014). Interestingly enough, though the ConvNet is trained with a specific dataset to perform a specific task, the generic features extracted from ConvNet seem to be powerful and perform very well on other classification tasks as well (Donahue et al., 2014, Razavian et al., 2014). Transfer learning has recently been applied in several agricultural applications such as disease detection (Fuentes, Yoon, Kim, & Park, 2017); however, the transfer learning procedure has not yet been investigated in detail in plant classification.

In this study, firstly, three different transfer learning scenarios were evaluated using AlexNet (Krizhevsky, Sutskever, & Hinton, 2012). Then, the performance of the following six pre-trained networks was compared: AlexNet, VGG-19 (Simonyan & Zisserman, 2015), GoogLeNet (Szegedy, Liu et al., 2015), ResNet-50 and ResNet-101 (He, Zhang, Ren, & Sun, 2016a), and Inception-v3 (Szegedy, Vanhoucke, Ioffe, Shlens, & Wojna, 2015). The classification performance in both evaluations was analysed regarding classification accuracy as well as training and classification time, given the fact that this approach should be used in a real-time field application.

The first section of this paper describes ConvNets and their popular architectures. The following section, describes three different scenarios in transfer learning. Then the performance assessment amongst six pre-trained networks is described. The experimental setup including field image dataset collection and the performance measures to be used are described. Then, the experimental results are shown with the corresponding discussions, leading to the conclusions.

Section snippets

Convolutional neural networks and popular architectures

Convolutional neural networks (ConvNets, or CNNs) are a specialised type of deep neural networks that are designed to process multi-dimensional data such as signals (1D), images (2D) and videos (3D) (LeCun et al., 2015, LeCun et al., 1998). ConvNets have gained huge success in many applications since AlexNet won the ImageNet competition in 2012 with a breakthrough performance (Sainath et al., 2013, Schwing and Urtasun, 2015, Sermanet et al., 2013, Zeiler and Fergus, 2014). Motivated by the

Three scenarios for transfer learning

Transfer learning aims to overcome the shortage of training data and time by transferring information or features that are extracted from the pre-trained ConvNets (Oquab et al., 2014, Weiss et al., 2016). AlexNet was used as a pre-trained ConvNet. Two options are available in transfer learning: use of ConvNet as a feature extractor and use of ConvNet as a classifier. Based on these available options, three scenarios for transfer learning were formulated based on the following hypotheses:

  • 1)

Classification performance amongst different ConvNet architectures

The following six pre-trained deep networks were evaluated to assess the classification performance amongst different ConvNet architectures: AlexNet, VGG-19, GoogLeNet, ResNet-50, ResNet-101 and Inception-v3. Each network was modified to produce binary classification output of sugar beet and volunteer potato, as was done in scenario 2 with AlexNet (Section 3.2), by removing the original last layer and adding two new FC layers. Then, each modified network was fine-tuned with 500 randomly

Field image collection and image dataset

For crop image acquisition, a camera was mounted at a height of 1 m and perpendicular to the ground on a custom-made frame carried by a mobile platform (Husky A200, Clearpath, Canada) (Fig. 6). The camera (NSC1005c, NIT, France) was equipped with two Kowa 5 mm lenses (LM5JC10M, Kowa, Japan) with a fixed aperture. The camera was set to operate in automatic acquisition mode with default settings. The camera had two identical complementary metal-oxide semiconductor (CMOS) sensors providing left

Scenario 1 – AlexNet as a fixed feature extractor

Three classifiers were trained, using supervised learning, based on the 4096 feature values that were extracted from each of AlexNet's two FC layers FC6 and FC7 separately. The classification performance was evaluated with TP, FN, FP, TN, classification accuracy, training time and classification time as shown in Table 2.

Using the features from FC6, the highest classification accuracy of 97.0% was obtained with an SVM with a quadratic kernel; while the lowest classification accuracy of 90.8% was

Discussion

The classification performance obtained using transfer learning in this study exceeds previously reported accuracies, for instance by Persson and Åstrand, 2008, Nieuwenhuizen et al., 2010, and Suh et al. (2018). Given the widely varying circumstances in natural fields, the highest classification accuracy (98.7%) obtained in this study is considerably better, to the best of our knowledge, than any other approaches mentioned in the literature for crop and weed classification. To further

Conclusion

This study evaluated a transfer learning procedure and assessed the performance amongst different ConvNet architectures for the classification of sugar beet and volunteer potato under ambient varying light conditions. Three different implementation scenarios were assessed using AlexNet, and the performance of the following six pre-trained networks was compared: AlexNet, VGG-19, GoogLeNet, ResNet-50, ResNet-101 and Inception-v3.

Transfer learning provided very promising performance for the

Acknowledgements

The work presented in this paper was part of the Agrobot part of the Smartbot project and funded by Interreg IVa, European Fund for the Regional Development of the European Union and Product Board for Arable Farming. We thank Gerard Derks at experimental farm Unifarm of Wageningen University for arranging and managing the experimental fields.

References (59)

  • D.C.C. Slaughter et al.

    Autonomous robotic weed control systems: A review

    Computers and Electronics in Agriculture

    (2008)
  • H.K. Suh et al.

    Sugar beet and volunteer potato classification using bag-of-visual-words model, scale-invariant feature transform, or speeded up robust feature descriptors and crop row information

    Biosystems Engineering

    (2018)
  • Y. Sun et al.

    Deep learning for plant identification in natural environment

    Computational Intelligence and Neuroscience

    (2017)
  • K.C. Swain et al.

    Weed identification using an automated active shape matching (AASM) technique

    Biosystems Engineering

    (2011)
  • J. Wang et al.

    Transferring pre-trained deep CNNs for remote scene classification with general features learned from linear PCA network

    Remote Sensing

    (2017)
  • M.Z. Alom et al.

    The history began from AlexNet: A comprehensive survey on deep learning approaches

    (2018)
  • B. Åstrand et al.

    An agricultural mobile robot with vision-based perception for mechanical weed control

    Autonomous Robots

    (2002)
  • A. Canziani et al.

    An analysis of deep neural network models for practical applications

    (2016)
  • L. Chen et al.

    Adaptive local receptive field convolutional neural networks for handwritten Chinese character recognition

    Pattern Recognit Commun Comput Inf Sci

    (2014)
  • J. Donahue et al.

    DeCAF: A deep convolutional activation feature for generic visual recognition

  • A. Fuentes et al.

    A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition

    Sensors (Switzerland)

    (2017)
  • S. Gebhardt et al.

    A new algorithm for automatic Rumex obtusifolius detection in digital images using colour and texture features and the influence of image resolution

    Precision Agriculture

    (2007)
  • Y. Gong et al.

    Deep convolutional ranking for multilabel image annotation

  • I. Goodfellow et al.

    Deep learning

    (2016)
  • K. He et al.

    Deep residual learning for image recognition

  • K. He et al.

    Identity mappings in deep residual networks

  • F. Hu et al.

    Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery

    Remote Sensing

    (2015)
  • N. Jean et al.

    Combining satellite imagery and machine learning to predict poverty

    Science

    (2016)
  • R. Jozefowicz et al.

    An empirical exploration of recurrent network architectures

  • Cited by (134)

    • Improving crop image recognition performance using pseudolabels

      2024, Information Processing in Agriculture
    View all citing articles on Scopus
    View full text