Kennard-Stone method outperforms the Random Sampling in the selection of calibration samples in SNPs and NIR data

This study evaluated the influence of the choice of the training subset in the construction of predictive models, as well as on their validation

Roberta de Amorim Ferreira; Gabriely Teixeira; Luiz Alexandre Peternelli


Scholarcy highlights

  • Under the framework of prediction modeling, splitting the data set into parts that will be used to build the models, as well as their validation, is a crucial step in their development
  • near-infrared spectroscopy data dataset The original NIR spectra of the 256 sugarcane leave samples are shown in figure 2
  • For the analysis and modeling of NIR spectroscopy data, we should apply some pretreatments on the data matrix as a way to remove or minimize the sources of spectral variability and to improve selectively
  • It was observed that the values from the KS partition method were, in general, better than those obtained from Random Sampling within all repetitions
  • When using the SNPs data set, we verified that there is a significant difference between the root of the mean squared error means obtained by the two methods, at the level of 1% significance, which indicates the superiority of the KS approach aiming to split the data set with the purpose of prediction
  • Set in training and testing subsamples allow for predicting models that are statistically different
  • The KS method may be a good alternative to commonly used partition methods for SNP data

Need more features? Save interactive summary cards to your Scholarcy Library.