Skip to main content

Table 2 Calibration, cross-validation, validation and independent test set 1 (ITS1) results for each algorithm on the 6 datasets

From: Analysis of near infrared spectra for age-grading of wild populations of Anopheles gambiae

Dataset

Samples

No. var

RMSEC

R2Cal

RMSECV

R2CV

LV

RMSEV

RMSEP-ITS1

Dataset 1

PLS

178

1851

2.68

0.55

3.16

0.39

10

2.90

3.88

iPLS

178

180

2.41

0.64

2.92

0.55

10

2.97

5.52

enPLS

175

400

1.71

0.82

2.04

0.74

na

2.62

7.01

MASS

173

258

2.00

0.74

2.28

0.66

10

2.93

4.04

VCPA

178

11

2.36

0.65

2.52

0.60

10

3.11

4.64

svmLinear

178

1851

na

na

2.83

0.59

na

2.70

4.29

Dataset 2

PLS

156

1851

1.85

0.83

2.28

0.74

10

2.71

4.08

iPLS

156

120

1.54

0.93

1.20

0.90

10

2.41

3.88

enPLS

152

300

0.81

0.97

1.05

0.95

na

1.89

4.19

MASS

153

385

0.87

0.96

1.10

0.94

10

2.41

4.33

VCPA

156

10

1.88

0.82

2.08

0.78

10

2.49

3.29

svmLinear

156

1851

na

na

1.89

0.81

na

2.13

4.60

Dataset 3

PLS

160

1851

2.05

0.80

2.61

0.70

10

2.85

5.53

iPLS

160

60

1.97

0.81

2.41

0.78

10

2.29

5.61

enPLS

158

350

0.76

0.97

1.44

0.90

na

1.96

4.29

MASS

158

441

1.24

0.93

1.59

0.88

10

2.06

4.17

VCPA

160

10

1.95

0.82

2.05

0.80

8

2.55

3.40

svmLinear

160

1851

na

na

1.94

0.82

na

2.23

3.76

Dataset 4

PLS

200

1851

2.10

0.76

2.60

0.64

10

2.43

5.18

iPLS

200

60

1.71

0.84

2.17

0.80

10

2.41

4.05

enPLS

195

350

0.85

0.96

1.32

0.90

na

1.49

3.56

MASS

196

140

1.55

0.87

1.78

0.82

10

1.98

3.95

VCPA

200

11

2.28

0.71

2.39

0.69

7

2.72

6.44

svmLinear

200

1851

na

na

1.99

0.77

na

1.74

4.32

Dataset 5

PLS

334

1851

2.94

0.50

3.16

0.43

10

3.42

3.57

iPLS

334

180

2.50

0.64

2.76

0.58

10

2.72

6.70

enPLS

330

200

1.77

0.82

2.07

0.75

na

3.10

4.69

MASS

329

466

2.20

0.71

2.36

0.67

10

3.10

3.67

VCPA

334

12

2.82

0.54

2.89

0.51

8

3.70

4.79

svmLinear

334

1851

na

na

2.66

0.63

na

2.81

3.70

Dataset 6

PLS

494

1851

3.24

0.43

3.50

0.34

10

3.29

3.43

iPLS

494

120

3.21

0.44

3.36

0.41

8

2.99

5.01

enPLS

479

300

1.76

0.83

2.21

0.73

na

2.77

3.33

MASS

492

482

2.58

0.64

2.83

0.56

10

3.08

2.96

VCPA

494

10

3.43

0.47

3.15

0.46

10

3.43

2.48

svmLinear

494

1851

na

na

2.68

0.61

na

2.78

3.49

  1. Note: eEach of the six datasets were used to generate models using six regression algorithms. The root mean squared error (RMSE) is presented for the calibration, cross-validation and validation sets, and independent test set 1. This measure (with units of “days”) allows for an approximation of how much error is present across the range of ages present in each dataset
  2. Abbreviations: No. of var. number of variables used, RMSEC root mean squared error of calibration, R 2 Cal coefficient of variation of calibration, RMSECV root mean squared error of cross-validation, R 2 CV coefficient of variation of cross-validation based on the actual vs predicted ages of the average of the 5 or 10 fold cross-validation, LV number of latent variables used in PLS regression (if applicable), RMSEV root mean squared error of validation set, RMSEP-ITS1 root mean squared error of prediction for independent test set 1, na not available for RMSEC/ R2Cal values (was not calculated natively in the implementation of svmLinear) or not applicable for LV (due to use of ensemble models in enPLS and not used in support vector machines)