In this study, we developed practical models for predicting malaria vector breeding sites using topographic variables. The two models using different DEMs had similar performance and their accuracies were quite good for both training and testing sites. The present study confirmed that the use of multiple topographic variables in combination is effective for predicting larval habitats of malaria vectors. Elevation and slope have direct effects on surface water flow, since water flows from high to low elevations. More complex variables such as plan and profile curvatures, CI, TWI, and TPI are also related to surface water flow. These variables are correlated, but somewhat different from each other. Thus the multivariate logistic model performed much better than did the univariate models. Previous studies using both topographic and land-cover variables have succeeded in predicting vector breeding sites with high accuracies. For example, one of the models developed by Clennon et al. had an AUC >0.95. The accuracy of our models was not as high as this. However, our model was able to indentify high-risk areas that made up about half of the total area but included nearly 80% of the breeding sites, which would be helpful to reduce target areas for vector control. We consider this an acceptable level of accuracy for predicting breeding sites.
Our model has fundamental practical advantages over the previous studies. First, because our models require only freely available DEMs that cover most of the land surface area (all of tropical area), they can be applied even in places where satellite images of good quality are not available. Furthermore, the present analyses were mainly conducted using free software. This permits easy evaluation and application of our models to a prospective study area. The only necessary resources are a personal computer, an internet connection, and moderate skill in use of the computer. This is an economic advantage of our models, particularly in countries with limited resources. Second, our models were developed using the results of an extensive survey over the entire area of a large island (42 km2). The extent of our study site was larger than that of previous studies [12, 13]. Given that the island consists of a range of topographic features and the survey identified various types of breeding sites, we expect that our models may be applicable to a variety of environments with differing topography and with different types of natural breeding sites. Third, in the present study we tested the accuracy of the model predictions in an independent area and confirmed good performance. Assessing the predictiveness of models in an independent area from the training site is considered the best approach for model testing and for evaluating model applicability [48, 49]. Previous studies did not examined model performance using independent data sets [13, 16].
It is of interest how far our model can be applied to the area of different geographical settings. The absolute probability of water pooling in an area of a certain topographic feature must be different in areas with different levels of precipitation and different soil types. However, it is possible that the topographic models can predict relative likelihood of water retention as far as water is more likely to retain in valley bottoms or plains near the foot of a mountain than in steep slopes. We hope that our model will be tested in different geographical settings for further validation.
It would be expected that breeding sites could be predicted more accurately with higher-resolution data, holding the other conditions constant. However, in the present study, the lower-resolution (90-m) SRTM DEM performed slightly better than the higher-resolution (30-m) ASTER DEM. One possible reason for this is a difference in the method used to measure elevation: SRTM measures elevation by receiving a radar signal that bounces off the earth’s surface while, ASTER estimates elevation by comparing two optical images taken at a certain interval. SRTM is the more direct measurement and thus should be more accurate if the resolution is the same. The elevation of the surface of Lake Victoria was identified as 1134 m in SRTM and varied from 1126 to1128 m in ASTER. The former appears to be closer to the actual elevation of Lake Victoria . This result agrees with Clennon et al., who reported that information derived from SRTM performed better than ASTER data in predicting breeding sites.
It was also unexpected that the accuracy of our models was better at the testing site with independent data than at the training site where the models were developed. This appears to be a deviation from the norm, because models are optimized to the training site. One possible explanation is that a large percentage of the southern test area was sloped and mountainous, and thus was predicted as a low risk-area. Breeding sites were concentrated near the lake shore, and model performance may be higher in such geographic settings. Another possible reason could be different climate conditions in the two study periods. Survey on the training site was carried out in the peak of the rainy season (April) of 2006; whereas the testing site was surveyed towards the end of the rainy season (May) of 2010. Considerable portions of the potential breeding sites were dried up in May 2010, as there was little rainfall during 10 days prior to this survey (unpublished data). It is possible that only stable habitats remained in the testing sites in the survey period and they are relatively easy to predict.
The predictiveness of our models varied with habitat type. Although the predictability was better than random for all habitat types, drainage/ditches, puddles, footprints, swamps, and tire tracks were highly predictable while rock pools and river beds were less predictable. High predictability for tire tracks and footprints could be related to the models’ ability to predict flat areas in low-lands where small depressions in the land surface such as foot-prints and tire tracks are likely to retain water. These sites may have similar topographic features to swamps, and are likely to occur around the fringes of swamps. These results suggest that our models are most useful if the main breeding sites are swamps and foot-prints. Low predictability of riverbed was unexpected, because riverbed pools are formed along drying streams, which would be easily predicted by topography. When we displayed the riverbed habitats on the maps, they are not well located along the river channel. We suspect that miss-classification of this type of habitat is the reason of low predictability of riverbed.
There are some limitations to the present study. First, because we did not use land-cover information for simplicity, predictive power may have been limited. This would be particularly important in areas where the vector species prefers specific land-cover types and the land-cover types in the area of interest are heterogeneous. Both the training and testing sites in the present study had been entirely deforested and thus had relatively homogeneous landscapes. In such areas, land-cover would not be an important limiting factor for the vectors. This may be one possible reason for the high accuracy of our models. Similarly, if the target area consists of sub-areas of different soil types that differ in capacity of holding water, predictability of our models would be limited. When these problems with land-cover and soil types seriously limit the predictability of simple topographic models, the use of satellite imagery should be considered as in the previous studies [12, 13].
Second, we used the results of an extensive survey that was carried out once in the rainy season (April). Therefore, our model may predict breeding sites well in the rainy season, but not in the dry season. Given that the locations of breeding sites would likely be different in the rainy and dry seasons [12, 54], another model may be necessary to predict dry season breeding sites.
Third, because we prioritized model simplicity, we assumed linear relationships between predictors and the logit of likely presence and did not consider any non-linear relationships or interactions among variables. For example, because mosquito larvae never occur in either dry soil or fast running water, a unimodal relationship may occur between larval habitat and certain indices such as TWI over a wide range. It is possible that the simplicity of our model sacrificed predictability to some extent.
Fourth, our survey did not distinguish species of Anopheles mosquitoes. Because different species may prefer aquatic sites of different topographic features, the fit of the model would be greatest when each species was treated separately. In our models, the relationship between topographic features and the presence of larval habitats would be less clear compared with the species specific models. In the study area, Anopheles gambiae species complex (A. gambiae and A. arabiensis) were found as the majority of the larval samples and A. funestus occurred less frequently in the rainy season. Application of our models to areas with different vector species should be conducted with caution.
Fifth, since we did not record aquatic sites without Anopheles larvae we could not model formation of aquatic sites. Formation of standing water on the surface ground is purely a physical process, so should be best predicted by topographic variables. Occurrence of Anopheles larvae may also depend on biological factors, such as water quality, occurrence of predators, and proximity to the blood meal, etc. When the observed habitats were overlaid with the model projections, (Figures 2, 3, 4 and 5), some areas are shown with high risk but no habitats. These might be aquatic sites that are not inhabited by anopheline larvae. It is desirable to have two different models that predict aquatic sites and anopheline habitats.
Lastly, it should be noted that the topographical model has a fundamental limitation when man-made habitats are important breeding sites of the vectors. Our survey on Rusinga found that more than half of the total numbers of breeding sites were artificial ones, such as holes and containers. These breeding sites might be important especially in urban environments and an alternative way to predict breeding sites would be necessary in such a situation.
Despite these limitations, our study adds confidence to localized mosquito habitat management and provides simple solutions for habitat modeling. We have demonstrated the feasibility of predicting potential breeding habitats of Anopheles mosquitoes using topographic variables derived from freely available DEMs. Our models are expected to be useful in defining target areas for larval control. Furthermore, application of these models to large areas may help identify high-risk areas for malaria infection, because it is most likely that malaria prevalence would be higher in areas with many potential breeding sites of malaria vectors [55, 56].