Skip to main content

Topographic models for predicting malaria vector breeding habitats: potential tools for vector control managers



Identification of malaria vector breeding sites can enhance control activities. Although associations between malaria vector breeding sites and topography are well recognized, practical models that predict breeding sites from topographic information are lacking. We used topographic variables derived from remotely sensed Digital Elevation Models (DEMs) to model the breeding sites of malaria vectors. We further compared the predictive strength of two different DEMs and evaluated the predictability of various habitat types inhabited by Anopheles larvae.


Using GIS techniques, topographic variables were extracted from two DEMs: 1) Shuttle Radar Topography Mission 3 (SRTM3, 90-m resolution) and 2) the Advanced Spaceborne Thermal Emission Reflection Radiometer Global DEM (ASTER, 30-m resolution). We used data on breeding sites from an extensive field survey conducted on an island in western Kenya in 2006. Topographic variables were extracted for 826 breeding sites and for 4520 negative points that were randomly assigned. Logistic regression modelling was applied to characterize topographic features of the malaria vector breeding sites and predict their locations. Model accuracy was evaluated using the area under the receiver operating characteristics curve (AUC).


All topographic variables derived from both DEMs were significantly correlated with breeding habitats except for the aspect of SRTM. The magnitude and direction of correlation for each variable were similar in the two DEMs. Multivariate models for SRTM and ASTER showed similar levels of fit indicated by Akaike information criterion (3959.3 and 3972.7, respectively), though the former was slightly better than the latter. The accuracy of prediction indicated by AUC was also similar in SRTM (0.758) and ASTER (0.755) in the training site. In the testing site, both SRTM and ASTER models showed higher AUC in the testing sites than in the training site (0.829 and 0.799, respectively). The predictability of habitat types varied. Drains, foot-prints, puddles and swamp habitat types were most predictable.


Both SRTM and ASTER models had similar predictive potentials, which were sufficiently accurate to predict vector habitats. The free availability of these DEMs suggests that topographic predictive models could be widely used by vector control managers in Africa to complement malaria control strategies.


Human malaria is the most serious parasitic disease in the tropics. The present global malaria control strategy using long-lasting insecticidal nets (LLINs), indoor-residual spraying (IRS), and artemisinin-combination therapies (ACTs) have decreased morbidity and mortality due to malaria worldwide [1, 2]. For further reduction in malaria transmission, supplemental strategies to LLINs, IRS, and ACTs are essential [24].

Targeting the immature stage may be a possible supplemental vector-control strategy, because LLINs and IRS are used to kill or repel only adult mosquitoes [511]. In recent larval control trials in Africa, successful results with hand-applied insecticides were limited to settings where mosquito larval habitats were well defined and not extensive [2], suggesting that defining target areas for larval control is essential. However, it is generally not an easy task to identify larval habitats over a large area. Extensive surveys of breeding sites are expensive, time-consuming, and labor-intensive, thus not feasible in countries with limited resources. Therefore, it would be beneficial to have practical models that can predict the locations of malaria vector breeding sites from easily obtainable information.

A few attempts have been made to predict malaria vector breeding sites based on remote sensing and topographic information [12, 13]. Mushinzimana et al.[12] modelled the presence of larval habitats in Kenyan highlands using land-cover variables derived from remote sensing (LANDSAT, IKONOS, and aerial photo) images and topographic variables. Their model predicted larval habitats with nearly 80% sensitivity. Clennon et al.[13] examined various combinations of land-cover and topographic variables to predict larval habitats in southern Zambia. Their model, using land-cover variables derived from LANDSAT imagery and topographic variables derived from digital elevation models (DEMs), successfully predicted the occurrence of aquatic sites and larval habitats of malaria vectors. Similar models have been developed to predict areas with risk of malaria infection using land-cover and topographic variables [14, 15].

These modeling approaches are effective for areas for which such land-cover and topographic information are available. However, it may not be generally practical to develop complex models using land-cover and topography variables, because satellite imagery and high-quality aerial photographs are sometimes not available, expensive, or not useful owing to cloud formations. Moreover, land-cover classification is a tedious and time-consuming task. On the other hand, topographic information is now freely available as DEMs for nearly the entire world and can be processed with free software. Given that, topography has fundamental importance in controlling surface water flow and pooling, it should have potential for predicting areas where suitable water bodies for malaria vector breeding would form. In previous studies, topographic variables have frequently been identified as important predictors of high-risk areas for malaria infection [14, 16, 17] and vector breeding sites [12, 13, 17].

In this study, we developed practical models that require only topographic information to predict the location of malaria vector breeding sites with acceptable accuracy. We used the results of an extensive survey of an entire large island in Lake Victoria. The island consists of various areas with differing topographic features, such as mountain peaks, cliffs, gentle slopes, streams, plains, swamps, etc.; thus, we expect that our model will be widely applicable to various environments with a range of topographic features.


Study area

The primary study area covered the entire area (42 km2) of Rusinga Island, Mbita District in western Kenya (Figure 1). The island is the second-largest island in the Kenyan part of Lake Victoria and has been extensively deforested and cultivated. Streams are seasonal, and the main water source for the population is the lake. The rainfall pattern in the area is bimodal, with an extended rainy season occurring from March through May, and a shorter rainy season around November. In 1983, the island was connected to the mainland with a 200-m-long causeway [18]. To test the accuracy of the models developed based on the island, we considered another study area called Nyamanga located on the adjacent main land (Figure 1). The extent of this area was approximately 9 km2. Within both sites, most houses are constructed of a stick framework plastered with a mixture of mud and cow dung, with a corrugated iron roof. Few houses have more than two rooms. The majority of residents belong to the Luo ethnic group. Although Dholuo is the main language spoken, many residents speak English and Kiswahili. The main economic activities are fishing and farming.

Figure 1
figure 1

Location of the study area. (Right) Map of Kenya showing the location of the study area. (Left) Study area showing the training site (Rusinga) and the testing site (Nyamanga).

Mosquito larval habitat survey

An extensive survey of mosquito larval habitats covering the entire area of Rusinga Island was conducted in April 2006. Prior to the field survey, the habitat information on Rusinga Island was provided by the community based malaria control project [18]. Field assistants visited the breeding sites that had been monitored by the project members and examined for anopheline larvae. The assistants also searched for other water pools throughout the island. Each potential breeding site was examined for anopheline larvae using a standard mosquito dipper (350 ml; BioQuip Products, Rancho Dominguez, California, USA). Field assistants dipped a maximum of 50 times within each site. Breeding sites were categorized into tire tracks, footprints, drains and ditches, swamps, riverbeds, and puddles, artificial containers and holes, and tree holes. When multiple footprints were present in an area, they were considered as a single site. Habitats within the lake were excluded from the survey [1922]. Artificial holes and containers, and tree holes were excluded in the model development because it is unlikely that formation of these habitats is affected by topographic variables. The coordinates of each site with anopheline larvae were recorded using a global positioning system (GPS). Anopheline larvae were not identified to species in this survey. However, previous studies have found that anopheline species from non-lake habitats in this area are mainly members of the Anopheles gambiae complex and the Anopheles funestus complex, which are important malaria vectors [1923]. Supplementary data were collected in the course of a longitudinal survey in Nyamanga in 2010. Before the longitudinal survey, we identified 160 potential breeding sites at which we confirmed anopheline larvae or we considered that larvae likely occurred. These sites in Nyamanga had been surveyed routinely by field assistants. The sites where the presence of anopheline larvae was confirmed in May 2010 were used as positive sites in validation of the models.

Digital elevation models (DEMs)

We used two digital elevation models (DEMs): 1) the Shuttle Radar Topography Mission 3 (SRTM3, 90-m resolution) DEM and 2) the Advanced Spaceborne Thermal Emission Reflection Radiometer Global DEM (ASTER GDEM, 30-m resolution). The SRTM DEM was collected during a space shuttle mission in 2000 using a multi-frequency and multi-polarization radar system. The absolute vertical and horizontal accuracies were set to ≤ 10 m and ≤ 20 m at the 90% level, respectively [24]. The data covers land surface between 60 degree N and 54 degree S and can be obtained freely from the National Aeronautics and Space Administration (NASA) web-site [24]. ASTER GDEM was generated from a large amount of ASTER images by automated processing using a stereo correlation method [25]. The vertical and horizontal accuracies estimated for the ASTER GDEM prior to its production were 20 meters and 30 meters respectively (both at 95% confidence level) [25]. This DEM covers land surface between 83 degree N to 83 degree S, and can also be downloaded freely [25]. More detailed technical information on these DEMs is available in the following websites [24, 25].

Both DEMs are available in geographic coordinates (latitude and longitude in decimal degrees). However, for easier interpretation of the spatial scale in metric units, we converted both DEMs to the Universal Transverse Mercator (UTM) projection system (Zone 36 South) using the Systems for Automated-Geoscientific Analyses (SAGA) [26].


Although a predictive model is often, and perhaps best, built using techniques such as logistic regression modeling that relies on both presence and absence data [27, 28], in our study, absence data were unavailable because the survey was designed to report only positive breeding sites. As an alternative, we generated random points within the study area and treated them as negative cases. Theoretically, the statistical power would be greater with larger numbers of pseudo-negatives [29]. However, as this study uses grid data with finite resolutions, multiple points in the same grid would be redundant. There are approximately 5,000 grids on Rusinga Island in SRTM DEM with 90 m resolution and the number of natural larval habitats was 826 (See Results). Therefore, we attempted to generate an approximately 5 times larger number of pseudo-absence points compared with the positive points, so that the total number of the points is close to 5,000. Using Microsoft Excel 2007, we initially generated 20,000 coordinates located within the extent of the island. The elevation of each point was then extracted from the SRTM DEM using SAGA [26]. Points with elevation below 1135 m were regarded as being on the lake and were therefore removed. Then points on another small island and the cape of the main-land were manually removed. The random points which are very close to the observed positive points are likely to have the same or very similar topographic features as the nearest positive sites, which may lessen the correlation between topographic variable and the likely occurrence of the habitats. Therefore, we removed the random points within 50 m from the positive sites. The cut-off distance of 50 m was chosen following Mushizimana et al.[12]. By this procedure, approximately 10% of the random points on the islands were removed. Thus, we obtained 4524 random points on the island that were at least 50 m distant from positive sites and treated them as negative points. The use of such pseudo-absence data in modeling is a recognized technique [3032].

Topographic variables

Eight topographic variables determined by local structure that potentially influence water content were extracted from the DEMs using the Terrain Analysis module of SAGA. First, the DEMs were smoothed to fill in isolated elevation pits (or spikes), which typically represent errors or areas of internal drainage that interrupt the estimate of water flow (Pre-processing, Fill Sink; Planchon and Darboux, 2001) [33]. Basic Terrain Analysis (BTA) was applied to the pre-processed DEM. Of the 11 variables derived from BTA, 6 were used in the analysis: slope, aspect, plan curvature, profile curvature, convergence index, and wetness index. In addition, two different scales of topographic position index (TPI) were calculated. In total, 9 variables including pre-processed elevation were derived from each DEM and examined as model predictors. The implications of the variables in terms of surface water accumulation are summarized below.

Elevation is a fundamental physical parameter defining soil-water gravitational potential energy [34] and is the primary influence on water movement throughout a landscape as well as within drainage channels. To enhance the applicability of our models to other areas with different elevation ranges, we converted the original elevation to relative elevations defined as the original elevation minus the lowest elevation in the area. The elevation of the surface of Lake Victoria was set as the lowest elevation for this analysis.

Slope is a measure of the change in elevation over a certain distance, or the difference in elevation between neighbouring cells, expressed as an angle from 0 to 90º. Slope has a strong influence on overland and subsurface flow velocity, drainage, and accumulation of water [35, 36].

The aspect of a land surface is the orientation that the slope faces, ranging from 0 to 360º. It determines the amount of sunlight a site receives. This may affect mosquito larval survival [12, 35]. In this study, aspect was cosine-transformed so that the values ranged from −1 (south-facing slope) to +1(north-facing slope).

Curvature is a measure of the rate of change of a slope per unit distance [37]. Curvature theoretically ranges from −1 to +1 and can be categorized into profile and plan curvatures. Profile curvature is parallel to the direction of the maximum slope. A negative value indicates that the surface is upwardly convex at that cell, and a positive value indicates that the surface is upwardly concave; a value of zero indicates that the surface is linear. Profile curvature affects the acceleration or deceleration of flow across the surface. Plan curvature is perpendicular to the direction of maximum slope. A positive value indicates the surface is sidewardly convex at that cell, and a negative value indicates the surface is sidewardly concave; a value of zero indicates the surface is linear. Plan curvature relates to the convergence and divergence of flow across a surface. Considering both profile and plan curvature together allows for a more accurate understanding of flow across a surface [38, 39].

Convergence index (CI) is used to determine whether water flow from neighbouring cells diverges (positive values to 100) or converges (negative values to −100). Convergence is calculated using the direction of water flow between adjacent cells based on the aspects of neighbouring cells [40].

Topographic wetness index (TWI) predicts soil moisture based on the assumption that a point with a larger upslope contribution area has greater inflow of surface water and that a point with a shallower slope has less outflow. TWI is calculated using the ratio of the upslope contributing area (A) to the tangent of local slope (tan β). Theoretically, TWI ranges from 0 to + ∞. High values of TWI are found for converging, flat terrain, while low values are typical of steep, diverging areas [41].

Topographic position index (TPI) is the deviation of a point elevation from the specified local mean, calculated by dividing the elevation difference by its standard deviation. TPI ranges from -∞ to + ∞; negative values indicate valley bottoms, while positive values signify areas such as hilltops and ridges [13, 42]. Many physical and biological processes acting on the landscape are highly correlated with topographic positions such as hilltops, valley bottoms, exposed ridges, flat plains, upper or lower slopes etc. Examples of these processes include solar radiation, hydrologic balance and response, wind exposure, etc. These biophysical attributes in turn are key predictors of habitat suitability, community composition, and species distribution and abundance [42]. TPI is an inherently scale-dependent factor; thus, both a local and an area-wide scale were considered (500 m and 2000 m, respectively). The 500-m neighbourhood assists in detecting local valleys and hills, while the 2000-m neighbourhood enables identification of larger-scale features such as large U-shaped valleys, gently sloping hills, and the tops of plateaus [13].

Statistical analyses for model development

We used logistic regression models to select variables that explained the presence and absence of anopheline larval habitats [4345]. We first used univariate logistic regression to screen potentially important variables (P < 0.05), before conducting multiple regression modeling [45]. To avoid co-linearity among variables, we checked for correlation among the variables. When two variables were highly correlated with a Pearson’s correlation coefficient > 0.8, the variable with the larger Akaike information criterion (AIC) in the univariate logistic regression was removed [13]. Variables retained in the final models were selected using both forward and backward procedures (Step function in R, statistical software version 2.13 [46]). The criterion for model selection was based on the AIC [47]. All statistical analyses were performed using R 2.13. To retain a relatively simple model, second-order or higher-order interactions were not fitted. The equations generated from the logistic regression analysis for each model were applied to the topographic variables for each grid to generate risk maps for anopheline larval habitat occurrence.

We further examined the models with random variables with and without spatial autocorrelation. Description of the model and the results are shown in the Additional file 1.

Evaluating model predictions

Assessing the predictive ability of a model is a critical step in allowing its proper applications [48, 49]. We evaluated the predictiveness of our models using independent breeding site data obtained from Nyamanga in 2010. For this assessment, we employed the receiver operating characteristic (ROC) approach [5052]. The area under the ROC curve (AUC) provides an assessment of model performance and predictive power [49]. AUC values range from 0 to 1, where a value of 0.5 indicates model accuracy not better than random and a value of 1.0 indicates a perfect model fit [50].

We also examined the applicability of our models to various types of natural breeding sites. For both the training and testing sites, the two models were separately fitted for each type of breeding site. All of the pseudo-absence points were used in each case and the AUC was used to compare the predictability of different habitat types.

Visualization of the models

To represent the models in map form, we applied the resulting logistic regression model to the topographic variables for each grid, using the relationship:

p = 1 / 1 + e f
f = b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 + + b n x n

where b 0 is the intercept and b 1 to b n are the coefficients of the topographic variables x 1 to x n , respectively. A map illustrating the P value was generated using the grid calculator function in the SAGA.

For easy interpretation we set cut-off values of P so that it divided the training site into high- and low-risk areas of nearly the same extent. Then we counted the numbers of positive sites located in the high- and low-risk sites.


Survey results

On Rusinga Island, Anopheles larvae were present at 2137 aquatic sites during the survey conducted in April 2006. Of these, 1129 were in artificial containers and holes. As our purpose was to develop models to predict breeding sites from topographic variables, these artificial breeding sites were excluded in the model development. Thirty-two water bodies in tree cavities were also excluded for the same reason. GPS coordinates or habitat type information was missing for 144 sites. Ultimately, 826 natural breeding sites of 7 different types were included in the analysis (Table 1). In Nyamanga, out of the 160 potential breeding sites identified previously, many were dried up and anopheline larvae were confirmed in 54 sites of 5 types in May 2010 (Table 1).

Table 1 Number (%) of natural breeding sites of malaria vectors in training and testing sites

Simple regression analysis

The simple logistic regression analyses for both DEMs revealed that all topographic variables are significantly correlated with vector breeding sites except aspect in SRTM (Table 2). For both DEMs, breeding sites were positively correlated with TWI and negatively correlated with all other variables. For the SRTM, TWI was the best predictor of breeding sites followed by TPI500 and slope. For the ASTER, slope was the best predictor followed by elevation and TWI. Among all univariate models for the two DEMs, TWI in the SRTM model had the smallest AIC and thus was considered the best model.

Table 2 Summary of univariate logistic regression on malaria vector breeding sites with topographic variables extracted from the two different DEMs

Correlations between predictive variables

For both SRTM and ASTER, topographic variables that were associated with occurrence of breeding sites in the simple regression were significantly correlated each other (Table 3). However, a high correlation (Pearson coefficient > 0.8) was observed only between the TPI500 and CI for the SRTM DEM. CI was excluded from the multivariate model with SRTM because it had larger AIC than TPI500 in the simple regression (Table 2). In the ASTER model, all pairs of the variables had Pearson coefficient < 0.8, and thus all 9 variables were used in the multivariate model (Table 4). TWI was negatively correlated with all of the other variables (Tables 3 and 4). The other variables were positively correlated with each other, except for slope and profile curvature, which had weak negative correlation in SRTM and ASTER (Tables 3 and 4). For ASTER, cosine aspect also showed weak negative correlations with plan curvature and profile curvature. Although these patterns were similar for SRTM and ASTER, the correlations were slightly but consistently stronger for variables from SRTM.

Table 3 Pearson correlation coefficients between topographic variables extracted from SRTM DEM
Table 4 Pearson correlation coefficients between topographic variables extracted from ASTER DEM

Multiple logistic regression analyses

Multiple logistic regressions were applied to derive models predicting potential breeding sites. The models from the two DEMs showed similar performance in terms of AIC and AUC (Tables 5 and 6). In the final SRTM model with the minimum AIC, each of the 7 variables entered were retained while in the ASTER model, CosAspect and TPI2000 were excluded from the final model (Table 5). The SRTM model had a slightly smaller AIC value than the ASTER model, indicating better performance of the SRTM model. The AUC of the SRTM model with the training data set was 0.758, slightly better than that of the ASTER model (0.755; Table 6).

Table 5 Multiple logistic regression models using SRTM and ASTER DEMs
Table 6 Accuracy of prediction by the two models in the training and testing sites expressed as the area under curve (AUC) of the Receiver Operating Characteristics (ROC) curve, and sensitivity and specificity

For the testing site, both models had higher AUC scores than for the training site (Table 6). As with the training site, the SRTM model had a higher AUC value (0.829) than the ASTER model (0.799, Table 6).

When random effects was included to the models to account for spatial dependencies between close sites, the models showed better fit than those without random effect in the training sites. However, they are not better than the simple logistic models when applied to the testing site (results shown in Additional file 1).

Applicability to different habitat types

The performance of the models in predicting the different habitat types was compared using AUC as the indicator (Table 7). For both the SRTM and ASTER models, prediction accuracy was high for drains/ditches, foot-prints puddles and swamps. On the other hand, accuracy was relatively low for rock pools and river beds. High predictability for drains/ditches and swamps was also observed at the testing site, where puddles were also predicted with high accuracy. When predictability was examined separately for each habitat type, the SRTM was not a clear improvement over ASTER.

Table 7 Accuracy of the model prediction of different types of breeding sites

Visualization of the models

For both the SRTM and ASTER models, the visualized maps showed good fitting with the observed locations of breeding sites.

For the SRTM model, the high risk area with p > 0.123 made up 49.8% of the total area of the island, and contained 658 (79.7%) breeding sites (Figure 2). The high-risk areas with p > 0.130 in the ASTER model made up 49.7% of the total area and contained 675 (81.7%) breeding sites (Figure 3).

Figure 2
figure 2

SRTM model: the likelihood of the presence of breeding sites in Rusinga based on logistic regression modeling with the topographic variables presented in Table5. Observed breeding sites are indicated with white dots.

Figure 3
figure 3

ASTER model: the likelihood of the presence of breeding sites in Rusinga based on logistic regression modeling with the topographic variables presented in Table5. Observed breeding sites are indicated with white dots.

High accuracy of the model prediction in the testing site was indicated visually in the maps (Figures 4 and 5). The area with p > 0.123 in the SRTM model made up 44.7% of the total area and contained 47 (88.0%) breeding sites (Figure 4). For the ASTER model, the high-risk areas with p > 0.130 made up 48.1% of the total area and contained 48 (88.9%) breeding sites (Figure 5).

Figure 4
figure 4

SRTM model: the likelihood of the presence of breeding sites in Nyamanga based on logistic regression modeling with the topographic variables presented in Table5. Observed breeding sites are indicated with white dots.

Figure 5
figure 5

ASTER model: the likelihood of the presence of breeding sites in Nyamanga based on logistic regression modeling with the topographic variables presented in Table5. Observed breeding sites are indicated with white dots.


In this study, we developed practical models for predicting malaria vector breeding sites using topographic variables. The two models using different DEMs had similar performance and their accuracies were quite good for both training and testing sites. The present study confirmed that the use of multiple topographic variables in combination is effective for predicting larval habitats of malaria vectors. Elevation and slope have direct effects on surface water flow, since water flows from high to low elevations. More complex variables such as plan and profile curvatures, CI, TWI, and TPI are also related to surface water flow. These variables are correlated, but somewhat different from each other. Thus the multivariate logistic model performed much better than did the univariate models. Previous studies using both topographic and land-cover variables have succeeded in predicting vector breeding sites with high accuracies. For example, one of the models developed by Clennon et al.[13] had an AUC >0.95. The accuracy of our models was not as high as this. However, our model was able to indentify high-risk areas that made up about half of the total area but included nearly 80% of the breeding sites, which would be helpful to reduce target areas for vector control. We consider this an acceptable level of accuracy for predicting breeding sites.

Our model has fundamental practical advantages over the previous studies. First, because our models require only freely available DEMs that cover most of the land surface area (all of tropical area), they can be applied even in places where satellite images of good quality are not available. Furthermore, the present analyses were mainly conducted using free software. This permits easy evaluation and application of our models to a prospective study area. The only necessary resources are a personal computer, an internet connection, and moderate skill in use of the computer. This is an economic advantage of our models, particularly in countries with limited resources. Second, our models were developed using the results of an extensive survey over the entire area of a large island (42 km2). The extent of our study site was larger than that of previous studies [12, 13]. Given that the island consists of a range of topographic features and the survey identified various types of breeding sites, we expect that our models may be applicable to a variety of environments with differing topography and with different types of natural breeding sites. Third, in the present study we tested the accuracy of the model predictions in an independent area and confirmed good performance. Assessing the predictiveness of models in an independent area from the training site is considered the best approach for model testing and for evaluating model applicability [48, 49]. Previous studies did not examined model performance using independent data sets [13, 16].

It is of interest how far our model can be applied to the area of different geographical settings. The absolute probability of water pooling in an area of a certain topographic feature must be different in areas with different levels of precipitation and different soil types. However, it is possible that the topographic models can predict relative likelihood of water retention as far as water is more likely to retain in valley bottoms or plains near the foot of a mountain than in steep slopes. We hope that our model will be tested in different geographical settings for further validation.

It would be expected that breeding sites could be predicted more accurately with higher-resolution data, holding the other conditions constant. However, in the present study, the lower-resolution (90-m) SRTM DEM performed slightly better than the higher-resolution (30-m) ASTER DEM. One possible reason for this is a difference in the method used to measure elevation: SRTM measures elevation by receiving a radar signal that bounces off the earth’s surface while, ASTER estimates elevation by comparing two optical images taken at a certain interval. SRTM is the more direct measurement and thus should be more accurate if the resolution is the same. The elevation of the surface of Lake Victoria was identified as 1134 m in SRTM and varied from 1126 to1128 m in ASTER. The former appears to be closer to the actual elevation of Lake Victoria [53]. This result agrees with Clennon et al.[13], who reported that information derived from SRTM performed better than ASTER data in predicting breeding sites.

It was also unexpected that the accuracy of our models was better at the testing site with independent data than at the training site where the models were developed. This appears to be a deviation from the norm, because models are optimized to the training site. One possible explanation is that a large percentage of the southern test area was sloped and mountainous, and thus was predicted as a low risk-area. Breeding sites were concentrated near the lake shore, and model performance may be higher in such geographic settings. Another possible reason could be different climate conditions in the two study periods. Survey on the training site was carried out in the peak of the rainy season (April) of 2006; whereas the testing site was surveyed towards the end of the rainy season (May) of 2010. Considerable portions of the potential breeding sites were dried up in May 2010, as there was little rainfall during 10 days prior to this survey (unpublished data). It is possible that only stable habitats remained in the testing sites in the survey period and they are relatively easy to predict.

The predictiveness of our models varied with habitat type. Although the predictability was better than random for all habitat types, drainage/ditches, puddles, footprints, swamps, and tire tracks were highly predictable while rock pools and river beds were less predictable. High predictability for tire tracks and footprints could be related to the models’ ability to predict flat areas in low-lands where small depressions in the land surface such as foot-prints and tire tracks are likely to retain water. These sites may have similar topographic features to swamps, and are likely to occur around the fringes of swamps. These results suggest that our models are most useful if the main breeding sites are swamps and foot-prints. Low predictability of riverbed was unexpected, because riverbed pools are formed along drying streams, which would be easily predicted by topography. When we displayed the riverbed habitats on the maps, they are not well located along the river channel. We suspect that miss-classification of this type of habitat is the reason of low predictability of riverbed.

There are some limitations to the present study. First, because we did not use land-cover information for simplicity, predictive power may have been limited. This would be particularly important in areas where the vector species prefers specific land-cover types and the land-cover types in the area of interest are heterogeneous. Both the training and testing sites in the present study had been entirely deforested and thus had relatively homogeneous landscapes. In such areas, land-cover would not be an important limiting factor for the vectors. This may be one possible reason for the high accuracy of our models. Similarly, if the target area consists of sub-areas of different soil types that differ in capacity of holding water, predictability of our models would be limited. When these problems with land-cover and soil types seriously limit the predictability of simple topographic models, the use of satellite imagery should be considered as in the previous studies [12, 13].

Second, we used the results of an extensive survey that was carried out once in the rainy season (April). Therefore, our model may predict breeding sites well in the rainy season, but not in the dry season. Given that the locations of breeding sites would likely be different in the rainy and dry seasons [12, 54], another model may be necessary to predict dry season breeding sites.

Third, because we prioritized model simplicity, we assumed linear relationships between predictors and the logit of likely presence and did not consider any non-linear relationships or interactions among variables. For example, because mosquito larvae never occur in either dry soil or fast running water, a unimodal relationship may occur between larval habitat and certain indices such as TWI over a wide range. It is possible that the simplicity of our model sacrificed predictability to some extent.

Fourth, our survey did not distinguish species of Anopheles mosquitoes. Because different species may prefer aquatic sites of different topographic features, the fit of the model would be greatest when each species was treated separately. In our models, the relationship between topographic features and the presence of larval habitats would be less clear compared with the species specific models. In the study area, Anopheles gambiae species complex (A. gambiae and A. arabiensis) were found as the majority of the larval samples and A. funestus occurred less frequently in the rainy season. Application of our models to areas with different vector species should be conducted with caution.

Fifth, since we did not record aquatic sites without Anopheles larvae we could not model formation of aquatic sites. Formation of standing water on the surface ground is purely a physical process, so should be best predicted by topographic variables. Occurrence of Anopheles larvae may also depend on biological factors, such as water quality, occurrence of predators, and proximity to the blood meal, etc. When the observed habitats were overlaid with the model projections, (Figures 2, 3, 4 and 5), some areas are shown with high risk but no habitats. These might be aquatic sites that are not inhabited by anopheline larvae. It is desirable to have two different models that predict aquatic sites and anopheline habitats.

Lastly, it should be noted that the topographical model has a fundamental limitation when man-made habitats are important breeding sites of the vectors. Our survey on Rusinga found that more than half of the total numbers of breeding sites were artificial ones, such as holes and containers. These breeding sites might be important especially in urban environments and an alternative way to predict breeding sites would be necessary in such a situation.

Despite these limitations, our study adds confidence to localized mosquito habitat management and provides simple solutions for habitat modeling. We have demonstrated the feasibility of predicting potential breeding habitats of Anopheles mosquitoes using topographic variables derived from freely available DEMs. Our models are expected to be useful in defining target areas for larval control. Furthermore, application of these models to large areas may help identify high-risk areas for malaria infection, because it is most likely that malaria prevalence would be higher in areas with many potential breeding sites of malaria vectors [55, 56].


This study has demonstrated that with the advent of freely available SRTM and ASTER DEMs, topographic models that predict malaria vector breeding habitats could be developed. These models could be more practically and widely used to complement targeted malaria control strategies. In particular, these maps can help exclude areas where breeding sites are unlikely to be present, and so help prioritize high risk areas more precisely. Targeted larval control would greatly maximize limited resources and thus, should be strongly considered in integrated malaria management in Africa.


  1. O’Meara WP, Mangeni JN, Steketee R, Greenwood B: Changes in the burden of malaria in sub-Saharan Africa. Lancet Infect Dis. 2010, 10 (8): 545-555. 10.1016/S1473-3099(10)70096-7.

    Article  PubMed  Google Scholar 

  2. Fillinger U, Lindsay SW: Larval source management for malaria control in Africa: myths and reality. Malaria J. 2011, 10: 353-10.1186/1475-2875-10-353.

    Article  Google Scholar 

  3. Beier J, Keating J, Githure J, Macdonald M, Impoinvil D, Novak R: Integrated vector management for malaria control. Malaria J. 2008, 7 (1): 4-10.1186/1475-2875-7-4.

    Article  Google Scholar 

  4. Ototo EN, Githeko AK, Wanjala CL, Scott TW: Surveillance of vector populations and malaria transmission during the 2009/10 El Niño event in the western Kenya highlands: opportunities for early detection of malaria hyper-transmission. Parasit Vectors. 2011, 4: 144-10.1186/1756-3305-4-144.

    Article  PubMed Central  PubMed  Google Scholar 

  5. Fillinger U, Ndegwa B, Githeko A, Lindsay SW: Integrated malaria vector control with microbial larvicides and insecticide treated nets in the western Kenyan highlands: a controlled trial. Bull World Health Organ. 2009, 87: 655-665. 10.2471/BLT.08.055632.

    Article  PubMed Central  PubMed  Google Scholar 

  6. Li L, Ling B, Laith Y, Guofa Z, Guiyun Y: Temporal and spatial stability of Anopheles gambiae larval habitat distribution in Western Kenya highlands. Int J Health Geogr. 2009, 8: 70-10.1186/1476-072X-8-70.

    Article  PubMed Central  PubMed  Google Scholar 

  7. Li L, Bian L, Yakob L, Zhou G, Yan G: Analysing the generality of spatially predictive mosquito habitat models. Acta Trop. 2011, 119 (1): 30-37. 10.1016/j.actatropica.2011.04.003.

    Article  PubMed Central  PubMed  Google Scholar 

  8. Michael TW, Jamie TG, Thomas SC, Neil MF, Maria-Gloria B, Azra CG: Modelling the impact of vector control interventions on Anopheles gambiae population dynamics. Parasit Vectors. 2011, 4: 153-10.1186/1756-3305-4-153.

    Article  Google Scholar 

  9. Gouagna LC, Dehecq JS, Girod R, Boyer S, Lempérière G, Fontenille D: Spatial and temporal distribution patterns of Anopheles arabiensis breeding sites in La Reunion Island–multi-year trend analysis of historical records from 1996–2009. Parasit Vectors. 2011, 4: 121-10.1186/1756-3305-4-121.

    Article  PubMed Central  PubMed  Google Scholar 

  10. Overgaard HJ, Reddy VP, Abaga S, Matias A, Reddy MR, Kulkarni V, Schwabe C, Segura L, Kleinschmidt I, Slotman MA: Malaria transmission after five years of vector control on Bioko Island, Equatorial Guinea. Parasit Vectors. 2012, 5: 253-10.1186/1756-3305-5-253.

    Article  PubMed Central  PubMed  Google Scholar 

  11. Animut A, Gebre-Michael T, Balkew M, Lindtjørn B: Abundance and dynamics of anopheline larvae in a highland malarious area of south-central Ethiopia. Parasit Vectors. 2012, 5: 117-10.1186/1756-3305-5-117.

    Article  PubMed Central  PubMed  Google Scholar 

  12. Mushinzimana E, Munga S, Minakawa N, Li L, Feng CC, Bian L, Kitron U, Schmidt C, Beck L, Zhou G, Githeko AK, Yan G: Landscape determinants and remote sensing of anopheline mosquito larval habitats in the western Kenya highlands. Malaria J. 2006, 5: 13-10.1186/1475-2875-5-13.

    Article  Google Scholar 

  13. Clennon JA, Kamanga A, Musapa M, Shiff C, Glass GE: Identifying malaria vector breeding habitats with remote sensing data and terrain-based landscape indices in Zambia. Int J Health Geogr. 2010, 9: 58-10.1186/1476-072X-9-58.

    Article  PubMed Central  PubMed  Google Scholar 

  14. Cohen JM, Ernst KC, Lindblade KA, Vulule JM, John CC, Wilson ML: Local topographic wetness indices predict household malaria risk better than land-use and land-cover in the western Kenya highlands. Malaria J. 2010, 9: 328-10.1186/1475-2875-9-328.

    Article  Google Scholar 

  15. Moss WJ, Hamapumbu H, Kobayashi T, Shields T, Kamanga A, Clennon J, Mharakurwa S, Thuma PE, Gregory G: Use of remote sensing to identify spatial risk factors for malaria in a region of declining transmission: a cross-sectional and longitudinal community survey. Malaria J. 2011, 10: 163-10.1186/1475-2875-10-163.

    Article  Google Scholar 

  16. Cohen JM, Ernst KC, Lindblade KA, Vulule JM, John CC, Wilson ML: Topography-derived wetness indices are associated with household-level malaria risk in two communities in the western Kenyan highlands. Malaria J. 2008, 7: 40-10.1186/1475-2875-7-40.

    Article  Google Scholar 

  17. Atieli HE, Zhou G, Lee M, Kweka EJ, Afrane Y, Mwanzo I, Githeko AK, Yan G: Topography as a modifier of breeding habitats and concurrent vulnerability to malaria risk in the western Kenya highlands. Parasit Vectors. 2011, 4: 241-10.1186/1756-3305-4-241.

    Article  PubMed Central  PubMed  Google Scholar 

  18. Opiyo P, Mukabana WR, Kiche I, Mathenge E, Killeen GF, Fillinger U: An exploratory study of community factors relevant for participatory malaria control on Rusinga Island, western Kenya. Malaria J. 2007, 6: 48-10.1186/1475-2875-6-48.

    Article  Google Scholar 

  19. Minakawa N, Mutero CM, Githure JI, Beier JC, Yan G: Spatial distribution and habitat characterization of anopheline mosquito larvae in Western Kenya. AmJTrop Med Hyg. 1999, 61: 1010-1016.

    CAS  Google Scholar 

  20. Minakawa N, Seda P, Yan G: Influence of host and larval habitat distribution on the abundance of African malaria vectors in western Kenya. AmJTrop Med Hyg. 2002, 67 (1): 32-38.

    Google Scholar 

  21. Fillinger U, Sonye G, Killeen GF, Knols BGJ, Becher N: The practical importance of permanent and semipermanent habitats for controlling aquatic stages of Anopheles gambiae sensu lato mosquitoes: operational observation from a rural town in western Kenya. Trop Med Int Health. 2004, 9: 1274-1289. 10.1111/j.1365-3156.2004.01335.x.

    Article  PubMed  Google Scholar 

  22. Minakawa N, Dida GO, Sonye GO, Futami K, Njenga SM: Malaria Vectors in Lake Victoria and adjacent habitats in western Kenya. PLoS One. 2012, 7 (3): e32725-10.1371/journal.pone.0032725.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  23. Amek N, Bayoh N, Hamel M, Lindblade KA, Gimnig JE, Odhiambo F, Laserson KF, Slutsker L, Smith T, Vounatsou P: Spatial and temporal dynamics of malaria transmission in rural Western Kenya. Parasit Vectors. 2012, 5: 86-10.1186/1756-3305-5-86.

    Article  PubMed Central  PubMed  Google Scholar 

  24. The Shuttle Radar Topography Mission (SRTM). (Accessed last on July, 2012)

  25. The Advanced Spaceborne Thermal Emission Reflection Radiometer Global DEM (ASTER GDEM). (Accessed last on July, 2012)

  26. SAGA, System for Automated Geoscientific Analyses. (accessed last on Novermber, 2010)

  27. Manel S, Dias JM, Buckton ST, Ormerod SJ: Alternative methods for predicting species distribution: an illustration with Himalayan river birds. J of Appl Eco. 1999, 36: 734-747. 10.1046/j.1365-2664.1999.00440.x.

    Article  Google Scholar 

  28. Mladenoff DJ, Sickley TA, Wydeven AP: Predicting gray wolf landscape recolonization: logistic regression models vs. new field data. Eco Appl. 1999, 9: 37-44. 10.1890/1051-0761(1999)009[0037:PGWLRL]2.0.CO;2.

    Article  Google Scholar 

  29. Barbet-Massin M, Jiguet F, Albert CH, Thuiller W: Selecting pseudo-absences for species distribution models: how, where and how many?. Method Ecol Evol. 2012, 3: 327-338. 10.1111/j.2041-210X.2011.00172.x.

    Article  Google Scholar 

  30. Wisz MS, Guisan A: Do pseudo-absence selection strategies influence species distribution models and their predictions? An information-theoretic approach based on simulated data. BMC Ecol. 2009, 9: 8-10.1186/1472-6785-9-8.

    Article  PubMed Central  PubMed  Google Scholar 

  31. Phillips SJ, Anderson RP, Schapire RE: Maximum entropy modeling of species geographic distributions. Ecol Modell. 2006, 190: 231-259. 10.1016/j.ecolmodel.2005.03.026.

    Article  Google Scholar 

  32. VanDerWal J, Shoo LP, Graham C, Williams SE: Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know?. Ecol modell. 2009, 220: 589-594. 10.1016/j.ecolmodel.2008.11.010.

    Article  Google Scholar 

  33. Planchon O, Darboux F: A fast, simple and versatile algorithm to fill the depressions of digital elevation models. Catena. 2001, 46: 159-176.

    Article  Google Scholar 

  34. Moore ID, Gessler PE, Nielsen GA, Peterson GA: Soil attribute prediction using terrain analysis. Soil Sci Soc Am J. 1993, 57: 443-452. 10.2136/sssaj1993.03615995005700020026x.

    Article  Google Scholar 

  35. Gorsevski PV, Gessler P, Foltz RB: Spatial Prediction of Landslide Hazard Using Logistic Regression and GIS. 2000, Banff, Alberta, Canada: 4th International Conference on Integrating GIS and Environmental Modeling (GIS/EM4): Problems, Prospects and Research Needs, accessed online on September 17, 2011)

    Google Scholar 

  36. Warren SD, Hohmann MG, Auerswald K, Mitasova H: An evaluation of methods to determine slope using digital elevation data. Catena. 2004, 58: 215-233. 10.1016/j.catena.2004.05.001.

    Article  Google Scholar 

  37. Band LE: Topographic partition of watersheds with digital elevation models. Water Res Research. 1986, 22: l5-l24. 10.1029/WR022i001p00005.

    Article  Google Scholar 

  38. Ohlmacher GC: Plan curvature and landslide probability in regions dominated by earth flows and earth slides. Engr Geol. 2007, 91: 117-134. 10.1016/j.enggeo.2007.01.005.

    Article  Google Scholar 

  39. Buckley A: Understanding curvature rasters, in imagery, mapping, mapping centre lead. (Accessed last on May, 2012)

  40. Olaf C: Convergence Index. 2001, (Accessed last on June, 2012)

    Google Scholar 

  41. Schmidt F, Persson A: Comparison of DEM data capture and topographic wetness indices. Preci Agric. 2003, 4: 179-192. 10.1023/A:1024509322709.

    Article  Google Scholar 

  42. Weiss A: Topographic position and landforms analysis. 2001, San Diego, CA: In ESRI User

    Google Scholar 

  43. Manel S, Dias JM, Ormerod SJ: Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird. Ecol Modell. 1999, 120: 337-347. 10.1016/S0304-3800(99)00113-1.

    Article  Google Scholar 

  44. Luoto M, Seppala M: Modelling the distribution of Palsas in Finnish Lapland with logistic regression and GIS. Perma Periglac Pro. 2002, 13: 17-28. 10.1002/ppp.404.

    Article  Google Scholar 

  45. Hosmer DW, Lemeshow S: Applied logistic regression. 2000, New York, USA: John Wiley and Sons

    Book  Google Scholar 

  46. R Development Core Team: R: a language and environment for statistical computing. 2008, Vienna, Austria: R Foundation for Statistical Computing, 010; 34(12)

    Google Scholar 

  47. Akaike H: Information Theory and an Extension of the Maximum Likelihood Principle. 2nd International Symposium on Information Theory. Edited by: Petrov BN, Csaki F. 1973, Budapest: Akademiai Kiado, 267-281.

    Google Scholar 

  48. Pearce J, Ferrier S: Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Modell. 2000, 133: 225-245. 10.1016/S0304-3800(00)00322-7.

    Article  Google Scholar 

  49. Manel S, Williams HC, Ormerod SJ: Evaluating presence-absence models in ecology: the need to account for prevalence. J Appl Ecol. 2001, 38: 921-931.

    Article  Google Scholar 

  50. Fielding AH, Bell JF: A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv. 1997, 24: 38-49. 10.1017/S0376892997000088.

    Article  Google Scholar 

  51. McPherson JM, Jetz W: Effects of species’ ecology on the accuracy of distribution models. Ecography. 2007, 30: 135-151.

    Google Scholar 

  52. Stevenson M, Nunes T, Sanchez J, Thornton R, Reiczigel J, Robison-Cox J, Sebastiani P: An R package for the analysis of epidemiological data version 0.9. 2012,,

    Google Scholar 

  53. Awange JL, Sharifi MA, Ogonda G, Wickert J, Grafarend EW, Omulo MA: The falling lake Victoria water level: GRACE, TRIMM and CHAMP satellite analysis of the lake basin. Water Res Manage. 2008, 22: 775-796. 10.1007/s11269-007-9191-y.

    Article  Google Scholar 

  54. Dieter KL, Huestis DL, Lehmann T: The effects of oviposition-site deprivation on Anopheles gambiae reproduction. Parasit Vectors. 2012, 5: 235-10.1186/1756-3305-5-235.

    Article  PubMed Central  PubMed  Google Scholar 

  55. Zhou G, Munga S, Minakawa N, Githeko AK, Yan G: Spatial relationship between adult malaria vector abundance and environmental factors in Western Kenya highlands. Am J of Tropi Med and Hyg. 2007, 77: 29-35.

    Google Scholar 

  56. Kulkarni MA, Desrochers RE, Kerr JT: High resolution niche models of malaria vectors in northern Tanzania: a New capacity to predict malaria risk?. PLoS One. 2010, 5 (2): e9396-10.1371/journal.pone.0009396.

    Article  PubMed Central  PubMed  Google Scholar 

Download references


We particularly wish to thank Dr. Ulrike Fillinger for providing the habitat information on Rushinga Island, and the community members for their cooperation in the field. The financial aid received from Nekken Domonkai is highly appreciated. This study was supported in part by the Japanese Government Ministry of Education, Science, Sports, and Culture as a doctoral fellowship award to JCN (2009–2013). This study was also supported in part by the Global Centre of Excellence Program, Institute of Tropical Medicine, Nagasaki University, Nagasaki, Japan.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Toshihiko Sunahara.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JCN was the principal investigator and was responsible for the study design, data analysis, interpretation, and writing of the manuscript. TS, KG, KF, and NM assisted in the study design, data analysis and interpretation. PA, GS, GD, KF, and NM assisted with the field-work. TS, KG, and NM revised the manuscript for intellectual content. TS, KG, and NM supervised the work. All authors read and approved the final manuscript.

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Nmor, J.C., Sunahara, T., Goto, K. et al. Topographic models for predicting malaria vector breeding habitats: potential tools for vector control managers. Parasites Vectors 6, 14 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: