The data
The maps presented here show overlaid areas of species-specific predicted occurrence based on the climatic and environmental variables provided to the BRT model. Each species map included in the composite maps only included those pixels where the model predicted a probability of presence greater than 0.5. As with all species mapping, the quality of the output depends, for the most part, on the amount and quality of the data input into the model. Species occurrence data are often poorly distributed spatially [7–10] or are limited numerically (e.g. An. leucosphyrus/An. latens n = 12 (Table 1)). The modelling methodology allowed these data to be supplemented with randomly assigned (and therefore more spatially dispersed) pseudo-presence points taken from within the EO area of the species' range [10]. These pseudo-data were weighted at half that of the 'true' occurrence data. However, where the occurrence data were limited, the pseudo-data may have exerted a greater influence on the final model, and therefore on the area of predicted presence. This can be seen in the predicted species occurrence on New Guinea Island. The EO ranges for An. farauti s.l., An. punctulatus s.l. and An. koliensis, indicate a blanket coverage across the whole island without considering the highland areas that run across almost the entire central length of the island. Members of the Punctulatus Group, which include An. farauti s.l., An. punctulatus s.l. and An. koliensis, are not known to occur at altitudes higher than 2300 m (Bangs, unpub obs) and the highlands on this island peak with Puncak Jaya (Mt. Carstensz) at 4884 m [24]. The range of these three DVS centres on New Guinea island with limited spread to some of the other smaller neighbouring islands, and in the case of An. farauti s.l., to Northern Australia [8]. This small range may have focussed the pseudo-presence points which may have fallen within both the lower and higher altitude locations, and thus the model was unable to establish altitude as a limiting factor for these species.
The quality of the occurrence data also relies on accurate species identifications reported in the source literature. The data were faithfully abstracted from each source and no assumptions were made, however this will have introduced some varying level of error. For example 'An. funestus' was rarely reported as a species complex, but also rarely subjected to the additional molecular methods of identification (e.g. Polymerase Chain Reaction (PCR)) [25, 26] necessary to identify accurately the members of the complex. Moreover, it is possible that some studies were actually reporting more than one member of the Funestus Group or Subgroup rather than An. funestus s.s. or even the An. funestus complex. The same may also be said for the An. maculatus group in Asia.
For some species there is also current debate about their taxonomy; for example, the identity and vectorial capacity of An. messeae is currently in question, with some suggestion that An. daciae may be responsible for malaria transmission previously attributed to An. messeae, and may be sympatric with An. messeae across much of its range, which might also explain the apparent high polymorphism associated with An. messeae[27, 28].
Despite some uncertainty in species classifications that cannot be corrected, the presence points for each species were carefully examined by the TAG, and those points that were clearly unreliable or related to dubious species identifications were removed at an early stage in the mapping process.
The maps
The maps presented here show the predicted occurrence of the DVS. They do not, however, indicate the probability of presence, although this information does underlie the distribution of positive and negative pixels (and is indicated on the original species maps [8–10]). A pixel is marked as 'present' where the BRT model indicated a probability of presence greater than 0.5. Therefore within these 'positives' the probability will range from > 0.5 to ≤ 1. Similarly, a pixel is marked as 'absent' where the BRT model indicated a probability of presence less than 0.5, but will include probabilities from 0 up to 0.5. These probability values are defined by the interaction of the environmental and climatic variables that are identified as predictors by the BRT model indicating where the environment is suitable for the species to exist. Hence such probabilities provide no direct information about potential species abundance but are simply the full output of the analysis. However, as these probabilities may indicate increasing or decreasing environmental suitability, it is feasible that these measures could be used to estimate species abundance at a specified location [29–31]. Further work is needed to try and establish a quantifiable link between these probabilities and DVS abundance.
Figure 1 provides the best currently available evidence-based global picture of the distributions of the main DVS. However, there will always be locations where the process has resulted in an oversimplification and the models do not pick up areas where a species may or may not be present. For example, in Africa, there is some question regarding the extensive predicted presence of An. funestus (species or members of the complex) within the highland areas of Ethiopia (Kiszewski, pers com). Indeed, elsewhere in the country, even where it is found, members of the Funestus Subgroup are rarely considered dominant, with An. arabiensis regarded as the major vector species [32]. Only one known study has conducted PCR identification of Funestus Group specimens from Ethiopia and only reported An. parensis, a non-vector, as present [33].
A lack of data across a large swath of central Africa should also be noted, for example only 3 sites reporting DVS occurrence were found for the Central African Republic, 2 sites in Congo and 23 in DRC [9]. Such areas therefore may not be being accurately represented by the model, especially where variable or unique environments and ecologies exist.
The large number of islands in the Asian-Pacific region, and those elsewhere of small size, can be problematic to accurately predict species occurrence. Overall, the models appear to have done well (based on TAG expert opinion), however there are a few cases where the model is not picking up areas of known presence. For example, on Grenada Island in the Americas the occurrence of An. pseudopunctipennis has been reported (see [10]) yet the model is not indicating a presence. However, An. aquasalis is correctly predicted to occur on this island. Similarly, An. barbirostris s.l. on the Lesser Sunda Island chain (including Flores, Sumbawa, Sumba, Timor and others) is not fully represented despite the existence of a published data point on Flores, and the islands being clearly within the EO range of this DVS. Anopheles barbirostris s.l. demonstrates dramatic varying behavioural attributes and vector importance over its geographical range in Indonesia, being of little or no epidemiological significance in Java and Sumatra in contrast to its role as a primary malaria vector in the eastern regions of the archipelago ([34, 35] Bangs, unpub obs), thereby illustrating some of the difficulties with certain species and the finer details for interpreting distribution maps.
The scale of these regional and global maps can also limit the visibility of some areas of presence on the smaller islands. For example, the Maluku Island chain in eastern Indonesia, where An. farauti s.l. is an important vector along the coasts and An. punctulatus s.l. a vector inland, does indicate the presence of these vectors, but mostly as sporadic individual pixels, and thus their presence is easy to overlook.
The maps presented here show the predicted distributions of a number of species complexes without reference to the sibling species they represent. Moreover, the molecular forms (M and S) of An. gambiae are not distinguished despite reported behavioural differences between them. This is due to a lack of spatially dispersed data providing accurate and defendable sibling species or form identification. It is hoped that such data will become increasingly available as the importance of correctly and fully identifying these species becomes more widely accepted, thus allowing for updated and detailed species-specific maps to be produced in the future.