Cryptic diversity in an Atlantic Forest malaria vector from the mountains of South-East Brazil

Background Anopheles (Kerteszia) cruzii is the primary vector of human and simian malarias in Brazilian regions covered by the Atlantic Rainforest. Previous studies found that An. cruzii presents high levels of behavioural, chromosomal and molecular polymorphisms, which led to the hypothesis that it may be a complex of cryptic species. Here, An. cruzii specimens were collected in five sites in South-East Brazil located at different altitudes on the inner and coastal slopes of two mountain ranges covered by Atlantic Rainforest, known as Serra do Mar and Serra da Mantiqueria. Partial sequences for two genes (Clock and cpr) were generated and compared with previously published sequences from Florianópolis (southern Brazil). Genetic diversity was analysed with estimates of population structure (FST) and haplotype phylogenetic trees in order to understand how many species of the complex may occur in this biome and how populations across the species distribution are related. Results The sequences from specimens collected at sites located on the lower coastal slopes of Serra do Mar (Guapimirim, Tinguá and Sana) clustered together in the phylogenetic analysis, while the major haplotypes from sites located on higher altitude and at the continental side of the same mountains (Bocaina) clustered with those from Serra da Mantiqueira (Itatiaia), an inner mountain range. These two An. cruzii lineages showed statistically significant genetic differentiation and fixed characters, and have high FST values typical of between species comparisons. Finally, in Bocaina, where the two lineages occur in sympatry, we found deviations from Hardy-Weinberg equilibrium due to a deficit of heterozygotes, indicating partial reproductive isolation. These results strongly suggest that at least two distinct lineages of An. cruzii (provisorily named “Group 1” and “Group 2”) occur in the mountains of South-East Brazil. Conclusions At least two genetically distinct An. cruzii lineages occur in the Atlantic Forest covered mountains of South-East Brazil. The co-occurrence of distinct lineages of An. cruzii (possibly incipient species) in those mountains is an interesting biological phenomenon and may have important implications for malaria prevalence, Plasmodium transmission dynamics and control. Electronic supplementary material The online version of this article (10.1186/s13071-018-2615-0) contains supplementary material, which is available to authorized users.


Background
Autochthonous malaria transmission in Brazil is currently reported outside the Amazon region in areas covered by the Atlantic Rainforest, where vectors, humans and nonhuman primates co-exist [1,2]. In this biome, one of world's top biodiversity hotspots [3][4][5][6], the main vectors are Anopheles Meigen, 1818 mosquitoes of the subgenus Kerteszia Theobald, 1905, which uses Bromeliaceae watertank plants as breeding sites [2,7,8]. Anopheles (Kerteszia) cruzii Dyar & Knab, 1908, known for its role as vector of the "bromeliad malaria" since the early twentieth century [9,10], is considered the primary vector of human and simian malaria currently occurring in South (S) and South-East (SE) Brazilian regions [2,11].
In the last 50 years several studies documented that An. cruzii populations from different regions of the country show great behavioural [12], morphological [8,13], chromosomal [14][15][16][17] and molecular [18][19][20][21][22][23][24] polymorphism, leading to the hypothesis of An. cruzii being a complex of cryptic species. In Rio de Janeiro (RJ), Brazil's third most densely populated state with 16 million habitants [25], more than 1700 malaria cases were reported from 1990 to 2015, mostly due to transmission by An. cruzii [1]. However, in spite of An. cruzii historical importance as a malaria vector and the continuous occurrence of autochthonous malaria cases in RJ, few studies have investigated the genetic diversity of this species in the state [2]. Research to date has tended to focus on comparing individuals from RJ mountains with those from other Brazilian states rather than investigating the genetic diversity among distinct sites within the state. A previous study using allozymes showed that a population from Nova Iguaçu, located near Tinguá, RJ, on the coastal side of Serra do Mar (a mountain range running parallel to the Atlantic Ocean from RJ to the South of Brazil) is genetically similar to the Florianópolis population, in the South Brazilian region; these localities are~800 km apart [18].
On the other hand, multilocus DNA sequencing studies found evidence for two sympatric independent lineages, both of them distinct from lineages of other Brazilian states, occurring in Itatiaia, a site in the Serra da Mantiqueira mountains, which is located in the inner North-West of the state, separated from Serra do Mar by the Paraíba Valley [22,24] (Fig. 1). However, since these studies were conducted separately and under different methodologies, the relationship between Serra da Mantiqueira and Serra do Mar populations remain unclear. Moreover, data from RJ are restricted to a small range of the An. cruzii distribution in the state and no further investigations have been made to understand the genetic diversity of the complex in this region.
The co-occurrence of more than one lineage of An. cruzii in the mountains of South-East Brazil may have important implications for malaria control, since they may be differentially involved in malaria transmission. Thus, three questions remain unsolved: (i) How many evolutionarily independent lineages of An. cruzii exists in the mountains of South-East Brazil and where do they occur? (ii) What is the relationship between An. cruzii individuals from distinct sites within the biome? (iii) If more than one lineage occur in South-East Brazil, which processes may have played a role in their differentiation?
In order to answer these questions, this work analysed the genetic diversity of An. cruzii individuals from four sites covering the distribution of the Atlantic Forest throughout RJ, and one site in São Paulo's Serra da Bocaina, and compared the genetic variability of these five populations from South-East Brazil to previously published data from Florianópolis (southern Brazil) [22,24].
Partial fragments of cpr (211 bp) and Clock genes (224 bp) were used in this study, since they have previously been shown to be good markers for identification of differentiating lineages in An. cruzii [21,22,24] and also in the Anopheles triannulatus [26] complexes of cryptic species. The cpr gene (NADPH cytochrome P450 reductase), has a putative function in Drosophila odorant clearance [27]. Since olfactory cues may have important influence in mate recognition [28], cpr may be important in sexual isolation between closely related species. The Clock gene encodes a transcription factor important in the control of dipteran circadian cycles [29]. Circadian cycle genes have been useful in the identification of cryptic diversity in African anophelines [30] and other dipteran groups such as fruit flies [31] and sand flies [32]. Furthermore, since variability in daily rhythms have an influence on malaria transmission by anophelines [33], knowledge regarding divergence in this kind of molecular marker may lend to useful insights in to malaria dynamics of the Atlantic Forest.
Our results with these two markers (cpr and Clock) strongly suggest that there are at least two independent lineages of An. cruzii in the mountains of South-East Brazil. until the adults emerged. Additionally, adult mosquitoes were collected starting an hour before sunset and ending 2 hours after dusk (17:00 h to 20:00 h). Species identification was carried out using morphological identification keys [34]. Briefly, the morphological characters used to diagnose An. cruzii adults were: scutum with four dark longitudinal lines, fifth hind tarsomeres dark scaled on at least 50% at base, with an apical white band, wing vein R 4 + 5 white-scaled except for a small dark spot at the base and another at the apex. After identification, adult mosquitoes were kept in 95% ethanol at -20°C for DNA extraction. The data for source (adult or reared in the laboratory) and sex of each sample are shown in Additional file 1: Table S1.

Molecular analysis
The cpr and Clock partial gene sequences from Florianópolis and some sequences from Itatiaia (cpr gene: Ita02 to Ita12; Clock gene: Ita02 to Ita13) have already been published (GenBank accession numbers GU072619-GU072646 and GU072709-GU072730 for cpr; JX129234-JX129257 and GU016402-GU016425 for Clock; see Additional file 2 for details). These sequences were added to the study dataset in order to better understand how populations in RJ are related to those from other Brazilian sites which previously showed divergence when compared to Itatiaia [21][22][23][24]. The additional sequences from Itatiaia (10 mosquitoes for cpr and 8 for Clock) as well as those from Tinguá (10 mosquitoes for both genes), Guapimirim (16 mosquitoes for cpr and 9 for Clock), Sana (12 mosquitoes for cpr and 9 for Clock) and Bocaina (12 mosquitoes for cpr and 16 for Clock) were obtained by PCR, cloning and sequencing as described below. An alignment of all sequences for each gene is presented in Additional file 2.
Genomic DNA was extracted from each individual mosquito according to Jowett [35] and used in PCR reactions carried out in an Eppendorf Mastercycler® thermocycler (Westbury, USA) using the proofreading Pfu DNA polymerase (Biotools, Madrid, Spain) and previously described primers [21,24]. PCR products were purified and cloned using CloneJET™ PCR Cloning Kit (Fermentas Life Sciences, Carlsbad, USA). Sequencing of positive clones was carried out in an ABI Prism 3730 DNA sequencer at the Oswaldo Cruz Institute using the ABI Prism Big Dye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems, Foster City, USA). The identity of the cloned fragments was determined by BlastX analysis against the NCBI nr database (http:// www.ncbi.nlm.nih.gov/BLAST/). At least 12 clones were sequenced for each mosquito to mitigate PCR errors, and allow the identification of the two alleles. The sequences were then edited and consensus sequences representing the two alleles were generated. The individuals were classified as homozygotes when only one haplotype was observed among the twelve sequences. The sequences were submitted to the GenBank database under the accession numbers KT724974-KT725197.
DNA sequence analysis and haplotype genealogies DNA sequences were aligned with ClustalX [36] and phylogenetic trees were constructed for each gene under the Bayesian Inference method using MrBayes v.3.2.5 [37,38]. The best-fit substitution models F81 + G (Clock gene) and SYM + G (cpr gene) were selected following the BIC criterion as implemented in JModelTest [39]. Indel information was included in the analysis after coding gaps into a binary matrix using FastGap software [40]. Analysis ran for 5,000,000 generations (MCMC -'burn-in' = 25%) from two independent random trees to generate a convergence diagnostic. The final trees presented here are the consensus between the two final trees generated for each gene. Results from all phylogenetic analyses were viewed in FigTree v.1.4.2 [41]. P RO S EQ v.2.91 [42] and Arlequin v.3.11 [43] software were used to obtain pairwise estimates of population differentiation [44]. The comparison of genetic distances, estimated by pairwise F ST , with geographical distances by Mantel test [45] was performed with the ade4 package [46] using R Software v.3.3.2 [47].

Results
A total of 152 sequences (two alleles from each individual) were analysed for the Clock gene (32 sequences from Bocaina, 18 from Guapimirim, 40 from Itatiaia, 18 from Sana, 20 from Tinguá and 24 from Florianópolis). For the cpr gene, a total of 170 sequences were obtained (again the two alleles from each mosquito; 24 sequences from Bocaina, 32 from Guapimirim, 42 from Itatiaia, 24 from Sana, 20 from Tinguá and 28 from Florianópolis). Sequences from the cpr gene were 211 bp long and sequences from Clock were 224 bp long. Both loci include one intron that show a number of indels. All base substitutions were silent or occurred within the introns. An alignment of all sequences for each gene is presented in Additional file 2.
Population structure Table 1 shows the pairwise F ST estimates for Clock and cpr genes. The highest F ST values, all of them statistically significant (P < 0.005), were found in the comparisons between Itatiaia and Bocaina, with the other four populations from Florianópolis, Guapimirim, Tinguá and Sana (F ST between 0.423-0.734 for Clock and between 0.388-0.733 for cpr). Such differentiation is unexpected within a single evolutionary lineage: it has been proposed that F ST values above 0.35 correspond to different species [48]. On the other hand, all comparisons between Bocaina and Itatiaia, or between the four other populations yield much lower F ST values (all below 0.35). These results suggest that the sampled populations belong to at least two independent evolutionary lineages, i.e. to different incipient species.
It is interesting to examine in more detail the F ST data. The Clock pairwise F ST values comparing Itatiaia with Bocaina and also those values comparing the populations from Group 1 were usually under 0.08 and most of them were not statistically significant. Using the cpr gene, albeit most of them significant, the pairwise F ST values among Guapimirim, Tinguá and Sana (from Group 1) were low, usually under 0.16, while those between Florianópolis and these other three populations from Group 1 were moderately high (from 0.24 to 0.32). This differentiation of the Florianópolis population is somewhat expected, given that it is lo-cated~800 km away from the remaining ones, whereas Guapimirim, Tinguá and Sana are at most 130 km apart. However, the differentiation among An. cruzii populations cannot be explained by a simple isolation by distance model: the correlation between pairwise F ST values and geographical distance was low and statistically non-significant for both genes (cpr: r = 0.127, P = 0.22; Clock: r = -0.097, P = 0.46).

Haplotype genealogies
In order to gain a deeper insight on the differentiation of the An. cruzii populations, phylogenetic trees of the alleles were generated under the Bayesian Inference method for the two genes, cpr and Clock (Figs. 2 and 3). In both trees, the most basal split separated the An. cruzii haplotypes in two large groups (henceforth called Groups 1 and 2) with posterior probabilities (pp) of 82% for cpr and 100% for Clock.
Group 1 was composed mainly of sequences obtained from individuals from the coastal side of Serra do Mar mountain range (Sana, Guapimirim and Tinguá) and was the same occurring in Florianópolis (South Brazil). Group 2 was composed exclusively by sequences from Itatiaia and Bocaina individuals in both trees. Hence the phylogenetic trees agree with the F ST data analysed in the previous section, i.e. the sampled An. cruzii populations belong to at least two incipient species (Group 1 and Group 2) that cannot be distinguished morphologically, at least using the available taxonomic keys.

Sympatry of group 1 and group 2 haplotypes in the Bocaina population
The phylogenetic analyses disclosed another interesting phenomenon: the Bocaina population, which was sampled for the first time in the present study, contained Group 1 and Group 2 haplotypes for both genes (Figs. 2 and 3), i.e. the incipient species would be sympatric there. This allows an additional test for the hypothesis that Group 1 and Group 2 correspond to different species: if this is true, it is possible (although not mandatory) that we will find deviations from the Hardy-Weinberg equilibrium (namely, reduced amount of heterozygotes) due to partial reproductive isolation. Indeed, we observed 6 individuals cpr 1 /cpr 1 , 5 cpr 2 /cpr 2 and 1 cpr 1 /cpr 2 , whereas the expected counts are 3.5, 2.5 and 6, respectively (χ 2 = 8.4, df = 1, P = 0.004). For the Clock gene, the heterozygote deficiency is less pronounced and not statistically significant (observed: 1 Clock 1 /Clock 1 , 13 Clock 2 /Clock 2 and 2 Clock 1 /Clock 2 ; expected: 0.25 Clock 1 /Clock 1 , 12.25 Clock 2 /Clock 2 and 3.5 Clock 1 / Clock 2 , respectively; P > 0.05). The most probable explanation for the deficiency of heterozygotes in sympatry is reproductive isolation [49].
It is important to note that the Bocaina population does not represent a simple scenario of two completely isolated species: we observed one cpr 1 /cpr 2 heterozygote (individual Boc13, which would be a hybrid; marked with a purple dashed arrow in Fig. 2), and two Clock 1 /Clock 2 heterozygotes (Boc07 and Boc08; purple and orange dashed arrows in Fig. 3). The overall frequency of hybrids would be around 10% (1 out 12 for cpr; 2 out 16 for Clock). Furthermore, the Clock and cpr genotypes are not always coherent in Bocaina individuals (namely, individuals 19 and 20 are cpr 1 /cpr 1 and Clock 2 /Clock 2 ), which is compatible with introgression or incomplete lineage sorting [49]. This complex scenario of partial isolation and differential introgression is similar to what has been observed in other anopheline species [50,51]. Further studies of An. cruzii populations, particularly focusing in Bocaina and using a broader range of markers, are necessary to better understand the phenomenon.

Fixed differences in cpr sequences
All variations in the cpr gene occurred in the intron, which shows a number of polymorphic indels and microsatellites (Additional file 2). The analysis of these polymorphisms showed that fixed differences are present between individuals from Groups 1 and 2. Table 2 shows the number of copies of each repeat in An. cruzii studied  Table 2.

Discussion
Several studies have shown significant differentiation among An. cruzii populations [8,[12][13][14][15][16][17][18][19][20][21][22][23][24]. We found that at least two independent An. cruzii groups (probably incipient species) occur in South and South-East Brazillian Atlantic Rainforests: Group 1 encompasses populations from Florianópolis, Guapimirim, Tinguá and Sana as well as part of the Bocaina sample, and Group 2 is composed of Itatiaia and the remainder of the Bocaina sample. There is good evidence that the distinction between Group 1 and Group 2 is real and biologically meaningful: (i) it is the deepest node in the haplotype phylogeny of both cpr and Clock genes, with strong statistical support (pp of 82 and 100%, respectively); (ii) there are fixed differences between them ( Table 2); (iii) Group 1 and Group 2 occur in sympatry in Bocaina and we found a statistically significant deficiency in heterozygote frequency for the cpr gene, indicating partial reproductive isolation; (iv) the genetic divergence between Group 1 and Group 2 populations is very high, with an average F ST of 0.57 (Table 1). Such differentiation is greater than expected within a single evolutionary lineage without any kind of reproductive isolation; F ST values above 0.35 usually correspond to different species [48]. Our data reject the scenario of a single species with broad distribution in which physically distant populations are more differentiated (i.e. a simple isolation by distance model). For example, Tinguá is 120 km apart from Itatiaia and 750 km apart from Florianópolis, and yet the population from this locality is genetically much more similar to Florianópolis (Table 1); overall, there was no significant correlation between geographical and genetic distance (r = 0.127 and -0.097 for cpr and Clock, respectively; Table 1). Hence, it seems safe to conclude that there are at least two different species in the sampled populations of An. cruzii. They are probably incipient species, since we observed Group 1 / Group 2 hybrids in the Bocaina population, where the two forms occur in sympatry. Overall, the An. cruzii case seems similar to that of the An. gambiae complex [51]: a complex of closely related species with partial introgression, which originates a patchwork of undifferentiated and differentiated genomic regions, with some markers showing more clear evidences of reproductive isolation (e.g. rDNA "M" and "S" forms in An. gambiae; cpr "Group 1" and "Group 2" alleles in An. cruzii).
Further studies of An. cruzii populations, particularly focusing in Bocaina and using a broader range of markers, are clearly desirable, since there are many interesting open questions. For example, what is the introgression pattern across the An. cruzii genome? Are there additional species? A possible hint for this latter possibility is the observation of some differentiation within Group 1 haplotypes: the Bocaina sequences do not appear at random in cpr and Clock trees, but instead formed a well defined cluster with strong statistical support (Figs. 2 and 3). We are now pursuing these two questions.
The results presented here are consistent with previous studies. A study using allozymes as genetic markers concluded that no significant differences could be found between An. cruzii populations from Florianopolis (SC) and Tinguá (RJ) [18]. On the other hand, sequences from Itatiaia (SE) were distinct from those from Florianópolis (S) [24], which we confirmed here with a larger sample.
A question that warrants future studies is: what is the cause and history of the differentiation between Group 1 and Group 2 populations? Geographical distance per se is not a likely explanation. These mosquitoes strictly depend on bromeliad for their life-cycle, and the rainforests at the top of mountain ranges were mostly replaced by grassland during pre-and full-glacial times [52]. Thus, repeated periods of isolation in the last 2.5 Myr between An. cruzii populations from the coastal slope of Serra do Mar (e.g. Tinguá) and Serra da Mantiqueira (e.g. Itatiaia) might have happened; such dynamics are known to have favoured speciation in groups as diverse as flatworms [53], harvestmen [54], snakes [55], frogs [56][57][58][59][60], primates [61][62][63], rodents [64], birds [65][66][67] and several malaria vectors [68]. Under this scenario, the Bocaina population, which is located midway between the coastal Serra do Mar and Serra da Mantiqueira populations, could be a zone of secondary contact between the incipient species, which typically is accompanied by hybridization and partial introgression [69,70].
Another open question, this time with direct implications for human health, is: are there differences in the vectorial capacities of the distinct lineages of An. cruzii? To date, the autochthonous human malaria cases from RJ seem to be restricted to the coastal side of Serra do Mar mountains, where An. cruzii is considered the main vector [2,71,72]. So, the knowledge about the co-occurrence of more than one lineage of An. cruzii in those mountains may have important implications for malaria control, especially if they are differentially involved in malaria transmission.

Conclusions
The analysis of genetic differentiation of cpr and Clock genes indicates that at least two evolutionarily Table 2 Microsatellite repeat polymorphisms in cpr sequences from individuals of distinct localities reveals fixed differences. Repeat polymorphisms were identified in the cpr sequences from individuals sampled in distinct localities. Itatiaia sequences have fixed differences [(A) 3 , (CG) 1 , (CT) 4 ] when compared with those from Florianópolis, Guapimirim, Tinguá and Sana [(A) 2 , (CG) 2-3 , (CT) [6][7][8][9] ]. In Bocaina, both repeat patterns co-occur. Based on Additional file 2, (A) 2-3 repeats are located between positions 18 to 20, (CG) 1-3 between 21 to 26 and (CT) 4-9 between 147 to 164 positions independent lineages of An. cruzii (probably incipient species) occur in the Atlantic Forests of South-East Brazil. Individuals from these lineages present significant genetic divergence and fixed differences. One of these lineages, Group 1, occurs mainly on the coastal side of the Serra do Mar mountain range in altitudes below 600 m and Group 2 occurs in the inner sites of Serra do Mar and on the Serra da Mantiqueira mountains at higher altitudes (above 900 m). The two groups occur in sympatry in the Bocaina population, where we found evidence for incomplete reproductive isolation.

Additional files
Additional file 1: