Lutzomyia longipalpis s.l. in Brazil and the impact of the Sao Francisco River in the speciation of this sand fly vector

Lutzomyia longipalpis s.l. (Diptera: Psychodidae) is the principal vector of Leishmania infantum chagasi in the Americas, and constitutes a complex of species. Various studies have suggested an incipient speciation process based on behavioral isolation driven by the chemotype of male sexual pheromones. It is well known that natural barriers, such as mountains and rivers can directly influence population divergence in several organisms, including insects. In this work we investigated the potential role played by the Sao Francisco River in eastern Brazil in defining the current distribution of Lu. longipalpis s.l. Our studies were based on analyses of polymorphisms of the cytochrome b gene (cyt b) sequences from Lu. longipalpis s.l. available in public databases, and from additional field-caught individuals. Altogether, 9 distinct populations and 89 haplotypes were represented in the analyses. Lu. longipalpis s.l. populations were grouped according to their distribution in regards to the 10°S parallel: north of 10°S (<10°S); and south of 10°S (>10°S). Our results suggest that although no polymorphisms were fixed, moderate genetic divergences were observed between the groups analyzed (i.e., FST = 0.184; and Nm = 2.22), and were mostly driven by genetic drift. The population divergence time estimated between the sand fly groups was about 0.45 million years (MY), coinciding with the time of the change in the course of the Sao Francisco River, during the Mindel glaciation. Overall, the polymorphisms on the cyt b haplotypes and the current speciation process detected in Lu. longipalpis s.l. with regards to the distribution of male sexual pheromones suggest a role of the Sao Francisco River as a significant geographical barrier in this process.


Background
Lutzomyia longipalpis s.l. (Diptera: Psychodidae) is the vector of Leishmania infantum chagasi, the causative agent of visceral leishmaniasis (VL) in the New World [1][2][3]. This sand fly species has a wide, though discontinuous distri-bution ranging from southern Mexico to northern Argentina [4]. The pattern of distribution of this sand fly is directly associated with a notable population divergence due to a reduced gene flow, allowing the appearance of sibling species [5]. In the last few years many investigators suggested the existence of at least four sibling species of Lu. longipalpis s.l. in South and Central Americas, based on the study of various chromosomal and molecular markers [6][7][8][9][10][11].
Recent studies also have suggested the presence of sibling species in eastern Brazil [12,13]. These studies relied on the characterization of different male sexual pheromones, the "love songs" resulted from male wing vibration during courtship, as well as microsatellite markers and speciation genes. As many as five different chemotypes of sexual pheromones were characterized from Lu. longipalpis s.l. in Brazil [14]. Three of the chemotypes are pheromones of the homosesquiterpene family, 3-methyl-α-himachalene (3MαH), (S)-9-methyl-germacrene-B (9MGB) and (S)-9methyl-germacrene-B+ (9MGB+) (the latter two can be differentiated based on the amounts of 9MGB produced). Two other chemotypes are members of the diterpene family: cembrene-1 and cembrene-2. Interestingly, different chemotypes can be associated with different male copulation sounds or love songs generated by wing flapping: 3MαH is associated with the copulation songs Pulse songtype1; 9MGB is associated with either the Pulse songtype2 or the Pulse songtype3; cembrene-1 production is associated with Burstsong [15]. No copulation songs have been associated with the production of pheromones 9MGB+ and cembrene-2 [3,14,15].
Lu. longipalpis s.l. in eastern Brazil is comprised of sibling species [3]. However, studies have yet to demonstrate a link between these sibling species and geographic barriers that potentially participate in the speciation process of this sand fly. One such barrier is the Sao Francisco River, the second largest river in Brazil extending 3160 km. The Sao Francisco runs generally north behind the coastal range before turning east to form the border between the state of Bahia and the states of Pernambuco and Alagoas. It enters the Atlantic Ocean between the states of Alagoas and Sergipe ( Figure 1A). Until the Mindel glaciation period (0.4 million years ago -MY) the Sao Francisco ran northward towards the State of Piaui ( Figure 1B). During the Mindel glaciation, the course of the river was altered to its current location [16], as shown in Figure 1.
Natural geographical barriers such as rivers have been shown to significantly influence genetic diversification and speciation in distinct taxa [17][18][19][20][21]. Thus, we analyzed the population structuring of Lu. longipalpis s.l. in eastern Brazil based on different haplotypes of the cytochrome b gene (cyt b), and investigated the role of the Sao Francisco River in the speciation of this sand fly. Our results suggest that the natural barrier represented by the Sao Francisco River in Lu. longipalpis s.l. has played a significant role in the development of sibling species of this sand fly. The data are supported by evidence from recent studies on the geological history of the Sao Francisco River [16]. To our knowledge, this work is the first to link the ongoing speciation of Lu. longipalpis s.l. in eastern Brazil with a significant natural barrier.

Sand flies
Lu. longipalpis s.l. used in this study were captured in five separate locations ( Figure 1 and Table 1) using CDC light traps. Trapping of sand flies was done as described previously [22]. Adult sand flies were killed by freezing at -20°C and identified as described [4].

DNA Extraction, Amplification and Sequencing
The DNA samples were prepared as reported previously [22], and amplification of cyt b was performed using the primers N1N-5'GGCAYWTTGCCTCGAWTTCGWTATGA3' and CB3-5'CAYATTCAACCWGAATGATA3' [23]. PCR reactions were routinely carried out in 50 μl volume containing 10 pmoles of each primer, 1 ng of Lu. longipalpis s.l. DNA, 5 μl of 10× PCR buffer with 2 mM of MgCl 2 (Biotools, B&M Labs, S.A. Madrid, Spain), 60 μM of dNTPs and 1.5 units of Taq DNA polymerase (Biotools). PCR reactions were performed as follows: a hot start at 94°C/3 min, followed by five cycles at 94°C/30 sec, 38°C/30 sec and 72°C/90 sec, and 35 cycles at 94°C/30 sec, 42°C/30 sec and 72°C/ 90 sec. The PCR products were purified with the Nucleospin Extract kit (Macherey-Nagel Inc., Easton, PA) and cycle-sequenced using the BigDye sequencing kit (Applied Biosystems, Foster City, CA), according to the manufacturer instructions. The sequencing reactions were performed using an ABI PRISM ® 3100 sequence analyzer (Applied Biosystems). Each sample was sequenced twice and the sequencing quality was assessed using Phred [24] with a cutoff value of 25. The haplotypes were assembled using CAP3 [25].

Analysis of Genetic Variability
Analyses of genetic variability were performed in 34 haplotypes identified from Lu. longipalpis s.l. used in this work, including haplotypes described previously [26,27]. The indexes of genetic variability were estimated using DnaSP 4.10 [28]. The parameters measured were: the number of haplotypes (h); the haplotypic diversity (Hd); the number of segregating sites (S); the pairwise nucleotide diversity (π); the net nucleotide substitution (d A ); and the average number of nucleotide substitutions (d XY ).
The θ parameter (2N e μ) was estimated for θπ and θ s . θπ is based on the average pairwise number of differences between sequences [29] and θ s is estimated from the number of segregation sites in a population [30]. The following tests were performed to determine whether the pattern of polymorphisms were in accordance with the model of neutral evolution: D [31], D* and F* [32], and Fs [33]. The Fs test was also used to assess population expansion.
The extent of nucleotide differentiation (F ST ) [34] and the level of gene flow (Nm) [35] among the populations also were analyzed using DnaSP 4.10 [28]. The F ST values were plotted in a matrix and used to create a distance-based Neighbor-Joining (NJ) tree using MEGA v3.1 [36]. The Kimura-2P (K2P) genetics distance between clades of the NJ tree was also calculated using the MEGA v3.1 software.
The genetic distances were used to calculate the time of divergence between clades, based on the rate of mutation of 2.3% sequence divergence per million years [37]. The minimum spanning networks and the mismatch distributions were obtained using Arlequin v2.0 [38]. Raggedness tests were performed to determine whether mismatch distributions fit the model of stable or expanding populations [39][40][41]. Also, we assessed the significance of the effect of geographic distance on differentiation via the Mantel test [42]. This test was based on the linear regres-Lu. longipalpis s.l. across the Brazilian NE region Figure 1 Lu. longipalpis s.l. across the Brazilian NE region. Shown is the Sao Francisco River in its current course (A), and prior to the Pleistocene period (B). (A) Lu. longipalpis s.l. used in this study were sampled from various locations across the Brazilian NE (numbers 1-8) and from a single location in the SE (number 9); a, indicates the population in Sobral, Ceara, where two distinct male sexual pheromones (9MGB+ and cembrene-1) are found in sympatry; grey arrow indicates a secondary contact between Lu. longipalpis s.l. in Sobral, as suggested by [55]. According to our results we propose that this secondary contact took place following the course change of the Sao Francisco River. Gray dots (appearing in northerly direction) represent the old course of the Sao Francisco. (B) Original south-to-north course of the Sao Francisco River, with the distribution of the various known Lu. longipalpis s.l. male sexual pheromones (a-h); Arrowhead indicates expansion of Lu. longipalpis s.l. towards the NE region based on sub-Andean/Amazonian gene pool expansion [7]; Grey lines depicts a model for expansion of Lu. longipalpis s.l. populations towards SE and NE areas during the Pliocene-Pleistocene period, using the Sao Francisco River as a geographic barrier. Horizontal dashed gray lines represent the 10°S parallel.
sion between the value of the F ST /(1 -F ST ) [43] and the geographical straight-line distance (in kilometers, Km) between each population pair. Geographic distances between capture sites were obtained from the Coordinate Distance Calculator [44]. Results of the Mantel test were a reflection of 1,000 permutations using Mantel v1.18 [45].
The populations analyzed in this study were combined into two groups, taking into account the sand fly distribution in relation to the 10°S parallel. Thus, individuals that were originally captured north of the 10°S parallel were named <10°S; and those captured south of the 10°S parallel, >10°S

Results
In this work we analyzed phylogeographic patterns of natural populations of Lu. longipalpis s.l.. in the Brazilian NE region. Our studies were based on polymorphisms of a 261 bp DNA fragment from the 3' end of the cyt b gene. The phylogenetic signal shown by this short fragment was higher than those presented by fragments of 318 and 489 nucleotides (unpublished data). Our analyses included twenty nine haplotypes previously described [26,27] and five novel haplotypes identified from Lu. longipalpis s.l. from the Brazilian NE.
Our data shows that Ll-2 is the most frequent haplotype and the haplotypic diversity is high (0.947) despite the geographically widespread distribution of haplotypes Ll-1, Ll-2 and Ll-7 (Tables 2 and 3). Furthermore, no genetic differences were found between the populations of Juazeiro and Natal, Juazeiro and Patos, Natal and Patos, Natal and Calumbi, and Patos and Calumbi. In contrast, the highest genetic distance was estimated between populations of Itamaraca and Monte Santo (d A = 0.496%; Table  4).
A FST distance-based Neighbor-Joining (NJ) tree was obtained for all populations investigated in this study. Two clades were identified: (I) including the populations of Natal, Patos, Juazeiro, João Pessoa, and Itamaracá; and (II) including the populations of Calumbi, Pancas, Monte Santo, and Feira de Santana ( Figure 2). In light of the low level of genetic structuring and geographic distribution the two clusters seen in Figure 2 were referred to as group <10°S (for populations north of 10°S parallel) and group >10°S (for populations south of 10°S parallel). According to its geographic localization the population of Calumbi was also included in <10°S.
The net nucleotide substitution (d A ) between the groups <10°S and >10°S was 0.217% ( Table 6). The estimates of haplotypic diversity within each group were high (Table  3), although some haplotypes (Ll-1, Ll-2, Ll-7 e Ll-8) presented a widespread geographic distribution. The average number of nucleotide differences (π) varied from 0.7% to 1.2% per site, or from 1.85 to 3.18 per nucleotide (Table  3), and no site were fixed at different nucleotides between groups. Within the 55 haplotypes from the group <10°S,  (2) Population groups were separated according to their geographic coordinates. Haplotypes shared between population groups are shown in bold; haplotypes shared by the populations within a group is shown in italics and underlined. Populations from which haplotypes were identified in this work are marked by ( 1 ), and the haplotypes obtained from the population from Itamaracá is marked by (*).
Intra-population group analysis showed values of genetic structuring (F ST ) and gene flow (Nm) of 0.078 and 5.92 for the group <10°S (Table 6). Equivalent results were also obtained for the group >10°S (F ST = 0.136 and Nm = 3.16, Table 6). The values of F ST among the two groups support moderate genetic divergence and higher gene flow values (F ST = 0.184 and Nm = 2.22, Table 6), whereas the pattern of genetic structuring did not follow the model of isolation by distance (Figure 3).
To assess gene flow between the various populations investigated, we applied the minimum spanning network (MSN) analysis ( Figure 4). The MSN constructed with all 34 different haplotypes ( Figure 4A) shows that most of the haplotypes shared by groups <10°S and >10°S were positioned at the interior nodes, representing ancestral haplotypes [46,47]. Thus, the Nm of 2.22 migrates per generation between groups <10°S and >10°S may reflect a greater influence from the shared ancestral polymorphisms than by ongoing gene flow, according to the outline of the MSN. Nevertheless, ongoing gene flow between these populations is still possible, as alternate connections between the terminal haplotypes are observed (Figure 4A).
MSN analysis was separately applied to the population groups <10°S and >10°S ( Figure 4B and 4C). In both cases, interior shared haplotypes were also observed for each group. However, the existence of shared haplotypes at the tips of the networks, and the connections between    Neutrality tests [48] performed between the groups <10°S and >10°S resulted in negative values. Although the estimates of θs and θπ are higher than those of π, the D test [31] and the D* and F* tests [32] were not significant (Tables 3 and 7). On the other hand, the Fs test [33] showed significant values for the same population groups. The mismatch distribution for groups <10°S and >10°S was unimodal (not shown), with very small raggedness statistics (0.025 and 0.050, respectively).
The genetic distances (K2P) between groups was 1.2%, between groups <10°S and >10°S. Based on cyt b molecular clock, the divergence time between these groups were estimated to be 0.45 million years ago (MY).   the model of isolation by distance (r = 0.347, P = 0.9628; Figure 3).
Here, the genetic structuring of ten Lu. longipalpis s.l. populations from eastern Brazil was estimated through the analysis of 26 segregating sites of a partial segment of 261 nucleotides from the mitochondrial gene cyt b. The populations studied were clustered in two groups (as indicated in Materials and Methods), with individuals in group <10°S correspond to the sand flies that produce the chemotype 3 (cembrene-1 and 9MGB+); and those in the group >10°S to chemotypes 1, 2 and 5 (9MGB, 3MαH and 9MGB+).
Intra-population analyses of groups <10°S and >10°S revealed low and moderate genetic differentiation. However, in both cases, the values of Nm (Tables 4) were above the threshold required for differentiation by genetic drift (Nm < 2, [49]). Also, the presence of several tip connections between haplotypes of distinct populations, and a shared tip haplotype in the MSN for the group <10°S (   between the groups <10°S and <10°S ( Table 6). The Fs test and the smoothness of the mismatch distribution indicated that these groups are in geographic expansion.
Since population expansion interferes with the estimates of gene flow [39][40][41], the Nm values obtained may be overestimated obscuring the actual population structuring. This, along with the prevalence of ancient introgression, rather than ongoing gene flow, indicates that genetic drift is mainly responsible for the moderate genetic differentiation between these groups.
Our results from the cyt b molecular clock suggest that the time of divergence between the Lu. longipalpis s.l. in groups <10°S and >10°S occurred around 0.45 MY. A similar divergence period (0.39 MY) was estimated between the populations of Baturite (located north the 10°S parallel) and Jacobina (located south of the same parallel). That estimation was based on the analysis of the molecular clock of the cytochrome oxidase II gene, according to a genetic distance of K2P = 0.009 [7], and a rate of divergence of 2.3% per MY [37]. The estimated time of divergence between the two groups investigated here correspond to the Mindel glaciation (0.38-0.41 MY) that took place during the Early to Middle Pleistocene, and was associated with the shift in the Sao Francisco River's course [16].
The Sao Francisco River is the second largest river in Brazil, separating the States of Pernambuco (PE) and Ceara (CE) from Bahia (BA) and Minas Gerais (MG). On its original course, the river flowed from MG, bearing northward through BA and continuing towards the coast of the State of Piaui (PI) through what is today the Piaui River and the Parnaiba River Basins. Following the uplift of the Parnaiba River Basin and the formation of a transversal geological fault to Northeastern coast, the course of the Sao Francisco River was altered to its current location [16], as shown in Figure 1A.
The reduced gene flow between sibling species and moderate genetic differentiation between groups <10°S and >10°S are consistent with a distribution pattern likely driven by the Sao Francisco River. Additionally, the estimated time of the divergence of these two groups coincides with the change of the river course. However, this divergence cannot be associated with reduced gene flow as mitochondrial DNA is prone to introgress through incipient species boundaries [50], and as demonstrated via cyt b gene analysis of Lu. longipalpis s.l. sympatric cryptic species  (1) Neutrality tests (for both neutral evolution and population demography) were performed as indicated in the text. Significant values were those which P < 0.05, indicated by (1) . Figure 4 Mantel Test. Correlation between the genetic structuring and the geographical distances between the 9 populations of Lu. longipalpis s.l. used in this study. [51]. Taken together, the data suggest that the Sao Francisco River is a significant geographic barrier between populations of Lu. longipalpis s.l., and could also have contributed to the current level of population diversity seen for this sand fly.

Mantel Test
This putative role of the Sao Francisco River in the speciation of Lu. longipalpis s.l. may also explain the current distribution of male sexual pheromones chemotypes. A plausible scenario is based on the suggestion that 9MGB represents an ancestral chemotype pheromone and that the diterpenes (i.e., cembrene-1 and 2) are more recent chemotypes [14]. Dispersal of Lu. longipalpis s.l. belonging to the Brazilian clade [7] is estimated to have occurred in the Plioceno-Pleistocene [52] when the Sao Francisco River flowed on a South-to-North direction [16]. Thus, the Sao Francisco River would have served as a barrier to the introduction of the 9MGB chemotype in the Brazilian NE coast.
The current chemotype distribution in the Brazilian NE possibly followed a process similar to what is described for the ring species hypothesis [53]. Under this hypothesis Lu. longipalpis s.l. would have reached the coastal areas of the NE by circumventing the River's nascent, along its west margin, in Minas Gerais, South Eastern Brazil (as shown in Figure 1B). This process led to the adaptation to different habitats, and to differentiation into a population producing the cembrane-1 chemotype presently found in the NE of Brazil. This idea is supported by the presence of Lu. longipalpis s.l. in the Brazilian SE producing different pheromone chemotypes, such as 9MGB found in Lapinha Cave, cembrene-2 in found in Jaiba, and 3MαH found in Jacobina [14,54]. This scenario also supports the suggestion that Lu. longipalpis s.l. sibling species currently found in Sobral were the result of vicariance 1MY ago [55], and prior to the change in the course of the Sao Francisco River. Thus, the sympatric species in Sobral were isolated by the Sao Francisco River, leading to the development of 9MGB+ on the west and cembrene-1 on the east margins. Following the shift of the Sao Francisco River these species reached the current secondary contact proposed by [55] ( Figure 1A). Our results suggesting greater similarity between Lu. longipalpis s.l. from Juazeiro (located on the southern rim of the Sao Francisco River) and flies belonging to group <10°S in comparison to flies belonging to group >10°S can be explained by a recent break-down of this geographic barrier, either by a recent geographic accident or through human activity in the region. This is consistent with our data indicating that no ongoing mitochondrial DNA haplotype exchange is detected between individuals from Juazeiro and individuals from group >10°S ( Figure 4A).
Lu. longipalpis s.l. secreting cembrene-1 also are present in areas other than the Brazilian NE [14,54,56]. In two such areas (Marajo Island and Santarem), the presence of cembrene-1 was possibly due to a recent geographic expansion after the change in course of the Sao Francisco River.
A study using microsatellite analysis indicated strong similarities between Lu. longipalpis s.l. from Marajo Island and flies from either Sobral or Natal [56]. Accordingly, our data supports a pattern compatible with population expansion from the latter two locations towards the former based on the mismatch distribution. In regards to the cembrene-1 population currently found in Espirito Santo do Pinhal, Sao Paulo [57], biogeographic and genetic studies are necessary in order to explain its origin.

Conclusion
Based on the data presented here we propose that the Sao Francisco River restricted gene flow between Lu. longipalpis s.l., participating in speciation processes of this sand fly. This paper contributes to our understanding on the expansion of Lu. longipalpis s.l. in Brazil and provides novel clues in regards to several aspects of the divergence of this important vector species.