Molecular markers for analyses of intraspecific genetic diversity in the Asian Tiger mosquito, Aedes albopictus

Background The dramatic worldwide expansion of Aedes albopictus (the Asian tiger mosquito) and its vector competence for numerous arboviruses represent a growing threat to public health security. Molecular markers are crucially needed for tracking the rapid spread of this mosquito and to obtain a deeper knowledge of population structure. This is a fundamental requirement for the development of strict monitoring protocols and for the improvement of sustainable control measures. Methods Wild population samples from putative source areas and from newly colonised regions were analysed for variability at the ribosomal DNA internal transcribed spacer 2 (ITS2). Moreover, a new set of 23 microsatellite markers (SSR) was developed. Sixteen of these SSRs were tested in an ancestral (Thailand) and two adventive Italian populations. Results Seventy-six ITS2 sequences representing 52 unique haplotypes were identified, and AMOVA indicated that most of their variation occurred within individuals (74.36%), while only about 8% was detected among populations. Spatial analyses of molecular variance revealed that haplotype genetic similarity was not related to the geographic proximity of populations and the haplotype phylogeny clearly indicated that highly related sequences were distributed across populations from different geographical regions. The SSR markers displayed a high level of polymorphism both in the ancestral and in adventive populations, and FST estimates suggested the absence of great differentiation. The ancestral nature of the Thai population was corroborated by its higher level of variability. Conclusions The two types of genetic markers here implemented revealed the distribution of genetic diversity within and between populations and provide clues on the dispersion dynamics of this species. It appears that the diffusion of this mosquito does not conform to a progressive expansion from the native Asian source area, but to a relatively recent and chaotic propagule distribution mediated by human activities. Under this scenario, multiple introductions and admixture events probably play an important role in maintaining the genetic diversity and in avoiding bottleneck effects. The polymorphic SSR markers here implemented will provide an important tool for reconstructing the routes of invasion followed by this mosquito.


Background
The Asian tiger mosquito, Aedes (Stegomyia) albopictus is a vector of numerous arboviruses including Dengue, Chikungunya, and a variety of epizootic viruses [1][2][3]. It is considered to be the most invasive mosquito species in the world [4], and it is listed among the top hundred most dangerous invasive species (http://www.issg.org/ database/species/search.asp?st=100ss).
Originally a native of the tropical forests of Southeast Asia, it has spread during the last 30-40 years to both tropical and temperate regions worldwide [4,5]. Global warming can result in wetter and warmer conditions that favour the spread of this mosquito and its aquatic larvae [6,7]. The dramatic expansion of this aggressive daytime biting mosquito has increased public health concerns for arbovirus-related disease outbreaks [8][9][10][11]. Some biological traits such as feeding preferences [12], winter diapause [13], and vector competence [14] of Ae. albopictus vary according to its geographic origin. Given that these traits may be genetically determined and may influence pathogen transmission, it is of paramount importance to define the geographic origin of invading populations.
Furthermore, as the threat represented by this mosquito is growing due to the lack of sustainable control measures and of the progressive spread of insecticide-resistance [15,16], efficient surveillance methods may depend on the detailed knowledge of the population structure and dynamics of this vector [10,17]. Moreover, the potential presence of cryptic subgroups, that may be unaffected by control methods, should not be under-estimated. Indeed, great variation in terms of genome size has been previously identified in strains from different regions [18][19][20][21]. Variations in the rDNA cistron have also been reported within and among populations of this mosquito [22]. In this context, the Internal Transcribed Spacer 2 (ITS2) of nuclear ribosomal DNA has been shown to be a useful marker, because of its power in detecting differentiation, not only among cryptic species [23][24][25], but also among conspecific populations [26][27][28]. As ITS2 is one of the less functionally restricted regions of the rDNA, it can accumulate mutations within isolated populations relatively quickly and can thus be an indicator of genetic discontinuity between populations [29,30]. The genetic diversity and the phylogeographic relationships among populations have been extensively analysed using mitochondrial DNA (mtDNA) markers [31][32][33][34][35]. The data from populations collected in various regions of the world were consistent in displaying the presence of weak sequence variation at mtDNA loci, probably due to the recent spread of the species, but also due to the low mutation rate of these markers. Indeed, when a set of 13 microsatellite markers were used to analyse the invasion of the Australian region [36], they appeared to be more informative, compared to mtDNA markers, in resolving the population structure and the invasion dynamics. Moreover, considering that Ae. albopictus populations are infected by the symbiont Wolbachia [37], whose presence can impact mtDNA diversity within host populations, the utility of mtDNA as a neutral marker is reduced [38].
In the present study we used the ITS2 region to address the presence of intraspecific differentiation (genetic barriers) among groups of populations from different ancestral and recently colonized geographic areas. Moreover, we developed additional novel microsatellite markers for this mosquito species. We provide evidence of their usefulness for the analysis of the genetic composition of populations.

Mosquito samples
Nine wild population samples of Ae. albopictus were collected across the species range (Table 1). Two samples were from Thailand (Ban Rai, Lampang Province in the North and Phato, Chumphon Province in the South); two from Réunion (St. Denis in the North and St. Pierre in the South-West); and five from Northern Italy (Brescia, Castellanza, Pavia, Modena, Cesena). The Thai samples are from a region that pertains to the putative home range of the species, while Réunion and the Italian samples are representative of old (17th and 18th century) [39] and recently colonized (1990) [40] areas, respectively. For each of the samples, eggs were collected using ovitraps and the emerging adults were reared in the insectary at 27°C, with 60-80% relative humidity and a 12:12 h (L:D) photoperiod. The identities of all the samples were confirmed using the morphological keys of Rueda [41]. The nine population samples were screened for sequence variability of the ITS2 marker.
The Rimini laboratory strain (F35) was used for the construction of the microsatellite-enriched libraries. This strain was established from mosquitoes collected in Rimini (Emilia Romagna) at the Centro Agricoltura Ambiente 'G. Nicoli' (Crevalcore, Italy). Three of the nine wild samples previously described, Ban Rai, Cesena and Brescia, were also screened to assess the variability of the newly developed microsatellites markers.

ITS2 marker characterization
DNA was extracted according to Baruffi et al. [42] from individual females of the population samples. The extracted DNA was resuspended in TE (10 mM Tris-HCl, pH 8, 1 mM EDTA) and its concentration was quantified using a Nanodrop ND-1000 spectrophotometer (Nano-drop Technologies Inc., Wilmington, DE, USA). The ribosomal DNA ITS2 region was amplified from the DNA samples using the universal primers 5.8Sf 5′-atcactcggctcgtggatcg-3′ and 28Sr 5′-atgcttaaatttagggg gtagtc-3′ [43]. These primers anneal to highly conserved sequences in the 5.8S and 28S rDNA genes flanking the ITS2 region. Reactions were performed in a volume of 25 μl with approximately 3 ng DNA, 1.5 mM MgCl 2 , Reaction buffer (10 mM Tris-HCl pH 9, 50 mM KCl), 200 µM dNTP, 10 pmol of each primer and 1 unit Taq DNA polymerase (Invitrogen). Amplifications were performed using an Eppendorf MasterCycler with the following cycle conditions: an initial denaturing step at 94°C for 2 min; twenty-five cycles of denaturation at 94°C for 20 s, annealing at 54°C for 15 s, extension at 72°C for 25 s, followed by a final extension at 72°C for 5 min.
The PCR products were separated on 1.5% agarose gels, stained with ethidium bromide and photographed under ultraviolet light. Amplification products were ligated into the TOPO 2.1 vector and transformed into competent One Shot cells using the TOPO TA cloning kit (Invitrogen). DNA inserts were sequenced on both strands.

ITS2 data analysis
The boundaries of the ITS2 region were determined by comparison with the available Ae. albopictus sequences in GenBank. The ITS2 sequences were aligned using MUSCLE 3.8 [44]. Nucleotide diversity and gene diversity parameters were determined using Arlequin 3.5.1.2 [45]. Geographic structure in the ITS2 data set was investigated by spatial analyses of molecular variance implemented in SAMOVA 1.0 [46]. This approach identifies groups of populations that are genetically and geographically homogeneous and maximally differentiated from each other. The method requires the a priori definition of the number of groups (K) of populations that exist and generates F-statistics (F SC , F ST and F CT ) using an AMOVA approach. Different numbers of groups (K) were tested, and a simulated annealing procedure permitted the identification of the composition of each of the K groups that maximizes the F CT index (proportion of total genetic variance due to differences between groups). The program was run for two to eight groups (K = 2 to K = 8) each time with the simulated annealing process repeated 100 times, starting each time with a different partition of the population samples into the K groups. A phylogenetic analysis was also performed including the unique ITS2 sequences or haplotypes identified in the nine population samples by Arlequin 3.5.1.2 [45]. The haplotype sequences together with an Ae. flavopictus ITS2 outgroup sequence (GenBank accession number: AF353559.1) were aligned using MUSCLE 3.8 [44]. The Kimura-2-parameter and Γ distribution model indicated by the Akaike Information Criterion (AIC = 3064) and the Bayesian Information Criterion (BIC = 3899), was used to perform Maximum Likelihood analyses in MEGA 5.2.2 [47]. The reliability of the resulting haplotype tree was determined by 1000 bootstrap replications. The mid-point rooted tree was drawn using FigTree v1.4 (http://tree.bio.ed. ac.uk/software/figtree/).

Microsatellite isolation and characterization
Total genomic DNA was extracted individually from 64 mosquitoes (40 males and 24 females) from the Rimini strain using the method reported by Baruffi et al. [42]. The DNA was subsequently pooled and submitted to Genetic Identification Services, GIS (Chatsworth, CA, USA, http://www.genetic-id-services.com) for library construction. The library was enriched for four different di-and tri-nucleotide microsatellite motifs, namely (CA)n, (GA)n, (AAC)n, and (ATG)n. Insert DNA from individual clones was amplified by PCR following GIS guidelines. The PCR products were purified using QIAquick columns (Qiagen) and sequenced using the DYEnamic ET Terminator Cycle Sequencing Kit (Amersham Bioscience) with the universal M13 primers (M13f 5′-aggaaacagctat gaccatg-3′ and M13r 5′-acgacgttgtaaaacgacgg-3′). Clones with inserts less than 200 bp were not considered for sequencing. Sequencing products were resolved on the Applied Biosystems model 377 DNA Sequencer (Applied Biosystems). The resulting sequences were screened for the presence of microsatellite motifs using WebSat [48]. All the clone sequences were checked for redundancy using BLASTn [49].
Microsatellite sequences that shared one or both the 5′/3′ flanking regions in common were discarded to exclude potential multi-locus microsatellites families. The sequences that were found to be unique within the library were subjected to BLASTn analysis against the nr database to identify eventual hits with previously described sequences. Primer pairs were designed for the flanking regions of the identified SSR sequences using Primer 3 [50]. Preliminary PCR screenings were performed on 20 individuals (10 males and 10 females) from the Rimini strain to validate the primer efficiencies in single SSR locus amplifications. DNA amplifications were performed on a Mastercycler gradient (Eppendorf). The PCR reaction mixture consisted of 50 ng genomic DNA, 1x PCR buffer, 1.5 mM MgCl 2 , 270 µM dNTPs (Invitrogen), 1 U Taq polymerase (Invitrogen), and 10 μM of each primer, one of which was 5′ labelled with a fluorescent dye, in a final volume of 15 μl. PCR cycling conditions were 94°C for 3 min, 30 cycles of 94°C for 30 s, 57-60°C for 30 s, and 72°C for 30 s, followed by a final extension step of 72°C for 10 minutes. Aliquots of PCR products were separated by electrophoresis on 2% agarose gels stained with ethidium bromide, and visualized under UV light. Each PCR product was then diluted 1:10 in ddH 2 O water and 1 μl of this dilution was added to 9 μl of a mixture of deionized formamide and GeneScan-500 ROX size standard (Applied Biosystems). After denaturation for 4 min at 94°C the fragments were resolved on an ABI PRISM 310 Genetic Analyzer (Applied Biosystems) with the GENESCAN software package (Applied Biosystems). To avoid genotyping errors, the program TANDEM [51] was used to automate binning of the microsatellite allele length variants.

Microsatellite data analysis
The parameters of intra-and inter-population genetic diversity, such as observed allele size, number of actual alleles (n a ), number of effective alleles (n e ), number of private alleles (n p ) and AMOVA were computed using GenALEx 6.5 [52]. Observed and expected heterozygosity (H O and H E , respectively), gene diversity (H S ) and F ST values were computed using MICROSATELLITE ANALYSER (MSA) V.4.05 [53]. Linkage disequilibrium, the frequency of null alleles (A n ) and the deviations from Hardy-Weinberg expectations were computed using GENEPOP version 4.2 [54,55]. For loci with fewer than five alleles, an exact test of Hardy-Weinberg proportions was performed. For loci with five alleles or more, an unbiased estimate of the exact probability was obtained using the Markov chain method of Guo & Thompson [56] for each combination of locus and population. The allelic Polymorphic Information Content (PIC) was derived using CERVUS [57].

ITS2 sequence variability
ITS2 sequences were obtained from an average of three individuals from each of the nine population samples collected in Thailand, Réunion and Italy (Table 2). In addition, to determine whether multiple copies of the ITS2 region varied within individuals, up to four clones were sequenced from each of these individuals. The primers amplified a product of 502-588 bp in length that included part of 5.8S and 28S sequences. The ITS2 sequence itself ranged from 322 to 408 bp in length with an average GC composition of 51%. The 76 ITS2 sequences, obtained from a total of 26 individuals, represent 52 unique sequence-types or haplotypes ( Table 2). The gene diversity estimates were very high across the analysed sequences and  An estimate of variability distribution in and among the nine tested population samples is reported in Table 3. AMOVA indicated that most of the variation occurs within individuals (74.36%) while only about 8% of total variation is detected among populations. The interindividual variation within samples accounted for 17.51% (P < 0.05) of the total variance. When the three main geographical population groups (Thailand, Réunion and Italy) are accounted for (Table 4), similar and relatively low percentages of variation were observed among groups (8.12%) and among populations within groups (7.09%), indicating that the genetic similarity is not related to the geographic proximity of populations. Indeed, according to the SAMOVA analysis (Table 5) the best partitioning of genetic diversity was obtained when the nine samples were grouped into seven groups: F CT =0.186, P = 0.011. Here the genetic diversity is not only among geographical regions, as even the two ancestral Thai populations are separated into two groups, while the North-Italian populations are separated into three groups and the two Réunion populations into two groups.

ITS2 haplotype phylogeny and population distribution
The phylogenetic analysis based on the 52 ITS2 haplotype sequences identified in the nine geographic populations resulted in a Maximum Likelihood tree with a log likelihood of −1428.3 (Figure 1). The ITS2 sequences form a compact cluster with very few nodes supported by bootstrap values greater than 50%. Highly related sequences are distributed across populations from different geographical regions; some evidence of clustering of tightly related sequences is present in Ban Rai, Phato (Thailand) and to a lesser degree in St. Pierre (Réunion).

SSR: Isolation and characterization
Out of a total of 109 sequenced clones obtained from the Ae. albopictus library, 92 contained tandem DNA repeats with at least 5 repeat units. BLASTn analyses showed that one or both flanking regions were shared in 42 sequences and they were thus excluded from the subsequent characterization. Within the remaining 50 sequenced clones, five harboured more than one tandem DNA repeat, resulting in the identification of 55 microsatellite motifs. Among these, trinucleotide motifs were the most abundant (28), followed by dinucleotide (22), mononucleotide (3), heptanucleotide (1) and hendecanucleotide motifs (1). Among the trinucleotides motifs, CAA was highly abundant (17) followed by TGA (10). Among the dinucleotide motifs, TG was predominant (14) followed by CT (8). Thirty-seven of these microsatellite motifs contained perfect repeat arrays. Twelve    Forty-five microsatellite sequences were considered for further validation, as the others lacked sufficient flanking regions to design primers. After the PCR screening on 20 individuals from the Rimini laboratory strain, 23 SSR sequences were validated as loci ( Table 6). The nucleotide sequences of these validated SSR loci were submitted to GenBank [Genbank: KP859591-KP859613].

Evaluation of microsatellite polymorphism in ancestral and adventive populations
On the basis of a preliminary screening performed on the Rimini strain, 16 (Aealbmic1 to Aealbmic16) of the 23 validated SSR loci (Table 6) were used to analyse the degree of polymorphism in 79 mosquitoes from three populations: Ban Rai (Thailand), Brescia and Cesena (Italy) ( Table 7). The average Polymorphic Information Content (PIC) estimated across the 16 loci was 0.56 ± 0.29 in Ban Rai, 0.50 ± 0.26 in Cesena and 0.47 ± 0.21 in Brescia, suggesting that these loci are informative for population analyses. Heterozygous individuals were present in both sexes for all 16 loci, excluding any condition of sex linkage. Tests for Hardy-Weinberg equilibrium (HWE), after sequential Bonferroni correction [58], revealed that five loci did not meet the expectations of this model (Table 7): three (Aealbmic4, 11,13) in the Ban Rai sample, one in Cesena (Aealbmic3) and two in Brescia (Aealbmic13, 16). However, the locus/population combinations that were not in HWE were not concentrated at any locus or in any population. As no evidence of linkage disequilibrium between loci was assessed, these 16 loci can be considered to be independent. In the three populations most of the loci display relatively high levels of gene diversity (Hs). The frequency of null alleles across the loci was generally relatively low apart from Aealbmic15 with an estimate of 0.98 in Cesena. Two alleles were detected in Cesena at the Aealbmic15 locus, but one was very rare. In the other populations this locus was monomorphic for the common allele. Aealbmic1 is monomorphic in Ban Rai and Cesena, while in Brescia two alleles were present. Across the loci a higher level of polymorphism was observed in Ban Rai, with a mean number of alleles per locus of 6.19 (Table 7). Private alleles (n p ) were detected in all populations, with the highest average value in the Ban Rai sample (2.25). Analysis of molecular variance (AMOVA) across the three populations indicates that the greatest portion of variance, 76.62%, is found within individuals, while 16.80% is among individuals and only the remaining 6.57% is among populations. The F ST values between each pair of populations are statistically significant and indicate that the degree of differentiation between the two Northern Italian samples, Cesena and Brescia (F ST = 0.061), is not very different from those between these two Italian samples and the Ban Rai sample (F ST = 0.072 in both comparisons).

Discussion
In the absence of a fully annotated genome, molecular markers for Ae. albopictus are crucially needed to track the rapid spread of this mosquito that is causing increasing concern due to its status as a vector of several arboviruses of public health importance [3,9]. The two types of genetic markers here implemented give a picture of the distribution of genetic diversity within and between populations providing clues on the dispersion dynamics of this species. ITS2 sequence variation has been analysed in mosquitoes from the supposed native range, Thailand, from an old colonised area, Réunion, and from a newly invaded area, Italy. It appears that substantial variability, both in terms of haplotype number and gene diversity, is present in the native as well in the two areas of introduction. The haplotype variability is mostly concentrated within individuals (74.36%) or within populations (84.78%), while only a relatively small proportion into geographical differences, both at single population (8.13%) and at geographic area level (8.12%). This implies that, according to SAMOVA, the haplotype variation, besides being high, has a distribution that is independent from geography. Different haplotypes are present in each individual of the nine samples, and the tree clearly demonstrates that they are similar and dispersed across native and adventive populations (Figure 1). Substantially concordant conclusions can be obtained with the newly developed SSR markers, which display a high level of polymorphism both in the ancestral (Ban Rai, Thailand) and adventive populations (North Italian samples), previously scored for ITS2 variability. Although the chromosomal location of these markers remains unknown, the assessed linkage equilibrium between them suggests that they are statistically independent and their variability patterns might reflect genome wide patterns across populations. Based on the population samples here analysed, the SSR markers were, as expected, more Motif type (P: perfect repeats; I: imperfect repeats). 2 The fluorescent label (6FAM or HEX) on the forward primer is indicated for the 16 loci that were subsequently used for the population analysis. 3 Annealing temperature (T a ). 4,5 SSR sequences that correspond to previously identified loci: (alb212) [57] and (AealbD2) [58], respectively.  informative than ITS2 in revealing the slight genetic diversity between native and derived populations both in terms of variability and differentiation. About 7% of total variability (AMOVA) is represented by the differences between the three populations, while the highest variability was found within populations. This is an agreement with a previous study on mosquitoes collected in Réunion [59]. The F ST estimates indicate that the differentiation between the two Italian samples is slightly smaller (F ST = 0.061) than that between the Italian and Thai samples (F ST = 0.072), suggesting the absence of great differentiation between ancestral and derived populations. The ancestral nature of the Thai population is also corroborated by its high level of variability. Taken together these genetic data raise questions on the way this mosquito is spreading and becoming established worldwide. Clearly, our data indicate that the diffusion of this mosquito does not conform to a progressive expansion away from the native Asian source area, but supports a relatively recent and chaotic propagule distribution mediated by human activities. Under this scenario, multiple introductions and admixture events probably play an important role in maintaining the genetic diversity and in avoiding bottleneck effects.

Conclusions
In this study we have exploited nuclear molecular markers that have allowed us to highlight the presence of high intraspecific variability and a low/moderate level of differentiation of Ae. albopictus populations collected in different eco-geographic areas. Most of the genetic variation was detected at the individual level and this contributed to the high genetic variation observed within populations. This genetic feature, also revealed using other markers [7,34,36,59] may have facilitated the characteristic ability of this invasive species to expand and adapt to novel environments.
The highly polymorphic SSR markers here implemented, together with the set previously described [36,59,60], represent an important tool for reconstructing the routes of invasion and for the identification of the origins of mosquito outbreaks.