Analysis of population genetic structure of Indian Anopheles culicifacies species A using microsatellite markers

Background Anopheles culicifacies sensu lato is an important vector of malaria in Southeast Asia contributing to almost 70% of malaria cases in India. It exists as morphologically similar sibling species A, B, C, D and E with varied geographical distribution patterns. Vector control measures have been difficult for this important vector as the sibling species have developed varying levels of resistance to the currently used insecticides. In view of the importance of this vector, we developed and validated a set of microsatellite markers and the same were used to analyze the population genetic structure of five different geographical populations of An. culicifacies A. Methods Anopheles culicifacies A samples were collected from different localities across India, and genotyping was performed using eight microsatellite markers on ABI Prism 310 Genetic Analyzer. Several statistical analyses were performed to ascertain the genetic diversity that exists within and between the populations. Results The markers were found to be moderately polymorphic in the populations. Genetic analysis indicated significant genetic differentiation between the majority of the population pairs analyzed and was not found to be related to the geographical distances between populations. Conclusion This is the first and successful attempt to test the microsatellite markers developed for population genetic analysis of An. culicifacies A. Host feeding and breeding habits of species A suggest that factors other than ecological and geographical barriers were responsible for the genetic differentiation that has been observed between the populations.


Background
Anopheles culicifacies sensu lato has a wide distribution in India and extends into the west to Ethiopia, Yemen, Iran, Afghanistan and Pakistan, and in the east to Bangladesh, Myanmar, Thailand, Cambodia and Vietnam and is also found in Nepal and southern China in the north and extends to Sri Lanka in the south [1,2]. It is an important malaria vector in India, Sri Lanka and in the countries west of India. It is responsible for 60-70% of new malaria cases in India [2]. This taxon has now been recognized as a species complex with five members provisionally designated as species A and B [3], C [4], D [5] and E [6]. Species, A, B, C and D, were recognized following the observation of a total absence or significant deficiency of heterozygotes in natural populations for the alternate arrangements observed in polytene chromosomes due to paracentric inversions. The fifth species, species E, was identified by correlating Y-chromosome polymorphisms of sons and the sporozoite positivity of mothers.
There is little information available about the population structure and gene flow that occurs between and/or within An. culicifacies sibling species populations in India. Several studies were carried out to examine the population structure of vectors in Africa and other countries, namely An. gambiae s. s. [7], An. arabiensis [8], An. funestus [9], An. darlingii [10] and many other vectors, using microsatellite markers and mitochondrial genes. Recently, two major vectors found in India, An. stephensi an urban malaria vector [11] and An. baimaii, a vector in north eastern states [12] were analysed for population genetic structure and gene flow using microsatellite markers and mitochondrial DNA genes respectively. Microsatellites are highly polymorphic genetic markers used extensively for studying genetic structure of populations. Realizing the importance of knowledge on the population structure and gene flow for implementing insecticide resistance management strategies for the control of An. culicifacies sibling species, microsatellite markers developed in our laboratory [13] were used in this study to understand the population structure of An. culicifacies species A populations, and the results are reported in this paper.

Sample collection and species identification
Anopheles culicifacies samples were collected from five different localities in India, namely the states (districts); Haryana (Sonepat), Gujarat (Kheda), Karnataka (Bijapur), Rajasthan (Jodhpur), and Uttar Pradesh (Allahabad). Specific spatial details of the collection sites are given in Figure 1 [14] and were selected to represent different ecosettings that prevail in India. Collections of indoor-resting mosquitoes were made by hand-collection using suction tube and torchlight. Specimens collected from resting sites were confirmed as An. culicifacies on a morphological basis, following the mosquito identification key by Christophers [15]. For sibling species identification, semi-gravid females were used for cytotaxonomic identification and genotyping. Ovaries from individual semigravid females were pulled out and stored in modified Carnoy's fixative (1:3 glacial acetic acid and methanol) and the carcasses of the mosquito were stored in isopropanol for DNA isolation. The two vials of each mosquito were given a corresponding code number for species correlation. Ovaries were processed for the preparation of polytene chromosomes and diagnostic inversion genotype analyses [16] were used for sibling species identification. In cases where specimens were not in a suitable Figure 1 Distribution of An. culicifacies sibling species in India and sites used in the study (source Ref [24]). Map of India showing the distribution of An. culicifacies sibling species in India. Site locations are indicated by arrows. States are indicated within parenthesis. stage for cytological identification, 28S-D3 -PCR assay was used [17].

Study sites
Study sites are shown in Figure 1 [14], which also shows the distribution of sibling species mapped from earlier studies based on which the present study sites were selected.
Genomic DNA extraction and Microsatellite genotyping of field collected samples The DNA was isolated from individual mosquitoes following the procedure described previously [18] and stored at −20°C until further use. In total thirty-one microsatellite markers were isolated in An. culicifacies A in our previous investigation [13]. Randomly, eight microsatellite markers, which were found to be polymorphic in the laboratory reared specimens and in one field population of An. culicifacies were used in the present study. Linkage relationship of these loci is not known. Each of the primer pairs belonging to the selected loci (Table 1), were labeled at 5′ with TET™, HEX™ or 6-FAM™ fluorescent dyes (Microsynth Corporation, Switzerland). Multiplex PCRs were set up by grouping 2 or 3 primer pairs together depending on their annealing temperatures, dyes, and sizes and PCR amplifications. The PCR primers and procedures followed in the study were same as described earlier [13]. The resultant amplified products were resolved using an ABI Prism 310 Genetic Analyzer (Applied Bio systems). Alleles were sized relative to an internal fluorescent standard marker and the results were analyzed using the Genescan software version 3.1 (Applied Bio systems).

Statistical analysis
Random mating i.e., agreement to Hardy-Weinberg Equilibrium (HWE) within each of the populations was tested by using the Arlequin v.2.0 software [19]. An unbiased estimate of the exact P-value for each locus was computed using the Markov chain method of Geo and Thompson [20] with the forecasted chain length of 100,000 steps and dememorization steps of 1000. The observed proportion of heterozygote deficiencies (D) and the frequencies of null alleles (r) that caused the heterozygote deficiencies were estimated following the method described in Chakraborty et al., [21]. Genetic variability among the populations was measured by Wright's F-statistics [22]. Further, F-statistics values were also calculated according to the method of Weir and Cockerham using the GenAlex v.5.4 (MS Excel-based genetic analysis tool) [23]. The significance of F ST values was tested using the formula described by Workman and Niswander where Chi square = 2NF ST (where N is the population size with n-1 degrees of freedom for 'n' subpopulations) [24].
The effective migration rate (Nm) was estimated according to Slatkin's (1987) formula; Nm = 1/4 [(1/ F ST )-1] [25]. To investigate, if levels of differentiation were related to geographical distances, the regression of F ST (1-F ST ) on the natural log (ln) of geographical distance was used [26].

Results
Morphologically identified An. culicifacies specimens were collected from five different localities in India that were selected to best represent the diversity of Indian geography and its ecology ( Figure 1). Of the specimens identified, both cytotaxonomically and by PCR methods, a total of 104 species A samples were analyzed for the population structure of An. culicifacies in India. Details of the number of samples from each of the five localities are given in Table 1 and the location of the study sites are shown in Figure 1.

Distribution and the level of genetic diversity
A total of 104 samples were analyzed for eight microsatellite markers, in order to understand various population genetic parameters that might be influencing the present genetic make-up of An. culicifacies. Genetic diversity using microsatellite markers was compared by examining the number of alleles and its heterozygosity observed among the five different populations of species A. Various

Hardy-Weinberg equilibrium (HWE)
The genotype frequencies at several loci did not conform to HWE. For the majority of markers, the observed number of heterozygotes was less than the expected number and this deficiency was statistically significant (P < 0.05). A total of 40 tests (8 loci, 5 populations) in species A for conformance to HWE at the locus level within the populations were performed. Of these tests, 29 (72.5%) tests showed a significant deviation from HWE in species A. Furthermore, in all the cases heterozygote deficiency was observed ( Table 2). The loci ACAV1B213, ACA61 and ACAVB93A showed a significant heterozygote deficiency in all the five populations, while the locus ACA11B5 showed a significant departure in four out of the five populations. Loci AcAB93 and AcAVIIB46 showed heterozygote deficiency in three populations while loci AcA59 and AcAVIIIB40 showed heterozygote deficiency in only two populations. Presence and frequency of null alleles was tested using the method of Chakraborty et al. [21] (Table 3). Average estimate of null allele frequency ranged from 0.02 to 1 with a maximum average of 0.45 in all the populations.
Locus AcAVB93A showed the maximum frequency of null alleles in all the populations, while ACA11B5 contributed the least (Table 3).
Correlation between the population divergence and isolation of populations by distance The five populations exhibited remarkably similar allelic distributions on averaging all loci but differential patterns were observed among the loci in four out of eight loci (Table 3). A single allele of 206 bp of the marker ACA59 was predominant in all the five populations of species A, and with reference to the ACAVIB213 locus, each of the populations had a different predominant allele (Table 3). There were also alleles unique to one or the other populations, but the frequencies of these alleles were very low (data not shown). The maximum frequency of alleles at each of the loci ranged from 0.2 to 0.9 in the populations (Table 3). Genetic variability between species A populations was studied using F-statistics. The pair-wise genetic differentiation among species A populations were found to be significant in eight out of the ten combinations and ranged from 0.0853 -0.2483. The populations sampled from five different geographical regions represented northern, western, eastern, and southern parts of the country (Figure 1). The shortest distance between the two populations was 400 km between the districts Kheda and Jodhpur, while the longest was 1730 km between the districts Sonepat and Bijapur. A high level of genetic differentiation was observed between the species A populations. The mean F ST for all the markers and all the populations was 0.155. The F ST values were non-significant for two population comparisons, namely Kheda and Sonepat; and Bijapur and Sonepat, which indicates a low genetic differentiation between these populations. For all other population comparisons, the values were significant suggesting a high genetic differentiation. Furthermore, to understand the role of geographical distance in generating the genetic distance between the sampled populations, the Mantel test was performed. The test revealed no significant correlation between the pair-wise F ST /(1 -F ST ) against the natural logarithm of pair-wise geographical distance (r2 = 0.3952; Figure 2), suggesting that the population genetic structure of An. culicifacies in India did not conform to the isolationby-distance model. This was also evident from the analysis of molecular variance which is calculated based on the F ST values, and is found to be greater within the pairs of populations analyzed (87%) than that among the populations (13%) (Figure 3).

Gene flow
The Nm-values were calculated based on the F ST values for all the comparisons and the values are tabulated in Table 4. In all the pair-wise population combinations, , and Kheda vs. Bijapur (5.6) which are distantly located, at 1000 and 1400 km apart, respectively, and it is only for these two population comparisons, that the FST values are non-significant. The measure of gene flow again supports that the genetic differentiation observed is not related to geographical distance.

Discussion
All the eight microsatellite markers examined in An. culicifacies species A populations were polymorphic.
Maximum number of alleles at each locus ranged from 4 to 10 with the exception for one marker (ACAVVIB213), which exhibited 17 alleles, suggesting that these markers are moderately polymorphic in the populations. Except for AcAVIIB46 and AcAVB93A with the least number of alleles, the other markers did not exhibit all the alleles in any of the five populations. Significant deviations from the Hardy-Weinberg expectations for these markers were observed in the populations studied. The deficiency of heterozygotes observed in these populations could be due to population sub-structuring or due to the presence of null alleles, i.e. mutation/s in the primer sites that may prevent annealing of primers to the DNA samples. This results in either total failure of amplification (homozygous for null allele), or most commonly one strand will be amplified and the sample will be scored as a homozygote, which in fact is a heterozygote [27]. During genotyping, a few DNA samples did not amplify for one or the other marker. In order to rule out the problems of multiplexing, these samples   were also genotyped separately by setting up an individual PCR. As these samples were normally amplified in some other combinations, non-amplification is necessarily not due to the poor quality of DNA or other general PCR procedural problems. Grouping together of different gene pools (Wahlund effect) and non-random mating within the populations i.e. presence of sub-populations as a reason for the deficiencies is not being favored because (i) collections were of only indoor-resting mosquitoes, collected from both human dwellings and cattle sheds, and mosquitoes were collected from structures within a single village or one or two more which are within 5-10 km range, (ii) the extensive cytotaxonomic studies carried out in different parts of the country did not indicate any sub-populations and (iii) that D values up to 10% do not indicate the presence of sub-populations [20]. The D-values observed in this study are significantly low. Therefore, presence of null alleles is being considered to be mainly responsible for the deficiencies of heterozygotes observed at these loci. However, it is not known whether the markers not being in Hardy-Weinberg equilibrium has had any influence on the population genetic parameters estimated in this study. Keeping this in view and that the markers were randomly selected not knowing whether they represent the entire genome, a few conservative conclusions are being drawn from the F-statistics data. F ST values have shown a great genetic differentiation between the pairs of populations analyzed. Low Nm values (<1) between Haryana and Uttar Pradesh, and Rajasthan and Haryana suggest a limited gene flow. The geographical distances of these two population pairs were less than those observed for Gujarat and Haryana, and Gujarat and Karnataka which had maximum Nm values (>5 and 7 respectively). Vindhyachal Mountain ranges which pass through central India (Maharashtra and Madhya Pradesh States) separate northern and southern parts of India. Karnataka is located in the southern part of India. There are no known geographical barriers that exist between the other four populations studied. Furthermore, anopheline species in general are known to have limited flight range. For species A, which is predominantly zoophagic (maximum anthropophilic index observed was 3-4%), An. culicifacies being a rural mosquito and agriculture being practiced extensively in these areas indicate free availability of cattle for blood feeding. Irrigation channels are preferred breeding habitat for species A. This suggests that other than geographical or ecological, some other barriers are playing a role for differential genetic differentiation levels observed between these populations. Residual sprays with effective insecticides alter and interfere with population sizes. At the time of collection of mosquitoes, no spray operations were going on in sites in Uttar Pradesh, Haryana and Rajasthan states. In Gujarat and Karnataka, malathion was being sprayed in the study sites, which even at present is effective on An. culicifacies populations in these areas. Therefore, the effect of insecticide sprays on the mosquito population sampling cannot be ruled out.

Conclusion
This is the first and successful attempt made to study the population genetic structure of An. culicifacies using microsatellite markers. Genetic analysis of five different   populations of species A was carried out with eight microsatellite markers. F ST values indicated significant genetic differentiation between the majorities of the population pairs analyzed. We hope that these results may add a further step in understanding the dynamics of the vector species for planning effective vector control activities based on population genetic structure.