Genetic differentiation and population structure of Anopheles funestus from Uganda and the southern African countries of Malawi, Mozambique, Zambia and Zimbabwe

Background Anopheles funestus (s.s.) is a primary vector of the malaria parasite Plasmodium falciparum in Africa, a human pathogen that causes almost half a million deaths each year. The population structure of An. funestus was examined in samples from Uganda and the southern African countries of Malawi, Mozambique, Zambia and Zimbabwe. Methods Twelve microsatellites were used to estimate the genetic diversity and differentiation of An. funestus from 13 representative locations across five countries. These were comprised of four sites from Uganda, three from Malawi and two each from Mozambique, Zambia and Zimbabwe. Results All loci were highly polymorphic across the populations with high allelic richness and heterozygosity. A high genetic diversity was observed with 2–19 alleles per locus and an average number of seven alleles. Overall, expected heterozygosity (He) ranged from 0.65 to 0.79. When samples were pooled three of the 12 microsatellite loci showed Hardy–Weinberg equilibrium. Unsupervised Bayesian clustering analysis of microsatellite data revealed two clusters with An. funestus samples from Mozambique, Uganda and Zambia falling into one group and Malawi and Zimbabwe into another. The overall genetic differentiation between the populations was moderate (FST = 0.116). Pairwise differentiation between the pairs was low but significant. A weak but significant correlation was established between genetic and geographical distance for most populations. Conclusions High genetic diversity revealed by the loci with low to moderate differentiation, identified two clusters among the An. funestus populations. Further research on the population dynamics of An. funestus in east and southern Africa is essential to understand the implications of this structuring and what effect it may have on the efficient implementation of mosquito vector control strategies.

. The traditional methods of vector control, insecticide-treated bednets and indoor residual house spraying, have been very effective in controlling An. funestus due to its strong association with humans and human habitations [1,2]. However, the development of insecticide resistance across the region [5,6] has seen vector control programs compromised in some places [7][8][9]. This has also played an important role in the slow-down of the gains made in malaria control over the past 10 years in Africa [10,11]. To achieve the goals of malaria elimination and eradication, not only are new products and interventions needed, but monitoring the effectiveness of all vector control interventions is crucial.
Mosquito vector surveillance covers the monitoring of all aspects of mosquito biology, from geographical distribution to the more complex genetic diversity within and between populations. Studies of population genetics provide insight into gene flow between mosquito populations and through that, the likelihood of the spread of genes (e.g. those conferring insecticide resistance) across geographical regions. In east and southern Africa, studies focused on the population structure of An. funestus have been carried out in Uganda [12][13][14], Malawi [12,[15][16][17], Kenya [18], Tanzania [19], Mozambique [12,16,17] and Zambia [16]. These and other studies have revealed various levels of differentiation over the African continent, from no or very low structure to high population structure of An. funestus [12][13][14][15][16][17][18][19].
Understanding the gene flow dynamics of populations of malaria vector mosquitoes is important for strategic planning of vector control interventions. In this study, the population structure of An. funestus was examined using specimens from one country in east Africa (Uganda) and four countries in southern Africa (Malawi, Mozambique, Zambia and Zimbabwe). Twelve microsatellite markers distributed across all five chromosome arms were used to address the questions: (i) Is there any evidence for population genetic subdivision within this collection of An. funestus? and (ii) What is the maximum amount of differentiation observed within each selected site? Population structure analysis quantifies the amount of genetic exchange that occurs between sample sets and provides insight on how these underlying genetic components may be used to address biological phenomena such as insecticide resistance, parasite transmission and dispersal of the mosquito.

Mosquito collection study sites
Preserved mosquito samples from selected sites (Table 1) in five African countries (Malawi, Mozambique, Uganda, Zambia and Zimbabwe) (Fig. 1) were used. From Malawi, sample collections were from three sites; Karonga, Majete and Likoma Island. Karonga is found in the far north of Malawi, Majete is an area near Majete Wildlife Reserve in the south-west of Malawi, and Likoma is an island in the north eastern corner of Lake Malawi close to Mozambique. In Malawi, Karonga has two seasons, a dry season from May to mid-November and a wet season from mid-November to April, temperature varies between 16.11-33.3 °C and humidity varies between 54-79%. In Likoma, the wet season starts from November to April and the rest of the year from May to October is almost dry. Temperatures vary between 22.4-31.4 °C and humidity varies between 43.7-83.6%. In Majete there are two distinct seasons, the dry season from May to September and the wet season from November to April. Temperature varies between 22-34 °C and the average humidity is 71%.
In Mozambique, samples were collected from Maciana and Matola, near Maputo in southern Mozambique. Matola is the largest suburb of Maputo and lies on the coastal plain north of Espirito Santo estuary. In Matola, found 42 m above sea level, the climate is characterized by a warm and wet season from December to March, and a dry and cold season from June to August. Precipitation averages 1050 mm (41 inches) per year, with abundant rainfall from December to March, and rare rains from May to October. Temperature varies between 15-34 °C and relative humidity ranges between 28-89%. Maciana, at 68 m above sea level is characterized by a warm and wet season from November to April, and a dry and cold season from May to October. Precipitation averages 815 mm (32 inches) per year, most of which falls from November to March. The rainiest month is January (170 mm). Temperature varies between 11-34 °C and relative humidity ranges between 59-67%.
Uganda experiences a warm tropical climate with heavy rains in March-May and September-November for the South. In northern Uganda, the wet season extends from April to November with peaks in April and August. Samples were obtained from Agule, Apac, Lira and Kamuli districts in North and central Uganda. Agule and Apac were originally one district so do not have much variation in climate characteristics. Agule lies 1034 m above sea level. Temperature varies between 20-34 °C and relative humidity varies between 28-88%. Apac lies on 1040 m above sea level with a tropical climate, over the course of the year, the temperature varies between 18-35 °C. Temperatures are highest in February, with an average of around 25.1 °C, whilst July is the coldest month of the year with an average of 22.1 °C. Precipitation over this area, averages 1305 mm. Humidity varies between 21% and 69%. Lira lies at 1091 m above sea level and has an average temperature of 23.2 °C and a yearly average precipitation of 1376 mm. The wet season runs from April to mid-November and with two dry seasons (December to February and June to August). February is the warmest month of the year with an average temperature of 33 °C. July has the lowest average temperature, with an average of 21.6 °C. Overall, the average humidity for Lira is 58%. On the other hand, Kamuli lies 1185 m above sea level and temperature varies between 15.5-32 °C. March is the warmest month of the year, with an average temperature of 22.7 °C, whilst July has the lowest average temperature of 20.7 °C. There is a great deal of rainfall in Kamuli, even in the driest month (January) with 63 mm of rainfall. April is the wettest with an average of 193 mm. Average rainfall is 1325 mm over the year. Humidity varies between 62-95%.
Zambian collections were from Nchelenge and Namwala. Namwala is located in the north-western corner of the southern province next to Kafue National Park. Zambia has a tropical climate, which experiences wet and dry periods. In Namwala, the wet season lasts from mid-November to March and the dry season from end of November to April. The cool season is from May to August with temperatures varying between 10-27 °C and the dry season from September to November with temperatures around 26 °C to 37 °C and humidity between 19-75%. Some days are particularly dry where humidity can be as low as 0%. Nchelenge district is located near the Democratic Republic of Congo next to Lake Mweru. For Nchelenge, the wet season lasts from September to May and the dry season from end of May to September. The cool season is from December to April with temperatures varying between 15-27 °C. The hot season is from September to October with temperatures around 22 °C to 34 °C and humidity varying between 22-89%. In Nchelenge, humidity can also reach a low of 0%.
In Zimbabwe, samples were obtained from Honde and Mangwanda. Both sites are located within Mutasa district. In Zimbabwe, Honde, from late October to around the end of April, the weather is hot and humid. Temperatures rise up to 28 °C and is when most of the convectional rainfall is received. Average rainfall is 1150 mm. From May to the beginning of July, the temperatures are very low, up to 2 °C, whilst August is very windy. During September to October very hot temperatures are recorded, where the maximum temperature may average 30 °C. Humidity varies between 11-45%. In Mangwanda, the climate is warm and temperate. The summers have much more rainfall with an average of 809 mm. From May to August the temperatures are very low, varying between 9-16 °C, while August is very windy. From September to December, it is very hot and the temperatures range between 16-28 °C. The average humidity is 73%.

Mosquito collection and identification
Specimens were collected using indoor aspirations for all mosquitoes mainly in the living/sleeping area of the houses (huts), but for two sites human landing catches were carried out (Table 1). Anopheline mosquitoes were identified morphologically [2] and preserved individually in 0.6 ml microcentrifuge tubes with silica gel and stored at room temperature until processing. Molecular identification was carried out using the standard method of Koekemoer et al. [20] and the clade TaqMan assay of Choi et al. [21].

DNA extraction and microsatellite genotyping
Genomic DNA extractions were performed on the legs of the individual An. funestus mosquitoes using a DNA extraction kit prepGEM ® Insect (Cat. No. PIN0050; ZyGEM, Hamilton, New Zealand) following the manufacturer's instructions. In short, a leg of an individual mosquito was homogenized with reagents, incubated for 15 min at 75 °C and at 95 °C for 5 min. A 10 µl elution of genomic DNA was obtained for each sample from the extraction.
Twelve microsatellite loci were selected from previously published An. funestus sequence data [22][23][24][25], based on their high polymorphism and no evidence for null alleles. These microsatellites were: FUNF and AFUB12 on chromosomal arm 3L; AFND19, AFND20 and AFND41 on 3R; AFUB10, AFUB11, FUNL and AFND23 located on 2L; AFUB3 and FUNO located on 2R; and FUNQ on X ( Table 2). PCR amplification of the microsatellite loci was carried out individually in a 25 µl reaction volume, using 5-10 ng of template DNA. The reaction mixture contained 2.5 µl (1× PCR) buffer (Takara Bio Inc, Shiga, Japan), 1.5 mM MgCl 2 (Takara Bio Inc), 0.2 mM of each dNTP (Takara Bio Inc), 10 pmol of each primer, 0.1 U of Taq DNA polymerase (Takara Bio Inc). The forward primer was labelled at the 5′-end with either PET, HEX or 6-FAM with an ABI PRISM 377 (Thermo Fisher Scientific, MA, USA). Alleles were sized relative to an internal size standard using Peak Scanner v1.0 (Applied Biosystems, Toulouse, France). Output files (*.fsa) generated by the autosequencer were loaded into Peak Scanner ™ software. Determination of fragment size was performed using default settings, with GS500LIZ specified as size standard and 'PP -Primers Present' in the ' Analysis Method' column. Peak information showing height, area and size were labelled and scored. The resulting fragment sizes were used for further analyses.

Microsatellite diversity, differentiation and structure analysis
Microsatellite allele and genotype frequencies were determined using Arlequin 3.5 [26] and FSTAT 2.9.3 [27]. These frequencies were then used to assess deviation from Hardy-Weinberg equilibrium at each locus, each population sample, and overall as indicated by the inbreeding coefficient (F IS ). Linkage disequilibrium between pairs of microsatellite loci was assessed using FSTAT 2.9.3 [27]. The level of genetic differentiation between sampling groups was assessed using F ST [28]. Pairwise F ST between sampling groups and their significance was assessed using the randomization approach implemented in FSTAT 2.9.3 with Bonferroni-adjusted P-values.
The significance of genetic differentiation between populations based on allelic distribution across populations was examined using a Fisher exact test with FSTAT 2.9.3 [27]. The pairwise F ST was assessed by estimating Wright's F-statistics, calculated according to Weir & Cockerham [28]. A locus by locus analysis of molecular variance (AMOVA) [29] was performed using Arlequin 3.5 to determine the relative contribution of within-sampling groups and between sampling group's genetic diversity to the overall genetic diversity. Finally, population structure was inferred using a Bayesian model-based clustering algorithm to assign individuals (probabilistically) Each run was carried out with 1,000,000 iterations after a 'burn-in' period of 100,000, using the admixture model and correlated allele frequencies.
To check for convergence of the Markov Chain Monte Carlo (MCMC), 5 replicates for each value of K were checked for the consistency of results [29,30]. Based on allele diversity, individuals with unique alleles were grouped together into assumed populations (K) which is pre-determined. The K value with the maximum posterior probability Pr (X|K) value was retained and assumed to be the most probable number of clusters in that population. The estimated number of clusters (K) is taken to be the value of K with the highest Pr (X|K). All obtained files were run through Structure Harvester [31] followed by CLUMPP [32] and DISTRUCT [33]. The visual representations of the clusters were viewed in Ghostview, a graphical interface for Ghostscript (https ://www.ghost scrip t.com). As an estimate of gene flow, the number of migrants per population per generation (Nm ≈ (1 − F ST )/4F ST was conducted [34]. The correlation between genetic and geographical distances was assessed by the regression of F ST / (1 − F ST ) on the logarithm (ln) of geographical distance, in GenALEx 6.503 [35,36].

Principal coordinates analysis
For the majority of the time, principal coordinates analysis (PCoA) uses Euclidean distance between two points that are being compared. To visualize the genetic relatedness among individuals, we performed PCoA after all the preliminary analyses using the GenALEx 6.503 [35,36] program to explore the relationships between the An. funestus samples. PCoA was conducted based on mosquito genotypes. The genotypes at each site were transformed with natural log (ln (x + 1)). Then, the sites on the PCoA ordination map were marked as groups from the cluster analysis extracted from the collected sites. PCoA is an indirect gradient analysis method for seeking the strongest linear correlation structure among variables [35,36] and it is a technique widely used for reducing the dimensions of multivariate problems. In the PCoA, eigenvalues, which explain a portion of the original total variance, were calculated. Each axis score using the eigenvector, which contains the coefficients of the linear equation for a given axis, was shown in an ordination [35,36].

Mosquito identification
A total of 323 An. funestus individuals were identified from the 13 localities. Both Clades I and II were found (Table 1), with Clade I found in all sites but Clade II only in Mozambique and Zambia. A repeat of the single Clade II specimen from Kamuli in Uganda was performed and it was still assigned to Clade II. However, many more An. funestus samples need to be examined, given that the TaqMan assay has been shown to have some limitations in its accuracy [37].

Genetic diversity
Levels of microsatellite polymorphism were moderate to high for all the twelve loci. One site from Malawi and two from Uganda had small sample sizes (less than 15 specimens,  Table S1).
These heterozygosity values were not significantly different among populations (P = 0.36). A summary of the mean variation of microsatellite loci from each country is presented in Table 3. Overall allelic richness averaged seven alleles per locus with a range of 2-19 (Table 2). Locus FUNQ had the lowest number of alleles (n = 2) and AFUB11 the greatest number of alleles (n = 19).

Hardy-Weinberg equilibrium
When samples were analyzed as a single population (i.e. pooled together), three loci (AFUB19, FUNQ and FUNF) of the twelve showed significant deviations from the Hardy-Weinberg equilibrium (HWE). This may have been due to significant heterozygote deficiency or due to the sub-structuring within the population. When these were removed no significant deviation was observed.

Genetic differentiation and population structure
Genetic divergence between the sampling groups estimated by F ST ranged from low to moderate. Values between 0.00-0.05 indicate little divergence, 0.05-0.15 moderate divergence, 0.15-0.25 high divergence and over 0.25 a very high degree of divergence [30,38,39]. Genetic differentiation between all pairs of the sites and samples was estimated based on allele frequency differences at microsatellite level between the studied populations. The values of F ST between pairwise population comparisons for all loci varied from 0.006 to 0.396 across populations. These were significant within countries (P < 0.05) for Mozambique (Maciana and Matola), Uganda (between Kamuli and the other three localities) and Zambia (Nchelenge and Namwala) ( Table 4). The    STRU CTU RE provided consistent results over 5 replicated runs tested for each K. The probability of data Ln Pr (X|K) increased from K = 1 to K = 2 until it reached a maximum value at K = 2, after which values decreased gradually for all countries (Fig. 2). Thus, in agreement with the F ST results, the most likely number of genetic clusters in the dataset was two. However, when STRU CTU RE was re-run with only 8 loci, three clusters were obtained for Malawi and Mozambique, and the rest of the countries retained the original two clusters (Additional file 2: Figure S1).
Additional insight into how variation was partitioned across the countries was shown by an analysis of molecular variance (AMOVA). The AMOVA of all twelve microsatellites confirmed the differentiation and structure analysis with the variation between individuals, within populations, and among populations, being 38%, 50.24% and 0.21% (Table 5), respectively. Variation among individuals within populations and among individuals explained 50% and 38% of the total variation, respectively. Almost all genetic diversity (88%) was partitioned within populations, indicating small differences. Global F ST estimates revealed a slight but significant overall degree of genetic divergence (P < 0.05 on 30,000 Markov chains) ( Table 6). Only for Uganda were conditions suitable for AMOVA and much of the variation was recorded as among individuals within populations. This is in accordance with other studies that have found that most of the variation was with individuals among populations [12,[40][41][42].

Isolation by distance and gene flow
A Mantel test for Matrix correspondence called the isolation by distance (IBD) hypothesis was performed separately for each country using GenAlEx 6.501 to test for the occurrence of a positive correlation (Rxy > 0) between the genetic matrix and the geographical distances.  Table 7). The highest N em country average was in the Zimbabwe populations.

Principal coordinates analysis
The PCoA analysis found that the two most important principal coordinate axes accounted for less than 80% of the total variance in relatedness between genetic and geographical distance for An. funestus microsatellite loci, which would be the threshold required for statistical significance in PCoA (Fig. 3). The top two PCoA  Table 6 Global test of differentiation *P < 0.05 components explained 16.1% and 12.5% of the total variance and grouped the individuals into two main clusters (Fig. 3). Groupings were between Malawi and Mozambique, and Uganda and Zambia. Zimbabwe samples uniquely mixed with all the other populations.

Discussion
Genetic methods of assessing An. funestus population structure and differentiation were employed to provide information on within and between population differentiation, diversity and gene flow. Understanding these characteristics allows for better tailoring of vector control methods. Molecular data revealed overall high genetic variability among An. funestus populations from eastern and southern Africa, highlighting an important level of allelic richness and gene diversity. Low to moderate differentiation of the 13 An. funestus populations studied was shown at the nuclear DNA level with the 12 microsatellites genotyping as polymorphic across sites. This is consistent with low to moderate levels of genetic differentiation found in individuals among An. funestus populations in east, west and southern Africa [12,18,19,[39][40][41][42] and in other Anopheles species [19,[43][44][45]. The low to moderate levels of genetic differentiation may also be the result of barriers to gene flow between the populations. In our case, genetic differentiation among the sampled An. funestus populations had a weak yet significant correlation with geographical distance and therefore might be affected by other factors that need investigation. Our findings confirm those of Ayala et al. [40], Michel et al. [12] and Barnes et al. [15], who found An. funestus to be structured following a similar pattern of isolation by distance. The differentiation deviations could be a result of inbreeding as observed in the high F IS values for some populations. In addition, these values could also be attributed to the presence of the Wahlund effect (reduction of heterozygosity caused by subpopulation structure) or the spatial pooling of individuals originating from different houses or focal points [46]. Changes have taken place over the studied populations and collections were done at different times and over different years and this could be influencing the HWE of An. funestus populations [47,48]. HWE states that the genes and genotypes will remain constant in the absence of migration, genetic drift and inbreeding from generation to generation [30,48]. The changes in this instance could also have been caused by epistatic natural selection and random genetic drift. These features can affect the flow of genes conditioning vector competence and insecticide resistance. However, other factors such as microclimate, increasing urbanization and global warming may play a part that needs to be investigated. The rate of gene flow over time is considered suitable for estimating exchange of genes between populations [38]. N em values of gene flow for An. funestus populations have been reported to vary from 17 to 483 [12,18,39]. The recorded values of N em for An. funestus in this study were much lower , probably as a result of the geographical distances between the populations which ranged from 140 km between Maciana and Matola in Mozambique, to 3000 km between the most southern site of Matola and the most northern site of Lira in Uganda. Furthermore, samples were collected at different time points spanning 10 years and in different seasons. Thus geographical distances that separate the studied populations may not be the only barriers to limit gene flow among them, but other environmental factors may also play a role. Analysis of molecular variance indicated that most of the genetic variation (88%) was maintained within the populations rather than between them. These results confirm findings of Barnes et al. [15], Michel et al. [12] and Ogola et al. [18] from countries in east, west and southern Africa.
A study on upregulation of insecticide resistance metabolic enzyme genes provided evidence of population structuring between northern and southern sites in Malawi [16]. The Mantel test used here for correlation of genetic variation with geographical distance (isolation by distance hypothesis) showed positive correlation between sites in Malawi (R xy = 0.083, P = 0.007), not indicating environmental or other barriers to gene flow. Only the two sites in Mozambique (Matola and Maciana, 140 km apart) gave negative results for isolation by distance. Genetic/geographical distance correlations are variable with previous studies showing these to be weak but significant [12,22], weak and non-significant [18], or nonexistent [19].
Using genetic structure analysis, the populations studied here were clustered into two groups, Malawi/Zimbabwe versus Uganda/Zambia/Mozambique. These groups do not coincide with the mitochondrial Clades I and II of Choi et al. [21], both of which were shown to occur in Mozambique and Zambia in the present study. A recent study by Jones et al. [37] based on the full mitogenomes of An. funestus from Zambia, the Democratic Republic of Congo and Tanzania, showed that there were two strong maternal lineages in Zambia and Tanzania. Whether the PCoA results illustrated in Fig. 3 here, showing two distinct clusters in the Zambian sample reflect the mitogenome clusters, remains to be confirmed. The TaqMan assay [21] was not always accurate in identifying the clades [37] and this might also impact on the results shown in our study.

Conclusions
Two genetically distinct clusters were revealed in An. funestus with populations from Mozambique, Uganda and Zambia forming one group and Malawi and Zimbabwe the second group. However, almost all genetic diversity (88%) was partitioned within populations, indicating small differences, except for the PCoA of the Zambian samples which were clearly divergent. The variable effective population sizes, climate and different collection time points, may be some of the factors affecting the differentiation. Future research should investigate changes in mosquito populations over time, especially as insecticide use and their coverage evolve, new interventions are rolled out, and climate/land use changes.