Genetic diversity and population structure of Glossina pallidipes in Uganda and western Kenya

Background Glossina pallidipes has been implicated in the spread of sleeping sickness from southeastern Uganda into Kenya. Recent studies indicated resurgence of G. pallidipes in Lambwe Valley and southeastern Uganda after what were deemed to be effective control efforts. It is unknown whether the G. pallidipes belt in southeastern Uganda extends into western Kenya. We investigated the genetic diversity and population structure of G. pallidipes in Uganda and western Kenya. Results AMOVA indicated that differences among sampling sites explained a significant proportion of the genetic variation. Principal component analysis and Bayesian assignment of microsatellite genotypes identified three distinct clusters: western Uganda, southeastern Uganda/Lambwe Valley, and Nguruman in central-southern Kenya. Analyses of mtDNA confirmed the results of microsatellite analysis, except in western Uganda, where Kabunkanga and Murchison Falls populations exhibited haplotypes that differed despite homogeneous microsatellite signatures. To better understand possible causes of the contrast between mitochondrial and nuclear markers we tested for sex-biased dispersal. Mean pairwise relatedness was significantly higher in females than in males within populations, while mean genetic distance was lower and relatedness higher in males than females in between-population comparisons. Two populations sampled on the Kenya/Uganda border, exhibited the lowest levels of genetic diversity. Microsatellite alleles and mtDNA haplotypes in these two populations were a subset of those found in neighboring Lambwe Valley, suggesting that Lambwe was the source population for flies in southeastern Uganda. The relatively high genetic diversity of G. pallidipes in Lambwe Valley suggest large relict populations remained even after repeated control efforts. Conclusion Our research demonstrated that G. pallidipes populations in Kenya and Uganda do not form a contiguous tsetse belt. While Lambwe Valley appears to be a source population for flies colonizing southeastern Uganda, this dispersal does not extend to western Uganda. The complicated phylogeography of G. pallidipes warrants further efforts to distinguish the role of historical and modern gene flow and possible sex-biased dispersal in structuring populations.


Background
Glossina pallidipes is a major vector of animal African trypanosomiasis. The vector has also been implicated in the transmission of Human African Trypanosomiasis (HAT). For example, the expansion of T. b. rhodesiense (Tbr) sleeping sickness beyond its traditional focus in southeastern Uganda to western Kenya in the 1950s was attributed to G. pallidipes [1]. The first confirmed case of Tbr in Kenya was reported in 1942, having spread from southeastern Uganda along the Sio River. The spread was attributed to Glossina pallidipes infestation in the Busia district on the Kenya-Uganda border [2]. Further evidence for involvement of G. pallidipes in transmission of HAT was obtained from the isolation of the T. b. rhodesiense parasite from G. pallidipes [3] Despite its role as a vector of trypanosomiasis, the dynamics of G. pallidipes populations in Uganda and the extent to which these populations are linked by dispersal to western Kenya populations were hitherto unknown. Department of Lands and Survey maps produced in the late 1960s [4] indicate that G. pallidipes is contiguously distributed across west-central Uganda occurring in one main belt. However, GIS prediction maps show the existence of two belts in the western and southeastern part of the country [5]. The southeastern belt extends into western Kenya and is believed to have been responsible for extending the focus of T. rhodesiense HAT into western Kenya [2]. The western belt occupies areas around Murchison Falls in Uganda, and regions south of Lake Albert. In the first half of the 20 th century, G. pallidipes was the most abundant species in southeastern Uganda followed by G. brevipalpis and G. f. fuscipes respectively [6]. Although all three species were thought to have dispersed into the area from other places [7], no records exist on the source populations. Fluctuation of G. pallidipes trap densities in Uganda in the 1970s and early 1980s [8], [9], possibly due to competition between G. pallidipes and G. fuscipes, led some authors to conclude that G. pallidipes could have disappeared from southeastern Uganda [10]. Recently it was claimed that G. pallidipes had re-invaded southeastern Uganda leading to significant increases in prevalence of trypanosome infections in cattle [11]. The re-invasion hypothesis demands a deeper understanding of the dynamics of G. pallidipes populations in southeastern Uganda and the adjoining western Kenya fly belt.
Whereas the population structure of G. pallidipes in Kenya and elsewhere has been extensively studied at micro-and macrogeographic scales [12][13][14][15][16], no such studies have been carried out in Uganda. Furthermore, it is unclear whether the G. pallidipes belt in southeastern Uganda is contiguous with the western Kenya fly belt, encompassing the traditional HAT foci of Busia, Teso, and Lambwe Valley ( Figure 1) as suggested earlier [17]. Here we report on the population structure of G. pallidipes in Uganda. In order to identify the source of tsetse in southeastern Uganda and to evaluate the extent which proposed fly belts form discrete units, we evaluated the connectivity between Ugandan populations and populations sampled at Kapesur and Lambwe Valley in western Kenya, and from Nguruman in southwestern Kenya. Sites in Lambwe and Nguruman areas were previously described [14]. Collection dates and sampling coordinates are reported in Table 1. Tsetse flies were trapped using biconical traps baited with cow urine and acetone [18]. Samples were morphologically identified as G. pallidipes, their sex was determined, and individual flies were stored in 85% ethanol, transferred to the laboratory and stored at -20°C until DNA extraction.

DNA extraction, amplification, sequencing and genotyping
Genomic DNA was extracted using the DNeasy blood and tissue kit (Qiagen) as per the manufacturer's instructions. For each population, 16 to 21 of the microsatellitegenotyped individuals (see below) were randomly selected for mtDNA typing. In the case of samples from Kabunkanga (KB), we typed all of the individuals in the sample. We amplified a 473 bp fragment of cytochrome c oxidase subunit I (COI) using the primers GpCOI_F1 (5'-GAGCCTTAATTGGAGATGATC-3') and GpCOI_R1 (5'-GATGTGCTCATACAATAAATCC-3'). Fragments were amplified in a 30 μl reaction employing 1X buffer (Applied Biosystems), 0.2 mM each dNTP (New England Biolabs), 0.6 μM primers, 2 mM MgCl 2 , 0.4 mg/mL BSA (New England Biolabs) and 0.6 units AmpliTaq Gold DNA Polymerase (Applied Biosystems) using 50 cycles and an annealing temperature of 50°C. Sequencing was performed on a 3730 × l DNA Analyzer (Applied Biosystems). Sequences were deposited in GenBank (Additional file 1, Table S1).

Descriptive statistics and marker validation
For mtDNA, we calculated haplotype diversity (H d ) and nucleotide diversity (π) using the program DnaSP v5 [24]. For microsatellites, we calculated allelic richness, as well as observed (H o ) and expected heterozygosities (H e ), using the program GenAlex v. 6.41 [25]. Loci were tested for deviations from Hardy Weinberg equilibrium (HWE) and for linkage disequilibrium (LD) using the program Genepop v4.0 [26]. Markov chain parameters were set to 10,000 dememorizations, 1000 batches, and 10,000 iterations per batch for both tests.

Population differentiation and structure
For both mtDNA and microsatellite data, we calculated estimates of pairwise differentiation between populations using the program Arlequin v3.1 [27] and tested for significant differentiation using 1000 permutations. We employed measures of differentiation based on haplotype or allelic frequencies (F ST ) and measures that accounted for the evolutionary distance between haplotypes or alleles (Φ ST , R ST ). Unlike F ST , Φ ST and R ST take into account the evolutionary distance among alleles  rather than only their frequencies. We also performed an analysis of molecular variance (AMOVA) on both mtDNA and microsatellite data to evaluate the extent to which genetic variation was explained by differences among and within populations. We evaluated evolutionary relationships among maternal (mtDNA) tsetse lineages using a parsimony network generated by the program TCS v1.21 [28]. Finer scale structuring at microsatellite loci was assessed using the Bayesian model-based clustering algorithm implemented in STRUCTURE 2.2 [29]. STRUCTURE assigns individuals to K populations based on their multilocus genotypes. We conducted five independent Markov chain Monte Carlo (MCMC) assignment runs for each K from K = 1 to K = 6 assuming an admixture model with correlated allele frequencies. We conducted the MCMC runs using 250,000 steps after throwing away the first 50,000. Evanno's criterion [30] and the method of Pritchard et al. [29] were used to identify the likeliest number of clusters. For final assignment of individuals to clusters, we used 500,000 MCMC steps. As an alternative approach for summarizing microsatellite variation across populations, we performed principle components analysis (PCA) using the "adegenet" package in R [31]. In contrast to the Bayesian assignment algorithm above, the PCA approach does not make any assumptions about HWE or LD and allows for a visual assessment of the degree to which populations differ from each other.

Sex-specific dispersal
Microsatellite data were used to obtain pairwise genetic distance and relatedness values between individuals for all flies collected in 2008 in Uganda (KB: 4 males, 12 females; MF: 9 males, 39 females; and OK: 4 males, 26 females). Genetic distances were calculated based on [32] by using the program Alleles in Space 1.0 [33], both within and between populations, while pairwise relatedness was computed in Kingroup v2 [34] via maximum-likelihood estimation [35]. Means, standard errors, and 95% confidence intervals were determined for males and females separately. The significance of within-and between-population differences between the two sexes was tested using one-sided t-tests. Only the 2008 samples were considered in order to avoid temporal fluctuations and afford a snapshot of sex-specific dispersal patterns.

Microsatellite validation
Following sequential Bonferroni correction, we detected no significant linkage between any pair of loci in any of the six populations. We detected a significant departure from HWE in locus GpB20b in three populations (Additional file 2, Table S2). Only Kapesur exhibited a significant departure after Bonferroni correction and the absolute value of F IS in this population was close to zero. Therefore, all further analyses included GpB20b.

Genetic diversities
Among 113 Glossina pallidipes, twenty-two mitochondrial haplotypes were detected of which only two (# 7, 17) were shared among sampling sites, Okame, Kapesur, and Lambwe (Table 1 and Additional file 1, Table S1). Twenty mitochondrial haplotypes were 'private' (i.e., confined to a single sampling site). Nine haplotypes were singletons. Mitochondrial diversity, the probability that two randomly chosen haplotypes differ, varied from 0.43 in Okame to 0.86 in Lambwe. The overall mean was 0.65 ± 0.15. Nucleotide diversities π (the average number of nucleotide differences per site) varied nearly ten-fold, from only c. 0.0015 in Nguruman to 0.014 at Kapesur and an overall mean of 0.008 ± 0.005.
Among 237 flies (474 genomes), seven microsatellite loci afforded 51 alleles ( Table 1). The number of alleles per locus ranged from 2 at GmC17 to 18 at GpB20b. Average allelic richness (A R , allelic diversity corrected for variations in sample size) ranged from 3.2 to 5.6, and was least in Okame and Kapesur; these locations also exhibited the least heterozygosities although He values did not differ significantly from the four other estimates. Microsatellite diversity measures for Lambwe were not significantly greater than in Okame and Kapesur (H (df = 2) = 0.27, P = 0.87 ). The foregoing locations share the same fly belt but Lambwe G. pallidipes has been subjected to repeated, and unsuccessful eradication attempts in the past 30 years [36].
Estimates of the expected (He) and observed (Ho) microsatellite diversities were closely similar, thereby roughly indicating random matings within populations. Formal tests of hypothesis are provided by F IS = 0 and indicated a significant difference at only one locus in only one sample (GpB20b at KP, Additional file 2, Table  S2). Population F IS , averaged over loci, indicated no departures from random mating within populations (Table 3).

Genetic differentiation and population structure
AMOVA results (Table 2) confirmed that differences among populations contributed a significant proportion of the variance observed in the distribution of mtDNA haplotypes. The proportion of the variance explained by population differences was greater when accounting for the evolutionary relationships among haplotypes (Φ ST ) than when just considering haplotype frequencies alone As evident in a haplotype network (Figure 1 and additional file 1, Table S1), only two of the 23 haplotypes were shared between any populations. These two haplotypes were the only sequences recovered from flies in Okame (OK) and Kapesur (KP) and represented a subset of the haplotypes found in flies from Lambwe Valley (LV). One of these haplotypes is found within a clade of widespread maternal lineages (Clade C), while the other is the only representative of Clade B. Clades A and B are quite distant from each other and from Clade A (2.4% and 3.3% of constituent nucleotides, respectively), although both clades include haplotypes from sampling sites that are geographically close to sites where only clade C haplotypes are recovered. Clades A and B topologies are also very different from the one for clade C. Clades A and B include either only one (Clade B) or a few (Clade A) recently diverged haplotypes, as suggested by the small number of mutational steps that separate them. Clade C not only comprises more than twice as many haplotypes than clade A, it includes haplotypes found in the same population which are separated by more mutational steps from haplotypes found in the same populations than haplotypes only found in other populations.
Microsatellite data revealed three clusters of genetically distinct populations which were consistent with the patterns of genetic differentiation indicated by mitochondrial DNA with the notable exception of flies from Murchison Falls. In contrast to the large pairwise distance observed between Murchison Falls (MF) and Kabunkanga (KB) in mtDNA (F ST = 0.255, Φ ST = 0.867; Table 3

Sex-specific dispersal
We used microsatellite data collected from three localities (KB, MF and OK), where flies were sampled in the same season (March-April 2008), to determine whether mobility differs significantly between sexes. Table 5 and   Figure 4 show the results. Mean pairwise relatedness was significantly higher in females than in males within populations. Between-population comparisons in the KB-MF-OK triangle (approximately 35,000 km 2 ) showed greater genetic similarity between males than females. The lack of significance in one comparison, between KB and OK (341 km apart), is likely due to small male sample sizes at both sites.

Discussion
All studies to date of breeding structure in the Morsitans group tsetse flies have indicated highly structured populations among which there has been little detectable gene flow [16]. Our results in G. pallidipes are in agreement with the earlier findings. Populations in western Uganda were significantly differentiated from flies in the northeastern corner of Lake Victoria, and these populations were further differentiated from the population in Nguruman in south-central Kenya. Furthermore, tsetse populations were not homogeneous within the three regions. Indices of differentiation inferred from mtDNA and microsatellites indicated that populations at Okame, Kapesur and Lambwe Valley form a genetically homogeneous group relative to the populations lying approximately 400 km to the east or west. Within this group, however, genetic diversity was less in Okame and Kapesur than in Lambwe Valley. In fact, mtDNA haplotypes recovered from Okame and Kapesur formed a subset of those found in the Lambwe Valley. Similarly, with the exception of one allele at one locus, microsatellite alleles in Okame and Kapesur were also a subset of those found in the Lambwe Valley (data not shown). Past control operations under the Farming in Tsetse Controlled Areas (FITCA) project http://www.au-ibar.org/index.php/en/ projects/completed-projects/fitca/achievements, are likely to be responsible for the genetic structuring. Historically, the three populations may have been part of a large, panmictic, and genetically diverse population, and control activities may have severely reduced population sizes in Okame and Kapesur leading to the observed reduction in genetic diversity. Once the FITCA project ended in the early 2000s, gene flow from Lambwe Valley could have led to increased genetic diversity and allelic homogenization. Alternatively, the Okame and Kapesur populations are not relicts of a larger population but originated from two recent colonizations from the Lambwe Valley. A priori, both scenarios are equally likely. However, since earlier genetic studies indicated that the Lambwe Valley tsetse population is large and has been in residence for a long time [14], it is most likely that the low genetic diversity observed in Okame and Kapesur flies is due to recent colonization rather than a past bottleneck.
As in the Lake Victoria region, populations in western Uganda differed significantly over the approximately 190 km separating Kabunkanga and Murchison Falls. Populations of G. pallidipes at Kabunkanga and Murchison Falls exhibited similar microsatellite frequencies, but extremely divergent mtDNA haplotypes.
Because of differences in evolutionary rates and inheritance patterns between bi-parentally inherited microsatellite loci and maternally-inherited mtDNA, direct comparisons between the results of these two types of molecular marker might be misleading. To investigate the possibility of sex-biased dispersal, in addition to  comparing microsatellite and mtDNA results, we carried out an individual-based sex-specific analysis of the level of genetic differentiation and relatedness using only microsatellite data. If dispersal is sex-biased, we expect to encounter higher genetic differentiation within populations and more genetically similar individuals across populations in the better-dispersing sex, while the more philopatric sex will exhibit higher relatedness values between individuals within populations and increased genetic dissimilarity and lesser relatedness between populations relative to the more mobile sex [37]. Our data can suggest that males disperse over longer distances than females (Table 5 and Figure 4). Despite the fact that females are believed to be highly mobile [38] due to their relatively larger body size, males are active for longer periods of time [39] and devote bloodmeals exclusively to the production of fat, which is used as an energy reserve for flight [40]. Additionally, the  dF -mean pairwise genetic distance between female flies, dM -mean pairwise genetic distance between male flies, rF -mean pairwise relatedness between females, rM -mean pairwise relatedness between males. One-sided t-tests were used to ascertain the significance of male-biased dispersal, i.e. higher relatedness and lower genetic distance of male versus female individuals between populations. Significant P-values are denoted in bold, italic font.  asymmetry in male versus female dispersal could be attributed to flight constraints imposed on females by carrying a larva, which can double the weight of a female at the peak of pregnancy [41]. The male-biased dispersal recovered from microsatellite data needs further scrutiny as the small male sample sizes in this study did not allow for rigorous testing of hypothesis, as the study was not designed for this purpose. The low level of microsatellite differentiation between tsetse at Kabunkanga and Murchison Falls is also hard to reconcile with the absolute divergence in COI sequences observed between tsetse flies from these two sites. The net average nucleotide divergence was 2.7%, consistent with a divergence time of 1.8 million years, assuming a molecular clock ticking at 1.5% divergence per million years [33]. Therefore, unequal dispersal rates would have to have been maintained for an extremely long period in order to generate the conflicting signals in microsatellites and mtDNA.
A mitochondrial sweep, due perhaps to Wolbachia infection favoring the amplification of a particular mitochondrial lineage in one population, could have shortened the time frame over which this apparent divergence accumulated. Even in this case, though, sufficient time has passed to allow the accumulation of mtDNA diversity in both populations without any concomitant exchange of haplotypes. Owing to the possibility of past bottlenecks and rare long-distance colonizations, as well as sex-biased dispersal, the phylogeographic history of G. pallidipes appears to be complex.
Aside from the seemingly contradictory signals from microsatellites and mtDNA in western Uganda, we also observed neighboring populations in the Lake Victoria region that shared two mtDNA lineages differing by about 2% without observing any of the intervening haplotypes. This would suggest that, G. pallidipes colonized the Lake Victoria region independently at least twice or a very large and diverse population of G. pallidipes underwent a severe bottleneck or series of lesser bottlenecks, leaving only remnants of the past diversity. A deeper understanding of the phylogeography of G. pallidipes will require greater context and range-wide relationships should be explored more thoroughly in the future.
The current study greatly enhances our understanding of G. pallidipes population dynamics especially in Uganda, which has been a missing link in previous samplings. To the best of our knowledge, this is the first report on the population structure of this species in Uganda based on natural samples. In an earlier paper that described the population structure of G. pallidipes at a macrogeographic scale covering almost its entire range, only a single sample from a laboratory colony of G. pallidipes originating from Uganda nearly three decades ago was analyzed [15]. In another study Ouma et al. [14] discussed the relict G. pallidipes populations in Lambwe and Nguruman, and demonstrated temporal and seasonal stability of G. pallidipes populations in these areas. Such temporal stability has also been reported in G. fuscipes fuscipes [42]. These previous studies were reviewed [16] and suggested significant differentiation among natural populations of G. pallidipes in eastern and southern Africa. However, in the absence of samples from Uganda, it was always difficult to put the data into perspective and understand the re-infestation of western Kenya including Lambwe Valley and Busia-Teso regions by G. pallidipes.
The findings of this study have reaffirmed the importance of gathering genetic data prior to implementing area-wide tsetse vector control operations as recommended for creation of G.p. gambiensis free zones in the Niayes region of Senegal [43]. Genetic data should be generated as part of baseline data collection to provide the much needed scientific evidence upon which control measures can be effectively implemented.

Conclusion
This study underscores the importance of tailoring both monitoring and control measures to the population-specific circumstances and history, and the importance of understanding the evolutionary dynamics likely to have shaped the breeding structure of each population. This is exemplified by our findings at different levels: 1-On a broad spatial scale our results point to the presence of at least three genetically discrete fly belts among which there has been little detectable gene flow in the region extending from western Uganda to Nguruman in southwestern Kenya. Such strong geographic structuring of G. pallidipes should limit the geographic scale on which area wide vector control needs to be implemented.
2-On a local scale our data point to specific populations where control and detection methods need improvement. In keeping with earlier studies [14], [16], we have identified the Lambwe Valley as a region where such revision is needed for two reasons. First, despite years of intensive control efforts and very low fly densities detected by current trapping methods, the population is still highly variable genetically and thus probably quite large. Second, this population has served as a source for seeding neighboring regions.
3-Our data suggest the existence of both historical (mtDNA-microsatellite comparion) and current (microsatellite-inferred) male-biased dispersal. This contradicts the general idea that females are better dispersers than males and because of its relevance for control and eventual sterile insect release activities should be further explored to understand its genetic, ecological and physiological underpinnings. Further research is needed to clarify sex-biased dispersal in this species and to demonstrate it in other morsitans group flies. 4-Finally, control efforts on small populations may vary in efficacy and can be optimized if coupled with inferences from genetic data, as exemplified by Okame/ Kapesur. If flies from these sites originated as rare immigrations from neighboring sites, as we suggest, control efforts can be long-lasting, even if control measures and monitoring activities are lessened over time, as the probability of re-infestation is low, as shown by extensive studies on breeding structure in morsitans group flies. On the other hand, if increases in tsetse densities in a given area are due to expansion of relict populations rather than re-infestation, then efforts can be ineffective, if local control is lessened before the population is completely extirpated. This task is rather difficult not only to achieve but also to evaluate by using traditional sampling methods.

Additional material
Additional file 1: Table S1. Cytochrome oxidase I in Glossina pallidipes: frequencies of haplotypes observed across populations and associated GenBank accession numbers.