Multilocus sequence typing of Enterocytozoon bieneusi in crab-eating macaques (Macaca fascicularis) in Hainan, China

Background Enterocytozoon bieneusi is one of common intestinal pathogens in humans and animals including non-human primates (NHPs). Many zoonotic pathogens including E. bieneusi have been found in these animals. However, there are few studies on the population structure of E. bieneusi in NHPs. To infer the gene diversity and population genetics of E. bieneusi, we selected 88 E. bieneusi-positive samples from crab-eating macaques for multilocus characterizations in this study. Methods The E. bieneusi isolates examined belonged to three common genotypes with different host ranges by sequence analysis of the ribosomal internal transcribed spacer (ITS): Type IV (n = 44), Macaque3 (n = 24) and Peru8 (n = 20). They were further characterized by sequence analysis at four microsatellite and minisatellite loci (MS1, MS3, MS4 and MS7). DnaSP, Arlequin and LIAN were used to analyze the sequence data together with those from the ITS locus to infer the population genetics. Subpopulation structure was inferred using phylogenetic and STRUCTURE analyses. Results Seventy-two (81.8%), 71 (80.7%), 76 (86.4%) and 79 (89.8%) samples were amplified and sequenced successfully at the MS1, MS3, MS4 and MS7 loci, respectively, with 53 having sequence data at all five MLST loci including ITS. Altogether, 33 multilocus genotypes (MLGs) were produced based on concatenated sequences from the 53 samples. In phylogenetic analyses of sequences and allelic data, four major subpopulations (SPs) were observed with different ITS genotypes in each of them: Type IV and Peru8 in SP1 and SP2; Type IV, Macaque3 and Peru8 in SP3; and Type IV and Macaque3 in SP4. SP3 and SP4 were phylogenetically related and might be NHP-specific based on the fact that Macaque3 is mostly found in NHPs. A strong linkage disequilibrium (LD) was observed among the multilocus sequences and allelic data. Conclusions The significant LD in the multilocus sequence analysis indicated the presence of an overall clonal population structure of E. bieneusi in crab-eating macaques. The inconsistent segregation of MLGs among ITS genotypes suggested some occurrence of genetic recombination. These observations should improve our understanding of the population genetics of E. bieneusi in NHPs.

Approximately 500 E. bieneusi genotypes have been identified based on sequence analysis of the internal transcribed spacer (ITS) region of the rRNA gene [15,16]. They belong to 11 phylogenetic groups. Among them, ITS genotypes in Group 1 and Group 2 have been found in a broad range of hosts including humans and are probably responsible for most zoonotic or cross-species E. bieneusi infections whereas host adaptation seems to be more common in genotypes of Groups 3 to 11 according to a recent review of E. bieneusi [6]. To date, more than 100 E. bieneusi genotypes have been detected in NHPs, mostly belonging to Group 1 [6,9,10,13,14,[17][18][19][20][21]. Among them, Type IV and Peru8 are common ITS genotypes in humans and various animals in many countries, including China [22]. Conversely, Macaque3 has been mostly found in monkeys and occasionally in dogs in China [1,21,[23][24][25][26].
Multilocus sequence typing (MLST) based on sequence analysis of four markers (MS1, MS3, MS4 and MS7) with simple tandem repeats and the ITS has been used in investigating the transmission dynamics of E. bieneusi in humans and some animals, including pigs, sheep, rabbits, calves, black bears, giant pandas and horses [27][28][29][30][31][32][33][34][35]. Thus far, only three studies have used the MLST analysis to characterize E. bieneusi isolates in NHPs, and only one study has found the presence of an overall clonal population structure and an epidemic population structure in subpopulations of this pathogen in NHPs [11,36,37].
In this study, 88 E. bieneusi-positive samples from crab-eating macaques (Macaca fascicularis) in Hainan Province, China were used for MLST analysis and population genetic characterization of E. bieneusi. The present study aims at evaluating the genetic diversity and population structure of E. bieneusi in NHPs to better understand the transmission and epidemiology of E. bieneusi in these animals.

Source of E. bieneusi-positive samples
The 88 E. bieneusi-positive samples used in this study were from crab-eating macaques in Hainan, China. They consisted of ITS genotypes Type IV (n = 44), Macaque3 (n = 24) and Peru8 (n = 20), which were identified by sequence analysis of the ITS region of the rRNA gene in a previous study [8]. They were collected from different animals and used for further molecular characterization in this study. Other minor E. bieneusi genotypes from that study were not included in the present study, as the small numbers of samples for each of them made examinations of intra-genotypic variations of them difficult.

MLST analysis of E. bieneusi
Four established MLST loci were targeted in this study, including three microsatellite loci (MS1, MS3 and MS7) and one minisatellite locus (MS4). These loci were amplified by nested PCR using published primers and conditions [32]. A positive control (DNA of genotype BEB4 which usually infects cattle rather than NHPs) and a negative control (reagent water without DNA) were used in all PCR tests. PCR was performed twice on each DNA sample at each genetic locus. The secondary PCR products generated were bi-directionally sequenced on an ABI 3730 Genetic Analyzer (Applied Biosystems, Foster City, CA). The nucleotide sequences obtained were assembled using software ChromasPro 1.32 (http://techn elysi um.com.au/Chrom asPro .html), and aligned with each other and homologous sequences downloaded from GenBank using ClustalX (http://clust al.org). The sequence alignments at the four genetic loci (MS1, MS3, MS4 and MS7) were used in the assessment of sequence polymorphism within the ITS genotypes, determination of MLGs and phylogenetic analyses of the MLST data.

Population genetic analysis of sequence data
The multilocus sequences data (concatenated sequences from all five markers, including the ITS) generated were used in the analysis of the population genetics of suggested some occurrence of genetic recombination. These observations should improve our understanding of the population genetics of E. bieneusi in NHPs.

Population genetic analysis of allelic profile data
Pairwise intergenic LD based on allelic profiles at the five genetic loci was evaluated using the exact test and Markov chain parameters implemented in Arlequin version 3.5.1.2 [39]. The standardized index of association (I S A ) was measured using LIAN 3.7 [40].

Subpopulation structure analysis
The phylogenetic relationship among MLGs including both SNPs and INDELs was assessed using the maximum likelihood (ML) analysis and general time reversible model implemented in MEGA 6 (https ://www. megas oftwa re.net). The reliability of clusters formed was assessed using bootstrap analysis with 1000 replicates. An un-weighted pair group mean average (UPGMA) tree of the allelic data was generated using MEGA 6. Wright's fixation index (F ST ) of E. bieneusi isolates was calculated using software Arlequin version 3.5.2.2. STRU CTU RE version 2.3.1 was used to identify distinct sub-populations, with K-values of 2-4 [41]. F ST was calculated to assess the robustness of the substructuring and to determine genetic distances between subpopulations.

Multilocus genotypes
Genetic diversity at the MLST loci was in the form of both SNPs and indels, with 24, 5, 25, 4 and 3 segregation sites at the MS1, MS3, MS4, MS7 and ITS loci, respectively (Table 2). Altogether, 6, 4, 6, 2 genotypes were detected at the MS1, MS3, MS4 and MS7 loci, respectively. The intragenic LD among segregating sites for each locus was calculated using the linear regression equation. The MS3, MS7 and ITS loci were in complete linkage disequilibrium (LD = 1). In contrast, the remaining two loci (MS1 and MS4) had incomplete LD, indicating the potential presence of genetic recombination. This was confirmed by further assessment of intragenic recombination, which showed the presence of two and five Rms at the MS1 and MS4 loci, respectively.
Of the 88 samples analyzed by MLST, 53 were positive at all four genetic loci. To determine the MLGs, sequences of the five genetic loci (MS1, MS3, MS4, MS7 and ITS) from each of the 53 samples were concatenated. Genetic diversity was assessed based on both total polymorphic sites (finite population variance estimates) and segregating sites only (infinite population variance estimates). Together they formed 33 MLGs with an Hd of 0.98 (Table 3). The mean numbers of pairwise differences in polymorphic and segregating sites were 16 and 14, respectively. The frequency of the MLGs ranged from 9.4% (one MLG with five samples), 7.5% (four MLGs each with three samples), 3.8% (eight MLGs each with two samples), to 1.9% (20 MLGs each with one sample) ( Table 4).

Population structure
Inter-locus LD over all segregating sites within the multilocus contig was assessed for all 53 samples. This produced a Z n s value of 0.185; the probability P-values for expected Z n s ≤ 0.185 was 0.976. The incomplete LD among these 53 samples (|D′| Y = 0.8679 − 0.2781X) indicated a possible occurrence of genetic recombination, which was confirmed by results of the recombination analysis (10 Rms) (Table 3). Thereafter, we used Fu's statistic test to evaluate the genotype frequency in this population. In this test, we obtained Fs values of -5.56 (P (Fs < − 5.56) = 0.158) and − 6.91 (P (Fs < − 6.91) = 0.122) based on polymorphic sites and segregating sites, respectively ( Table 3). The results of Fu's statistic tests indicated the presence of an excessive number of alleles.
We calculated I S A to assess the LD between alleles in pairwise combinations of genetic loci [39]. Results of this analysis of allelic profile data showed a positive I S A value together with V D > L. Furthermore, a significant P MC value (< 0.001) was obtained using the Monte Carlo method based on the null hypothesis of panmixia (Table 5). All these results indicated that the population was in LD. However, values of V D < L and P MC > 0.001 were obtained for subpopulations identified in this study or ITS genotype. Thus, these subpopulations and ITS genotypes were in linkage equilibrium (LE) and had the epidemic population structure.

Subpopulation structure
Maximum likelihood analysis of the concatenated sequences of the 33 MLGs showed four main subpopulations (SPs). SP1 contained samples of three MLGs of ITS genotype Type IV and two MLGs of ITS genotype Peru8, SP2 contained six MLGs of ITS genotype Type IV and two MLGs of ITS genotype Peru8, SP3 contained seven MLGs of ITS genotype Type IV, four MLGs of ITS genotype Peru8, and five MLGs of ITS genotype Macaque3, while SP4 contained two MLGs of ITS genotype Type IV and two MLGs of ITS genotype Macaque3 (Fig. 1). Similarly, four major clades were formed in UPGMA analysis of 33 MLGs in agreement with results of ML analysis of the concatenated sequence data (Fig. 2).
In STRU CTU RE analysis of the MLST data, we used the smallest value of K (K = 2) to measure population differentiation within the E. bieneusi population. Two subpopulations (SPs) with alpha (α) value of 0.104 were produced (1, 2; Fig. 3a). SP1 contained 14 MLGs, while SP2 contained the remaining 19 MLGs (Fig. 3a). When the K value was increased to 3, three clear SPs were formed (α = 0.074; Fig. 3b). SP1 included 8 MLGs belonging to SP1 at K = 2, SP2 included 3 MLGs belonging to SP1 and 7 MLGs belonging to SP2 at K = 2, while SP3 included 3 MLGs belonging to SP1 and 12 MLGs belonging to SP2 at K = 2 (Fig. 3a, b). When using K = 4, four clear SPs formed, with individuals in SP1 being similar to SP1 at K = 2 and 3 (Fig. 3). In addition, the α value of 0.061 was similar to the values generated at K = 2 and 3, suggesting the substructures at K = 4 could be realistic.

Discussion
Currently, multilocus sequence typing has been used to analyze the genetic diversity of E. bieneusi around the world [15,42,43]. In Peru, India and Nigeria, 66 MLGs were identified in HIV-infected individuals [33,37]. In China, 59 and 44 MLGs were identified in NHPs and pigs, respectively, while some genetic diversity based on MLST data was also observed in E. bieneusi isolates from domestic, wild, and zoo animals [29,36,[42][43][44][45][46][47][48]. However, few studies analyzed the MLGs of this pathogen in NHPs in China. In the present study, 33 MLGs were identified from 53 E. bieneusi-positive samples from macaque monkeys, indicating a high genetic diversity within some common ITS genotypes (Type IV, Macaques3 and Peru8), and supporting high-resolution of the multilocus sequencing tool established by Feng et al. [32]. A strong and significant intragenic LD was observed in the analysis of the MLST data, indicating an overall clonal population structure of E. bieneusi in this study. In previous studies, the strong and significant intragenic   [36,37,47,49,50]. Therefore, there was a clonal endemicity of E. bieneusi in the study population. Nevertheless, the MLGs found in crab-eating macaques were not segregated by ITS genotypes, indicating the existence of genetic recombination in the E. bieneusi population studied. In previous investigations, E. bieneusi MLGs were segregated by ITS genotypes for some genotypes with narrower host ranges, such as A, EbpA, SCO2 and Cs-4, but not for zoonotic genotypes such as Type IV, D, WL11 and Peru11 [15,16]. Similarly, in this study, MLGs of E. bieneusi in crab-eating macaques were also not segregated by ITS genotypes, especially for the zoonotic genotypes Type IV and Peru8. Genetic recombination appears to be common in some microsporidia of insects, including Kneallhazia solenopsae in fire ants and Andreanna caspii in mosquitoes. In a previous study in China, the MLGs of some common ITS genotypes (Type IV, Peru8, D, Macaque3, CM2, Peru11, Henan V, PigE-BITS7) in NHPs were not segregated by ITS genotypes in phylogenetic analysis, suggesting a possible occurrence of sexual recombination in E. bieneusi [36]. Therefore, data from both studies have indicated that there could be some sexual recombination among common E. bieneusi genotypes in NHPs.
Four SPs were obtained in this study among MLGs isolates within Group 1, according to results of phylogenetic and subpopulation genetics analyses. The four SPs all appear to have an epidemic population structure. In previous studies, the ITS genotype Macaque3 was mostly found in NHPs, and only occasionally in dogs [8-10, 26, 51]. Therefore, the clustering of the host-adapted ITS genotype Macaque3 in SP3 and SP4 suggests the NHP-specific nature of these two subpopulations. Both potentially zoonotic and host-adapted subpopulations have been identified in Group 1 in previous studies. The former included MLGs of ITS genotypes Type IV, D, Peru8, WL11, and Peru11 in humans, NHPs, pigs and  [1,15,21,36,52,53]. Therefore, SP3 and SP4 including MLGs of Macaque3 were potentially NHPs-adapted subpopulations, while SP1 and SP2 only including MLGs of ITS genotype Type IV and Peru8 were potentially zoonotic subpopulations.

Conclusions
A much higher genetic diversity was revealed in E. bieneusi isolates from crab-eating macaques by multilocus sequence typing (33 MLGs) than by ITS genotyping (three ITS genotypes). This appears to have resulted from genetic recombination among common ITS genotypes despite an overall clonal population structure in E. bieneusi. Nevertheless, the MLGs generated from the analysis have formed four subpopulations with different host ranges and an epidemic population structure. These results provide data on the population and subpopulation structure of E. bieneusi in NHPs, and would improve our perception of E. bieneusi epidemiology in these animals.