Population structure and geographical segregation of Cryptosporidium parvum IId subtypes in cattle in China

Background Cryptosporidium parvum is a zoonotic pathogen worldwide. Extensive genetic diversity and complex population structures exist in C. parvum in different geographical regions and hosts. Unlike the IIa subtype family, which is responsible for most zoonotic C. parvum infections in industrialized countries, IId is identified as the dominant subtype family in farm animals, rodents and humans in China. Thus far, the population genetic characteristics of IId subtypes in calves in China are not clear. Methods In the present study, 46 C. parvum isolates from dairy and beef cattle in six provinces and regions in China were characterized using sequence analysis of eight genetic loci, including msc6-7, rpgr, msc6-5, dz-hrgp, chom3t, hsp70, mucin1 and gp60. They belonged to three IId subtypes in the gp60 gene, including IIdA20G1 (n = 17), IIdA19G1 (n = 24) and IIdA15G1 (n = 5). The data generated were analyzed for population genetic structures of C. parvum using DnaSP and LIAN and subpopulation structures using STRUCTURE, RAxML, Arlequin, GENALEX and Network. Results Seventeen multilocus genotypes were identified. The results of linkage disequilibrium analysis indicated the presence of an epidemic genetic structure in the C. parvum IId population. When isolates of various geographical areas were treated as individual subpopulations, maximum likelihood inference of phylogeny, pairwise genetic distance analysis, substructure analysis, principal components analysis and network analysis all provided evidence for geographical segregation of subpopulations in Heilongjiang, Hebei and Xinjiang. In contrast, isolates from Guangdong, Shanghai and Jiangsu were genetically similar to each other. Conclusions Data from the multilocus analysis have revealed a much higher genetic diversity of C. parvum than gp60 sequence analysis. Despite an epidemic population structure, there is an apparent geographical segregation in C. parvum subpopulations within China.


Background
Cryptosporidium spp. are apicomplexan pathogens that can cause debilitating gastrointestinal illness in animals and humans with the main clinical symptom as diarrhea [1]. There is extensive genetic variation within the genus Cryptosporidium. Among the nearly 40 Cryptosporidium species identified, C. parvum is the most important species causing zoonotic cryptosporidiosis [2]. It has a wide host range, with over 20 subtype families based on sequence analysis of the 60 kDa glycoprotein (gp60) locus [3]. Among the most common subtype families, IIa and IId are zoonotic, while IIc and IIe are anthroponotic [2,4].
Cattle are among the most common hosts of C. parvum, with pre-weaned calves being considered the most important reservoir for zoonotic C. parvum infection [5]. Differences in virulence and transmission dynamics of C. parvum have been observed among geographical regions [6]. Subtyping of C. parvum in bovine studies identified an exclusive occurrence of IId subtypes in calves in China, mostly IIdA15G1 and IIdA19G1 [7]. Moreover, these IId subtypes have caused outbreaks of cryptosporidiosis in calves in several areas in China, leading to the occurrence of significant mortality [8,9]. In contrast, pre-weaned calves in industrialized countries are mostly infected with C. parvum IIa subtypes, especially IIaA15G2R1 [6,7].
Population genetic studies based on highly polymorphic loci can shed light on the true genetic diversity of C. parvum in disease endemic areas and compensate for the relatively low resolution of the single gp60 locus because of the likely occurrence of genetic recombination among loci and the existence of genetic determinants of other phenotypic traits [3,10]. Multilocus typing tools based on genetic loci with simple tandem repeats have been used in studies of the population genetic characteristics of C. parvum, leading to the discovery of high genetic diversity, significant geographical segregation and complex population structure [11,12]. Thus far, a range of genetic structures of C. parvum have been identified, including panmictic (unrestricted gene flow and linkage equilibrium among loci), clonal (largely restricted gene flow and linkage disequilibrium among loci), and epidemic (underlying panmictic structure masked by an abundance of genetically identical clones) [2].
Most previous studies of the population genetics of C. parvum had focused on the IIa subtype family. A mostly panmictic population structure for C. parvum IIa subtype family has been found in humans and calves in many industrialized nations [12][13][14][15][16][17][18][19][20][21]. This could be related to the transmission intensity and reproductive characteristics of the IIa subtype family. Indeed, IIa subtypes, especially the hyper-transmissible IIaA15G2R1, are the dominant ones in cattle and humans in these countries [2]. In addition, one study of IIaA15G2R1 has also shown an epidemic population structure and common occurrence of genetic recombination within the subtype [16]. Several analyses of the IId subtype family have demonstrated potential differences in population structure between IId and IIa subtype families. For example, IIa subtypes in cattle in Spain has a panmictic structure while IId subtypes in sheep has a clonal structure [20,22]. This was supported by a population genetic study of the C. parvum IId subtype family in China, Egypt and Sweden, which mostly has a clonal population structure.
The aim of this study was to explore the population genetic characteristics of IId subtypes of C. parvum in cattle in China using multilocus sequence typing (MLST) of isolates.

Sample sources
Forty-six isolates of C. parvum IId subtypes including IIdA20G1 (n = 17), IIdA19G1 (n = 24), IIdA15G1 (n = 5) from beef and dairy cattle in Xinjiang, Heilongjiang, Hebei, Shanghai, Jiangsu and Guangzhou, China, were selected for the population genetics analysis. They were from previous and ongoing studies of molecular epidemiology of cryptosporidiosis in cattle [8,23,24]. The geographical distribution of isolates and their gp60 subtype designations are shown in Table 1 and Fig. 1. The six provinces and autonomous regions are representative ones in China, including the south (Guangdong), east (Shanghai and Jiangsu), center (Hebei), northeast (Heilongjiang) and northwest (Xinjiang). These areas have some of the largest dairy farms in China. The three C. parvum subtypes examined in the study are the most common subtypes in China, responsible for over 90% C. parvum infections in cattle. They were diagnosed by DNA sequence analysis of the gp60 gene [23].

PCR and sequence analyses
Eight polymorphic loci including gp60 with simple tandem repeats were used in the characterization of C. parvum isolates in the present study. In addition to gp60, they included msc6-7 (serine repeat antigen), rpgr (retinitis pigmentosa GTPase regulator), msc6-5 (hypothetical trans-membrane protein), dz-hrgp (hydroxyproline-rich glycoprotein), chom3t (T-rich gene fragment), hsp70 (70 kDa heat shock protein), mucin1 (mucin-like protein). Nested PCR was used in the analysis of these genetic loci as previously described [25]. Each isolate was analyzed twice by PCR at each genetic locus. Reagent-grade water was used as a negative control, whereas DNA of C. parvum IOWA isolate (IIaA15G2R1 subtype) was used as a positive control. Positive PCR products were sequenced on an ABI 3730 Genetic Analyzer (Applied Biosystems, CA, USA). The sequences generated were assembled using ChromasPro v.2.1.8 (http://techn elysi um.com.au/ Chrom asPro .html) and aligned with reference sequences from each locus using the program Clustal X v.2.1 (http:// www.clust al.org/).

Population genetic analyses
The sequences from the eight loci were tandemly concatenated for each isolate. The multilocus genotypes (MLGs) with the same sequences were analyzed for gene diversity (Hd), linkage disequilibrium (LD) and recombination events (Rms) using software DnaSP version 6.12.03 (http://www.ub.edu/dnasp /) with consideration of both sequence length polymorphism and nucleotide substitutions [26]. The genetic structure of C. parvum IId subtypes was assessed by measuring the association of standard correlation index ( I S A ) and the relationship between V D and L using the online software LInkage ANalysis, v.3.7 (http://guani ne.evolb io.mpg.de/cgi-bin/ lian/lian.cgi.pl/query ) [27].

Substructure analyses
Maximum likelihood analysis implemented in the software RAxML v.8.0.0 (http://epa.h-its.org/raxml /submi t_ singl e_gene) was used in clustering nucleotide sequences of all isolates using the General Time Reversible (GTR) model [28]. Subpopulations within the 46 isolates of the  [29]. Several analyses of allelic data were performed by using K (likely populations) ranging from 2 to 10 and 50,000 iterations after a ‛burnin' of 50,000 iterations. Output at K = 3-5 provided the best fit to MLST data and was used in further analyses. Pairwise genetic distance (F st ) was calculated using Arlequin v.3.5 (http://cmpg.unibe .ch/softw are/arleq uin3/) in the evaluation of the genetic differentiation between MLGs of C. parvum. Principal coordinates analysis (PCoA) via covariance matrix with data standardization was performed on the generated matrices with the software GENALEX v.6.501 (http://biolo gy-asset s.anu.edu. au/GenAl Ex) [30]. A median-joining phylogeny was generated using Network software v.5.0 (www.fluxu s-engin eerin g.com/share net.htm) to estimate the genetic segregation and evolutionary trend of C. parvum [31].

MLST subtypes and sequence polymorphism
Forty-one of the 46 isolates were successfully amplified at all eight loci. Among them, dz-hrgp, rpgr and mucin1 had relatively higher sequence polymorphism, with 5, 4 and 4 subtypes being identified, respectively. In contrast, the 44 isolates generated the same sequence at the normally polymorphic chom3t locus (Additional file 1: Table S1). Altogether, 17 MLGs were obtained from these isolates of C. parvum. Among them, the IIdA19G1 isolates from Guangdong, Jiangsu and Shanghai consisted of 12 MLGs. In addition, the IIdA20G1 isolates from Hebei and Heilongjiang had two geographically segregated MLGs. The IIdA15G1 isolates from Xinjiang had 3 different MLGs (Additional file 1: Table S1). Sequence data of all eight loci were concatenated to make a multilocus contig of 4740 bp in length. There was a high genetic diversity (Hd = 0.89) within C. parvum IId population in China (Table 2). Among the IIdA19G1 isolates, the genetic diversity of isolates from Shanghai (Hd = 0.94) was greater than isolates from Guangdong (Hd = 0.78) or Jiangsu (Hd = 0.67) ( Table 2). This could be attributed to the difference in the number of farms examined in different regions. In contrast, IIdA20G1 isolates had relatively low genetic diversity (Hd = 0.48). Among them, isolates from Hebei and Heilongjiang showed high genetic homogeneity (Hd = 0.00) within each population. In contrast, IIdA15G1 isolates from Xinjiang were highly heterogeneous (Hd = 1.00) ( Table 2).

Population structure of IId subtypes of C. parvum
In the analysis of the genetic structure of IId subtypes with V D and L measurements, an epidemic genetic structure was obtained in the overall population (I S A = −0.0421, P MC = 0.889 and V D : 1.1307 < L: 2.3307) ( Table 3). In further analyses, most of the subpopulations by region or gp60 subtype also had the epidemic genetic structure, except for the subpopulations of Heilongjiang, Hebei and Xinjiang which could not be determined due to the small sample size ( Table 3).

Subpopulations of IId subtypes of C. parvum
Maximum likelihood analysis of the sequences grouped the 41 isolates into several evolutionary clusters (Fig. 2). Among them, IIdA20G1 isolates from Heilongjiang formed one cluster separated from other isolates including IIdA20G1 isolates from Hebei. Another cluster was formed by IIdA15G1 isolates from Xinjiang. In contrast, there was no significant geographical clustering among IIdA19G1 isolates from Jiangsu, Shanghai and Guangdong (Fig. 2).
A similar result was obtained in STRU CTU RE analysis of allelic data. At all K-values used in the analysis, the IIdA20G1 isolates from Heilongjiang were clearly separated from isolates of other regions, including those from Hebei that had the same gp60 subtype. The best separation of subpopulations by gp60 subtype was seen at a K-value of 3; all three C. parvum subtypes formed their own clusters (Fig. 3). In addition, regardless the K-values (3)(4)(5) used in the analyses, IIdA19G1 isolates from Guangdong, Shanghai and Jiangsu clustered together (Fig. 3). This was supported by the results of PCoA and median-joining network analyses, in which isolates from Heilongjiang, Hebei and Xinjiang formed their own The results of F st analysis supported the occurrence of geographically associated subpopulations of C. parvum IId subtypes. By gp60 subtype, isolates of IIdA15G1, IIdA19G1 and IIdA20G1 were genetically segregated from each other with high statistical significance (Table 4). Within the IIdA20G1 subtype, there was a significant differentiation between isolates from Hebei and Heilongjiang (χ 2 = 15.0, df =1, P < 0.0001). In contrast, the differentiation among IIdA19G1 isolates from Guangdong, Shanghai and Jiangsu was low. Compared with IIdA20G1 isolates from Heilongjiang, there was also reduced differentiation of between IIdA20G1 isolates from Hebei and IIdA19G1 isolates from Jiangsu and Shanghai (Table 5).

Discussion
The population genetic analysis of eight polymorphic loci has unravelled a high genetic diversity among isolates of C. parvum IId subtypes from different geographical areas in China. Although they were identical at the gp60 locus, the IIdA19G1 isolates differed at most other genetic loci including dz-hrgp, msc6-5, msc6-7, mucin1 and rpgr. Similarly, IIdA20G1 isolates from Hebei and Heilongjiang differed from each other at the hsp70 locus, while IIdA15G1 isolates from Xinjiang differed from each other at the dz-hrgp, hsp70, msc6-5 and msc6-7 loci.
Results of the LD analysis indicate the presence of an epidemic genetic structure of C. parvum IId subtypes in the present study. This could be attributed to the high prevalence of C. parvum in calves as the result of concentrated animal feeding operations and limited number of IId subtypes in China [2]. Indeed, IIdA19G1 and IIdA15G1 are dominant subtypes in cattle in China [7,32]. Previously, isolates of C. parvum IId subtypes from China, Egypt and Sweden were shown to have a clonal population structure with limited genetic recombination [25]. The discrepancy in the inference of population genetic structure between these two studies was largely due to whether the analysis has taken the over-representation of the same MLG in the study population into consideration. If this had taken into consideration, the previously reported clonal population of C. parvum IId subtypes could be in fact an epidemic population.
Significant geographical segregation was observed in the IIdA15G1 isolates from Xinjiang and the IIdA20G1 isolates from Heilongjiang based on phylogenetic, substructure, PCoA and F st analyses. Previous reports indicated that most IIa isolates of C. parvum form Table 3 Results of linkage disequilibrium analysis of allelic profile data from Cryptosporidium parvum at eight genetic loci a Considering isolates with the same MLG as one individual  country-specific populations. For example, an eBURSTbased analysis revealed geographical differences among isolates from Uganda, Israel, Serbia, Turkey and New Zealand [33]. Similarly, a significant geographical segregation was also identified among 692 C. parvum isolates from Italy, Ireland and Scotland [13]. The same situation was also observed in IId isolates of C. parvum between China and Sweden in a previous MLST study [25]. Other studies, however, have failed to identify geographical segregation in C. parvum populations, but they were conducted over smaller geographical areas within a country [17,18]. In the present study, diverse isolates were obtained from the southern, north-eastern and northwestern regions of China, leading to the identification of unique subpopulations of C. parvum in the more geographically isolated Xinjiang and Harbin. In contrast, isolates from Shanghai, Jiangsu and Guangdong had frequent genetic exchanges with no significant geographical barriers among them. These regions were chosen with the consideration of both geographical representation and the intensity of cattle production. The three C. parvum subtypes examined in the study are the most common ones in China, responsible for over 90% C. parvum infections in cattle. Additional population genetic analysis of more C. parvum isolates from other areas and other subtypes is needed to support the observations in this study. The presence of multiple MLGs on almost all farms in Guangdong, Shanghai and Jiangsu suggests the presence of a significant intra-farm genetic diversity in C. parvum. This was not revealed by gp60-based subtyping, as all isolates belonged to IIdA19G1. Nevertheless, this is in agreement with previous population genetic studies of C. parvum in European countries [12,15,19,22,25,34]. This intra-farm genetic diversity of C. parvum in cattle   2D, b 3D). Each solid sphere represents an MLG. The color of the spheres indicates geographical origin of the isolates, while the size of the spheres represents the number of isolates may be attributed to frequent animal trade among farms, which is known to increase the heterogeneity of C. parvum and the complexity of infections [14,19,21].

Conclusions
Despite the presence of only a limited number of gp60 subtypes of C. parvum in cattle in China, a much higher genetic diversity is evident in MLST characterization of isolates at both farm and region levels. Nevertheless, biological selection has led to the dominance of limited numbers of geographically segregated MLGs of C. parvum in calves in China, with an apparent epidemic population structure. Currently, the veterinary and public health significance of this biological selection of C. parvum subpopulations is not entirely clear. Efforts should be made to monitor the genetic evolution of this unique zoonotic pathogen in China.