As with diagnostic test development, validation of any microbial typing system is essential. Guidelines have been set out for bacterial typing systems [24, 25], and many of these guidelines can be applied to parasites, particularly haploid protozoa such as Cryptosporidium. In the current study, performance of MLFT was assessed in bovine-derived C. parvum and found to provide good typeability, specificity, precision and discrimination.
Markers were selected as they had shown promising results in other studies of bovine C. parvum [12, 22, 26], and were also ranked highly in a recent review of published multilocus genotyping methods [15]. An ideal typing scheme would have markers distributed evenly across several chromosomes [27], which is not the case in the current scheme - of the 4 most discriminatory markers, three are on Chromosome 8. MM18 and MM19 are some distance from each other, although TP14 and MM18 are somewhat closer on Chromosome 8. Selecting markers on different chromosomes would remove any confounding effect of physical linkage and provide the added value of enabling data generated to be analysed robustly at the population genetic level. As more genomic data becomes available for C. parvum, it should be possible to select more appropriate markers for MLFT schemes.
Typeability using nested PCR was considered acceptable at 84 %. It was observed that samples which were variably amplified with 18S rRNA primers (for example only amplifying in one or two of three replicates) were also difficult to amplify using MLFT primers (data not shown) and 7 samples failed to amplify with any of the MLFT primers. This suggests that these samples contained low levels of, or poor quality, C. parvum DNA, as the 18S rRNA protocol has been shown to be very sensitive, perhaps due to the multi-copy nature of this gene. DNA was prepared from stool using standard methods, however template quality may have adversely affected typeability. The DNA samples from the Cheshire study were prepared in 2004, 10 years before use in the current study, possibly allowing degradation of DNA. In addition, samples may have contained low numbers of oocysts as calves were not sampled on the basis of clinical signs in either study - many calves were not in the acute stage of infection. Typeability obtained with MLFT has been shown to compare favourably with MLST [13], possibly due to the “stutter effect” where tandem repeat units interfere with sequencing.
Although C. parvum is the most prevalent species in young calves, C. bovis and C. ryanae are also occasionally identified, whereas C. andersoni is usually found in older animals [7, 28]. We wanted to verify that the primers used would not co-amplify any non-C. parvum species in undetected mixed species infections, as this could be misinterpreted as a new (C. parvum) allele. The current study trialled a limited number of non-parvum species prevalent in bovines and found that the primers did not amplify these species. However, where environmental, human or wildlife samples are to be typed a greater range of species may be identified, therefore primer-BLAST was used and it was established that C. hominis could be co-amplified with these primer pairs, as shown in previous studies using these markers [26]. This may be of value to public health laboratories as the same scheme could potentially be used for both of the major causes of cryptosporidiosis in humans. However a comprehensive review of MLFT markers in both species concluded that different sets of markers are probably required for each species [15]. It is always advisable to first assign samples to species level before further typing.
Precision, in terms of the repeatability of sizes obtained within our laboratory, was good, with the possible exception of MM5 allele 2. Between laboratories, fragment sizes did differ to some degree resulting in reduced reproducibility of sizing. The consistent difference of 11 bp in MS9 sizing remains to be explained and unfortunately sequencing data is not available for the historical data. We consider it unlikely that this is a true reproducibility issue, given that it is limited to this marker however MS9 was excluded from further analysis as it was monoallelic in our samples, along with marker MS1. Therefore there are challenges in comparing MLFT results between laboratories unlike, for example, sequence data; however, crucially, our results show that the tool is reproducible with respect to allele assignation. Larger scale inter-laboratory validation is now warranted. In the future it may be beneficial to have marker-specific size standards which include sizes of all known alleles, aiding reproducibility.
The current protocol may not accurately measure size, as demonstrated when sequence and fragment sizes were compared. Again, measuring the size accurately may not be as important as assigning the correct alleles. Sequence and fragment size obtained by CE have been shown to differ in other studies [13]. One aspect that may affect accuracy is the size standard used. ROX400HD has 21 size markers, compared to 16 for ROX500, therefore it is more accurate in sizing fragments up to 400 bp. In addition, Applied Biosystems’ literature states that the marker at 250 bp cannot be used to size samples with ROX500 as it is sensitive to small temperature variations in CE. Fragments in this range may be sized less accurately, which may particularly impact on MM5, with alleles of 235 bp and 262 bp. However, MS9 has fragments >400 bp so ROX400HD could not be used for this marker; for consistency, ROX500 was used throughout.
The level of discrimination required by a particular typing tool depends on the epidemiological question being addressed. The population genetics of the microbe should also be considered. Here we were seeking a tool to answer geographically and temporally local epidemiological questions in a relatively conserved parasite. In Cryptosporidium, the ideal MLFT tool should have the discriminatory ability to differentiate geographically local isolates [27], and for this reason we used isolates from two cross-sectional studies, which sampled farms from relatively small spatio-temporal windows. The results show that the typing scheme was able to fulfil this criterion, as 14/23 MLGs were unique to sampled farm although most of the MLGs detected were part of the same clonal complex (data not shown). In addition, the finding that most calves and farms had single MLGs suggests that the scheme is not overly discriminatory for regional (such as catchment-level) studies. An application of this tool is to study transmission dynamics between and within farms, by investigating whether farms have “unique” or “common” MLGs, single or multiple MLGs and to investigate stability over time (manuscript in preparation).
The MLFT scheme showed good discriminatory power when compared to standard subtyping using gp60 sequencing alone, as demonstrated by SID. This is due to the fact that the majority (85/136) of samples were the common gp60 subtype, IIaA15G2R1. As samples were not independent but were clustered by farm, the values for SID may not be applicable to the general cattle population - in fact, the non-independence may actually reduce the apparent SID. Some markers were more informative than others - MS1 and MS9 were monoallelic in the samples tested. However, some diversity has been reported with these markers in other studies of cattle [11, 12], although the prevalence of alleles other than those in the current study appear to be very rare in Scottish and Irish calves [12, 22]. Widmer and Sullivan [27] recommended that the minimum number of markers be used to give the required resolution; a recent review estimated that in C. parvum there was, on average, 23 % marker redundancy [15]. This is true of the current study, demonstrated by the fact that SID using just the three or four most informative markers was estimated at 85 % (81–90 %) and 89 % (85–92 %) respectively.
Other studies utilising fragment sizing have included a similar region of the GP60 gene to that used in sequence analysis for subtype assignment; when used in this way it is often referred to as GP15. We chose to use sequence analysis for this gene as this method is a good library typing tool, having been adopted almost universally by Cryptosporidium researchers worldwide allowing for easy comparison of subtypes. Being sequence based it offers more discrimination than fragment sizing alone but mixed profiles can be problematic. As shown in the current study, the discriminatory ability of this single locus sequence type is not sufficient for local epidemiological questions, such as outbreak investigations.
Where harmonised schemes are being developed to allow source attribution, it should be considered whether markers are informative in C. parvum derived from both potential sources of oocysts (livestock, wildlife etc.) and humans; previous studies have used our trialled markers to successfully type both human and bovine-derived C. parvum [12].
Three additional markers, MSA, MSD and MSF, were applied to representatives of the 23 MLGs identified in the current study using primers reported in the literature [11]; all 3 of these additional markers were monoallelic in our samples producing fragments of 229 bp, 274 bp and 156 bp respectively (data not shown). These sizes correspond with reported allele sizes for these markers [11]. In the current study only isolates of gp60 subtype IIa were tested; cattle in other countries including Portugal [29, 30] have been shown to infrequently shed gp60 allele IId, although this allele has not been reported to date in UK cattle. Human C. parvum is most commonly gp60 allele IIa in the UK [6] but other alleles are also occasionally identified, notably IId and, rarely, IIc [6]. More work would be required to determine the performance of the markers proposed in the current study in non-IIa gp60 subtypes of C. parvum.
Calves from both Cheshire 2004 and NE Scotland 2011 cross-sectional studies appeared to shed the same predominant alleles of the markers used and allele frequency distributions were very similar, suggesting that these alleles are fairly stable in UK calf populations. It is also clear that the same alleles are mainly present in other Scottish studies of calves [12], as demonstrated in Table 5; a few additional alleles were detected, probably due to differences in study design, but in very low numbers. We also assessed available literature where the same markers and primers were used. As previously noted, this was limited by the lack of a coordinated approach to marker selection. In addition allele sizes are not always reported; where they are given, it is not possible to prove definitively that reported alleles are the same due to the previously stated problems with fragment sizing. Authors rarely state if reported sizes are sequence sizes or binned fragment sizes. A study of C. parvum in Italian livestock used the same marker combination as the current study, but there were some small variations in (second round) primer sequences (TP14 reverse and MM19 forward) [26]. These primer sequences were also applied to C. parvum samples collected from calves in Ireland 2003–2005 [22]. Interestingly in both of these studies, size and frequency of alleles reported for MM5, MM18, TP14 and MS1 were similar to those found in the current study: 94–98 % was 233/260 bp for MM5, 65–95 % was 290/296 bp for MM18, 89–95 % was 300/309 bp for TP14 and 66–99 % was 362 bp for MS1. The amended MM19 forward primer sequence used by Drumo et al. and De Waele et al. aligns with the published reference genome for C. parvum and may be superior to that used in the current study. These differences also account for the small number of base differences detected when we aligned our allele sequences with other microsatellite sequences using BLAST.
As well as the improved typeability of MLFT over MLST as reported by Diaz et al. [13], we found MLFT also compares favourably to sequencing in terms of time and cost. Although the use of fluorescently-labelled primers adds to the cost of standard PCR, fragment analysis was economical compared to sequencing at approximately £0.80 (1 EURO, $1.28) per read; this could have been reduced by multiplexing PCR products into one well for sizing, either using different fluorescent labels or ensuring that expected fragment sizes were sufficiently different to allow differentiation. In addition, 10/118 samples had more than one allele at one, or more, markers, which would not have been detected using direct sequence typing. This is similar to the 11.6 % of infections found to be mixed in an Italian study of humans and livestock [26]. In other studies, criteria for assignment of mixed genotypes are unclear, or more stringent. For example one Scottish study defined a sample as mixed if the height of the secondary peak was >10 % of the main peak, possibly explaining why they detected a relatively high prevalence of mixed infections - up to 37 % in Aberdeenshire [12]. The prevalence of mixed infections may also increase with age of animal sampled, as older animals will have been exposed to more sources of oocysts. However the advantages of MLST are that it provides greater discrimination than MLFT and both accuracy and reproducibility are superior, in single genotype infections.
Our preferred software was STRand, as we found that PeakScanner had problems with bleed-through in the event of the product being too strong. This problem can be easily detected and manually corrected using STRand. When comparing results we found minimal variation when the same sample was sized with the two different softwares. Both of these softwares are free to download.