Complete mitochondrial genomes of the ‘intermediate form’ of Fasciola and Fasciola gigantica, and their comparison with F. hepatica

Background Fascioliasis is an important and neglected disease of humans and other mammals, caused by trematodes of the genus Fasciola. Fasciola hepatica and F. gigantica are valid species that infect humans and animals, but the specific status of Fasciola sp. (‘intermediate form’) is unclear. Methods Single specimens inferred to represent Fasciola sp. (‘intermediate form’; Heilongjiang) and F. gigantica (Guangxi) from China were genetically identified and characterized using PCR-based sequencing of the first and second internal transcribed spacer regions of nuclear ribosomal DNA. The complete mitochondrial (mt) genomes of these representative specimens were then sequenced. The relationships of these specimens with selected members of the Trematoda were assessed by phylogenetic analysis of concatenated amino acid sequence datasets by Bayesian inference (BI). Results The complete mt genomes of representatives of Fasciola sp. and F. gigantica were 14,453 bp and 14,478 bp in size, respectively. Both mt genomes contain 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes, but lack an atp8 gene. All protein-coding genes are transcribed in the same direction, and the gene order in both mt genomes is the same as that published for F. hepatica. Phylogenetic analysis of the concatenated amino acid sequence data for all 12 protein-coding genes showed that the specimen of Fasciola sp. was more closely related to F. gigantica than to F. hepatica. Conclusions The mt genomes characterized here provide a rich source of markers, which can be used in combination with nuclear markers and imaging techniques, for future comparative studies of the biology of Fasciola sp. from China and other countries.


Background
Food-borne trematodiases are an important group of neglected parasitic diseases. More than 750 million people are at risk of such trematodiases globally [1,2]. Fascioliasis is caused by liver flukes of the genus Fasciola, and has a significant adverse impact on both human and animal health worldwide [3]. Human fascioliasis is caused by the ingestion of freshwater plants or water contaminated with metacercariae of Fasciola [4]. It is estimated that millions of people are infected worldwide, and more than 180 million people are at risk of this disease worldwide [5]. To date, no vaccine is available to prevent fascioliasis. Fortunately, this disease can be treated effectively using triclabendazole [6], but there are indications of resistance developing against this compound [7].
The Fasciolidae is a family of flatworms and includes the genus Fasciola. Both F. hepatica and F. gigantica, which commonly infect livestock animals and humans (as definitive hosts), are recognized as valid species [8]. The accurate identification of species and genetic variants is relevant in relation to studying their biology, epidemiology and ecology, and also has applied implications for the diagnosis of infections. Usually, morphological features, such as body shape and perimeter as well as length/ width ratio, are used to identify adult worms of Fasciola [9]. However, such phenotypic criteria are unreliable for specific identification and differentiation, because of considerable variation and/or overlap in measurements between F. hepatica and F. gigantica [10].
Due to these constraints, various molecular methods have been used for the specific identification of Fasciola species and their differentiation [5]. For instance, PCRbased techniques using genetic markers in nuclear ribosomal (r) and mitochondrial (mt) DNAs have been widely used [11][12][13]. The sequences of the first and second internal transcribed spacers (ITS-1 and ITS-2 = ITS) of nuclear rDNA have been particularly useful for the specific identification of F. hepatica and F. gigantica, based on a consistent level of sequence difference (1.2% in ITS-1 and 1.7% in ITS-2) between them and much less variation within each species [11,14]. Nonetheless, studies in various countries, including China [5], Iran [15], Japan [16], Korea [14], Spain [17] and Tunisia [18], have shown that some adult specimens of Fasciola sp., which are morphologically similar to F. gigantica [10], are characterized by multiple sequence types (or "alleles") of ITS-1 and/or ITS-2, reflected in a mix between those of F. hepatica and F. gigantica [11,12]. Some authors [19][20][21] have suggested that such specimens (sometimes called 'intermediate forms') represent hybrids of F. hepatica and F. gigantica.
In the present study, we undertook an independent, genetic comparison of Fasciola sp. (i.e. 'intermediate form') and F. gigantica with F. hepatica. To do this, we characterized the mt genomes of individual specimens of Fasciola sp. and F. gigantica whose identity was defined based on their ITS-1 and/or ITS-2 sequences, and assessed their relationships by comparison with F. hepatica and various other trematodes using complete, inferred mt amino acid sequence data sets.

Ethics statement
This study was approved by the Animal Ethics Committee of Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences (Permit code. LVRIAEC2012-006). Adult specimens of Fasciola were collected from bovids, in accordance with the Animal Ethics Procedures and Guidelines of the People's Republic of China.
Parasites and isolation of total genomic DNA Adult specimens of Fasciola sp. were collected from the liver of a dairy cow (Bos taurus) in Heilongjiang province, China. Adult specimens of F. gigantica were collected from the liver of a buffalo (Bubalus bubalis) in Guangxi province, China. The worms were washed extensively in physiological saline, fixed in ethanol and then stored at −20°C until use. Single specimens were identified as Fasciola sp. or F. gigantica based on PCR-based sequencing of the ITS-1 and ITS-2 rDNA regions [11,12].
Long-range PCR-based sequencing of mt DNA To obtain some mt gene sequence data for primer design, regions (400-500 bp) of the cox1 and nad4 genes were PCR-amplified and sequenced using relatively conserved primers JB3/JB4.5 and ALF/ALR [13,22], respectively. Using BigDye terminator v.3.1 chemistry (Applied Biosystems, Weiterstadt, Germany), the amplicons were sequenced in both directions in a PRISM 3730 sequencer (ABI, USA). After sequencing regions of the cox1 and nad4 genes of both Fasciola sp. and F. gigantica, two internal pairs of conserved primers were designed (Table 1). These pairs were then used to long PCR-amplify the complete mt genome [23] in two overlapping fragments (cox1-nad4;~9 kb and, nad4-cox1 =~6 kb) from a proportion of total genomic DNA (10-20 ng) from one individual of Fasciola sp. and another of F. gigantica. The cycling conditions used were 92°C for 2 min (initial denaturation), then 92°C for 10 s (denaturation), 58-63°C for 30 s (annealing), and 60°C for 5 min (extension) for 5 cycles, followed by 92°C for 2 min, 92°C for 10 s, 58-63°C for 30 s, and 66°C for 5 min for 20 cycles, and a final extension at 66°C for 10 min. Each amplicon, which represented a single band in a 0.8% (w/v) agarose gel, following electrophoresis and ethidium-bromide staining [23], was column-purified and then sequenced using a primer-walking strategy [24].

Sequence analyses
Sequences were manually assembled and aligned against each other, and then against the complete mt genome sequences of 11 other trematodes (see section on Phylogenetic analysis) using the program Clustal X 1.83 [25] and manual adjustment, in order to infer gene boundaries. Open-reading frames (ORFs) were established using the program ORF Finder (http://www.ncbi.nlm.nih.gov/gorf/ gorf.html), employing the trematode mt code, and subsequently compared with those of F. hepatica [26]. Translation initiation and termination codons were identified based on comparisons with those of F. hepatica [26]. The secondary structures of 22 tRNA genes were predicted using tRNAscan-SE [27] with manual adjustment [28], and rRNA genes were predicted by comparison with those of F. hepatica [26].

Sliding window analysis of nucleotide variation
To detect variable nucleotide sites, pairwise alignments of the complete genomes, including tRNAs and all intergenic spacers, were performed using Clustal X 1.83. The complete mt genome sequences of Fasciola sp. and F. gigantica were aligned with that published previously for F. hepatica (NC_002546) [26], and sliding window analysis was conducted using DnaSP v.5 [29]. A sliding window of 300 bp (in 10 bp overlapping steps) was used to estimate nucleotide diversity Pi (π) across the alignment. Nucleotide diversity was plotted against mid-point positions of each window, and gene boundaries were identified.

Phylogenetic analysis
The amino acid sequences conceptually translated from individual genes of the mt genomes of each Fasciola sp. and F. gigantica were concatenated. For comparative purposes, amino acid sequences predicted from published mt genomes of selected members of the subclass Digenea, including F. hepatica (NC_002546) [ [32], Orientobilharzia turkestanicum (HQ283100) [33], Schistosoma mansoni (NC_002545) [34], S. japonicum (HM120846) [35], S. mekongi (NC_002529) [34], S. spindale (DQ157223) [36] and S. haematobium (DQ157222) [35] [Schistosomatidae], were also included in the analysis. A sequence representing Gyrodactylus derjavinoides (accession no. NC_010976) was included as an outgroup [37]. All amino acid sequences were aligned using the program MUSCLE [38] and subjected to phylogenetic analysis using Bayesian inference (BI), as described previously [39,40]. Phylograms were displayed using the program Tree View v.1.65 [41]. In addition, all publicly available sequences of NADH dehydrogenase subunit 1 gene (nad1) of Fasciola sp.. F. gigantica and F. hepatica were aligned (over a consensus length of 359 bp) using MUSCLE, the alignment was modified manually, and then subjected to phylogenetic analysis by BI, applying the General Time Reversible (GTR) model. Nodal support values for the final phylogram were determined from the final 75% of trees obtained using a sample frequency of 100. The analysis was performed until the potential scale reduction factor approached 1 and the average standard deviation of split frequencies was less than 0.01. An nad1 sequence of Fascioloides magna was used as an outgroup in phylogenetic analysis.

Results
Identity of the two liver flukes, and features of the mt genomes The ITS-1 and ITS-2 sequences (GenBank accession no. KF543341) of the specimen of Fasciola sp. from Heilongjiang province were the same as that of an 'intermediate form' of Fasciola from China (AJ628428, AJ557570 and AJ557571) reported previously [11,12], which is characterized by polymorphic positions at 10 positions in ITS-1 and ITS-2 (Additional file 1: Figure S1; Table 2). Based on these key polymorphic positions (cf. [11,12]), this specimen of Fasciola sp. from China was inferred to be a hybrid between F. gigantica and F. hepatica. The ITS-1 and ITS-2 sequences of the F. gigantica sample (accession no. KF543340) from Guangxi province were consistent with that of the same species from Niger (AM900371) and did not have any polymorphic positions ( Table 2). The complete mt genome sequences representing Fasciola sp. (GenBank accession no. KF543343) and F. gigantica (accession no. KF543342) were 14,453 bp and 14,478 bp in size, respectively. Each mt genome contains 12 proteincoding genes (cox1-3, nad1-6, nad4L, cytb and atp6), 22 transfer RNA genes and two ribosomal RNA genes (rrnS and rrnL), but lack an atp8 gene ( Figure 1). The mt genome arrangement of the two flukes is the same as that of F. hepatica [26], but as expected, distinct from Schistosoma spp. [36]. All genes are transcribed in the same direction and have a high A + T content (62.7%). The ATrich regions of both mt genomes are located between tRNA-Glu and tRNA-Gly, and tRNA-Gly and cox3.

Annotation
For the two liver flukes, the protein-coding genes were in the following order: , and the lengths of the all protein-coding genes are the same for Fasciola sp. and F. gigantica ( Table 3). The inferred nucleotide and amino acid sequences of each of the 12 mt proteins of two liver flukes were compared. A total of 3,356 amino acids are encoded in the both mt genomes. All protein-coding genes have ATG, TTG or GTG as their initiation codon (Table 3). All protein-coding genes have TAG as their termination codon, except for cox3 and nad3, which have TAA in Fasciola sp. (Table 3) The two ribosomal RNA genes (rrnL and rrnS) of Fasciola sp. and F. gigantica were inferred based on comparisons with sequences from those of F. hepatica. The rrnL of Fasciola sp. and F. gigantica is located between tRNA-Thr and tRNA-Cys, and rrnS is located between tRNA-Cys and cox2. The length of rrnL is 987 bp for both Fasciola sp. and F. gigantica. The size of the rrnS genes is 769 bp and 771 bp for Fasciola sp. and F. gigantica, respectively. The A + T contents of rrnL and rrnS are~62% and~61% for Fasciola sp. and F. gigantica, respectively.
Two AT-rich non-coding regions (NCR) in the mt genomes Fasciola sp. and F. gigantica were inferred. In both mt genomes, the long NCR (841 bp) is located between the tRNA-Gly and cox3 (Figure 1), has an A + T content of~53% and contains eight perfect, 86 bp tandem repeats (TR1 to TR8). The short NCR is 174-176 bp in length, is located between tRNA-Glu and tRNA-Gly ( Figure 1) and has an A + T content of~72%.
Comparative mt genomic analyses of Fasciola sp. and F. gigantica with F. hepatica The complete mt genome sequences representing Fasciola sp. and F. gigantica are 9 bp shorter and 16 bp longer than F. hepatica (14,462 bp in length) [26], respectively. A comparison of the nucleotide sequences of each mt gene, and the amino acid sequences, conceptually translated from all mt protein-encoding genes of the three flukes, is given in Table 4. Across the entire mt genome, the sequence difference was 2.6% (380 nucleotide substitutions) between Fasciola sp. and F. gigantica, 11.8% (1712 nucleotide substitutions) between Fasciola sp. and F. hepatica, and 11.8% (1714 nucleotide substitutions)

Present study C/T A/T C/T T/A C/T T/C T/C C/T C/T T KF543341
* Sequence positions were determined by comparison with that of a previous study [11].   [26], except for the 22 tRNA genes, which are designated using one-letter amino acid codes, with numerals differentiating each of the two leucine-and serine-specifying tRNAs (L1 and L2 for codon families CUN and UUR, respectively; S1 and S2 for codon families AGN and UCN, respectively). Large non-coding region (NS); small non-coding region (NL).
between F. gigantica and F. hepatica. The difference across both nucleotide and amino acid sequences of the 12 protein-coding was 11.6% (1167 nucleotide substitutions) and 9.54% (320 amino acid substitutions) between the Fasciola sp. and F. hepatica; 11.6% (1167 nucleotide substitutions) and 9.83% (330 amino acid substitutions) between the F. gigantica and F. hepatica; and 2.8% (281 nucleotide substitutions) and 2.1% (71 amino acid substitutions) between the Fasciola sp. and F. gigantica, respectively. Nucleotide variability in the mt genome among Fasciola sp., F. gigantica and F. hepatica Sliding window analysis across the mt genomes of Fasciola sp., F. gigantica and F. hepatica provided an estimation of nucleotide diversity Pi (π) for individual mt genes ( Figure 2). By computing the number of variable positions per unit length of gene, the sliding window indicated that the highest and lowest levels of sequence variability were within the genes nad6 and cytb, respectively. Conserved regions were identified within nad1 and cox1 genes. In this study, the cytb and nad1 genes are the most conserved protein-coding genes, and nad6, nad5 and nad4 are the least conserved.

Phylogenetic analysis
Phylogenetic analysis of the concatenated amino acid sequence data for all 12 mt proteins ( Figure 3) showed that the Fasciolidae clustered to the exclusion of representatives of the families Paragonimidae (P. westermani) and Opisthorchiidae (O. viverrini, O. felineus and C. sinensis); the Schistosomatidae clustered separately with strong nodal support (posterior probability (pp) = 1.0). Within the Fasciolidae, Fasciola sp. and F. gigantica clustered together with strong support (pp = 1.0), to the exclusion of F. hepatica, with the former two taxa being more closely related than either was to F. hepatica. In addition, phylogenetic analysis using the nad1 data supports clustering of  the Fasciola sp. with aspermic F. gigantica x F. hepatica hybrids characterised previously [42] (Additional file 2: Figure S2).

Discussion
The present comparative, genetic investigation of representative specimens of Fasciola sp. (i.e. the 'intermediate form'), F. gigantica and F. hepatica using whole mt genomic and protein sequence data sets showed that Fasciola sp. and F. gigantica were more closely related than either was to F. hepatica. This finding was also supported by an analysis of nad1 sequence data (cf. Additional file 2: Figure S2). Although this evidence might suggest that Fasciola sp. is a population variant of F. gigantica, previous studies [19][20][21] have proposed that Fasciola sp. is a hybrid of F. gigantica and F. hepatica. The combined use of mtDNA (if indeed maternally inherited in fasciolids) and nuclear DNA markers [43] should assist in exploring the "hybridization/speciation" hypotheses [44]. Clearly, there is consistent evidence from various studies [11,12,14]  has not yet been estimated using a mutation scanning-or cloning-based sequencing [45], the polymorphic positions in the sequences determined by direct sequencing [11,14] indicate a clear pattern of introgression between the F. gigantica and F. hepatica. Although mt genomic (11.8%) and inferred protein (9.83%) sequence differences between these two species is substantial, the explanation that Fasciola sp. represents a hybrid between these two recognized species seems plausible, given that the karyotypes of both diploid F. hepatica and F. gigantica are the same (2n = 20) [46,47] and that the magnitude of sequence variation (1.7%) in ITS-2 (a species marker) between F. gigantica and F. hepatica is comparable with the highest level (1.3-1.6%) in this rDNA region between some schistosome species for which hybrids (i.e. S. haematobium × S. bovis; S. haematobium × S. guineenis; S. haematobium × S. intercalatum) have been reported [48][49][50]. While hybridization seems possible, another explanation might be ITS rDNA "lineage sorting and retention of ancestral polymorphism" [51,52], but this is perhaps less likely, given a clear pattern of mixing of ITS sequences seen in Fasciola sp. (cf. Additional file 1: Figure S1). In addition, polyploidy or diploidy in aspermic Fasciola [20] needs to be considered, and warrants future investigation. Perhaps the aspermic Fasciola specimens described in the literature [53] were infertile hybrids of F. gigantica and F. hepatica (in situations where both species occur in sympatry). Questions that might be addressed directly in relation to Fasciola sp. are: Are eggs from Fasciola sp. fertilized and viable? If miracidia develop and emerge from these eggs, are they infective to snails? If they do infect snails, do the ensuing adult worms (in the definitive host) contain sperm and are these worms fertile, and what is their ploidy? These questions should be addressed, and could be complemented by detailed light and transmission electron microscopic investigations of a relatively large number of adult specimens of Fasciola sp., F. gigantica and F. hepatica (preferably from different countries), which have been unequivocally and individually identified based on their ITS-1 and ITS-2 sequences. Such a study Figure 3 Genetic relationships of Fasciola sp. with Fasciola gigantica and F. hepatica, and other trematodes. Phylogenetic analysis of the concatenated amino acid sequence data representing 12 protein-coding genes was conducted using Bayesian inference (BI), using Gyrodactylus derjavinoides (NC_010976) as an outgroup.
should pay particular attention to the morphology of the reproductive organs, sperm and oocytes, and the karyotypes of worms, and establish whether or not Fasciola sp. from China is polyploid and/or aspermic [20].
Moreover, although challenging, laborious and timeconsuming, it would be highly informative to conduct hybridization studies in vivo, whereby individual miracidia from eggs from adults of each Fasciola sp., F. gigantica and F. hepatica would be used to infect (separately) their lymnaeid snail hosts, to raise clonal populations of cercariae and metacercariae of these three taxa, so that mixed infections (in different combinations and with monospecific controls) could be established in, for example, sheep or goats, to attempt to cross-hybridize the three taxa in a pairwise manner. Using such an experimental design, eggs and adult worms could then be examined in detail at both the electron microscopic, karyotypic and molecular levels. Importantly, in these experiments, ITS-1 and/or ITS-2 could be used to establish the genotypes of subsamples of individuals, and mt markers derived from mt genomes determined here and of F. hepatica could be used to determine haplotypes and mtDNA inheritance if the cross-hybridization studies were successful. Therefore, the present markers could be employed, in combination, to establish the biological relationship of the three taxa through in vivo experiments, but also in the field in sympatric and allopatric populations, if they occur. Combined with the use of markers in nuclear and mt genomes, advanced genomic sequencing, optical mapping and microimaging techniques might assist studies of Fasciola sp. in China and other countries.