Assessment of the genetic relationship between Dictyocaulus species from Bos taurus and Cervus elaphus using complete mitochondrial genomic datasets

Background Dictyocaulus species are strongylid nematodes of major veterinary significance in ruminants, such as cattle and cervids, and cause serious bronchitis or pneumonia (dictyocaulosis or “husk”). There has been ongoing controversy surrounding the validity of some Dictyocaulus species and their host specificity. Here, we sequenced and characterized the mitochondrial (mt) genomes of Dictyocaulus viviparus (from Bos taurus) with Dictyocaulus sp. cf. eckerti from red deer (Cervus elaphus), used mt datasets to assess the genetic relationship between these and related parasites, and predicted markers for future population genetic or molecular epidemiological studies. Methods The mt genomes were amplified from single adult males of D. viviparus and Dictyocaulus sp. cf. eckerti (from red deer) by long-PCR, sequenced using 454-technology and annotated using bioinformatic tools. Amino acid sequences inferred from individual genes of each of the two mt genomes were compared, concatenated and subjected to phylogenetic analysis using Bayesian inference (BI), also employing data for other strongylids for comparative purposes. Results The circular mt genomes were 13,310 bp (D. viviparus) and 13,296 bp (Dictyocaulus sp. cf. eckerti) in size, and each contained 12 protein-encoding, 22 transfer RNA and 2 ribosomal RNA genes, consistent with other strongylid nematodes sequenced to date. Sliding window analysis identified genes with high or low levels of nucleotide diversity between the mt genomes. At the predicted mt proteomic level, there was an overall sequence difference of 34.5% between D. viviparus and Dictyocaulus sp. cf. eckerti, and amino acid sequence variation within each species was usually much lower than differences between species. Phylogenetic analysis of the concatenated amino acid sequence data for all 12 mt proteins showed that both D. viviparus and Dictyocaulus sp. cf. eckerti were closely related, and grouped to the exclusion of selected members of the superfamilies Metastrongyloidea, Trichostrongyloidea, Ancylostomatoidea and Strongyloidea. Conclusions Consistent with previous findings for nuclear ribosomal DNA sequence data, the present analyses indicate that Dictyocaulus sp. cf. eckerti (red deer) and D. viviparus are separate species. Barcodes in the two mt genomes and proteomes should serve as markers for future studies of the population genetics and/or epidemiology of these and related species of Dictyocaulus.


Background
Species of Dictyocaulus (Strongylida: Dictyocaulidae) are economically important parasitic nematodes of the lungs of various ungulate animals, including domestic and wild ruminants (e.g., cattle and deer) [1], and are causative agents of bronchitis and pneumonia (dictyocaulosis or "husk") [2]. All members of the Dictyocaulidae have direct life cycles [3]. Adult nematodes live in the bronchi, where the ovo-viviparous females produce eggs, from which first stage larvae (L1s) hatch in the lung. The L1s are shed in the faeces from the infected host. Under favourable environmental conditions, the L1s develop through to the infective third-stage larvae (L3s) over a period of~4-6 days. After ingestion by the host, the L3s migrate through the intestinal wall to the mesenteric lymph nodes, moult, and, as fourth stage larvae (L4s), are transported to the lungs. The L4s penetrate the alveoli, moult and then develop into dioecious adults. The period from ingestion to reproductive maturity is estimated at 21-35 days [3].
Since Dictyocaulus was first erected [4], there has been ongoing controversy as to the validity of some species within this genus, particularly those infecting bovids and/or cervids [5], because of a lack of reliable morphological features for their unequivocal identification. The identification of Dictyocaulus species and populations is not only important from a taxonomic perspective, but also has implications for studying the host and geographical distributions of the parasites, the cross transmissibility of Dictyocaulus between or among host species (particularly between bovine and cervid hosts) and also for the control of dictyocaulosis. Although molecular tools, employing genetic markers in ribosomal DNA, have found utility for systematic and/or epidemiological studies of some species [6][7][8][9][10][11][12][13], there is still limited information on the genetic composition of Dictyocaulus populations in different ruminant host species and countries around the world.
Mitochondrial (mt) genomes provide markers for systematic, genetic and epidemiological investigations of species of strongylid nematodes, with a perspective on discovering population variants and cryptic species and exploring transmission patterns linked to particular genotypes of a species [14][15][16]. For instance, we have shown that not only are mt genomic regions useful for exploring the population genetic structures of Dictyocaulus viviparus [12,17], concatenated amino acid sequences inferred from whole mt genomes can provide barcodes for nematodes [16]. Although nucleotide sequence variation can be considerable (≤3.5%; upon pairwise comparison) in some of the protein coding mt genes of D. viviparus studied to date [12], the inferred amino acid sequence variation is less. For example, previous studies have shown amino acid sequence variation of 0-1.5% (over 131 positions in COX1) (among 252 individual worms) [17], of 0-1.6% for COX3 (over 125 positions) and 0-2.3% for NAD5 (over 132 positions) (72 individual worms) [12]. Importantly, current evidence indicates that concatenated amino acid sequence datasets can be employed for the retesting of hypotheses regarding the systematic relationships of nematodes; such sequence datasets are relatively large and usually have excellent phylogenetic signal, often achieving nodal support values of 98-100% in tree reconstructions [16,18]. High throughput sequencing and new computational approaches [19] have underpinned these advances, and now enable mt proteomic barcodes to be defined for Dictyocaulus species from a range of ungulate hosts. Here, as a first step, we used a massively parallel sequencing method and semi-automated bioinformatic pipeline for the characterisation of the mt genomes of D. viviparus (from Bos taurus) and Dictyocaulus sp. cf. eckerti (from Cervus elaphus) (cf. [20]), which we compared directly with those of other lungworms, for which published whole mt genomic datasets were available [16,21]. We also studied the genetic relationships between these two Dictyocaulus species and selected representatives of the order Strongylida, and identified regions in the mt genomes of Dictyocaulus that might serve as markers for future population genetic or molecular epidemiological studies.

Parasites, DNA isolation and identification
Adult specimens of Dictyocaulus were collected from the bronchi of the lungs of cattle (from Sweden) or red deer (from New Zealand) in previous studies [17,20,22,23]. The worms originally collected were washed extensively in physiological saline and then stored at −80°C until use. Upon thawing, the anterior and posterior ends of each nematode were cut off and cleared in lactophenol for subsequent morphological identification. The midbody section of each worm was used for the extraction of genomic DNA using a small-scale sodium dodecylsulphate (SDS)/proteinase K digestion and mini-column purification (Wizard DNA Clean-Up Kit, Promega, USA) [23]. The molecular identity of each specimen was verified by PCR-based sequencing of the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA (rDNA) using an established method [23].
Sequencing and assembly of mt genomes Using the protocol described by Hu et al. [24], the complete mt genome was amplified by long-PCR (BD Advantage 2, BD Biosciences) as two overlapping amplicons (~5 kb and~10 kb) from the genomic DNA from the mid-body section of a single male specimen of D. viviparus from B. taurus and of Dictyocaulus sp. cf. eckerti from C. elaphus. Amplicons were consistently produced from the positive control samples (total genomic DNA of Angiostrongylus vasorum); in no case was a product detected for any of the negative (no-template) controls. Amplicons were then treated with shrimp alkaline phosphatase and exonuclease I, and quantified in a spectrophotometer (ND-1000 UV-VIS v.3.2.1, Nano-Drop Technologies). Following an electrophoretic analysis of quality, the two amplicons (2.5 μg of each) from each of the two worms from each host species were pooled and then sequenced using the 454 Genome Sequencer FLX (Roche) [25]. The mt genome sequences (GenBank accession nos. JX519459 and JX519460) were each assembled from (~300 bp) reads using the program CAP3 [26].

Annotation and analysis of mt genomic sequence data
The genes and features of each mt genome were annotated using an established computational pipeline [16]. In brief, each protein-encoding mt gene was identified by local alignment comparison (six reading frames) using amino acid sequences conceptually translated from corresponding genes from the mt genome of a reference species (e.g., Metastrongylus pudendotectus; accession no. GQ888714; [16]). The large and small subunits of the mt rrn (ribosomal RNA = rRNA) genes (rrnS and rrnL, respectively) were identified by local alignment of nucleotide sequence data. The trn (transfer RNA = tRNA) genes were predicted (from both strands) according to their structure, using scalable models, based on the standard nematode mt tRNAs [15]. All predicted trn genes were then grouped, based on their anti-codon sequence, and identified based on the amino acid encoded by this anti-codon. Two separate trn gene groups were predicted each for serine (one each for the anticodons AGN and UCN, respectively) and leucine (one each for the anticodons CUN and UUR, respectively), because these trn genes are duplicated in many invertebrate mt genomes [15]. All predicted tRNAs within each amino acid group were ranked based on structural "strength" (as inferred by the number of nt mismatches in each stem), and the 100 best-scoring structures for each group were compared by BLASTn against a custom database representing all published nematode mt genome sequences available in the Gen-Bank database (accessible via www.ncbi.nlm.nih.gov).
All trn genes of each mt genome were then identified and annotated based on maximum sequence identity to known nematode tRNAs. Annotated sequence data were imported using the program SEQUIN (available via http://www.ncbi.nlm.nih.gov/Sequin/) for the final verification of the mt genome organization/annotation prior to submission to the GenBank database.
Sliding window analysis was performed on the aligned, complete mt genome sequences of the two Dictyocaulus species using DnaSP v.5 [27]. The alignment of these sequences was achieved using the program MUSCLE v.3.8 [28], as implemented in SeaView v.4 [29]. Keeping the nucleotides in frame, there were no ambiguously aligned regions. A sliding window of 300 bp (steps of 10 bp) was used to estimate nucleotide diversity (π) over the entire alignment; indels were excluded using DnaSP. Nucleotide diversity for the entire alignments was plotted against midpoint positions of each window, and gene boundaries were defined.

Results
Features of the mt genomes of D. viviparus and Dictyocaulus sp. cf. eckerti from red deer The circular mt genome sequences determined for D. viviparus and Dictyocaulus sp. cf. eckerti were 13,310 bp and 13,296 bp in size (Figure 1), being similar to those published for Metastrongylus pudendotectus (13,793 bp) and M. salmi (13,778 bp) [16]. The nucleotide compositions of the two Dictyocaulus mt genomes were similar, being~24.6% for A, 6.3% for C, 17.3% for G and 51.8% for T ( Table 1). As expected, these two mt genomes were AT-rich, with T and C being the most and least favoured nucleotides, respectively. Both mt genomes contained genes encoding 12 proteins (COX1-3, NAD1-6, NAD4L, ATP6 and CYTB), two rrn and 22 trn genes but, as expected (cf. [15]), lacked an atp8 gene. All genes were inferred to be transcribed in the same direction (5' > 3') ( Figure 1) and the gene arrangement (GA2) was the same as that of other strongylid nematodes [15].

Protein genes and codon usages
The initiation and termination codons predicted for the protein encoding genes of D. viviparus were compared with those of Dicyocaulus sp. cf. eckerti ( Table 2). The commonest start codon for D. viviparus was ATT and TTG (for three of 12 proteins each), followed by ATA and GTT (two genes), ATC and TTA (one gene each); for Dictyocaulus sp. cf. eckerti, it was ATA (four genes), followed by TTG (three genes), ATG (two genes), ATT, GTT and TTA (one gene each). Eight mt protein genes in D. viviparus and nine in Dictyocaulus sp. cf. eckerti had a TAA or TAG translation termination codon; the other protein genes ended in an abbreviated stop codon, such as TA or T ( Table 2). For both Dicyocaulus species, the 3'-ends of these genes were immediately adjacent to a downstream trn gene ( Figure 1; Table 2), consistent with the arrangement for M. salmi [16].
The codon usage for the 12 protein genes of D. viviparus was also compared with Dictyocaulus sp. cf. eckerti ( Table 3); 62 of the 64 possible codons were used. Neither codon CGC (Arg) nor CTC (Leu) were utilized in the mt genome of D. viviparus. Codons CGC (Arg) and GAC (Asp) were not utilized in the mt genome of Dictyocaulus sp. cf. eckerti. The preferred nucleotide usage at the third codon position of mt protein genes of D. viviparus and Dictyocaulus sp. cf. eckerti reflects the overall nucleotide composition of these mt genomes. At this position, T is the most frequently, and C the least frequently used. For D. viviparus and Dictyocaulus sp. cf. eckerti, the codons ending in A have higher frequencies than the codons ending in G, which is similar to, for example, other members of the order Strongylida and Caenorhabditis elegans (Rhabditida), but distinct from Ascaris suum (Ascaridida) and Onchocerca volvulus (Spirurida) (accession nos. in [15]). As the usage of synonymous codons is proposed to be preferred in gene regions of functional significance, codon bias might be linked to selection at silent sites [30,31].
AT bias in the nucleotide composition was also reflected in a bias in the amino acid composition of proteins. The AT-rich codons represent the amino acids Phe, Ile, Met, Tyr, Asn or Lys, and the GC-rich codons represent Pro, Ala, Arg or Gly. In the mt genomes of D. viviparus and Dictyocaulus sp. cf. eckerti, the most frequently used codons are TTT (Phe), TTA (Leu), ATT (Ile), TTG (Leu), TAT (Tyr) and GTT (Val). The least frequently used codons were CTA, CTG (Leu), ATC (Ile), GTC (Val), AGC (Ser), CCC (Pro), GCC (Ala), TAC (Tyr), CAC (His), AAC (Asn), CGA (Arg), TCC (Ser) and GGC (Gly). All four GC-rich only codons are represented here, and every codon had at least one C. When the frequencies of synonymous codons within the AT-rich group, such as Phe (TTT, 16.1% and 16.2%; TTC, 0.7% and 0.3%), Ile (ATT, 5.7% and 5.3%; ATC, 0.15% and 0.03%), were compared between both mt genomes, the frequency was always less if the third position was a C.

Transfer RNA genes
Twenty-two trn genes were identified in the mt genomes of both D. viviparus and Dictyocaulus sp. cf. eckerti. The trn gene sequences ranged from 50-68 nt in length. The trn structures had a 7 bp amino-acyl arm, a 3-4 bp DHU arm, a 4-5 bp anticodon stem, a 7 base anticodon loop, with a T always preceding an anticodon and a purine always following an anticodon. Twenty of the 22 trn genes (i.e. excluding the two trnS genes) had a predicted secondary structure with a 3-4 bp DHU stem  and a DHU loop of 5-8 bases, in which the variable TψC arm and loop were replaced by a "TV-replacement loop" of 6-12 bases, in accordance with other nematodes [15]. The mt trnS of both Dictyocaulus mt genomes had a secondary structure consisting of a DHU replacement loop of 4-6 bases, 3 bp TψC arm, TψC loop of 6-9 bases and a variable loop of 4 bases, consistent with other nematodes of the class Secernentea [32,33], but distinct from Trichinella spiralis and Trichuris suis (class Adenophorea) [34,35].    Gene abbreviations are according to [69].

Ribosomal RNA genes
The rrnS and rrnL genes of each of the two Dictyocaulus species were identified by sequence comparisons with homologous sequences of A. cantonensis. The rrnS gene was located between trnE and trnS (UCN), and rrnL was between trnH and nad3. The two genes were separated from one another by the protein genes nad3, nad5, nad6 and nad4L (Figure 1). The amino acid sequences predicted from individual protein-encoding mt genes of D. viviparus were compared with those of Dictyocaulus sp. cf. eckerti (Table 4). Pairwise comparisons of the concatenated sequences revealed identities of 63.0-96.6% (65.5% overall) between them. Based on identity, COX-1 was the most conserved protein, whereas NAD6 and ATP6 were the least conserved (see Table 4). Phylogenetic analysis of the concatenated amino acid sequence data for the 12 mt proteins showed that both D. viviparus and Dictyocaulus sp. cf. eckerti were closely related, and grouped (pp = 1.00) to the exclusion of Metastrongylus and Angiostrongylus species (Metastrongyloidea) as well as H. contortus (Trichostrongyloidea), An. caninum (Ancylostomatoidea) and O. dentatum (Strongyloidea) (Figure 2).

Nucleotide variation between the mt genomes of D. viviparus and Dictyocaulus sp. cf. eckerti
Nucleotide diversities calculated from pairwise comparisons across the mt genomes of D. viviparus and Dictyocaulus sp. cf. eckerti, achieved by sliding window analyses, are shown in Figure 3. Also indicated on the graph are the regions of cox1 (452 bp), and of trnC_M_D_G (320 bp), rrnL (484 bp), nad5 (426 bp) and cox3 (401 bp) used previously as markers to assess genetic diversity within D. viviparus populations [12,17]. International Union of Pure and Applied Chemistry (IUPAC) codes (N = A, G, C or T; Y = C or T; R = A or G) were used. Gene product abbreviations are according to [69].
In Figure 3, these markers are indicated with M1, M2, M3, M4 and M5, respectively, and lengths include PCR primers. Greatest nucleotide diversity was detected within nad6, followed by peaks of variation within nad5, nad1 and atp6. Gene-by-gene nucleotide diversity was highly variable, but, by far, the least variation was recorded within cox1. Of the markers used previously by Hu et al. [17] (i.e. M1) and Höglund et al. [12] (i.e. M2-M5), M1 and M2 (within cox1) captured the least, and M4 (within nad5) captured the greatest sequence variation. The relatively low variation captured by targeting trn genes is also indicated by a trough for M2. Overall, the full sliding window indicates a wealth of new genes capable of providing high levels of nucleotide variation for population genetic studies.
Whether the parasite from red deer (i.e., Dictyocaulus cf. eckerti) represents a distinct species from (or a  population variant of ) "D. eckerti" (cf. [50]) remains to be established. A previous study showed that the magnitude of sequence difference (~6-8%) in ITS-2 rDNA of Dictyocaulus from fallow deer from Germany [7] is greater than variation (0.4-2.6%) among selected individuals of Dictyocaulus from red deer [10], although the degree of genetic variation within the operational taxonomic unit from fallow deer is not yet known. D. eckerti was originally described from the reindeer, Rangifer tarandus, from Western Siberia (cited in [50]), which raises questions as to the specific identity of Dictyocaulus from various cervid species [5,8,13,55]. Although Durette-Desset et al. [5] proposed that dictyocaulids of European cervids be called Dictyocalus noerneri, molecular and morphological data have shown that roe deer and moose can harbour D. capreolus (e.g., [8,54]) and that chamois (Rupicapra rupicapra) might harbour a unique species [13]. This controversy and the findings of numerous previous studies emphasize the need for detailed investigations of Dictyocaulus specimens from various species of wild and domestic bovid and cervid hosts from different continents, which could be achieved using a combined morphological and mt proteomic barcoding approach.
There is also significance in using mt genetic markers for studying the genetic composition of populations of Dictyocaulus spp., given that there are few morphological features for the specific differentiation of some developmental stages (i.e., larvae) [3] and given that cryptic species have been detected within the Strongylida [56,57]. In nematodes, mt DNA is usually more variable in sequence within a species than ITS-2 and other rDNA regions [56], indicating that mt gene regions are well suited for studying the population genetics of parasitic nematodes [14,56,58,59]. The sliding window analysis conducted herein displayed distinct patterns of nucleotide diversity between the two mt genomes representing Dictyocaulus (Figure 3). Low variability is useful for the design of oligonucleotide primers that flank mt regions with high variability for population genetic or epidemiological investigations. Some previous studies have shown the utility of some mt gene regions. For instance, Hu et al. [17] employed primer sets, originally designed to cox1 of flatworms [60], for PCR-based mutation scanning and selective sequencing analysis of D. viviparus individuals from 17 different populations (farms). Within-species variation in cox1 was low (0.3-2.3%) [17], similar to findings for some parasitic nematodes of plants and insects [61][62][63][64][65] but distinctly different from findings for gastrointestinal trichostrongyloid nematodes of domestic ruminants [66,67]. In the present study, cox1 is revealed to be the gene with the lowest nucleotide diversity between the two Dictyocaulus species. In another study, Höglund et al. [12] showed distinct genetic substructuring in D. viviparus populations from Sweden using four mt regions (in the cox3, nad5, trnC_M_D_G and rrnL genes) covering 11% (1542 bp) of the mt genome of D. viviparus (Figure 1). Although some of the gene regions (e.g., cox3 and nad5) used to date provided some resolution of genetic variation (<6% among haplotypes, upon pairwise comparison), the present sliding window analysis indicates other candidate genes for capturing greater diversity. That the cox1 region (of 393 bp) is the most conserved relative to other regions (see Figure 3; M1) is a feature that facilitates the design of primers for PCR, but might come at the cost of missing signal (of sequence variation) required for analysis. Here, numerous sequence tracts in the mt genome have been identified as potentially suitable for population genetic or taxonomic studies (Figure 3). The comparison among the mt genomes of different species of nematodes characterised to date now provides the prospect for designing generic or specific primers to capture regions of variability most suited to the resolution needed for analysis, and also provides opportunities for a range of DNA amplification-based approaches, including multiplexed diagnostic methods (cf. [68]).
PCR primers can be designed rationally in conserved regions flanking "variable sequence tracts" within the mt genome considered to be most informative for population genetic studies of Dictyocaulus from different bovid and cervid hosts. Utilizing such primer sets, PCR-based single-strand conformation polymorphism (SSCP) analysis [23] could be applied to screen large numbers of individual specimens representing different host species and populations for haplotypic variability (at any developmental stage). A previous study has shown the merit of SSCP for exploring the genetic variation in various populations of D. viviparus in Sweden [17], and could be applied to large-scale studies of Dictyocaulus specimens representing distinct operational taxonomic units (based on ITS-2) and different ungulate host species.
Using a range of variable mt regions, in combination with classical parasitological techniques, it might also be possible to assess the cross transmission of particular species or genetic variants (haplotypes) of Dictyocaulus between/among cattle and cervid hosts, and their pathogenicity in different host species (cf. [22]). Furthermore, it will also be significant to extend mt genome sequencing to a range of lungworm species of domestic and wild ruminants, and assess their genetic relationships. The present study shows the relevance of the mt genomes of Dictyocaulus species for future systematic investigations and, importantly, stimulates a reassessment of the phylogenetic position of the family Dictyocaulidae in relation to the Metastrongylidae, Trichostrongylidae and other families within the order Strongylida (cf. [11]).

Conclusions
Comparative analyses of proteomic sequence datasets inferred from the mt genomes of Dictyocaulus sp. cf. eckerti (red deer) and D. viviparus indicate that these parasites are closely related species. Barcodes identified in the mt genomes and proteomes could serve as markers for future studies of the systematics, population genetics and/or epidemiology of a range of Dictyocaulus species as well as other lungworms.