The mitochondrial genome of Protostrongylus rufescens – implications for population and systematic studies

Background Protostrongylus rufescens is a metastrongyloid nematode of small ruminants, such as sheep and goats, causing protostrongylosis. In spite of its importance, the ecology and epidemiology of this parasite are not entirely understood. In addition, genetic data are scant for P. rufescens and related metastrongyloids. Methods The mt genome was amplified from a single adult worm of P. rufescens (from sheep) by long-PCR, sequenced using 454-technology and annotated using bioinformatic tools. Amino acid sequences inferred from individual genes of the mt genomes were concatenated and subjected to phylogenetic analysis using Bayesian inference. Results The circular mitochondrial genome was 13,619 bp in length and contained two ribosomal RNA, 12 protein-coding and 22 transfer RNA genes, consistent with nematodes of the order Strongylida for which mt genomes have been determined. Phylogenetic analysis of the concatenated amino acid sequence data for the 12 mt proteins showed that P. rufescens was closely related to Aelurostrongylus abstrusus, Angiostrongylus vasorum, Angiostrongylus cantonensis and Angiostrongylus costaricensis. Conclusions The mt genome determined herein provides a source of markers for future investigations of P. rufescens. Molecular tools, employing such mt markers, are likely to find applicability in studies of the population biology of this parasite and the systematics of lungworms.


Background
Protostrongylus rufescens is a metastrongyloid nematode of small ruminants, including sheep and goats (definitive hosts) in most parts of the world [1]. The dioecious adults of this nematode live in the respiratory system (terminal bronchioles and alveoli) of the definitive host. Here, the females produce eggs, from which first-stage larvae (L1s) hatch within the airways of the lung. L1s then migrate via the bronchial/tracheal escalator to the pharynx, are swallowed and are then excreted in the faeces. L1s infect a molluscan intermediate host (snail) and then develop, under favourable environmental conditions, into third-stage larvae (L3) [1]. L3s within an infected intermediate host are then ingested by the ruminant host, penetrate the gut wall and then migrate via the lymphatic system or blood stream to the lungs, where they develop to adult worms. The prepatent period is reported to be~4-9 weeks [2]. Although P. rufescens infection is widespread, it does not usually cause major clinical disease. Nonetheless, pathological changes, characterized by chronic, eosinophilic, granulomatous pneumonia, can be detected upon post mortem examination. Adult worms reside mainly in the bronchioles and alveoli, and are surrounded by macrophages, giant cells, eosinophils and other inflammatory cells which produce grey or beige plaques (1-2 cm) under the pleura in the dorsal border of the caudal lung lobes [3].
Little is known about fundamental aspects of the epidemiology and ecology of P. rufescens. Molecular tools employing suitable genetic markers can underpin fundamental studies in these areas, with a perspective on investigating transmission patterns linked to particular genotypes of a parasite and on discovering population variants or cryptic species [4,5]. Advances in nucleic acid sequencing and bioinformatics have provided a foundation for characterizing the mt genomes from parasitic nematodes as a source of genetic markers for such explorations. Here, we used an established, massively parallel sequencing-bioinformatics pipeline [6] for the characterization of the mt genome of P. rufescens, which we compared with those of related metastrongyloid nematodes, for which mt genomic sequence data are available. We also studied the genetic relationships among these lungworms and selected representatives within the order Strongylida, and suggest that selected regions in the genome of P. rufescens should serve well as markers for future studies of the ecology and epidemiology of this nematode around the world.

Parasite and genomic DNA isolation
Adult worms of P. rufescens were collected from the lungs of a fresh sheep cadaver in Victoria, Australia, washed extensively in physiological saline and then stored at −80°C. Upon thawing, genomic DNA was isolated from a single adult male specimen using an established method of sodium dodecyl-sulphate (SDS)/ proteinase K digestion and subsequent mini-column purification [7]. The identity of the specimen was verified by PCR-based sequencing (BigDye chemistry v.3.1) of the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA [7].

Long-PCR, sequencing and mt genome assembly
From the genomic DNA extracted from the single male worm, the complete mt genome was amplified by long-PCR (BD Advantage 2, BD Biosciences) as two overlapping amplicons (~5 kb and~10 kb), using the protocol described by Hu et al. [8], with appropriate positive (i.e., Haemonchus contortus DNA) and negative (i.e., no template) controls. Amplicons were consistently produced from the positive control samples; in no case was a product detected for the negative controls. Amplicons were then treated with shrimp alkaline phosphatase and exonuclease I [9], and quantified by spectrophotometry. Following agarose electrophoretic analysis, the two amplicons (2.5 μg of each) were pooled and subsequently sequenced using the 454 Genome Sequencer FLX (Roche) [10] according to an established protocol [6]. The mt genome sequence was assembled using the program CAP3 [11] from individual reads (of~300 bp).

Annotation and analyses of sequence data
Following assembly, the mt genome of P. rufescens was annotated using the bioinformatic annotation pipeline developed by Jex et al. [6]. Briefly, the open reading frame (ORF) of each protein-coding mt gene was identified (six reading frames) by comparison to those of the mt genome of Angiostrongylus vasorum [GenBank: JX268542; [12]]. The large and small subunits of the mt ribosomal RNA genes (rrnS and rrnL, respectively) were identified by local alignment. The transfer RNA (tRNA) genes were predicted (from both strands) based on their structure, using scalable models based on the standard mt tRNAs for nematodes [5]. Predicted tRNA genes were then grouped according to their anti-codon sequence and identified based on the amino acid encoded by the anti-codon. Two separate tRNA gene groups were predicted each for leucine (Leu) (one each for the anticodons CUN and UUR, respectively) and for serine (Ser) (one each for the anticodons AGN and UCN, respectively), as these tRNA genes are duplicated in many invertebrate mt genomes, including those of nematodes [5]. All predicted tRNAs for each amino acid group were ranked according to the "strength" of their structure (inferred based on minimum nucleotide mismatches in each stem); for each group, the 100 bestscoring structures were compared by BLASTn against a database comprising all tRNA genes for each amino acid for all published mt genome sequences of nematodes (available via http://drake.physics.mcmaster.ca/ogre/; [13]). The tRNA genes were then identified and annotated based on their highest sequence identity to known nematode tRNAs. Annotated sequence data were imported using the program SEQUIN (via http://www.ncbi.nlm.nih.gov/Sequin/), the mt genome structure verified and the final sequence submitted as an SQN file to the GenBank database.

Phylogenetic analysis of concatenated amino acid sequence datasets
The amino acid sequences were predicted from individual mt genes of P.  [6,12,[14][15][16][17][18][19][20] (Table 1). All amino acid sequences were aligned using the program MUSCLE [21] and then subjected to phylogenetic analysis. For this analysis, best-fit models of evolution were selected using ProtTest 3.0 [22] employing the Akaike information criterion (AIC) [23]. Bayesian inference analysis was conducted using MrBayes 3.1.2 [24], with a fixed mtREV amino acid substitution model [25], using four rate categories approximating a Γ distribution, four chains and 200,000 generations, sampling every 100th generation. The first 200 generations were removed from the analysis as burn-in.

Features of the mt genome
The circular mt genome sequence of P. rufescens   It contains two ribosomal genes, 12 protein-coding (cox1-3, nad1-6, nad4L, atp6 and cytb) and 22 tRNA genes. The gene arrangement (GA2) in the mt genome of P. rufescens was the same as all other strongylid nematodes studied to date [5,26]. All of the 36 genes are transcribed in the same direction (5′ to 3′) ( Figure 1). Overall, the genome is AT-rich, as expected for strongylid nematodes [12,20,27,28], with T being the most favoured nucleotide and C the least favoured. The nucleotide contents were 25.9% (A), 6.8% (C), 18.6% (G) and 48.6% (T) ( Table 2). The longest non-coding (ATrich) region, located between the genes trnA and trnP, was 223 bp in length (see Figure 1); its AT-content was 83.4%, significantly greater than for all other parts of the mt genome (Table 2).

Ribosomal RNA genes
The rrnS and rrnL genes of P. rufescens were identified by sequence comparison with An. vasorum. The rrnS gene was located between trnE and trnS (UCN), and rrnL was between trnH and nad3. The two genes were separated from one another by the protein-encoding genes nad3, nad5, nad6 and nad4L ( Figure 1). The sizes of the rrnS and rrnL genes of P. rufescens were 683 bp and 959 bp, respectively. The lengths of these two genes were similar to those of other metastrongyloids for which mt genomes are known (694-699 bp for rrnS, and 958-961 bp for rrnL [12,20,[26][27][28] (Figure 1), and amongst the shortest for metazoan organisms [29].

Protein-coding genes and codon usage
The prediction of initiation and termination codons for the protein-coding genes of P. rufescens (Table 3) revealed that the commonest start codon was ATT (for five of 12 proteins), followed by TTG (four genes), ATA (two genes) and ATG (one gene). Ten mt protein genes of P. rufescens were predicted to have a TAA or TAG translation termination codon. The other two protein genes ended in an abbreviated stop codon, such as T or TA ( Table 3). The codon usage for the 12 protein-encoding genes of P. rufescens was also compared with that of other metastrongyloid nematodes, Aelurostrongylus (Ae.) abstrusus, An. cantonensis, An. costaricensis and An.
vasorum [12,20,28] (Table 4). All 64 codons were used. The preferred nucleotide usage at the third codon position of mt protein genes of P. rufescens reflects the overall nucleotide composition of the mt genome. At this position, T was the most frequently, and C the least frequently used. For P. rufescens, the codons ending in A had higher frequencies than the codons ending in G, which is similar to, for example, other members of the order Strongylida and Caenorhabditis elegans (Rhabditida), but distinct from Ascaris suum (Ascaridida) and Onchocerca volvulus (Spirurida) [14][15][16][17]30]. As the usage of synonymous codons is proposed to be preferred in gene regions of functional importance, codon bias appears to be linked to selection at silent sites and to translation efficiency [31,32].
The AT bias in the genome is also reflected in the amino acid composition of predicted proteins. The ATrich codons represent the amino acids Phe, Ile, Met, Tyr, Asn the Lys, and GC-rich codons represent Pro, Ala, Arg the Gly. In the mt genome of P. rufescens, the most frequently used codons were TTT (Phe), TTA (Leu), ATT (Ile), TTG (Leu), TAT (Tyr), GGT (Gly), AAT (Asn) and GTT (Val). Six of these codons are AT-rich, and one of them is GC-rich. Seven of the eight codons contained an A or a T at two positions, except for GGT (Gly), which contained a T only in the third position. None of them had a C at any position. The least frequently used codons were CTC, CTG (Leu), GTC (Val), AGC (Ser), CCC (Pro), GCC (Ala), CAC (His), CGA (Arg), TCC (Ser), GGC (Gly) and ACC (Thr). All four GC-rich codons were represented here, and every codon had at least one C. When the frequencies of synonymous codons within the AT-rich group, such as Phe (TTT, 14.2%; TTC, 1.2%), Ile (ATT, 5.6%; ATC, 0.7%), Tyr (TAT, 5.6%; TAC, 0.9%) and Asn (AAT, 3.8%; AAC, 0.7%), were compared, the frequency was always less if the third position was a C.

Transfer RNA genes
Twenty-two tRNA gene sequences were predicted in the mt genome of P. rufescens. These sequences ranged from 52-63 nt in length. The tRNA structures had a 7 bp amino-acyl arm, a 4 bp DHU arm, a 5 bp anticodon stem, a 7 base anticodon loop, a T always preceding an anticodon as well as a purine always following an anticodon. Twenty of the 22 tRNA genes (i.e. excluding the two trnS genes) have a predicted secondary structure with a 4 bp DHU stem and a DHU loop of 4-10 bases, in which the variable TψC arm and loop are replaced by a "TV-replacement loop" of 4-11 bases, in accordance with most nematodes whose mt genomes have been characterised [5]. The mt trnS for P. rufescens has a secondary structure consisting of a DHU replacement loop of 7 bases, 3 bp TψC arm, TψC loop of 4-6 bases and a  [29,34,35]. Overlaps of one to four nucleotides are found between the genes trnH and rrnL, nad4L and trnW, trnY and nad1, trnI and trnR within the mt genome of P. rufescens.

Amino acid sequence comparisons and genetic relationships of P. rufescens with metastrongyloid and other nematodes
The amino acid sequences predicted from individual protein-encoding mt genes of P. rufescens were compared with those of Ae. abstrusus, An. cantonensis, An. costaricensis, An. vasorum, Dictyocaulus viviparus and and O. dentatum and S. vulgaris (strongyloids) (Figure 2) (pp = 1.00).

Implications
The characterisation of the mt genome of P. rufescens provides genetic markers for future population genetic and systematic studies. As sequence variation in ITS-2 nuclear rDNA is usually low within most species of strongylid nematodes [36], mt DNA is better suited for assessing population genetic variation. Therefore, PCR-based analytical approaches, using cox1, nad1 and nad4 (displaying varying levels of within-species divergence), could be used to study haplotypic variation in P. rufescens populations in sheep and goats and also in molluscan hosts. Given that species complexes are commonly encountered in bursate nematodes [1,4,36], it would be interesting to prospect for cryptic species, to assess whether distinct genotypes/haplotypes of P. rufescens exist in sheep and goats as well as snails [37], and to establish whether particular subpopulations of P. rufescens occur in particular environments or geographical regions/countries, and have particular patterns of transmission. It would also be interesting to assess the genetic structure of P. rufescens populations using PCR-coupled International Union of Pure and Applied Chemistry (IUPAC) codes (N = A, G, C or T; Y = C or T; R = A or G) were used.
mutation scanning and sequencing of selected mt gene regions (such as cox1 and nad4), and mt DNA diversity within populations and the gene flow among populations. Findings for this lungworm (with an indirect life cycle via a molluscan intermediate host) could be compared with those for D. viviparus (with a direct life cycle), which has been reported to have surprisingly low mt DNA diversity within populations and limited gene flow among populations [38,39]. The complete mt genome of P. rufescens provides a basis for extended comparative mt genomic/proteomic analyses of other protostrongyloids of ruminants, including P. brevispiculum, P. davtiani, P. hobmaieri, P. rushi, P. skrjabini, P. stilesi, Cystocaulus ocreatus, Neostrongylus lineatus, Muellerius capillaris (the latter of which is a particularly pathogenic parasite in goats), and those of other animal hosts, such as lagomorphs and pinnipeds. Given the utility of predicted mt proteomic datasets, high phylogenetic signal and consistently high nodal support values in recent systematic analyses [6,12,27,28,33] provide an opportunity to reassess the evolutionary relationships of lungworms (order Strongylida). For example, the family Protostrongylidae is distinguished from other metastrongyloids by only a couple of morphological characters, i.e., the gubernaculm and telamon in adult male worms [40], and it is proposed that protostrongyloids of lagomorphs originated from their ancestors primarily infecting sheep, goat, antelopes and deer [41]. Analyses of inferred mt proteomic data sets from a range of protostrongyloids should allow relationships within the family Protostrongylidae and also the origin of the protostrongylids of lagomorphs to be assessed. In addition, there has been considerable debate as to the relationships among suborders within the Strongylida, based on the use of phenotypic characters [42]. On one hand, it has been hypothesized that the suborder Metastrongylina (to which species of Protostrongylus, Metastrongylus, Aelurostrongylus and Angiostrongylus belong) originated from ancestors in the Strongylina [43,44] or Trichostrongylina [45,46]. On the other hand, it has been proposed that the Metastrongylina gave rise to the Strongylina [47].
To date, molecular phylogenetic analyses of nuclear ribosomal rDNA sequence data [48,49] have suggested that the Trichostrongylina are basal to the Metastrongylina, which represented a monophyletic assemblage. However, Jex et al. [6], using mitochondrial sequence data, showed that the major suborders within the Strongylida (e.g., the Metastrongylina, Strongylina and Trichostrongylina) were each resolved as distinct, monophyletic clades with maximum statistical and nodal support (posterior probability = 1.00; bootstrap = 100). A detailed analysis using inferred mt proteomic data sets would allow an independent assessment of the systematic relationships of these suborders.

Conclusions
Comparative analyses of proteomic sequence datasets inferred from the mt genomes of P. rufescens and other lungworms indicate that P. rufescens is closely related to Ae. abstrusus, An. cantonensis, An. costaricensis and An. vasorum. The mt genome determined herein should provide a source of markers for future investigations of P. rufescens. Molecular tools, employing such mt markers, are likely to find applicability in studies of the population biology of this parasite and the systematics of lungworms.