The complete mitochondrial genome of the Columbia lance nematode, Hoplolaimus columbus, a major agricultural pathogen in North America

Background The plant-parasitic nematode Hoplolaimus columbus is a pathogen that uses a wide range of hosts and causes substantial yield loss in agricultural fields in North America. This study describes, for the first time, the complete mitochondrial genome of H. columbus from South Carolina, USA. Methods The mitogenome of H. columbus was assembled from Illumina 300 bp pair-end reads. It was annotated and compared to other published mitogenomes of plant-parasitic nematodes in the superfamily Tylenchoidea. The phylogenetic relationships between H. columbus and other 6 genera of plant-parasitic nematodes were examined using protein-coding genes (PCGs). Results The mitogenome of H. columbus is a circular AT-rich DNA molecule 25,228 bp in length. The annotation result comprises 12 PCGs, 2 ribosomal RNA genes, and 19 transfer RNA genes. No atp8 gene was found in the mitogenome of H. columbus but long non-coding regions were observed in agreement to that reported for other plant-parasitic nematodes. The mitogenomic phylogeny of plant-parasitic nematodes in the superfamily Tylenchoidea agreed with previous molecular phylogenies. Mitochondrial gene synteny in H. columbus was unique but similar to that reported for other closely related species. Conclusions The mitogenome of H. columbus is unique within the superfamily Tylenchoidea but exhibits similarities in both gene content and synteny to other closely related nematodes. Among others, this new resource will facilitate population genomic studies in lance nematodes from North America and beyond.


Background
In the phylum Nematoda, plant-parasitic species can be distinguished from animal parasites as well as non-parasitic relatives because their mouthparts and stylet are well developed allowing them to penetrate sturdy plant cell walls while digging and feeding [1,2]. A number of plant-parasitic nematodes are currently recognized as major pathogens of agricultural crops worldwide, which leads to more than 150 billion USD losses annually in the USA [3]. In a recent USA survey of agricultural Conclusions: The mitogenome of H. columbus is unique within the superfamily Tylenchoidea but exhibits similarities in both gene content and synteny to other closely related nematodes. Among others, this new resource will facilitate population genomic studies in lance nematodes from North America and beyond. Keywords: Hoplolaimus, Lance nematode, Ecdysozoa, Mitochondrial genome, Phylogeny, de novo assembly pathogens, six main genera of plant-parasitic nematodes were recognized as serious crop threats [1]: cyst nematodes (Heterodera spp.); lance nematodes (Hoplolaimus spp.); root-knot nematodes (Meloidogyne spp.); lesion nematodes (Pratylenchus spp.); reniform nematodes (Rotylenchulus spp.); and dagger nematodes (Xiphinema spp.). Moreover, some of the above pathogens, like lance nematodes, can damage horticultural fields, golf courses, and turfgrasses. These plant-parasitic nematodes can also cause serious indirect environmental problems by favoring chemical overuse during nematode management [4].
Lance nematodes are all species of migratory ecto-endo parasites with a distinct cephalic region and a massive well-developed stylet [4]. According to the current taxonomical view that relies on a combination of molecular and morphological characters, lance nematodes belong to the class Chromadorea, infraorder Tylenchomorpha [2,[4][5][6]. They exhibit a wide range of hosts, including, among others, turf grasses, cereals, soybean, corn, cotton, sugar cane, and some trees [1,4,7]. They live in soil, feed on plant roots, move inside or around plant tissue, and destroy cortex cells, that can result in root necrotic lesions [8]. Hoplolaimus columbus, also known as the 'Columbia lance nematode' , is considered among the most economically important species in the world [1]. This nematode was described as a new species from samples collected in Columbia, South Carolina, USA. Later, the same species was reported in the states of North Carolina, Georgia, Alabama, and Louisiana [8][9][10]. In the field, H. columbus is parasitic on cotton and soybean, on which pathogenicity has been demonstrated; production losses for cotton are typically 10-25%, and losses for soybean can be as high as 70% in the southeastern USA [11][12][13][14]. Although H. columbus has been found in some Asian countries [2], there are no reports yet of crop damage in the region. Hoplolaimus columbus belongs to the subgenus Basirolaimus together with 17 other nematode species [2]. Species in the subgenus Basirolaimus have been reported in Asian countries, including China, India and Japan [2]. Nonetheless, the only species in the subgenus Basirolaimus so far recognized as a major agricultural pathogen is H. columbus (i.e. in the USA) [2,15,16]. Considering its wide distribution and damage to crops, a better genomic understanding of H. columbus would prove helpful to understand its population genetic structure and effects, or the lack thereof, on commercially relevant crops.
Morphological characteristics alone have limited function to distinguish among closely related species in the genus Hoplolaimus given the remarkable similarity of internal and external organs and body parts among closely related species [5,7]. Molecular markers have been shown to be useful for species identification and for understanding phylogenetic relationships and population genetics in different species of lance nematodes [5,[15][16][17]. Although previous work has provided valuable insights for nematode phylogeny [18][19][20][21], it has been noticed that short nuclear and/or mitochondrial gene markers are sometimes uninformative for revealing fine to moderate population genetic structure within a species [22]. This shortcoming can be addressed by developing genomic resources in this relevant group of lance nematodes. Although the genomes of a number of plantparasitic nematodes have been sequenced and analyzed before [23][24][25][26][27][28], no genomic resources exist for lance nematodes, yet.
In this study, we de novo sequenced and assembled the complete mitochondrial genome of the Columbia lance nematode H. columbus. Other than annotating and providing a detailed description of the mitochondrial chromosome in this crop pathogen, we used protein-coding genes to explore phylogenetic relationships among plantparasitic nematodes belonging to the class Chromadorea, superfamily Tylenchoidea.

Collection of specimens, DNA extraction and whole-genome amplification
Soil samples containing specimens of H. columbus were collected from the Edisto Research Center in Blackville, South Carolina (33°21'56.2"N, 81°19'46.9"W) and transported to Clemson University for further study. In the laboratory, nematodes were first extracted from soil samples using the sugar centrifugal flotation method  [29]. A few fixed specimens were then identified using diagnostic key characters under an optical microscope [30]. Next, live nematodes (n = 9) were submerged into distilled water, starved for two weeks, and placed in a 3% hydrogen peroxide solution (Aaron Industry, Clinton, SC, USA) for 5 min before washing them in distilled water three times to eliminate potential microorganisms inhabiting their surface. Then, the same nematodes were placed separately in DNA Away solution (Molecular Bio-Products Inc., San Diego, CA, USA) to eliminate potential DNA and DNase contamination and washed three times using PCR-grade water. Total DNA from each H.
columbus specimen was extracted using a Sigma-Aldrich extract-N-Amp kit (XNAT2) (Sigma-Aldrich, St. Louis, MO, USA). The whole genome size of H. columbus was estimated to be ~300 million bp using flow cytometry [31,32]. Whole-genome amplification (WGA) of each individual nematode was then performed using an Illustra Ready-To-Go GenomiPhi V3 DNA amplification kit (GE Healthcare, Chicago, IL, USA) following the manufacturer's instructions. Three WGA replicates per nematode were performed, and the one with the highest DNA concentration tested using a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) was selected for the next generation sequencing library preparation.

Library preparation and whole genome shotgun sequencing
The Nextera XT kit (Illumina, San Diego, CA, USA) was used for library preparation using the manufacturer's instructions. Library concentration and fragment size distribution after library preparation were determined using a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) and a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA), respectively. Sequencing was conducted in an Illumina MiSeq with the v3 chemistry kit. A total of ~56 million reads (paired-end 300 bp) were generated and 98.11% of these reads were of high-quality. Approximately 13 Gb of sequence data had a quality score (Q-score) > 30.

Mitophylogenomics in the superfamily Tylenchoidea
The phylogenetic analysis included full mitochondrial genomes belonging to a total of 14 nematode species in the class Chromadorea, of which 12 were plantparasitic nematodes in the superfamily Tylenchoidea (Table 1). Caenorhabditis elegans (non-parasitic) and Ascaris suum (animal-parasitic) were used as outgroup terminals in our phylogenetic analysis. Each of a total of 12 PCGs (see results) was first aligned using MAFFT version 7 [46] and output files converted into Phylip format using the web server Phylogeny. fr [47,48]. Then, poorly aligned positions in each of the 12 PCG sequence alignments were trimmed using BMGE (block mapping and gathering with entropy) [49]. SequenceMatrix [50] was used to concatenate all 12 PCG alignments in the following order: atp6-cox1-cox2-cox3-cytb-nad1-nad2-nad3-nad4-nad4L-nad5-nad6. The GTR + G nucleotide substitution model (Additional file 1: Table S1) selected using SMS (smart model selection) (http://www.atgc-montp ellie r.fr/sms/) [51] was used for maximum likelihood (ML) phylogenetic analysis conducted on the web server IQ-Tree (http://www.iqtre e.org/) [52] with the default settings but enforcing the GTR + G model of nucleotide substitution. A total of 100 bootstrap replicates were employed to explore support for each node in the resulting phylogenetic tree that was depicted using the web server iTOL (Interactive Tree of Life) (https ://itol. embl.de/) [53].
According to the prediction by MiTFi, the mitogenome of H. columbus comprises 19 tRNAs genes, ranging in length from 50 bp (trnQ) to 73 bp (trnC), including 2 trnW genes with different anticodons (UCA and CCA). Most of the tRNA genes encoded in the same direction as the PCGs and the two rRNA genes (rrnS and rrnL), except for the trnR gene which encoded in the opposite direction. There were 4 tRNA genes missing: trnA, trnM, trnN and trnT. Structure predictions of the different tRNAs are shown in Fig. 2. Most often, nematode tRNAs do not exhibit a regular canonical cloverleaf structure, either lacking the T-arm or missing both arms [54,55]. In H. columbus, variable loops were found on the acceptor stem (trnC and trnE), on the T-stem (trnC, trnR, trnS1 and trnV), and on the anticodon arm (trnE, trnF, trnG, trnR, trnV and trnY). The T-arm was missing in trnE, trnG, trnH, trnL1, trnL2, trnP, trnV and trnY. The D-arm was missing in trnS1 and trnS2. The predicted structure of the trnW(tga) gene had a T-stem but no a T-loop.
The rrnS and rrnL genes identified in the mitochondrial genome of H. columbus were 598 bp and 901 bp nucleotide long, respectively (Fig. 1). The rrnS gene was located between trnK and trnS2. The rrnL gene was located next to nad3, between rrnL and nad3, in agreement to that reported for the mitogenomes of Pratylenchus vulnus, Meloidogyne chitwoodi and M. incognita. The overall nucleotide composition of the rrnS gene was A = 30.10%, T = 39.96%, C = 11.37%, and G = 21.57%, and that of the rrnL gene was A = 32.41%, T = 45.84%, C = 7.33%, and G = 14.43%.
Two long non-coding regions were identified, which might be useful in the future for nematode population genetics. One long non-coding region was located between the nad4L and trnR genes (NCR1, 7661 bp), and the second one was located between the trnR and trnK genes (NCR2, 3157 bp) (Fig. 1, Table 2). Long non-coding regions > 4000 bp have been reported in other plantparasitic nematodes such as Pratylenchus vulnus (6847 bp), Meloidogyne chitwoodi (5404 bp), and Meloidogyne incognita (4097 bp), but H. columbus has the longest non-coding regions reported so far. The two regions were heavily A + T rich with an overall base composition of A = 29.79%, T = 39.86%, C = 10.90%, and G = 19.44% in NCR1, and A = 28.10%, T = 50.71%, C = 5.35%, and G = 15.84% in NCR2. Microsatellite repeats were detected in the two NCRs (Additional file 1: Table S3). Tandem repeat finder detected 13 repeats in NCR1 (the longest consensus size of a repeat was 237 bp, and the shortest one was 34 bp long) and 5 repeats in NCR2 (the longest consensus size of a repeat was 23 bp, and the shortest one was 18 bp) (Additional file 1: Table S4). No tandem repeat was found in other shorter intergenic spaces. Secondary structure prediction analysis using RNAFold detected a large number of hairpin structures in the two long NCRs (Additional file 2: Figure S1). Furthermore, a large number of microsatellite sequences were detected in the two non-coding regions (n = 104 and 72 in NCR1 and NCR2, respectively). Altogether, the observed high A + T rich nucleotide content, tandemly repeated sequences, and predicted hairpin secondary structures suggest that these two NCRs are possibly involved in the initiation of replication in the mitochondrial genome of H. columbus; all these features have been observed in the putative mitochondrial genome control region/D-loop of other invertebrates [56][57][58][59][60].
The ML phylogenetic analysis (Fig. 3) confirmed the monophyly of the superfamily Tylenchoidea and placed H. columbus in a monophyletic clade together with Radopholus similis, Rotylenchulus reniformis, Heterodera glycines and Globodera ellingtonae, in agreement with previous molecular phylogenies [24][25][26][27][28]. Our results also supported the position of H. columbus as belonging to the family Hoplolaimidae. Moreover, the analysis revealed Pratylenchus vulnus to be sister to the genus Meloidogyne, and all species belonging to the genus Meloidogyne clustered together into a wellsupported monophyletic clade. De Ley & Blaxter [6] recently suggested to classify Meloidogininae as a fully separate family based on the SSU rDNA phylogenies, and their view is supported by our mitophylogenomic analysis.
The synteny of protein-coding genes, ribosomal RNA genes, and non-coding regions observed in H. columbus was compared with that of other species in the same superfamily Tylenchoidea with completely annotated mitogenomes available in GenBank (Fig. 3). The mitogenome synteny of Rotylenchulus reniformis was not available in GenBank and was predicted in this study using the web server MITOS. A unique gene order was found in H. columbus, and this order is somewhat similar to that reported for other species in the same superfamily (Fig. 3). A visual comparison between phylogenetic relatedness and gene synteny also suggests that synteny might represent a useful phylogenetic character in this clade; a correlation between phylogenetic relatedness and gene synteny was observed in the studied plant-parasitic nematodes (Fig. 3) although variability is relatively high considering that the comparison was made among different genera belonging to the superfamily Tylenchoidea.

Conclusions
This study de novo assembled, for the first time, the mitochondrial genome of H. columbus, a result that also represented the first mitochondrial genome description for the genus Hoplolaimus. The mitogenome of H. columbus had a relatively large size compared to that of other plant-parasitic nematodes, exhibits long noncoding regions, and has a unique gene order within the superfamily Tylenchoidea. The mitophylogenomic analysis also agreed with a previous phylogenic hypothesis established using the SSU rDNA marker, and confirmed the taxonomic relationships among species in the superfamily Tylenchoidea. Ultimately, we envision that this new genomic resource in H. columbus will help to improve our knowledge about the biology and population genetics of this economically and ecologically relevant agricultural pathogen both in Asia and North America
Additional file 1: Table S1. Model selection for phylogenetic analysis by smart model selection (SMS). Table S2. Microsatellite repeats in intergenic spaces. Table S3. Microsatellite repeats in non-coding regions. Table S4. Tandem repeats in non-coding regions.
Additional file 2: Figure S1. Secondary structure prediction analysis of non-coding regions in the mitochondrial genome of Hoplolaimus columbus by FORNA.