Highly rearranged mitochondrial genome in Falcolipeurus lice (Phthiraptera: Philopteridae) from endangered eagles

Background Fragmented mitochondrial (mt) genomes and extensive mt gene rearrangements have been frequently reported from parasitic lice (Insecta: Phthiraptera). However, relatively little is known about the mt genomes from the family Philopteridae, the most species-rich family within the suborder Ischnocera. Methods Herein, we use next-generation sequencing to decode the mt genome of Falcolipeurus suturalis and compare it with the mt genome of F. quadripustulatus. Phylogenetic relationships within the family Philopteridae were inferred from the concatenated 13 protein-coding genes of the two Falcolipeurus lice and members of the family Philopteridae using Bayesian inference (BI) and maximum likelihood (ML) methods. Results The complete mt genome of F. suturalis is a circular, double-stranded DNA molecule 16,659bp in size that contains 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and three non-coding regions. The gene order of the F. suturalis mt genome is rearranged relative to that of F. quadripustulatus, and is radically different from both other louse species and the putative ancestral insect. Phylogenetic analyses revealed clear genetic distinctiveness between F. suturalis and F. quadripustulatus (Bayesian posterior probabilities=1.0 and bootstrapping frequencies=100), and that the genus Falcolipeurus is sister to the genus Ibidoecus (Bayesian posterior probabilities=1.0 and bootstrapping frequencies=100). Conclusions These datasets help to better understand gene rearrangements in lice and the phylogenetic position of Falcolipeurus and provide useful genetic markers for systematic studies of bird lice. Graphic abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13071-021-04776-5.

Lice are permanent, obligate, and often host-specific ectoparasites commonly found on birds and mammals. The suborder Ischnocera (approximately 3120 species) is currently divided into two families, Philopteridae (approximately 2600 species) and Trichodectidae [12]. Complete mt genomes of only a limited number of philopterid species have been sequenced: Bothriometopus macrocnemis [13], Campanulotes bidentatus compar [14], Campanulotes compar [11], Coloceras sp. [15], Falcolipeurus quadripustulatus [11], Ibidoecus bisignatus [15], Columbina picui, Columbina cruziana, and Columbicola columbae [16]. These studies have found extensive gene rearrangements in philopterid mt genomes. A recent report has demonstrated that highly fragmented, mt minicircles are present in four species of the genus Columbicola [16], indicating that fragmented mt genomes are more prevalent in parasitic lice than previously hypothesized [11]. These studies demonstrate that our knowledge of the structure in mt genomes of bird lice is far from comprehensive. Additional data is needed to understand the pattern and mechanisms of genome fragmentation and rearrangement in bird lice.
Owing to maternal inheritance, relatively high evolution rate, conserved gene components, and low rate of recombination, the mt genome has been widely used as a genetic marker for comparative, evolutionary, and phylogenetic analysis at different taxonomic levels [17,18]. In this study, we (i) characterize the complete mt genome sequences of F. suturalis from the tawny eagle, (ii) compare it with that of F. quadripustulatus from the vulture, and (iii) assess the phylogenetic position of Falcolipeurus within the Philopteridae.

Sample collection and DNA extraction
Adult samples of F. suturalis were taken from a tawny eagle Aquila rapax in the Beijing Wildlife Rescue Center, China. Lice were washed twice with sterile physiological saline solution (0.85%), and initial identification as F. suturalis made based on morphology and host species ( Fig. 1) [19], and then stored in 95% (v/v) ethanol at −40 °C. Total genomic DNA was extracted from 60 individual lice (30 females and 30 males) using the DNeasy Tissue Kit (Promega, Madison, USA) following manufacturer instructions. Two pairs of primers [20], mtd6-mtd11 and 12SA-12SB, were used to amplify fragments of cox1 (600 bp) and rrnS (350 bp) genes, respectively, for use as assembly baits [4].

Sequencing, assembling and verification
The quality of extracted genomic DNA was tested by agarose gel electrophoresis [21] and DNA concentration were quantified by Qubit 2.0 Fluorometer (Thermo Scientific). A genomic DNA library (350 bp inserts) was prepared and sequenced by Novogene Bioinformatics Technology Co., Ltd. (Tianjin, China) using Illumina HiSeq 2500 (250 bp paired-end reads). Raw reads were filtered with PRINSEQ [22]. Preliminary cox1 and rrnS sequences of F. suturalis were used as initial references for assembly in Geneious v 11.1.5 (minimum overlap identity = 99%, minimum overlap = 150 bp, maximal gap size = 5 bp) [23]. Genome size and circular organization were verified by long PCR using four pairs of specific  Table S1; Additional file 2: Figure S1).

Annotation and sequence analysis
Protein-coding and rRNA genes were identified by alignment to homologous genes of previously sequenced mt genome of the vulture louse F. quadripustulatus [11] using the MAFFT v7.122 software [24]. tRNA genes were identified using ARWEN [25] and tRNAscan-SE [26], with manual adjustment. Annotated mt genomes were illustrated using the visualize module of MitoZ [27]. Nucleotide composition and amino acid sequences of each protein-coding genes and codon usage were analyzed using MEGA v6.0 [28]. Asymmetry of base composition was calculated as the following formula: ATskew = (A−T)/(A+T), GC-skew = (G−C)/(G+C) [29].

Phylogenetic analysis
A total of 23 mt genomes from lice, including ten species from the Philopteridae, three species from the Trichodectidae, one elephant louse, Haematomyzus elephantis, and eight species of sucking lice, were used for phylogenetic analysis [3, 5, 7, 9-16, 20, 30-32], with one wallaby louse, Heterodoxus macropus (GenBank: AF270939), used as an outgroup [33]. Amino acid sequences of 12 protein-coding genes (except for nad2 because it is unidentified in the H. elephantis mt genome) were aligned individually using MAFFT [23]. Alignments of the individual genes were concatenated into a single dataset. Ambiguously aligned areas were removed by Gblocks 0.91b with the option of less stringent selection [34], and subjected to phylogenetic analyses under Bayesian inference (BI) and maximum likelihood (ML). BI was conducted with four independent Markov chains run for 5,000,000 metropolis-coupled MCMC generations, sampling trees every 500 generations in MrBayes v3.2.7 [35]. The initial 25% (2500) trees of each MCMC were discarded as burn-in and the majority-rule consensus tree used to calculate Bayesian posterior probabilities (Bpp). For ML analysis, the alignment was partitioned by gene, and bootstrapping frequencies (Bf ) performed using the rapid bootstrapping option with 100 iterations. The MtART (all 12 genes) model was selected as the most suitable model of evolution by ProtTest v2.4 based on the Akaike information criterion (AIC) [36]. ML analyses were computed using RAxML v2.2.3 [37]. Phylograms were drawn using FigTree v.1.31.
F. suturalis can be identified by the following morphological characters: (1) body slender in deep dark brown colour; (2) head longer than wide with pointed anterior and slightly broad posterior; (3) dorsally, 3 dark brown spots on each side of preantennal area and 1 dark brown horizontal stripe near antennal area; (4) antennae sexually dimorphic, male antennae stout and long on segment 1, female antenna slender and short on all 5 segments; (5) thorax widened from top to bottom; (6) dorsal thorax with 1 pronotum and 2 pteronotums, pronotum larger than pteronotum, 1 stout short seta on posterior margin of each pteronotum; (7) ventral thorax, mesosternum with 1 pair dark brown spots near midlegs; (8) 3 pairs of legs slender, forelegs small, midlegs and hindlegs progressively larger; (9) abdomen longer than wide; (10) dorsal abdomen deep dark brown colour, 2nd segment slightly narrower than thoracic pteronotum and with 2 divided tergite plates; (11) tergites on dorsal abdominal segments 3 to 6 slightly narrow in middle on both male and female; (12) male with 2 divided tergite plates laterally and female with 1 tergite on each dorsal abdominal segment of 7 and 8; (13) ventral abdomen with 9 segments and in white colour, 1 pair of dark brown spots laterally on each segment of 2 to 9; (14) female with 1 small dark brown spot in inverted triangle shape in middle of ventral abdominal segments 8 and 9.

Genome organization
A total of 3.7 Gb of data (about 20-fold coverage) was obtained from the Illumina HiSeq 2500 platform, raw sequencing of 15,178,382 paired reads. Reads were cleaned (9,630,532 clean pairs) and assembled. The longest assembled contig was 16,659 bp in size and was the complete mt genome of F. suturalis (GenBank accession: MW696813). All 37 mt genes typical of metazoan mt genomes were present. Gene arrangement was distinct from that of F. quadripustulatus [11] and other philopterids [15]. Overall nucleotide composition was: A = 27.8%, T = 44.8%, C = 11.1%, G = 16.3%. All mt genes were encoded on the heavy strand, except tRNA-Arg. Three overlapping regions in the mt genome were observed: nad4L/nad1, tRNA-His/tRNA-Asp and tRNA-Asp/tRNA-Arg, ranging from 4 to 8 bp overlaps (Table 1). In additional, 22 intergenic regions were observed, ranging from 1 to 180 bp in size. The longest spacer was between tRNA-S 2 and tRNA-G (Table 1).
Total A + T and G + C content of the complete mt genome was 73.0% and 27.0%, respectively, consistent with the nucleotide content of lice reported in previous studies [11,15] (Table 2). Negative AT skew (−23.3) and positive GC skew (18.9) were found ( Table 2), consistent with other louse mt genomes [11,15]. All bird lice from the Philopteridae reported to date have strong strand asymmetry (GC skew between 6.3% and 38.1%) ( Table 2).

Comparative analyses between F. suturalis and F. quadripustulatus
The entire mt genome of F. suturalis is 537 bp longer than that of F. quadripustulatus [11]. A comparison of nucleotide and amino acid sequences in each proteincoding gene of the two Falcolipeurus species is given in Table 3. Nucleotide sequence differences across the entire mt genome was 31.4%. The magnitude of nucleotide sequence variation in each gene between F. suturalis and F. quadripustulatus ranged from 13.2 to 27.5%. The greatest variation was observed in atp8 (27.5%), and the least difference was found in cytb (13.2%) ( Table 3). For rrnL and rrnS, sequence difference is 28.4% and 14.6%, respectively, between F. suturalis and F. quadripustulatus (Table 3). Amino acid sequences inferred from individual mt protein genes of F. suturalis were also compared with those  . 2 The mt genome of Falcolipeurus suturalis. All genes are on the same DNA strand and are transcribed clockwise, except for tRNA-Arg (R). Protein-coding and rRNA genes are indicated with the standard nomenclature. tRNA genes are indicated with the one-letter code of their corresponding amino acids. There are two tRNA genes for leucine: L 1 for codons CUN and L 2 for UUR; and two tRNA genes for serine: S 1 for codons AGN and S 2 for UCN. "NCR1" refers to the first non-coding region. "NCR2" refers to the second non-coding region. "NCR3" refers to the third non-coding region of F. quadripustulatus. Amino acid sequence differences ranged from 4.5 to 41.2%, with cox1 being the most conserved protein, and atp8 the least conserved (Table 3). Our results are consistent with other species-level comparisons in lice. For example, amino acid divergence in the 13 protein-coding genes of C. picui and C. cruziana was 5.5-50% [16], while between C. bidentatus compar and C. compar it was 0-37.3% [11,14]. Taken together, the molecular evidence presented here supports that F. suturalis and F. quadripustulatus represent distinct louse species.

Gene rearrangement
The mt genome arrangements of both Falcolipeurus species differ substantially from those of other species in the Philopteridae and from the putative ancestral insect (Fig. 4). Compared with the putative ancestral insect, no mt gene arrangements are shared with the mt genomes of Falcolipeurus species, as all genes are moved and/or inverted relative to their ancestral positions (Fig. 4). Of the 13 protein-coding genes, four (nad5, nad4, nad4L, and nad1) are inverted in both Falcolipeurus species relative to the putative ancestral insect. Only two gene   blocks are shared between Bothriometopus and the putative ancestral insect: tRNA-Gly-nad3 and atp8-atp6 [13], while one gene block is shared between Campanulotes species and the ancestral insect: atp8-atp6 [11,14,33]. Three gene blocks, tRNA-Val-cox3, tRNA-Tyr-cox2, and tRNA-Leu CUN -nad4L, are shared between Falcolipeurus and Ibidoecus [11]. Two gene blocks, nad4-tRNA-Leu UUR and tRNA-Gly-nad3, are shared between Falcolipeurus and Campanulotes. In the Philopteridae, only one gene block, tRNA-Ile-cox1, is shared across all Philopterids, excepting I. bisignatus, Columbina, and Columbicola. Such a lack of conserved gene arrangements in the mt genome of bird lice complicates the accurate reconstruction and identification of rearrangement events across their history [13]. Usually, gene arrangement in mt genomes is very conserved between congeneric lice [11,14]. Gene rearrangements between F. suturalis and F. quadripustulatus were identified consisting of at least one translocation (Fig. 5). The nad3 gene is located between cox2 and tRNA-Thr in F. quadripustulatus, but between tRNA-Gly and tRNA-Leu CUN in F. suturalis (Fig. 5). This gene rearrangement between the mt genomes of two Falcolipeurus species indicated that the rate of rearrangements of mt genes may vary substantially among closely related groups of lice [38]. It is interesting that two congeneric Falcolipeurus species differ by a rearrangement. Although this pattern has been found in Columbicola [16] and multiple Anoplura species [11], it is the first time that it has been found in a louse without fragmented genomes.
One tRNA gene (tRNA-Gly) was lacking, while the duplication of three genes (tRNA-Thr, tRNA-Tyr, and cox2) was found in the F. quadripustulatus mt genome [11]. However, all 37 genes have been identified in the F. suturalis mt genome. Gene duplications have also been reported in the mt genomes of several families in the class Insecta, such as the Reduviidae and Thripidae [39][40][41]. In addition, tRNA loss has been also found in the mt genome of several families of the class Insecta [11,42].

Phylogenetic relationships
Phylogenetic analysis shows the genetic distinctiveness between F. suturalis and F. quadripustulatus (Bpp = 1.0; Bf = 100). The branch leading to the two Falcolipeurus species is much longer than the branch of two Columbina species. The genus Falcolipeurus is more closely related to the genus Ibidoecus than to other genera (Bpp = 1.0, Fig. 6; Bf = 100, Fig. 7), which was consistent with that of a previous study [11]. Ten species of the Philopteridae were included in this study. The Philopteridae was paraphyletic with strong support in Bayesian analysis (Bpp > 0.9, Fig. 6) and weak support in ML analysis (Bf > 17, Fig. 7).
DNA sequencing provides the opportunity to further evaluate phylogenetic relationships in the Philopteridae. Phylogenetic relationships in the Philopteridae have been investigated with analyses of the nuclear gene sequences. For example, Cruickshank et al. analyzed elongation factor-1 alpha (EF-1α) sequences of 127 species from the four louse suborders, showing the Philopteridae to be paraphyletic [43]. Yoshizawa and Johnson analyzed mt 12S and 16S rDNA sequences of 18 species also showed the family to be paraphyletic [44]. However, Johnson et al. analyzed 1107 single-copy orthologous genes of 46 species and showed that the Philopteridae was monophyletic [45]. De Moya et al. analyzed 2370 orthologous genes and also showed that Philopteridae was monophyletic [46].
Mt genome sequences are effective molecular markers for the study of phylogenetic and systematic relationships at various taxonomic ranks across the phylum Arthropoda, including for ectoparasites [44][45][46][47][48]. Recently, a study has also indicted that mt genes can provide reliable reconstructions of evolutionary relationships in parasitic lice [6]. In the present study, the monophyly of the Philopteridae was rejected by mt genomic phylogenetic analyses. Song et al. inferred the high-level phylogeny of parasitic lice with the mt genome sequences of 25 species of parasitic lice, and showed that the Philopteridae was paraphyletic [11]. To date, the phylogenetic position of the Philopteridae within the Phthiraptera has not been confidently determined. Although mt genomic data has proven to be useful as genetic markers to explore the phylogenetic relationships among major lineages of parasitic lice [9,   Phylogenetic relationships among 10 species of the family Philopteridae inferred by Maximum likelihood from deduced amino acid sequences of 12 protein-coding genes. One wallaby louse, Heterodoxus macropus as an outgroup. Bootstrapping frequencies (Bf ) were indicated at nodes species of this family were not included across these studies [11]. Additional complete mt genomes of bird louse species representing multiple families should be included in future analysis to help resolve the phylogenetic position of the Philopteridae within the Phthiraptera.

Conclusions
The current study presents the entire mt genome of F. suturalis and compared it with the mt genome of F. quadripustulatus. Its gene order is rearranged relative to other Falcolipeurus species, and represents a new pattern within the Phthiraptera. These novel datasets will help to better understand the gene rearrangements and phylogenetic position of Falcolipeurus and provide useful genetic markers for systematics and phylogenetic studies of bird lice.