Complete mitogenomes of Anopheles peditaeniatus and Anopheles nitidus and phylogenetic relationships within the genus Anopheles inferred from mitogenomes

Background Despite the medical importance of mosquitoes of the genus Anopheles in the transmission of malaria and other human diseases, its phylogenetic relationships are not settled, and the characteristics of mitochondrial genome (mitogenome) are not thoroughly understood. Methods The present study sequenced and analyzed the complete mitogenomes of An. peditaeniatus and An. nitidus, investigated genome characteristics, and inferred the phylogenetic relationships of 76 Anopheles spp. Results The complete mitogenomes of An. peditaeniatus and An. nitidus are 15,416 and 15,418 bp long, respectively, and both include 13 PCGs, 22 tRNAs, two tRNAs and one control region (CR). Mitogenomes of Anopheles spp. are similar to those of other insects in general characteristics; however, the trnR and trnA have been reversed to “trnR-trnA,” as has been reported in other mosquito genera. Genome variations mainly occur in CR length (493–886 bp) with six repeat unit types identified for the first time that demonstrate an evolutionary signal. The subgenera Lophopodomyia, Stethomyia, Kerteszia, Nyssorhynchus, Anopheles and Cellia are inferred to be monophyletic, and the phylogenetic analyses support a new phylogenetic relationship among the six subgenera investigated, in that subgenus Lophopodomyia is the sister to all other five subgenera, and the remaining five subgenera are divided into two clades, one of which is a sister-taxon subgenera Stethomyia + Kerteszia, and the other consists of subgenus Nyssorhynchus as the sister to a sister-group subgenera Anopheles + Cellia. Four series (Neomyzomyia, Pyretophorus, Neocellia and Myzomyia) of the subgenus Cellia, and two series (Arribalzagia and Myzorhynchus) of the subgenus Anopheles were found to be monophyletic, whereas three sections (Myzorhynchella, Argyritarsis and Albimanus) and their subdivisions of the subgenus Nyssorhynchus were polyphyletic or paraphyletic. Conclusions The study comprehensively uncovered the characteristics of mitogenome and the phylogenetics based on mitogenomes in the genus Anopheles, and provided information for further study on the mitogenomes, phylogenetics and taxonomic revision of the genus. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13071-021-04963-4.

complexes worldwide [1]. Anophelinae mosquitoes can transmit a variety of diseases, and are one of the most important groups of insects in medicine, as they are the unique vectors of human malarial parasites, which caused 229 million cases and 409,000 deaths worldwide in 2019 [2]. In addition to malaria parasites, mosquitoes in Anopheles also transmit filarial parasites [3]. Some studies have shown that Anopheles mosquitoes also harbor arboviruses, which multiply in the mosquito vectors before transmission to a vertebrate host, such as o'nyongnyong [4]. Due to their exceeding importance, mosquitoes of this genus are subject to more taxonomic studies than any other mosquito group.
The classification of Anopheles started more than 100 years ago [5], when it was treated as one of 18 genera in the Anophelinae, while Cellia, Nyssorhynchus, Stethomyia and Kerteszia were also treated as independent genera based on morphological characteristics. Subsequently, the five genera were successively included as subgenera of the genus Anopheles based on the number and location of specialized setae on the male genital gonocoxites and other characteristics [6][7][8]. Three additional subgenera, Lophopodomyia, Baimaia and Christya were established within the genus Anopheles [9][10][11]. Due to the diversity of species contained in the subgenera Anopheles, Cellia and Nyssorhynchus, taxonomists divided some species into informal categories such as sections, series and groups. The earliest phylogenetic studies for Anopheles were mainly based on morphological characters and single genes. Different data sets and phylogenetic inference methods often lead to inconsistent results between studies, and therefore phylogenetic relationships in Anopheles have not been well settled.
There have been a number of representative phylogenetic studies on the genus Anopheles. An analysis including 63 species in Anophelinae based on 163 morphological characters suggested the monophyly of the subgenera Cellia, Nyssorhynchus, Stethomyia, Kerteszia and Lophopodomyia [12]. In Nyssorhynchus, the three sections Albimanus, Argyritarsis and Myzorhynchella were suggested to be paraphyletic. In Cellia, only the series Cellia was considered to be monophyletic. In Anopheles, series Arribalzagia and Lophoscelomyia were considered to be monophyletic, while the series Cycloleppteron + Arribalzagia was nested within series Myzorhynchus [12]. Some further morphology-based studies also suggested the monophyly of the subgenera Nyssorhynchus, Cellia and Kerteszia, and displayed the sister relationship between subgenera Kerteszia and Nyssorhynchus [11,13,14]. An analysis based on COX1 + ITS2 dataset suggested the monophyly of subgenera Anopheles and Cellia, and the analysis using ITS2 dataset alone resulted in the same conclusion, which was not supported by the COX1 dataset alone [15]. Two studies based on the mitogenomes, including 50 and 33 species, respectively, both also supported the monophyly of the subgenera Anopheles, Nyssorhynchus, Cellia and Kerteszia [16,17]. Generally, the monophyly of the subgenera Anopheles, Nyssorhynchus, Cellia, Stethomyia, Kerteszia and Lophopodomyia has been supported by most recent studies; however, sections and series within the subgenera Anopheles, Nyssorhynchus and Cellia have not been well resorted. There is a need to elucidate the phylogeny of the genus Anopheles using more species, more data and updated phylogenetic analysis approaches.
The mitochondrion is an important organelle in eukaryotic cells, with a genome independent of the nucleus, the mitochondrial genome (mitogenome) [18]. The mitogenome typically has a small genome size, low levels of recombination and maternal inheritance, and therefore it has been widely used as a molecular marker for the identification of species, phylogenetic inference and population structure research [19,20]. Since the publication of the first insect mitogenome (Drosophila yakuba) in 1985 [21], the number of insect mitogenomes have increased rapidly. Phylogenetic studies based on insect mitogenomes have shown good results in Diptera [22], Orthoptera [23], Coleoptera [24] and Hymenoptera [25]. To date the complete mitogenomes of 125 species of Culicidae have been sequenced, of which 74 species are from the genus Anopheles. Dipteran mitogenomes are mostly 14-20 kb long, including 37 genes-13 proteincoding genes (PCGs), two ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes and a control region (CR)-and these genes are arranged in a compact circular genome [26]. The genome structure in all reported mosquito mitogenomes is similar to the typical mitogenomes of Diptera; however, the trnA and trnR of mosquitoes are rearranged to form "trnR-trnA" [16,17,21].
In the present study, we sequenced and annotated the complete mitogenomes of An. peditaeniatus and An. nitidus, and analyzed the mitogenome characteristics of 76 species in the genus Anopheles. Additionally, we constructed the phylogenetic relationships of these 76 species. This study provides new insights into the mitogenome characteristics and phylogenetic relationships in the genus Anopheles.

Sample collection and DNA extraction
Specimens of An. peditaeniatus and An. nitidus were collected from Yadong County (29° 11′ 46″ N, 95° 12′ 11″ E), Tibet, China, in July 2014, and Tiebei County, Jilin Province, China (42° 27′ 21″ N, 128° 06′ 18″ E) in July 2013. All samples were preserved in individual vials. After morphological identification using keys reported previously [27], samples were stored in 100% alcohol and housed at −20 °C until DNA extraction. Total DNA was extracted from an individual adult mosquito using the QIAGEN Genomic DNA Kit [28], and used for 350 bp library construction and Illumina high-throughput sequencing by Shenzhen Huitong Biotechnology Co. Ltd.

Mitogenome sequencing annotation and characteristics analysis
Genome sequencing using paired-end sequencing (PE 150) was carried out using the Illumina HiSeq X Ten platform by Huitong Biotechnology Co., Ltd. In total, 20.41 Gb (An. peditaeniatus) and 25.96 Gb (An. nitidus) clean data were obtained after filtering of raw data (20.54 Gb for An. peditaeniatus and 26.15 Gb for An. nitidus) using the NGS QC Toolkit [29], and the sequencing depth was 288.9X (An. peditaeniatus) and 5162X (An. nitidus). Subsequently, the mitogenome reads were extracted using the BLAST program with An. sinensis mitogenome sequence as reference, and assembled using de novo mitogenome assembly with SPAdes 3.9.0 [30].

Phylogenetic analysis
Multiple sequence alignments of the PCGs were performed on the TranslatorX server (http:// trans latorx. co. uk/) using the MAFFT amino acid alignment mode. Gblocks with the default setting in TranslatorX was used to remove the ambiguously aligned positions. Individual alignments were concatenated in SequenceMatrix [39]. PartitionFinder 2.0 was used to determine the best-fit substitution model for each gene according to the Akaike information criterion (AIC), and the default values for the initial partition settings were applied [40]. Phylogenetic analyses were performed using maximum likelihood (ML) inference in IQ-TREE 1.6.10 [41] and Bayesian inference (BI) analysis in MrBayes v.3.2.7a [42] using Culex pipiens pallens as outgroup (Table 1). Bootstrap values were calculated using 1000 replicates for ML. BI was performed as two independent runs, each with four chains, and these chains ran simultaneously for 10,000,000 generations, with sampling every 1000 steps, and a 25% burn-in rate. Phylogenetic trees were drawn using FigTree v.1.4.4 (http:// tree. bio. ed. ac. uk/ softw are/ figtr ee/).

Nucleotide composition and genome organization
The complete mitogenomes of An. peditaeniatus (GenBank: MT822295) and An. nitidus (GenBank: MW401801) are both circular genomes with full lengths of 15,416 and 15,418 bp, respectively ( Fig. 1). Both are composed of 37 genes (including 13 PCGs, 22 tRNA genes and two rRNA genes) and one control region (CR). There are 22 genes (nine PCGs and 13 tRNAs) located on the majority coding strand (J-strand), while the other 15 genes (four PCGs, nine tRNAs and two rRNAs) on the minority strand (N-strand). Compared with the typical Diptera mitogenome (e.g., Drosophila yakuba), both An. peditaeniatus and An. nitidus have a "trnR-trnA" rearrangement. The AT content of the mitogenomes of the two species is high, 78.32% and 78.26%, respectively, with obvious AT bias (Additional file 1: Table S1). The AT-skew of An. peditaeniatus (0.0322) is higher than the average AT-skew of mosquito mitogenomes (0.0283), whereas the AT-skew of An. nitidus mitogenome (0.0266) is lower than the mosquito average. GC-skew in An. peditaeniatus (−0.1587) and An. nitidus (−0.1536) was higher than the average GC-skew value in mosquitoes investigated (−0.16048).
The three-dimensional scatter plot of AT content, ATskew and GC-skew of mitogenomes in the genus Anopheles is shown in Fig. 2. AT-skew ranged from 0.005 in An. gilesi to 0.043 in An. christyi. All mitogenomes display negative GC-skews ranging from −0.207 in An. parvus to −0.136 in An. punctulatus. Most species of the subgenera Nyssorhynchus and Cellia have similar AT content and AT/GC-skew (closely distributed in the three-dimensional scatter plot), whereas species in the subgenera Lophopodomyia, Stethomyia, Kerteszia and Anopheles ae widely distributed in the plot for AT content, AT-skew and GC-skew.

Protein-coding genes
The total nucleotide lengths of the PCGs of An. peditaeniatus and An. nitidus was 11,223 and 11,168 bp, respectively. In An. peditaeniatus, ATN is used as the start codon for all genes except COX1 and ND5, which use TCG and GTG as start codons. In An. nitidus, all PCGs initiate with ATN as the start codon, except COX1, which uses TCG ( Table 2).
The RSCU values of mitogenomes in the genus Anopheles are presented in Additional file 2: Table S2. Anopheles species have different usage frequencies of synonymous codons; UUA is the most frequently used codon, followed by CGA, GGA, GCU. The amino acid Leu has the highest usage percentage for all 76 mitogenomes investigated with an average of 16.37%, followed by Phe (9.69%), Ile (9.31%) and Ser (8.48%), whereas Cys has the lowest percentage (0.99%). The usage percentages of amino acids do not differ significantly between different subgenera (Fig. 3).

Transfer RNAs, ribosomal RNAs and CR
The total length of tRNAs in An. peditaeniatus and An. nitidus was 1475 bp and 1476 bp, respectively, while the length of individual tRNAs varies from 64 to 72 bp. All tRNAs can fold into the typical clover-leaf structure of four stems and loops, except for trnS2 which has lost the dihydrouridine (DHU) arm (Additional file 3: Figure S1). The length of the rRNAs was 2125 bp, with an AT content of 81.36% in An. punctulatus and 2122 bp, with an AT content of 81.39% in An. nitidus. The control regions (CRs) of Anopheles mitogenomes are located between rrnS and trnI, with lengths of 575 and 580 bp and AT content of 94.43% and 93.62% in An. peditaeniatus and An. nitidus, respectively. Six repeat unit types are found in the CRs of Anopheles mitogenomes (Additional file 4: Figure S2). All species have 15-27 bp poly-T stretch, located immediately after 140-212 bp of conserved sequence. The poly-T stretch is adjacent to the conserved motif 5′-CCC CTA -3′ in 68 species, whereas this motif was replaced by 5′-ATT GTA -3′ in An. cracens and An. dirus, and 5′-TTC CCC -3′ in An. kompi, An. nimbus, An. gilesi and An. pseudotibiamaculatus. The repeat type is 12-55 bp long and composed of 2-6 repeats, located downstream of the poly-T stretch, and is found in 54 species. The third type ([TA(A)] n stretch) with 22-91 repeats, is found in 36 species. The fourth type is a 12-38 bp region composed of 2-5 repeats adjacent to trnI and found in 40 species. The remaining two repeat unit types are found in only a few species; one is a 15-36 bp region located after the second repeat type and found in five species, while the last type is a 108-171 bp region, the longest of the six types and found in only four species.

Characteristics of the mitogenome sequences of the genus Anopheles
Comparison of mitogenome sequences in the genus Anopheles shows that the length variation mainly exists in the CRs, similar to earlier reported mitogenomes in insects [43,44]. The gene number and the gene composition, codon usage and tRNA secondary structures are similar to other reported mitogenomes of Diptera [22,45]. However, the trnR and trnA have a reversed arrangement to form "trnR-trnA" in comparison to the ancestral insect, as those reported in other genera in Culicidae [21,45].
The present study identified six repeat unit types in CRs for the first time in Anopheles mitogenomes. Among the six types, the poly-T stretch has also been found in other insects, which may involve the identification of the replication origin of mitochondrial DNA (mtDNA) [46]. The conserved sequences in CRs have been reported to be taxon-specific and of evolutionary information, and have been used as important evidence in the inference of phylogenetics in the taxa of the genus Culex and Lutzia and taxon [47]. However, the evolutionary information carried in the genus Anopheles does not seem stable and reliable.

Phylogenetic relationships
This present study suggests that all six subgenera investigated are monophyletic, and the phylogenetic analysis shows that subgenus Lophopodomyia is the sister to all five other subgenera, and the remaining five subgenera are divided into two clades, one including a sister-taxon (Stethomyia + Kerteszia), and the other consisting of subgenus Nyssorhynchus as the sister to a sister-group subgenera Anopheles + Cellia. A phylogenetic study  on 163 morphological characters for 64 species in the subfamily Anophelinae using the approximate weighting (AW) method showed that the subgenera Lophopodomyia, Stethomyia, Kerteszia, Nyssorhynchus and Cellia were monophyletic, whereas the subgenus Anopheles was polyphyletic. Two subgenera, Lophopodomyia and Stethomyia, were nested within the subgenus Anopheles [12]. A later morphology-based phylogenetic analysis, which used 167 characters for 66 species in the Anophelinae analyzed with both the equal weighting (EW) and implied weighting (IW) methods, found the same results as described above [14]. All analyses from these three methods showed that the subgenera Nyssorhynchus and Kerteszia were sister-taxa, while the AW and EW methods suggested that the Nyssorhynchus + Kerteszia was sister-group to subgenus  14:452 Cellia + subgenera Lophopodomyia, Stethomyia and Anopheles, and the IW method found a clade comprising the sister-taxon (Nyssorhynchus + Kerteszia) and subgenus Cellia, and the this clade was sister-group to three subgenera Lophopodomyia, Stethomyia and Anopheles. In contrast, a molecular-based phylogenetic analysis, using COI, COII and 5.8S rRNA for 47 species of Anopheles and using the ML method, supported the monophyly  of the subgenera Stethomyia, Kerteszia, Nyssorhynchus, Anopheles and Cellia, and this study suggested the subgenus Anopheles was sister-group to all other subgenera, and placed the subgenus Cellia as a sister-group to a clade which comprised subgenus Nyssorhynchus and a sister-taxon (Stethomyia + Kerteszia) [48]. A recent study of amino acid sequences of 1085 single-copy orthologous genes from 18 species in the subgenera Nyssorhynchus, Anopheles and Cellia analyzed with the ML method found that all three subgenera were monophyletic, and showed that the subgenus Nyssorhynchus was sister to a sister-taxon (Anopheles + Cellia) [49]. Our prior study using mitogenome PCG nucleotide sequences from 50 species in Culicidae with the ML and BI methods showed that the subgenera Nyssorhynchus, Anopheles and Cellia were monophyletic, with the sister relationship between subgenus Nyssorhynchus and a sister-taxon (Anopheles + Cellia) [16].
All six Anopheles subgenera included in the comprehensive phylogenetic analyses discussed above were suggested to be monophyletic except for the subgenus Anopheles, which was recognized as polyphyletic in both  Table 1 morphology-based inferences, while it was monophyletic in the three molecular-based inferences. Importantly, the study based on 18 whole nuclear genomes showed that the subgenus Anopheles was monophyletic [49]. The present study supported the monophyly of all six subgenera. Studies based on 18 whole nuclear genomes [50] and 50 whole mitogenomes [16] both suggested that the subgenus Nyssorhynchus was sister to the sister-group (Anopheles + Cellia), as does the present study. A recent study based on COI, COII and 5.8S rRNA found that the subgenera Stethomyia and Kerteszia were sisters [48], as in the present study. The subgenus Lophopodomyia was grouped with the subgenera Anopheles and Stethomyia in both morphology-based studies [12,14], whereas it has not previously been included in molecular-based studies [16,48,49]. The current study found that Lophopodomyia was sister to the other five subgenera. In general, the phylogenetic relationships inferred from morphology and those based on molecular data are quite different, and further studies are needed including more species and data to elucidate relationships among subgenera.  Table 1 Within the subgenus Cellia, the four series Neomyzomyia, Pyretophorus, Neocellia and Myzomyia that were investigated all appear to be monophyletic (pp = 1 and bv = 100% for their clades), and Neomyzomyia was a sister-group to all other three series, and Pyretophorus was a sister to the sister-taxon (Neocellia + Myzomyia). The current results are consistent with those from our earlier study, those also based on whole mitogenomes [16], and almost close to those based on 18S, 28S, COI and COII data in both taxon monophyly and relationships [50]. However, the early morphology-based study found all four series to be paraphyletic [12]. These suggest that results stemmed from molecular and morphology are often conflicting as discussed above.
Within the subgenus Anopheles, the two sections Angusticorn (from which only series Anopheles was included) and Laticorn (two series Myzorhynchus and Arribalzagia included) are both polyphyletic. The series Myzorhynchus and Arribalzagia are both monophyletic (pp = 1 and bv ≥ 96% for their clades), while if An. lindesayi were excluded, the series Anopheles would also be monophyletic (pp = 0.93 and bv = 85%), with the sister relationship between Anopheles and a sister-taxon (Myzorhynchus + Arribalzagia). Analysis of COI, COII and 5.8S rRNA suggested that the sections Laticorn and Angusticorn and the series Anopheles and Myzorhynchus were polyphyletic. In one morphology-based study, the sections Laticorn and Angusticorn and the series Myzorhynchus and Anopheles were paraphyletic [12]. The other morphology study found section Laticorn and the series Arribalzagia and Myzorhynchus to be monophyletic, while section Angusticorn and the series Anopheles were polyphyletic [14]. All of these studies suggested that section Angusticorn and series Anopheles were polyphyletic, and most studies found the section Laticorn to be polyphyletic, whereas series Arribalzagia was always monophyletic while series Myzorhynchus may be monophyletic.
Within the subgenus Nyssorhynchus, three sections, Myzorhynchella, Argyritarsis and Albimanus, were investigated, and the subdivisions in all three sections all appear to be polyphyletic or paraphyletic. A morphology study suggested that sections Albimanus, Argyritarsis and Myzorhynchella were paraphyletic [12]. Two molecular studies found the three sections to be not monophyletic, [51,52]. All four studies demonstrate that the taxonomy and phylogenetics of Nyssorhynchus are quite conflicted, with more study necessary to reconstruct their taxonomic system.

Conclusions
This study analyzed the complete mitogenomes of An. peditaeniatus and An. nitidus and investigated phylogenetic relationships among 76 species in the genus Anopheles. These mitogenomes have the same general characteristics found in earlier reports from insects; however, the trnR and trnA are reversed in comparison to other Diptera mitogenomes, as has been reported in other genera in the Culicidae. Genome variations mainly occur in the CR regions, which range in length from 493 to 886 bp and have six repeat regions, identified for the first time. The subgenera Lophopodomyia, Stethomyia, Kerteszia, Nyssorhynchus, Anopheles and Cellia were all found to be monophyletic and showed a new phylogenetic relationship among the six subgenera investigated. Four series Neomyzomyia, Pyretophorus, Neocellia and Myzomyia in the subgenus Cellia, were found to be monophyletic, as were the series Arribalzagia and Myzorhynchus in the subgenus Anopheles, while the series Anopheles and three sections in Nyssorhynchus, Myzorhynchella, Argyritarsis and Albimanus, and their subdivisions were polyphyletic or paraphyletic. Further studies of more mosquito species are needed to elucidate the phylogenetic relationships in the genus Anopheles.