The mitochondrial genomes of Culex tritaeniorhynchus and Culex pipiens pallens (Diptera: Culicidae) and comparison analysis with two other Culex species

Background Culex tritaeniorhynchus and Culex pipiens pallens are the major vectors of the Japanese encephalitis virus and Wuchereria bancrofti, the causative agent of filariasis. The knowledge of mitochondrial genomes has been widely useful for the studies on molecular evolution, phylogenetics and population genetics. Methods In this study, we sequenced and annotated the mitochondrial (mt) genomes of Cx. tritaeniorhynchus and Cx. p. pallens, and performed a comparative analysis including four known mt genomes of species of the subgenus Culex (Culex). The phylogenetic relationships of Cx. tritaeniorhynchus, Cx. p. pallens and four known Culex mt genome sequences were reconstructed by maximum likelihood based on concatenated protein-coding gene sequences. Results Culex tritaeniorhynchus and Cx. p. pallens mt genomes are 14,844 bp and 15,617 bp long, both consists of 13 PCGs, 22 tRNAs, 2 rRNAs and 1 CR (not sequenced for Cx. tritaeniorhynchus). The initiation and termination codons of PCGs are ATN and TAA, respectively, except for COI starting with TCG, and COI and COII terminated with T. tRNAs have the typical clover-leaf secondary structures except for trnS(AGN) that is lacking the DHU stem. 16S rRNA and 12S rRNA secondary structures were drawn for the first time for mosquito mt genomes. The control region of Cx. p. pallens mt genome is 747 bp long and with four tandem repeat structures. Phylogenetic analyses demonstrated that the mt genome of Cx. tritaeniorhynchus was significantly separated from the remaining five mt genomes of Culex spp. Culex p. pipiens, Cx. p. pallens and Cx. p. quinquefasciatus formed a monophyletic clade with Cx. p. quinquefasciatus linked in the middle of the clade, and Cx. p. pallens should have the same taxonomic level as Culex p. pipiens and Cx. p. quinquefasciatus. Conclusions The mt genomes of Cx. tritaeniorhynchus and Cx. p. pallens share the same gene composition and order with those of two other Culex species. Culex p. pallens of the Pipiens complex should have the same taxonomic level as Culex p. pipiens and Cx. p. quinquefasciatus investigated. We enriched the Culex mt genome data and provided a reference basis for further Culex mt genome sequencing and analyses. Electronic supplementary material The online version of this article (doi:10.1186/s13071-016-1694-z) contains supplementary material, which is available to authorized users.


Background
Mitochondrion, also known as "power plant", is the structure for energy production and the main site for aerobic respiration in eukaryote cells [1]. Along with the rapid spread of mosquito-borne diseases, the research on mitochondrial (mt) genomes is of increasing importance to both, basic research and mosquito control. In most insects the mt genome is a small circular molecule with a length of c. 15 Kb, containing 13 protein-coding genes (PCGs), 2 ribosomal RNA genes (rRNAs), 22 transfer RNA genes (tRNAs) and 1 (A + T)-rich control region (CR) [2][3][4]. Due to the stable structure, coding content conservation, maternal inheritance, rapid evolution rate, no recombination and a high copy number, the mt genome has been widely used in the pattern analysis of molecular evolution, and in the studies on phylogeography, phylogenetics and population genetics [5][6][7][8][9].
Culex, the largest genus in the Culicidae, is worldwide distributed [10], and its many species are important vectors of mosquito-borne diseases, including epidemic encephalitis and lymphatic filariasis [11]. So far, there have only been two species of this genus with mt genome sequences available in the GenBank database, Cx. p. pipiens and Cx. quinquefasciatus [12]. Culex p. pipiens is a recognized vector of encephalitis viruses in North America, Rift Valley fever virus in Egypt, and transmits lymphatic filariasis and canine dogworm in Eastern Asia, and Cx. p. quinquefasciatus is an important vector of filariasis in the tropics [13,14]. There are four mt genome sequences for C. p. pipiens from different populations (three of these completely identical) and two mt genome sequences for Cx. p. quinquefasciatus with nucleotide differences, reported in the GenBank.
Culex tritaeniorhynchus, an important vector of Japanese encephalitis virus, belongs to the Sitiens group of the subgenus Culex and has a wide distribution in China, Japan, Korea, south-east Asia, India and Pakistan [15]. In 2012, there were 67,900 individuals infected by Japanese encephalitis virus, which led to 20,400 deaths and 14,000-24,000 cases of neurological impairment in Asia and the Western Pacific [16]. Culex p. pallens, belonging to the Pipiens complex of the Pipiens group, is widely distributed in China, Korea and Japan, and has been considered to be the major vector of bancroftian filariasis [15]. The classification of the Pipiens species complex has long been debated [17].
In the present study, we sequenced the mt genomes of Cx. tritaeniorhynchus and Cx. p. pallens with PCR amplifications, analyzed their characteristics including the composition and biases of nucleotides, codon usage, tRNA and rRNA secondary structure, and predicted tandem repeats of the control region. In addition, we carried out a comparative analysis of the newly-sequenced mt genomes with two other reported earlier in the genus, reconstructed the phylogenetic relationships using 13 protein-coding genes of six mt genome sequences, and discussed the classification of the Pipiens group. This comprehensive study of the known mt genomes within the genus Culex establishes an information frame of Culex spp. mt genomes and an important basis for further work.

Sample origin and mtDNA extraction
The colonies of Cx. tritaeniorhynchus and Cx. p. pallens were originally collected from fields in Beijing and Shaanxi, respectively, and reared at the National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing. The adult samples were collected from the colonies, and then preserved in 100 % ethanol and stored at -80°C prior to DNA extraction. Mitochondrial genomic DNA was extracted from single adult mosquito based on the method established in the Institute of Entomology and Molecular Biology, Chongqing Normal University, China [18].

PCR amplification, sequencing, assembly and annotation
The complete mt genomes were amplified using 18 PCR primer pairs designed in Chongqing Normal University [19]. PCR amplification conditions included an initial denaturation at 94°C for 1 min, followed by 32-36 cycles of denaturation at 94°C for 40 s, annealing at 46-54°C for 45 s, extension at 68°C for 1 min and a final extension at 72°C for 10 min. PCR products were column-purified (Qiagen QIAquick PCR Purification Kit, Hilden, Germany), separated by 1.0 % agarose gel electrophoresis, and sequenced with the same primers. The amplified fragments with low concentration or unclear electropherograms were cloned into the vector pMD-19 T (TaKaRa) according to the manufacturer's recommendations before sequencing.
Mt genome sequences of Cx. tritaeniorhynchus and Cx. p. pallens were assembled into contigs using DNAMANx software. Annotations of the mt genomes were based on comparisons with mtDNA genes of Cx. p. pipiens and Cx. quinquefasciatus. Furthermore, the identification of the tRNA genes and drawing of the stem-loop secondary structures were conducted using the program tRNAscan-SE Search Server v.1.21 [20]. The identification of tRNA genes also referred to those in other two Culex spp. if the genes could not be completely recognized. rRNA genes were recognized by comparison with other mosquito sequences using comparative RNA Web (CRW) [21], and the secondary structures of rRNA were constructed with Mfold Web Server [22]. Nucleotide composition and codon usage were calculated using the programme MEGA 5.0 [23]. The six mt genome sequences were aligned using DNAMANx software, and sliding window analysis was performed using DnaSP V.5.10.01 [24], to reveal the nucleotide diversity at different nucleotide positions throughout the entire mt genomes. Each gene or unit skew analysis were assessed with the formulas: ATskew = (A %-T %)/(A % + T %) and GCskew = (G %-C %)/(G % + C %) [25]. In PCG genes, relative synonymous codon usage (RSCU) was calculated using the formula: the number of same codon coding a given amino acid divided by the total numbers of all synonymous codons coding the amino acid. The tandem repeat (TR) sequences in CR were examined using Tandem Repeats Finder program [26].

Phylogenetic analysis
The phylogenetic relationships of the mt genome sequences for Cx. tritaeniorhynchus, Cx. p. pallens and four other known sequences for Culex spp. were reconstructed with Anopheles gambiae as the outgroup (Additional file 1: Table S1). The concatenated nucleotide sequences of 13 PCGs were used for the phylogenetic construction using Maximum Likelihood (ML) with PHYML [27]. The bestfit model of nucleotide substitution, the GTR + I + G model, was determined for the ML tree inference with Modeltest 3.7 [28]. The bootstrap values for 1,000 replicates and the genetic distances of each clade were calculated with PHYML, and the values larger than 50 % and the genetic distances were marked on each node of the tree.

Results and discussion
Annotation and base composition of the mt genomes of Cx. tritaeniorhynchus and Cx. p. pallens The circular mt genomes of Cx. tritaeniorhynchus and Cx. p. pallens were 14,844 bp ( Fig. 1a, GenBank number KT851544; CR lacking) and 15,617 bp long (Fig. 1b, GenBank number KT851543, with CR), respectively. They both consisted of 13 PCGs, 22 tRNAs, 2 rRNAs and 1 CR (not sequenced for Cx. tritaeniorhynchus), which is a composition typical for the metazoan taxa [2] and conserved in the known mosquito mt genomes [29,30]. There were 16 overlapping nucleotide areas between genes/CR with a total length of 46 bp and 74 bp, respectively; these areas ranged between 1-7 bp except for that between 12S rRNA and CR region of Cx. p. pallens, which was 34 bp long ( Table 1). There were 12 intergenic spacers with a total length of 79 bp and 90 bp, respectively, in these two species, and the spacers ranged between 1-19 bp.

Protein-coding genes
The total length of PCGs in the two mt genomes was 11,225 bp and 11,234 bp ( Fig. 2b), with AT content  Table 2), respectively. The AT content at the 1st, 2nd and 3rd codon position were all above 67.3 %, with that in the 3rd codon position reaching 92.4-92.9 %, which is a characteristic widespread in insect mt genomes [31].
The PCGs of six Culex mt genomes showed an overall negative AT-skew and positive GC-skew, with negative AT-skew at the 1st, 2nd and 3rd codon position except for the 3rd position of Cx. p. pallens mt genome, and positive GC-skew at the 1st, but negative at the 2nd and 3rd codon position (Table 2). The PCGs in Cx. tritaeniorhynchus used the initiation codon ATN except for COI that had the initiation codon TCG [8,29,32], and six genes (ATP6, COIII, COII, Cytb, ND4 and ND4L), four genes (ATP8, ND1, ND3 and ND6) and the other genes (ND2 and ND5) used the initiation codon ATG, ATA and ATC, respectively (Table 3). Culex p. pallens mt genome had the same initiation codon except for COII that used ATA. These two mt genomes had the complete termination codon TAA except for COI and COII that had the incomplete termination codon T, which was supplied to complete termination codon TAA by polyadenylations during the posttranscriptional process [33]. A comparison of the six mt genomes revealed that termination codons were different in COIII, CytB, ND3 and ND4.
There were 58 and 60 different codons used in Cx. tritaeniorhynchus and Cx. p. pallens mt genomes, respectively (Table 4). RSCU analysis showed that UUA, CGU, GCU, UCU and GGA were most used codons in the two mt genomes, and GGC, UGG, ACG and UCG were least used in Cx. tritaeniorhynchus, whereas CUG, CUC, GGC and AGG were least used in Cx. p. pallens. Results of the RSCU analysis showed that the codons with A and T in the 3rd position were overused when compared to other synonymous codons. For example, the codon TTA for Leu presented a RSCU value of 5.08 and 5.14 in the two mt genomes, and the codon CTG also for Leu showed a RSCU value of 0 and 0.01, respectively. The Leu was most present amino acid; on the contrary, Cys was least used one in the amino acid sequences of the two mt genomes, which were consistent with other Culex mt genomes studied to date (Fig. 3).

Transfer RNA and ribosomal RNA genes
Out of 22 tRNAs in the two mt genomes, 18 were predicted using tRNAscan-SE Search Server v.1.21 [20], and the remaining four tRNAs (trnS, trnR, trnK and trnL) were identified by comparing with known Culex spp. mt genomes. The tRNAs of Cx. tritaeniorhynchus totaled 1,490 bp in length (Fig. 2b), had a 78.8 % AT content (Fig. 2c) and ranged from 65 bp (trnY) to 72 bp (trnV); those of Cx. p. pallens totaled 1,482 bp, had 78.9 % AT content and ranged from 64 bp (trnR) to 72 bp (trnV). The anticodons of the 22 tRNAs were identical with published reference mosquito mt genomes [29,30] ( Table 1). The tRNA secondary structure is a typical clover-leaf structure in many insects, including four stems [dihydorouridine (DHU), amino acids (AA), TΨC, anticodons (AC)] and loops [DHU, TΨC, AC and variable (V)] [34][35][36]. The 22 tRNAs of the Cx. tritaeniorhynchus and Cx. p. pallens mt genomes also had the typical cloverleaf secondary structures except for trnS (AGN) that was lacking the DHU stem (Additional file 2: Figure S1a) and contained 30 and 25 mismatched base pairs, respectively, which were distributed in AA stem (8 and 6 bp), DHU stem (11 and 10 bp), AC stem (8 and 6 bp) and TΨC (3 bp in both species). Those mismatched base pairs affect the thermodynamic stability compared    P  TAA  TAA  T  T  TAA  TAA  TAA  TAA  TAA  TAA  TAA  TAA  TAA   Cx. tritaeniorhynchus  T  ATG  ATA  TCG  ATG  ATG  ATG  ATA  ATC  ATA  ATG  ATG  ATC  ATA   P  TAA  TAA  T  T  TAA  TAA  TAA  TAA  TAA  TAA  TAA  TAA  TAA   Cx. p. pipiens Turkey  T  ATG  ATA  TCG  ATG  ATG  ATG  ATA  ATC  ATA  ATG  ATG  ATC  ATA   P  TAA  TAA  T  T  TA  T  TAA  TAA  T  TA  TAA  TAA  TAA   Cx. p. pipiens  T  ATG  ATA  TCG  ATG  ATG  ATG  ATA  ATC  ATA  ATG  ATG  ATC  ATA   P  TAA  TAA  T  T  TA  T  TAA  TAA  T  TA  TAA  TAA  TAA   Cx. quinquefasciatus USA  T  ATG  ATA  TCG  ATG  ATG  ATG  ATA  ATC  ATA  ATG  ATG  ATC  ATA   P  TAA  TAA  T  T  TA  T  TAA  TAA  T  TA  TAA  TAA  TAA   Cx. quinquefasciatus  T  ATG  ATA  TCG  ATG  ATG  ATG  ATA  ATC  ATA  ATG  ATG  ATC  ATA   P  TAA  TAA  T  T  TAA  TAA  TAA  TAA  T  TAA  TAA  TAA  TAA with Watson-Crick pairs [37,38], and are a common phenomenon but could be corrected in posttranscriptional RNA editing processes [39]. The 12S rRNAs in the Cx. tritaeniorhynchus and Cx. p. pallens mt genomes were both located between trnV and CR and the 16S rRNAs between trnL and trnV, as in the mt genomes in other metazoan species [29,30,32]. The 12S rRNAs were 757 bp and 804 bp long with AT contents 80.6 % and 81.2 %, respectively, and the 16S rRNAs were 1,338 bp and 1,334 bp long with AT contents 82.5 % and 83.2 %, respectively. The rRNA secondary structures were inferred for the first time for a mosquito mt genome (Additional file 2: Figure S1b and c). They all contained G-C, A-U and G-U base pairs with G-U not being considered as mismatched base pairs [37]. The

The control regions
The CR of Cx. p. pallens mt genome was 747 bp long, located between 12S rRNA and trnI (Table 1), and had a highest AT content 88.8 %. We identified four TR structures (Fig. 4): TR I (nucleotides 14,966-14,998 in the mt genome sequence), TR II (15,275), TR III (15,335) and TR IV (15,593). The TR I was 33 bp long and composed of an ATAA unit, TR II 18 bp and poly-T, TR III 39 bp and microsatellites (TA) n and TR IV 86 bp, and microsatellites (AT) n . The comparison with two other known Culex spp. mt genomes  revealed that these four TR structures were highly conserved, with exception of a thymine change in TR II and two thymine and one adenine changes in TR III in Cx. p. pallens (Fig. 4).

Nucleotide diversity throughout the whole mt genome
The aligned sequence length of the six Culex spp. mt genomes was 15,669 bp, and the sliding window analysis showed that nucleotide diversity (Pi) value at nucleotide positions ranged from 0 to 0.10433 (Fig. 5). There were four high nucleotide divergence regions identified with the Pi-values greater than 0.06 in each region, and these

Phylogenetic analysis
Due to the high conservation of the amino acid sequences in the six Culex spp. mt genomes with close phylogenetic relationships, the ML method was used to infer the phylogenetic relationships using nucleotide sequences of 13 PCGs with An. gambiae as the outgroup (Fig. 6). On the tree, the mt genome of Cx. tritaeniorhynchus appeared separate from the remaining five which formed a strongly supported clade (100 % bootstrap support and a genetic distance of 0.02705). This result is consistent with traditional classification, in which Cx.   [13]. Culex p. pallens and Cx. pipiens from Turkey were grouped with 94 % bootstrap support and a genetic distance of 0.00063. Interestingly, the two mt genomes of Cx. quinquefasciatus were linked between Cx. p. pallens and Cx. p. pipiens, and their separation was not supported (47 % bootstrap support), with a very small genetic distance of 0.00020. Culex p. pipiens was separated from the other four mt genomes in the Pipiens complex with 100 % of bootstrap support and a relatively large genetic distance of 0.01119. Miller et al. [40] constructed a phylogeny of the Pipiens species complex based on rDNA ITS-1 and ITS-2 (1,326 aligned sites). In the tree involving 14 Culex spp., Cx. p. pipiens, Cx. p. pallens and Cx. quinquefasciatus were clustered into a unique clade, inside which there was seldom further divergence. In the tree involving 26 mosquito species representing 13 geographical populations of the Pipiens complex, there were two large clades: the Cx. quinquefasciatus clade and Cx. pipiens clade, and the intermediate pipiens-quinquefasciatus hybrids formed a group with Cx. pipiens pallens, which was linked to the Cx. pipiens clade. The hybridization in the Pipiens complex has been widely reported, e.g. between Cx. p. pipiens and Cx. p. quinquefasciatus in Madagascar [41], Argentina [42] and California [43], and between Cx. p. pallens and Cx. quinquefasciatus in southern Japan, Korea and China [44,45]. More recently, Liu et al. [14] investigated genetic polymorphisms in the Pipiens Complex in Lhasa, China, using multiplex PCR and sequencing at the 2nd intron of ace-2. The results revealed that 36 mosquitoes (34.29 % of the total) were homozygous (13 Cx. p. pipiens, 20 Cx. p. pallens and 3 Cx. p. quinquefasciatus), whereas 69 (65.71 %) were heterozygous (41 between Cx. p. pipiens and Cx. p. pallens, 1 Cx. p. pipiens and Cx. p. quinquefasciatus, 14 Cx. p. pallens and Cx. p. quinquefasciatus, and 13 among Cx. p. pipiens, Cx. p. pallens and Cx. p. quinquefasciatus). These results demonstrated that the three "subspecies or species" can cross each other in sympatry.
For the Pipiens complex, Harbach [17] used the Pipiens Assemblage to avoid difficulties associated with the meaning of the word "complex". He concluded that Cx. pipiens and Cx. quinquefasciatus were separate species which evolved in Africa and hybridize in non-indigenous areas where they were unintentionally introduced by humans; and Cx. pallens has no taxonomic status under the provisions of the International Code of Zoological Nomenclature.
Whether the Pipiens Assemblage is a single polytypic species or a complex of sibling species has long been disputed [17]. Despite extensive morphological and physiological/behavioral variation, there has been no conclusive phylogenetic analysis to support their species status and the hybridization widely occurs where their populations overlap. Our study based on mt genomes strongly suggests that Cx. p. pipiens, Cx. p. pallens and Cx. p. quinquefasciatus were monophyletic, but their taxonomic status is still unsettled. Considering the bootstrap support and genetic distances in the present analysis, these all should have same taxonomic level, species, subspecies or form.

Conclusions
We sequenced and analyzed the mt genomes of Cx. tritaeniorhynchus and Cx. p. pallens. The gene composition and order of the two mt genomes are the Fig. 6 Phylogenetic tree of the six Culex spp. mt genomes based on nucleotide sequences of 13 protein-coding genes. Maximum Likelihood analysis was used to construct the tree with Anopheles gambiae as the outgroup. The genetic distances and bootstrap values from 1,000 replicates are marked at the nodes. Stars indicate the newly-generated mt genome sequences