Comparative analysis of apicoplast genomes of Babesia infective to small ruminants in China

Background Babesiosis is an economically important disease caused by tick-borne apicomplexan protists of the genus Babesia. Most apicomplexan parasites, including Babesia, have a plastid-derived organelle termed an apicoplast, which is involved in critical metabolic pathways such as fatty acid, iron-sulphur, haem and isoprenoid biosynthesis. Apicoplast genomic data can provide significant information for understanding and exploring the biological features, taxonomic and evolutionary relationships of apicomplexan parasites, and identify targets for anti-parasitic drugs. However, there are limited data on the apicoplast genomes of Babesia species infective to small ruminants. Methods PCR primers were designed based on the previously reported apicoplast genome sequences of Babesia motasi Lintan and Babesia sp. Xinjiang using Illumina technology. The overlapped apicoplast genomic fragments of six ovine Babesia isolates were amplified and sequenced using the Sanger dideoxy chain-termination method. The full-length sequences of the apicoplast genomes were assembled and annotated using bioinformatics software. The gene contents and order of apicoplast genomes obtained in this study were defined and compared with those of other apicomplexan parasites. Phylogenetic trees were constructed on the concatenated amino acid sequences of 13 gene products using MEGA v.6.06. Results The results showed that the six ovine Babesia apicoplast genomes consisted of circular DNA. The genome sizes were 29,916–30,846 bp with 78.7–81.0% A + T content, 29–31 open reading frames (ORF) and 23–24 transport RNAs. The ORFs encoded four DNA-directed RNA polymerase subunits (rpoB, rpoCl, rpoC2a and rpoC2b), 13 ribosomal proteins, one elongation factor TU (tufA), two ATP-dependent Clp proteases (ClpC) and 7–11 hypothetical proteins. Babesia sp. has three more genes than Babesia motasi (rpl5, rps8 and rpoB). Phylogenetic analysis showed that Babesia sp. is located in a separate clade. Babesia motasi Lintan/Tianzhu and B. motasi Ningxian/Hebei were divided into two subclades. Conclusions To our knowledge, this study is the first to elucidate the whole apicoplast genomic structural features of six Babesia isolates infective to small ruminants in China using Sanger sequencing. The data provide useful information confirming the taxonomic relationships of these parasites and identifying targets for anti-apicomplexan parasite drugs. Electronic supplementary material The online version of this article (10.1186/s13071-019-3581-x) contains supplementary material, which is available to authorized users.


Background
Most apicomplexan protists cause important diseases of humans and other animals, including malaria (Plasmodium spp.), toxoplasmosis (Toxoplasma gondii), cryptosporidiosis (Cryptosporidium spp.), cyclosporiasis (Cyclospora cayetanensis), coccidiosis (Eimeria spp.), babesiosis (Babesia spp.), theileriosis (Theileria spp.) and neosporosis (Neospora caninum) [1]. These parasites, with the exception of Cryptosporidium spp. [2], possess a unique, essential, vestigial plastid known as the apicoplast. The discovery of the photosynthetic apicomplexan Chromera (chromalveolate hypothesis), the structural characteristics of the apicoplast/plastid genome and phylogenetic analysis of the glyceraldehyde-3-phosphate dehydrogenase and cytochrome c oxidase subunit 2 genes have demonstrated that the apicoplast has evolved through secondary endosymbiosis of a red alga [3][4][5][6]. The apicoplast is also involved in critical metabolic pathways such as fatty acid, iron-sulphur, haem and isoprenoid biosynthesis. Previous studies have shown that Plasmodium falciparum, Babesia bovis and Babesia bigemina cannot grow with the antibiotic fosmidomycin that causes loss of the apicoplast [7,8]. Interestingly, isopentenyl pyrophosphate supplementation completely reverses death following treatment with fosmidomycin [8]. The apicoplast and some of these metabolic pathways are vital for parasite survival, thus making the apicoplast an attractive target for anti-parasitic drugs.
In the present study, we employed a Sanger dideoxy chain-termination method to sequence the apicoplast genomes of BspXJ, BspDH, BmLT, BmTZ, BmHB and BmNX, and used bioinformatics software to conduct the assembly and annotation of full sequences of these isolates to verify the published sequences of BspXJ and BmLT sequenced using Illumina technology. Phylogenetic trees were constructed using the apicoplast genomic data to determine the taxonomic relationships among these geographical isolates of Babesia infective to sheep and goats in China. The study provides essential data for clarifying the classification of the Babesia species infective to small ruminants and lays the foundation for performing research on metabolic pathways and identifying diagnostic markers and drug targets for Babesia infections.

Parasites and isolation of genomic DNA
BspXJ, BspDH, BmLT, BmTZ, BmHB and BmNX were isolated from splenectomised sheep that were infected or infested with field-collected sheep blood or ticks from Xinjiang, Hebei and Gansu (Lintan, Tianzhu, Ningxian and Dunhuang counties) provinces during the period 2000-2010. The purified merozoites of BspXJ, BspDH, BmLT, BmTZ, BmHB and BmNX were provided by the vectors and vector-borne diseases laboratory in Lanzhou Veterinary Research Institute, China [38]. Genomic DNA was extracted from the merozoites using a QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany), according to the manufacturer's instructions. The DNA concentration and quality were measured using the 260/280 nm absorbance ratio on a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). DNA was stored at − 20 °C until PCR amplification.

Amplification and sequencing of apicoplast genomes
The PCR primers were designed on the basis of the reported apicoplast genomic sequences of BspXJ (KX881914) and BmLT (KX881915), sequenced using Illumina technology [36]. The primer details are shown in Additional file 1: Table S1. Overlapped fragments covering whole apicoplast genomes were amplified from the genomic DNA of BspXJ, BspDH, BmLT, BmTZ, BmHB and BmNX. The PCR products were directly sequenced using a BigDye Terminator v.3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA, USA) on an ABI 3730 DNA analyser (Applied Biosystems) or cloned into pBluescript II SK(+) using a Clone Express ® II One Step Cloning Kit (Vazyme, Nanjing, China) for subsequent sequencing.

Assembly and annotation of the apicoplast genomes
The CLC Genomics Workbench v.7.5.1 (Qiagen, Redwood City, CA, USA) was used to assemble the apicoplast genomes according to the user manual. Genome annotation was performed using the software Artemis [39] and BLAST (http://blast .ncbi.nlm.nih.gov/Blast .cgi). The use of these two methods guarantees the robustness of the annotation. The E-value of BLAST is 0.0 and we were blasting against a non-redundant protein sequence (nr) database. The tRNA genes were identified using tRNAscan-SE v.2.0 (http://lowel ab.ucsc.edu/tRNAs can-SE/) with the default search mode and other mitochondrial sequence sources [40]. Genetic maps were obtained using the online software CGView (http://stoth ard.afns. ualbe rta.ca/cgvie w_serve r/) [41]. Mauve (http://gel. ahabs .wisc.edu/mauve ) was used to generate the genome comparisons [42]. The nucleotide sequences and annotation information reported in this article were submitted to the GenBank database under the accession numbers MH992224-MH992229.

Phylogenetic analysis
The 18 apicoplast genomic sequences, six from the ovine Babesia isolates from this study and 12 from apicomplexan parasites obtained from GenBank [Babesia sp. Xinjiang (BspXJ), B. motasi Lintan (BmLT), B. bovis, B. orientalis, B. microti, Theileria parva, Plasmodium falciparum, Cyclospora cayetanensis, P. chabaudi chabaudi, Toxoplasma gondii, Eimeria tenella and Leucocytozoon caulleryi], and one chloroplast genomic sequence of Chromera velie (GenBank: HM222967), were used in the phylogenetic analysis (Table 1). On the basis of the annotation information, 13 encoding genes (rpl2, rpl4, rpl6, rpl14, rpl16, rps2, rps3, rps4, rps7, rps11, rps12, tufA and ropB) of each genomic sequence were used to deduce their amino acid sequences. The concatenated sequences of 4683 amino acid residues from each species were put into multi-alignment using Clustal W with further manual verification. Subsequently, MEGA v.6.06 (http://www. megas oftwa re.net/) software was applied to conduct phylogenetic analysis. A bootstrap phylogenetic tree demonstrating the relationship of BspXJ, BspDH, BmLT, BmTZ, BmNX and BmHB to other apicomplexan parasites was created by the maximum likelihood (ML) method or neighbour-joining (NJ) method, using a distance matrix corrected for nucleotide substitutions based on the JTT with Freqs model. In addition, phylogeny of the whole apicoplast nucleotide sequence was constructed by the ML method based on the Kimura 2-parameter model. A bootstrap analysis was used to assess the robustness of the clusters using 1000 replicates.

Sequence analysis of the apicoplast genomes of ovine Babesia
Sequencing and assembly revealed that the apicoplast genomes of the six ovine Babesia isolates were formed of circular DNA, ranging from 29,916 to 30,846 bp in length with a high A + T content of 78.7-81.0% (Table 1). Bioinformatics analysis indicated that the circular DNA contained a small subunit and a large subunit ribosomal RNA (SSU and LSU), 23-24 transfer ribonucleic acids (tRNAs) and five to six ribosomal protein large subunits (rpl), eight to nine ribosomal protein small subunits (rps), four to five subunits of DNA-directed RNA polymerase (rpo), two ATP-dependent Clp proteases (clpC1, clpC2), one elongation factor TU (tufA), and seven to eleven hypothetical protein genes (hyp) ( Table 2).
All coding genes were transcribed in the same orientation (Additional file 2: Figure S1; Additional file 3: Figure  S2; Additional file 4: Figure S3; Additional file 5: Figure  S4; Additional file 6: Figure S5; Additional file 7: Figure S6). In total, 7094, 7220, 7384, 7436, 7163 and 7212 amino acids were encoded in the apicoplast genomes of the six ovine Babesia species. All the protein-encoding genes had ATG as a translation start codon. Most of the apicoplast protein-encoding genes had TAA as a translation stop codon, followed by TGA and TAG ( Table 3). The alignment of apicoplast genomes of ovine Babesia isolates indicated that the identities of BspXJ/DH, BmLT/TZ and BmNX/HB were 99.8, 99.9 and 99.9%, respectively. BmLT/TZ and BmNX/HB had 70.9 and 71.6-71.7% identity, respectively, to BspXJ/DH, and that between BmLT/TZ and BmNX/HB was 86.9-87.0%. Based on the nucleotide sequence analysis of the whole apicoplast, there were multiple base differences among BspXJ/DH, BmLT/TZ and BmNX/HB (Additional file 8: Table S2).

Alignment of apicoplast genomes with those of other apicomplexan parasites
Four gene clusters were found in the BspXJ/DH, BmLT/ TZ and BmHB/NX apicoplast genomes. They were in synteny with the same gene clusters of other apicomplexan parasites (Additional file 9: Figure S7). Gene cluster 1 included those encoding ribosomal proteins and tufA genes (Fig. 1). Similar to the gene organization of B. bovis, B. orientalis, B. microti, T. parva, T. gondii, E. tenella and C. cayetanensis apicoplast genomes, the six ovine Babesia isolates lacked the rpl23 gene, but it was present in P. chabaudi chabaudi and P. falciparum between the rpl4 and rpl2 genes. Hypothetical protein genes were not found in Bsp/DH, B. microti, T. parva, T. gondii and P. falciparum between rps7 and tufA, whereas BmLT/ TZ, BmNX/HB, B. bovis and B. orientalis had two to seven hypothetical protein genes in the locus. Gene cluster 2 consisted of hypothetical protein genes and ClpC chaperone genes. Similar to B. bovis, B. orientalis, B. microti and T. parva, the ClpC genes of the six ovine Babesia isolates were duplicated, with both copies containing the AAA_2 ATPase domain, whereas T. gondii and P. falciparum contained one ClpC gene (Fig. 1).
Gene cluster 3 included RpoB, RpoC and rps2 genes, which were consistent with the gene orientation and contents of cluster 3 in B. bovis, B. orientalis, B. microti and T. parva. Cluster 3 of the piroplasma apicoplast genomes lacks the sufB gene involved in iron-sulphur cluster synthesis. Gene cluster 4 of the piroplasms had several tRNA genes and a single set of SSU and LSU genes, which are transcribed in the same orientation, whereas, this cluster consisted of two sets of SSU and LSU genes in opposite orientations in other parasites. Unlike BspXJ/DH, BmLT/TZ and BmNX/HB, one to two tRNA genes exist between the SSU and LSU genes in B. bovis, B. orientalis, B. microti and T. parva (Fig. 1).

Comparison of genomic data sequenced by the Sanger dideoxy chain-termination method and Illumina technology
The apicoplast genomic sequences of BspXJ (KX881914) and BmLT (KX881915) sequenced using Illumina  11 ORFs (hyp1-11) 11 ORFs (hyp1-11) 9 ORFs (hyp1-9) 9 ORFs (hyp1- 9) technology (designated BspXJ-Illumina and BmLT-Illumina) were compared with those obtained in this study using the Sanger dideoxy chain-termination method (designated BspXJ-Sanger, MH992224; BmLT-Sanger, MH992226). The results indicated that the size of the apicoplast genome, the number of tRNA and protein-encoding genes, and the initiation and termination codons used in the encoding genes are different in the two sets of data from Sanger and Illumina (Tables 1, 3). In addition, the A + T contents and the number of rRNA molecules were consistent but there were differences in the bases at several nucleotide positions along the full-length genome between the first-generation and second-generation data ( Table 4).

Phylogenetic analysis
The ML and NJ trees were constructed using the concatenated amino acid sequences of 13 gene products (4683 Table 3 Initiation codons and termination codons used in encoding genes of the six ovine Babesia apicoplast genomes a a1 and a2 are the same sample, sequenced using the Sanger and Illumina method, respectively b b1 and b2 are the same sample, sequenced using the Sanger and Illumina method, respectively

Species
Total no. of protein-encoding  genes   Initiation codon  Termination codon   ATG  ATA  ATT  ATC  TAA  TGA (Fig. 2). In the whole apicoplast nucleotide phylogenetic tree, the grouping result for the six ovine Babesia isolates is the same as that of the amino acid phylogenetic trees (Additional file 10: Figure S8).

Discussion
Babesia spp. are tick-transmitted haemoprotozoans that infect various animal species including humans, causing loss of livestock and public health concern in tropical and subtropical regions of southern Europe, Africa, Asia, Australia and the Americas [9,36]. The apicoplast is a unique organelle found in apicomplexan parasites; it is considered a target for screening anti-parasitic drugs because it plays important roles in metabolic pathways for fatty acid, iron-sulphur, haem and isoprenoid biosynthesis [43]. In the present study, the apicoplast genomes of six ovine Babesia isolates were sequenced, assembled, annotated and compared with those of other apicomplexan parasites. The apicoplast genomes are smaller than those of most apicoplexan parasites [24,28,29,33,37,44], but slightly larger than those of P. chabaudi chabaudi [35] and B. microti [25]. The content and order of genes in the cluster differ among the parasites. The rpl23, rpl11 and sufB genes are deficient in the apicoplast genomes of the six ovine Babesia spp., which is similar to B. bovis, B. orientalis, B. microti, T. parva, T. gondii, E. tenella and C. cayetanensis. It is possible that these genes were lost during the genetic evolution of the apicoplast in most apicomplexan parasites [25]. Therefore, some researchers speculate that the rpl23 and rpl11 genes do not play an important role in the growth and development of these parasites; alternatively, they may be translated at other sites in the apicoplast genome or directly encoded by the nuclear genome. The loss of the sufB gene may have been caused by the Fig. 2 Phylogenetic relationships of Babesia infective to small ruminants in China and other apicomplexan parasites. Phylogeny was inferred with a maximum likelihood analysis of amino acid sequence data for 13 selected apicoplast genome-encoded genes based on distances calculated with the JTT with Freqs model. Chromera velia (HM222967) was used as the outgroup. Bootstrap values > 50% from 1000 replicates are shown on the nodes. Babesia obtained in this study is shown as triangles gene inversion of cluster 3 during the evolution of Piroplasmida [24,25]. In this study, different numbers of hypothetical protein genes were found by aligning the apicomplexan apicoplast genomes, which suggests that hyp genes may be one of the causes of the variation in apicoplast genome size. Previous studies have suggested that rearrangements and loss of the genes involved in photosynthesis were thought to be responsible for the formation of apicoplast [24,25]. Furthermore, gene deletion (sufB, rpl23, rpl11 and hyp), inversion (RNA polymerase), duplication (ClpC and hyp) and restructuring (SSU and LSU) were also important events during the early evolution of Babesia species. These modifications may have caused the apicoplast genomes of BmNX and BmHB to be the smallest among the reported Babesia in cattle and small ruminants.
Comparison of the BspXJ and BmLT apicoplast genome data sequenced by first-generation and secondgeneration sequencing techniques showed that there were some differences in genome sizes and the numbers of tRNA and protein-encoding genes. For example, rpl5 and rpl6 genes were present between rpl14 and rps5 genes in the second-generation sequence data of BmLT, while only the rpl6 gene exists in this region in the firstgeneration sequence data of the four B. motasi isolates. It is possible that these differences involving nucleotide substitutions, insertions and deletions were due to the different sequencing methods and annotation software used. Additionally, settings of a few parameters comprising ORF length, nested ORFs, start and stop codon, and genetic code were used. Therefore, it is recommended to perform first-generation sequencing to verify sequences based on the second-generation sequencing sequence, to annotate genes using two or more annotation software programs and then further improve the data by using BLAST.
Studies on the biological characteristics (including morphology, pathogenicity, vector tick, serology and in vivo or in vitro propagation) and molecular classification (target genes including SSU, ITS, LSU, HSP90, cox1, cytb, cox3, RPS8 and TRAP) of the Babesia species infective to small ruminants in China have shown that these parasites are divided into two Babesia species: B. motasi and Babesia sp. There are two further subspecies of B. motasi, named BmLT/TZ and BmNX/HB (with differences in pathogenicity, serology, in vitro culture features and target gene sequences). These findings are consistent with the Uilenberg inference [12-15, 17, 18, 20, 38, 45-51]. In this study, phylogenetic analyses using concatenated amino acids or whole apicoplast nucleotide sequences confirmed the taxonomical relationships among Babesia species infective to small ruminants in China. In the ML and NJ trees, the ovine Babesia fell into two groups: Babesia sp. and B. motasi. Babesia motasi was further divided into two small clades, BmLT/TZ and BmHB/NX. The apicoplast genomes in apicomplexan Babesia parasites remain conserved throughout all the species, including the six ovine Babesia isolates, and Babesia apicoplast functions are significantly different from those of the host [25], suggesting that they might be useful as targets for the development of potent and safe therapies for the treatment of babesiosis. Our data provide useful information confirming the taxonomical relationships of these parasites and identifying targets for anti-apicomplexan parasite drugs.

Conclusions
In the present study, we have accomplished the sequencing, assembly and annotation of the apicoplast genomes of six ovine Babesia isolates from China. This study has also confirmed our previous inference that there are two Babesia species (Babesia sp. and B. motasi) infective for small ruminants in China, and that the four B. motasi isolates possibly belong to two subspecies (BmLT/ BmTZ and BmNX/BmHB). Further studies are needed to validate the effects of anti-Babesia drugs against DNA replication, transcription, translation and 2-C-methyld-erythritol 4-phosphate pathways of the apicoplast.