Bioinformatic analysis of beta carbonic anhydrase sequences from protozoans and metazoans

Background Despite the high prevalence of parasitic infections, and their impact on global health and economy, the number of drugs available to treat them is extremely limited. As a result, the potential consequences of large-scale resistance to any existing drugs are a major concern. A number of recent investigations have focused on the effects of potential chemical inhibitors on bacterial and fungal carbonic anhydrases. Among the five classes of carbonic anhydrases (alpha, beta, gamma, delta and zeta), beta carbonic anhydrases have been reported in most species of bacteria, yeasts, algae, plants, and particular invertebrates (nematodes and insects). To date, there has been a lack of knowledge on the expression and molecular structure of beta carbonic anhydrases in metazoan (nematodes and arthropods) and protozoan species. Methods Here, the identification of novel beta carbonic anhydrases was based on the presence of the highly-conserved amino acid sequence patterns of the active site. A phylogenetic tree was constructed based on codon-aligned DNA sequences. Subcellular localization prediction for each identified invertebrate beta carbonic anhydrase was performed using the TargetP webserver. Results We verified a total of 75 beta carbonic anhydrase sequences in metazoan and protozoan species by proteome-wide searches and multiple sequence alignment. Of these, 52 were novel, and contained highly conserved amino acid residues, which are inferred to form the active site in beta carbonic anhydrases. Mitochondrial targeting peptide analysis revealed that 31 enzymes are predicted with mitochondrial localization; one was predicted to be a secretory enzyme, and the other 43 were predicted to have other undefined cellular localizations. Conclusions These investigations identified 75 beta carbonic anhydrases in metazoan and protozoan species, and among them there were 52 novel sequences that were not previously annotated as beta carbonic anhydrases. Our results will not only change the current information in proteomics and genomics databases, but will also suggest novel targets for drugs against parasites.


Background
Carbonic anhydrases (CAs) are ubiquitous metalloenzymes. They are encoded by five evolutionary divergent gene families and the corresponding enzymes are designated α, β, γ, δ and ζ-CAs. α-CAs are present in animals, some fungi, bacteria, algae, and cytoplasm of green plants. β-CAs are expressed mainly in fungi, bacteria, archaea, algae, and chloroplasts of monocotyledons and dicotyledons. γ-CAs are expressed in plants, archaea, and some bacteria. δand ζ-CAs are present in several classes of marine phytoplankton [1][2][3][4][5][6]. A total of 13 enzymatically active α-CAs have been reported in mammals: CA I, CA II, CA III, CA VII, and CA XIII are cytosolic enzymes; CA IV, CA IX, CA XII, CA XIV, and CA XV are membrane-bound; CA VA and CA VB are mitochondrial; CA VI is secreted and CA VIII, CA X, and CA XI are acatalytic CA-related proteins [3,7]. The active site of CA contains a zinc ion (Zn 2+ ) which has a critical role in the catalytic activity of the enzyme. ζ-and γ-CAs represent exceptions to this rule since they can use cadmium (ζ), iron (γ), or cobalt (γ) as cofactors [8][9][10]. CAs are involved in many biological processes, such as respiration involving transport of CO 2 and bicarbonate between metabolizing tissues, pH homeostasis, electrolyte transfer, bone resorption, calcification, and tumor progression. They also participate in some biosynthetic reactions, such as gluconeogenesis, lipogenesis, and ureagenesis [3,[11][12][13][14].
The first β-CA was serendipitously discovered by Neish in 1939 [15]. In 1990, the cDNA sequence of spinach (Spinacea oleracea) chloroplast CA was determined, and found to be non-homologous to animal α-CA [16,17]. Thereafter, cDNA sequences of β-CA from pea (Pisium sativum) and Arabidopsis thaliana were determined [17][18][19]. It is believed that the plant β-CAs are distributed in the chloroplastic stroma, thylakoid space, and cytoplasm of plant cells [17]. Many putative β-CAs have been discovered since 1990, not only in photosynthetic organisms, but also in eubacteria, yeast, and archaea [17].
β-CA is an important accessory enzyme for many CO 2 or HCO 3 utilizing enzymes (e.g. RuBisCO in chloroplasts, cyanase in E. coli [42], urease in H. pylori [43], and carboxylases in Corynebacterium glutamicum [44]). In cyanobacteria, β-CA is an essential component of the CO 2 -concentrating carboxysome organelle [17,45]. β-CA activity is required for growth of E. coli bacteria in air [46]; it is also indispensable if the atmospheric partial pressure of CO 2 is high or during anaerobic growth in a closed vessel at low pH, where copious CO 2 is generated endogenously. β-CA is also needed for growth of C. glutamicum [44,47] and some yeasts, such as S. cerevisiae [40]. In higher plants, the Flaveria bidentis genome contains at least three β-CA genes, named CA1, CA2, and CA3 [48]. The functional roles of β-CAs in plants are not yet fully understood, even though a lot of new data has emerged in recent years. C 3 and C 4 plants have different mechanisms for carbon fixation and photosynthesis and, thus, β-CAs might possess different roles, depending on the location of the enzyme and the type of plant [49]. In plants, the highest CA activity has been found within the chloroplast stroma, but there is also some CA activity in the cytosol of mesophyll cells [50]. Carbon dioxide coming from the external environment must be rapidly hydrated by β-CA and converted into HCO 3 − for the phosphoenolpyruvate carboxylase enzyme [49]. Additionally, CAs play a role in photosynthesis by facilitating diffusion into and across the chloroplast, and by catalyzing HCO 3 dehydration to supply CO 2 for RuBisCO. Interestingly, both RuBisCO and β-CA expression levels increase together when P. sativum is transferred from an environment with high levels of CO 2 to one with low levels [47].
Crystal structures of β-CAs reveal that a zinc ion (Zn 2+ ) is ligated by two conserved cysteines and one conserved histidine [5]. Until now, the only X-ray crystallography structure defined for β-CAs in plants belongs to P. sativum [51]. E. coli was the first bacteria in which the β-CA crystal structure was determined [20]. β-CA can adopt a variety of oligomeric states with molecular masses ranging from 45 to 200 kDa [52].
The first metazoan β-CAs were reported in 2010 [41]. In one of the studies [4,41], two genes encoding β-CAs (y116a8c.28 and bca-1) were identified in Caenorhabditis elegans. Another study reported a novel β-CA gene identified from FlyBase, which was named DmBCA (short for Drosophila melanogaster β-CA) [4]. Additionally, orthologs were retrieved from sequence databases, and reconstructed when necessary. The results confirmed the presence of β-CA sequences in 55 metazoan species, such as Aedes aegypti, Culex quinquefasciatus, Anopheles gambiae, Drosophila virilis, Tribolium castaneum, Nasonia vitripennis, Apis mellifera, Acyrthosiphon pisum, Daphnia pulex, Caenorhabditis elegans, Pristionchus pacificus, Trichoplax adhaerens, Caligus clemensi, Lepeophtheirus salmonis, Nematostella vectensis, Strongylocentrotus purpuratus, and Saccoglossus kowalevskii. The DmBCA enzyme was produced as a recombinant protein in Sf9 insect cells, and its kinetic and inhibition profiles were determined. The enzyme showed high CO 2 hydratase activity, with a k cat of 9.5 × 10 5 s -1 and a k cat /K M of 1.1 × 10 8 M -1 s -1 . DmBCA was inhibited by the clinically-used sulfonamide, acetazolamide, with an inhibition constant of 49 nM. Subcellular localization studies have indicated that DmBCA is probably a mitochondrial enzyme, as is also suggested by sequence analysis.
In this study, using bioinformatics tools, we discovered and verified the presence of β-CA in various other metazoan species, and, for the first time, in protozoa. Previously, most β-CA proteins have been identified in protein databases as 'unknown' proteins or 'putative' CAs, without a specific reference to β-CAs. Based on the present findings, new avenues will be opened to biochemically characterize β-CAs and their inhibitors in arthropods, nematodes and protozoans.

Methods
Identification of putative β-CA enzymes in protozoan and metazoan species and multiple sequence alignment Identification of novel β-CAs was based on the presence of the highly-conserved amino acid sequence patterns of the active site, namely Cys-Xaa-Asp-Xaa-Arg and His-Xaa-Xaa-Cys also marked in Additional file 1: Figure S1. Alignment was visualized in Jalview [53]. In total, 75 invertebrate β-CA sequences were retrieved from Uniprot (http://www.uniprot.org/) for alignment analysis, and one bacterial sequence (Pelosinus fermentans) was included as an outgroup. All protein sequences were aligned using Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/) [54]. The sequences were manually curated to remove residues associated with an incorrect starting methionine. A total of 90 residues were removed from the N-terminal end of Uniprot IDs D4NWE5_ADIVA, G0QPN9_ICHMG, D6WK56_TRICA, I7LWM1_TETTS and I7M0M0_TETTS. The modified protein sequences were then re-aligned. This protein alignment then served as the template for codon alignment of corresponding nucleotide sequences using the Pal2Nal program (http://www.bork.embl.de/pal2nal/) [55].

Prediction of subcellular localization
Subcellular localization prediction of each identified invertebrate β-CA was performed using the TargetP webserver (http://www.cbs.dtu.dk/services/TargetP/). TargetP is built from two layers of neural networks, where the first layer contains one dedicated network for each type of pre-sequence [cTP (cytoplasmic targeting peptide), mTP (mitochondrial targeting peptide, or SP (secretory signal peptide)], and the second is an integrating network that outputs the actual prediction (cTP, mTP, SP, other). It is able to discriminate between cTPs, mTPs, and SPs with sensitivities and specificities higher than what has been obtained with other available subcellular localization predictors [59].

Multiple sequence alignment
The Uniprot search of potential β-CA sequences, and the subsequent multiple sequence alignment, identified 75 β-CAs in metazoan and protozoan species, of which 23 sequences were reported as β-CAs previously [4]. Thus, 52 metazoan and protozoan β-CA sequences were novel and reported here for the first time. All 75 β-CAs in metazoan and protozoan species are shown in Table 1. The multiple sequence alignment results of these 75 β-CAs, plus a bacterial β-CA sequence from Pelosinus fermentans, are shown as Additional file 1: Figure S1. Multiple sequence alignment of all animal β-CAs confirmed conservation of the known active site motifs CxDxR and HxxC in all identified enzymes. Several other key residues were also highly conserved. Notably, all β-CA sequences from Leishmania species (Leishmania donovani, Leishmania infantum, Leishmania major, and Leishmania mexicana) contained a 71 residue N-terminal extension not present in any other sequences.

Phylogenetic analysis
The results of the phylogenetic analysis of 75 β-CAs in metazoan and protozoan species are shown in Figure 1. A β-CA sequence from the Pelosinus fermentans bacterium was used as an outgroup [60]. The phylogenetic results represent the evolutionary root of β-CAs in metazoan and protozoan species, the similarity between them, and duplications that have occurred. The branching pattern and branch lengths reveal interesting evolutionary relationships of β-CAs in various invertebrate species. There is a close relationship between our bacterial outgroup and Trichomonas vaginalis β-CAs, both having originated well before the other species within the tree. β-CAs of nematodes and arthropods are located in the lower evolutionary branches. In the protozoan Tetrahymena thermophilia and Paramecium tetraurelia clades significant duplications of β-CA have occurred, with 8 and 5 distinct proteins respectively. Meanwhile, metazoan and nematode species tend to have just one or two β-CAs. Surprisingly, β-CAs of the nematode Trichinella spiralis and trematode Schistosoma mansoni appear more closely related to arthropod than to nematode enzymes. The triangle located near the bottom of Figure 1 represents the clade of β-CAs in different Drosophila species. The details of the phylogenetic tree of β-CAs in Drosophila species are shown in Figure 2.
The likely presence of inaccuracies in some of the database sequences, and inherent limitations of Bayesian inference, prompted use of additional phylogenetic methods. These analyses generally supported the major features of the final tree achieved via Bayesian inference.

Subcellular localization of β-CAs
The predictions for subcellular localization of the 75 β-CAs are shown in Table 2. The results reveal that 31 are predicted to have a mitochondrial localization, one (Anopheles darlingi, Uniprot ID: E3X5Q8) was predicted to be secreted, and the remaining 43 were predicted to have other cellular localizations. The predictions were based on the analysis of 175 N-terminal amino acids of each sequence. In the Name column, there are both IDs of the β-CAs in Uniprot database and scientific name of the metazoan and protozoan species.

Discussion
This study shows that the β-CA enzyme is present in a range of protozoans and metazoans. A total of 75 sequences were identified and a phylogenetic tree constructed. The multiple sequence alignment results revealed that all 75 sequences have the highly conserved residues (Cysteine, Aspartic acid, Arginine, and Histidine) consistent with a β-CA enzyme (Additional file 1: Figure S1). Most of the metazoan and protozoan β-CAs, and corresponding coding sequences, were designated as uncharacterized sequences or CAs with no class specification. These  can be now assigned to β-CAs in proteomics and genomics databases. β-CAs have been identified in the mitochondria of a variety of different organisms, such as plants [61], green algae [62], fungi [1,63], and Drosophila melanogaster [4]. Our results of subcellular localization prediction (Table 2) suggested that 31 of the β-CAs are targeted to mitochondria. In mitochondrial targeting peptides (mTPs), Arginine, Alanine and Serine are over-represented, while negatively charged amino acid residues (Aspartic acid and Glutamic acid) are rare. Furthermore, mTPs are believed to form an amphiphilic α-helix, which is important for the import of the nascent protein into the mitochondrion [59]. The successful construction of the TargetP predictor demonstrates that protein sorting signals can be recognized with reasonable reliability from amino acid sequence data alone, thus, to some extent, mimicking the cellular recognition processes [59]. The prediction of the mitochondrial localization for many of the proteins studied is also supported by the previous experimental data, showing that recombinant DmBCA protein is indeed located in mitochondria of insect cells [4]. As mitochondrial proteins the β-CAs may contribute to key metabolic functions. Among the mammalian α-CAs, CA VA and CA VB are the only enzymes that have been exclusively located to mitochondria. Functional studies, summarized in [64], have indicated them in several metabolic processes, such as gluconeogenesis, urea synthesis, and fatty acid synthesis. It has been shown previously that the gluconeogenic enzyme, pyruvate carboxylase, is expressed in protozoan (Toxoplasma gondii) mitochondria [65]. This enzyme utilizes bicarbonate to convert pyruvate to oxaloacetate. Mitochondrial CA V is also involved in lipid synthesis through pyruvate carboxylation reaction [66]. Importantly, lipid metabolism is of crucial importance for parasites. Lipids serve as cellular building blocks, signaling  molecules, energy stores, posttranslational modifiers, and pathogenesis factors [67]. Parasites rely on complex metabolic systems to satisfy their lipid needs. The present findings open a new avenue to investigate whether mitochondrial β-CAs are functionally involved in these processes.
The single β-CA of Anopheles darlingi is the first predicted secretory β-CA. Among the various α-CAs, the first secreted form (CA VI) was identified in human saliva in 1987 [68], and in 2011 another α-CA was identified in the salivary gland of Aedes aegypti [69]. Complementary research, such as morphological, biochemical, and spatial mapping of gene expression in Anopheles darlingi will clarify the exact expression pattern of β-CA in this mosquito [69,70].
The TargetP predictor defined 43 β-CAs with 'other' cellular localizations. Although it is possible that β-CAs are truly located in different subcellular compartments depending on the species, these results should be interpreted with caution. Both the common errors in full genomic DNA, cDNA, or protein sequences in databases, and the potential inaccuracy of TargetP predictor could contribute to the observed deviations of the results. The highest prediction accuracy, with appropriate selection of specificity and sensitivity, is 90% [59].
Among the species mentioned in Table 1, some have important medical relevance, such as Aedes aegypti, Anopheles darlingi, Anopheles gambiae, Ascaris suum (Ascaris lumbricoides), Culex quinquefasciatus, Entamoeba histolytica, Hirudo medicinalis, Leishmania species, Schistosoma mansoni, Trichinella spiralis, and Trichomonas vaginalis. In the past decade, inhibition profiles of β-CAs of bacteria [24,31,71] and fungi [72][73][74][75] have been investigated with various inhibitors. Our results suggest that various protozoans and metazoans express β-CAs and that these molecules represent protein targets appropriate for inhibitor development. These proteins are not restricted to nematodes, insects, or protozoa causing human diseases, but are also present in many species with relevance to agriculture or veterinary medicine. These species include: Acyrthosiphon pisum, Ancylostoma caninum, Ascaris suum, Caligus clemensi, Camponotus floridanus, Culex quinquefasciatus, Dendroctonus ponderosae, Entamoeba species, Ichthyophthirius multifiliis, Solenopsis invicta, Tribolium castaneum, Trichinella spiralis, and Trichoplax adhaerens. Therefore, our findings also suggest that it might be possible to develop specific β-CA inhibitors as pesticides for the protection of crops and other natural resources against pathogens and pests.

Conclusions
The present data identifies β-CA enzymes that are expressed in a number of protozoans and metazoans. Metazoan and protozoan β-CAs represent promising diagnostic and therapeutic targets for parasitic infections, because this CA family is absent from mammalian proteomes. Many of these enzymes are predicted to be present in mitochondria where they might contribute to cell metabolism by providing bicarbonate for biosynthetic reactions and regulating intra-mitochondrial pH.