Genomic analysis of single nucleotide polymorphisms in malaria parasite drug targets

Malaria is a life-threatening parasitic disease caused by members of the genus Plasmodium. The development and spread of drug-resistant strains of Plasmodium parasites represent a major challenge to malaria control and elimination programmes. Evaluating genetic polymorphism in a drug target improves our understanding of drug resistance and facilitates drug design. Approximately 450 and 19 whole-genome assemblies of Plasmodium falciparum and Plasmodium vivax, respectively, are currently available, and numerous sequence variations have been found due to the presence of single nucleotide polymorphism (SNP). In the study reported here, we analysed global SNPs in the malaria parasite aminoacyl-tRNA synthetases (aaRSs). Our analysis revealed 3182 unique SNPs in the 20 cytoplasmic P. falciparum aaRSs. Structural mapping of SNPs onto the three-dimensional inhibitor-bound complexes of the three advanced drug targets within aaRSs revealed a remarkably low mutation frequency in the crucial aminoacylation domains, low overall occurrence of mutations across samples and high conservation in drug/substrate binding regions. In contrast to aaRSs, dihydropteroate synthase (DHPS), also a malaria drug target, showed high occurrences of drug resistance-causing mutations. Our results show that it is pivotal to screen potent malaria drug targets against global SNP profiles to assess genetic variances to ensure success in designing drugs against validated targets and tackle drug resistance early on.


Introduction
Malaria is caused by Plasmodium parasites and is a lifethreatening disease that remains an important public health concern in developing countries [1]. Eliminating malaria is complicated by the development and spread of resistance in the parasites Plasmodium falciparum and Plasmodium vivax to antimalarial drugs via the emergence of resistant strains [2]. The presence of parasites that are able to evade first-line antimalarial artemisininbased combination therapies (ACTs) has been confirmed in malaria-endemic countries of South East Asia and Africa, posing a threat of disease resurgence [3]. To prevent health emergencies due to this common yet treatable disease, validation of newer drug targets and the design of novel antimalarial scaffolds are urgently needed [4,5]. The aminoacyl-tRNA synthetase (aaRS) family of enzymes are universally distributed as they catalyse the linkage of cognate amino acid to transfer RNA (tRNA) that corresponds to the anti-codon triplet of the tRNA based on the genetic code [6]. During the aminoacylation reaction, ATP activates the amino acid by forming an aminoacyl-adenylate intermediate, following which this intermediate is ligated to the tRNA molecule and the final product is transported to the ribosome to carry out protein translation. Since aaRSs implement the genetic code for protein translation, their inhibition results in ribosome stalling during protein synthesis. The aaRS
Global field isolates of Plasmodium parasites from various locations worldwide are a valuable resource for understanding genetic variances in the parasite genome across populations. A single nucleotide polymorphism (SNP), which is a genomic variant at a single base position in the DNA, is the most common genetic variation that can alter the surface properties of a protein. The Malaria Genomic Epidemiology Network (MalariaGEN), a public data-sharing network with global study partners, integrates epidemiology with population genomics pertaining to malaria parasites [19]. MalariaGEN provides nucleotide-level SNP data for P. falciparum and P. vivax which have been obtained from clinical blood samples collected from malaria patients worldwide [19][20][21][22][23]. Within the framework of MalariaGEN, 9751 unique samples have been sequenced at multiple locations in over 50 countries in various projects dating from 2015 to 2020, resulting in the identification of approximately 8,051,696 SNPs in the genome of the P. falciparum, with the 3D7 strain as reference [20][21][22]. Additionally, 303,616 SNPs have been identified in the genome of P. vivax, with the Salvador 1 (Sal 1) strain as reference [23].
In this work, we have analysed global non-synonymous SNPs (referred to further as SNPs) in the 20 cytoplasmic Plasmodium aaRSs. Our analysis revealed 3182 unique SNPs in these 20 cytoplasmic P. falciparum aaRSs, with an average of 159 SNPs and 9-31% mutation frequency for individual aaRSs. Structural mapping of SNPs onto the inhibitor/substrate-bound 3D structures of three aaRSs, namely the P. falciparum lysyl (PfKRS)-, prolyl (PfPRS)-and P. vivax phenylalanyl (PvFRS)-tRNA synthetases, showed a very low frequency of mutations in their catalytic domains and high conservation in the substrate/drug binding regions. In contrast, a similar analysis of another key malaria drug target, dihydropteroate synthase (DHPS), showed the presence of all drug resistance-causing mutations despite low overall mutation frequency.
Inherent variability in genomic sequences of drug and vaccine targets can be a shortcoming in the design and development of effective therapies as the targets will be naturally resistant (compromised) [24,25]. Any amino acid change, particularly at the active site of a drug target, can induce alterations in the structural features which will alter its interaction with the drug and can lead to weak or no binding. Such polymorphisms can then lead to the rise of subgroups in a population which exhibit drug resistance. Based on the results of this study, the authors recommend investigating region-and population-specific polymorphisms in the ligand/substrate/ inhibitor binding regions of drug targets to ensure success in therapeutic design against malaria.

Plasmodium
6-hydroxy-methyl-7,8-dihydropterin pyrophosphokinase-dihydropteroate synthase (HPPK-DHPS) is a bifunctional fused enzyme that is essential for the biosynthesis of folate. Inhibition of this process results in depletion of dTTP, leading to decreased DNA synthesis in the parasites and ultimately in their death [26][27][28]. The DHPS domain of PfHPPK-DHPS is inhibited by sulfadoxine (Sdx) and sulfa-class of compounds; however, the emergence of Sdx resistance is well-documented in field isolate studies worldwide [26][27][28]. Structures of P. falciparum and P. vivax HPPK-DHPS are available in complex with substrates/analogues (PDB ID: 6JWW and 5Z79) [27,28]. PfHPPK-DHPS is a dimer having HPPK and DHPS domains connected by a linker (Fig. 1a). Our analysis revealed 33 unique SNPs in PfHPPK-DHPS, with an overall mutation frequency of  Table S3). The PfDHPS domain in itself has a lower frequency of 4% as it contains only 14 SNPs in its 320 residues (Fig. 1c). Further, PvHPPK-DHPS contains an equivalent of 12 mutations as well as an additional five from the P. vivax project, of which two are unique. Of the five mutations from P. vivax project, four lie in the DHPS domain, with three of the latter known to be drug resistance-causing mutations. Similar low frequencies of mutations were seen in the P. falciparum and P. vivax DHPS domains (Fig. 1c). SNPs in the PfDHPS domain are primarily hydrophobic/polar/charged residues mutating to hydrophobic/polar residues. No mutations were seen wherein a residue had mutated to a charged residue. A particular SNP in PfDHPS domain was seen in at least one sample and in a maximum of 1892 samples (of the total 9751 unique samples from MalariaGEN projects Pf3k, Pf4.0 and Pf6.0). All five drug resistancecausing mutations in P. falciparum (S436A/F, A437G, K540E, A581G, and A613S/T) are prominently present in PfDHPS domain in up to 1892 samples out of the 9751 MalariaGEN projects samples analysed. At least two drug resistance-causing mutations were found to coexist in 2552 samples out of 9751 (26% samples). Also, three mutations coexist in 97 samples, with the third mutation (E189Q) located in the HPPK domain. Structural mapping of SNPs on PfHPPK-DHPS shows the presence of the drug resistance-causing mutations in the sulfa drug binding site (Fig. 1d), as reported earlier [29]. Moreover, the high occurrence of these mutations in samples confirms the considerable prevalence of Sdx resistance and underscores the importance of global SNP databases as keys to understanding ongoing resistance spread.

SNP profiles of cytoplasmic P. falciparum and P. vivax lysyl-tRNA synthetase
The 3D structures of cytoplasmic PfKRS in complex with the inhibitors cladosporin (and its analogues) and a lymphoma kinase inhibitor and substrate L-Lys are available [11][12][13][14]. The tested inhibitors, particularly the cladosporin analogues, are selective towards the parasite KRS compared to its human counterpart [11][12][13][14]. PfKRS is a homodimer with an aminoacylation domain and an anticodon domain. Cladosporin occupies the substrate ATP-binding site by mimicking adenosine binding while substrate L-Lys is bound to the amino acid binding site [11][12][13][14]30]. Our analysis revealed 64 unique SNPs in PfKRS (in 583 residues), with an overall low mutation frequency of 11% (Figs. 2, 3; Additional file 3: Table S3). An individual SNP in PfKRS occurs in a low number of samples, i.e. at least in one sample and in a maximum of five samples out of a total of 9751. A sole conservative mutation, V78M, occurs in five samples. Individual samples also contain only one SNP, with the exception of two samples from Gambia and Ethiopia which have two SNPs each (K59M and N176S, and N286K/T and L280I respectively), indicating an overall low genetic variation. SNPs in PfKRS constitute different types of residues mutating to either a hydrophobic or polar residue (Fig. 4b). Interestingly, several polar-to-charged mutations are present (S3R, T46R, N286K, N355D, N422D, N441H) and also two mutations in which charge switched from either basic to acidic or vice versa (K209E, E423K) ( Fig. 4b; Additional file 3: Table S3). Structural mapping of SNPs on PfKRS shows 41% and 42% mutations in the N-terminal and aminoacylation domain, respectively, with the remaining mutations in the anticodon domain (PDB ID: 4PG3) (Fig. 5a, b). Sequence alignment of PfKRS and PvKRS revealed 34 corresponding mutations in PvKRS, and the P. vivax project identified six SNPs in PvKRS, of which three are unique from the former 34. Interestingly, disordered residues 1-77 have 20 mutations, and so V78M, occurring in five samples, can not be adjudged to be a core or surface mutation. The overall mutation frequency is lowest for the aminoacylation domain (only 8% in PfKRS and 5% in PvKRS), emphasising that the catalytic domain is structurally very conserved (Fig. 5c). Overall, ~45% of mutations are conservative, and 50% lie in flexible loops. The mutations in PfKRS are predominately basic in terms of charge or polar and, notably, the polar residue that mutates is mostly an asparagine (Fig. 4b).
Residues 287-354 of the aminoacylation domain ( Fig. 5d; "Patch 1") constitute the longest sequence stretch on PfKRS with no mutations, and interestingly covers half of all cladosporin and L-Lys interacting residues. This stretch surrounds the inhibitor-binding pocket and also contains the majority of homodimer interface residues. Only one buried mutation, M277T/I, is present in the dimer interface that makes an H-bond with E341 of the other monomer. Mutation clusters (defined as > 3 mutations that lie in close proximity of each other on the enzyme surface) do not exist in PfKRS. All cladosporin and L-Lys interacting residues are conserved. Curiously, two mutations, V501I and L555I, are present at the base of the inhibitor-binding pocket (Fig. 5e). It should be noted that these mutations are conservative hydrophobic and that they are each seen in only one sample (in samples from Mauritania and Tanzania respectively). Thus, these two mutations have very low frequency. One distant SNP, N286K/T, within 5 Å of the binding pocket appears to have no effect on the pocket's peripheral surface. N286K is one of the mutations coexisting in the only two samples which have two SNPs. Further, residues F342, R559, K499 and F563, which have been shown in a previous study [12] to be involved in conformational changes, are conserved. Residues V328 and S344, shown in a previous study to enable drug entry into the binding pocket [11], are also conserved. Finally, the three conserved class II motifs of Plasmodium KRS with Human KRS [11], 276-PMMLI-281, 329-FRNE-332 and 557-ID-559 are conserved, with the exception of one mutation, M277I/T, in the first motif.

SNP profiles of cytoplasmic P. falciparum and P. vivax prolyl-tRNA synthetase
The 3D structure of cytoplasmic PfPRS in complex with inhibitor halofuginone is available (PDB ID: 4YDQ) [15][16][17]. PfPRS is a dimer consisting of aminoacylation, editing, anticodon and Zn-binding domains (Fig. 6a). PfPRS binds to halofuginone, a well-established dual-site inhibitor that binds to two sites, the 3'-end of the tRNA site and the proline site, while an ATP-analogue binds to the ATP site [15][16][17]. Several halofuginone analogues  15:309 have been shown to have improved selectivity towards the parasite and higher therapeutic indices [17]. Also, halofuginone can be utilised in synergistic drug combination approaches, as has been shown against PRS in the parasite Toxoplasma gondii [31]. Our analysis revealed 93 unique SNPs in PfPRS (in 746 residues), with an overall low mutation frequency of 12%. ( Interestingly, similar to PfKRS, polar-to-charged mutations involving asparagine/glutamine (N109D, Q111H, N138K, Q140R, N261K, Q395K, N416D, Q700R) are present. Also, two mutations of a switch between basic and acidic charge are seen (K92E, D561H) (Fig. 4c).
PfPRS has a higher number of non-conservative mutations compared to PfKRS (Fig. 4). Structural mapping of SNPs on PfPRS shows that 14%, 12% and 11% mutations in the aminoacylation, anticodon and Zn-binding domains respectively (Fig. 6a, b). Curiously, PvPRS has a higher number of mutations in the catalytic domain compared to PfPRS. The domain-wise distribution shows 13 SNPs in aminoacylation domain of PfPRS, i.e. only a 5% frequency, indicating a highly conserved catalytic domain (Fig. 6c). In comparison, 20% of SNPs are seen in the PvPRS aminoacylation domain. PfPRS contains two sequence stretches in its aminoacylation domain with no mutations (Fig. 6d). The first stretch (amino acids 291-368, "Patch 1") ( Fig. 6d) has only one conservative mutation, E312D, and it also comprises residues 316-351, which are involved in the 'collapsed conformation' that imparts a characteristic asymmetric homodimer interface [15] to PfPRS (Fig. 6d). A second stretch (amino acids 418-490, "Patch 2") ( Fig. 6d) contains only the SNP G455A. Intriguingly, both these mutation-free stretches are  (Fig. 6d). With the exception of one cluster of four mutations (P369S, Y375H, R376S, N498T) that is located towards the end of the aminoacylation domain connecting to the anticodon domain, SNPs on the PfPRS surface are mostly singular. Analysis showed high conservation in halofuginone and ATP-analogue interacting residues (Fig. 6e). Additionally, two crucial zinc chelating cysteine residues of human PRS in the C-terminal Zn-binding domain which are replaced by serine (S732) and threonine (T686) in Plasmodium [17] are also conserved.

SNP profiles of cytoplasmic P. falciparum and P. vivax phenylalanyl-tRNA synthetase
Phenylalanyl-tRNA synthetase (FRS) in Plasmodium exists as a heterodimer of α-and β-subunits that differ significantly in their sequence [18]. The α-subunit of Plasmodium FRS contains the aminoacylation domain while the β-subunit contains the editing domain (Fig. 7a) [18]. In the complex of PvFRS and selective inhibitor bicyclic azetidine BRD1389, the drug binding region lies exclusively in the α-subunit, with BRD1389 occupying the substrate L-Phe binding site and an auxiliary site [18]. Our analysis of the PfFRS α-subunit revealed 103 unique SNPs for its 575 residues, with an overall frequency of 18% (Additional file 3: Table S3). Structural mapping of SNPs on the PvFRS α-subunit showed that the maximum number of mutations are located in the N-terminal region (amino acids 1-271), which is disordered in the current structure, and that ~25% of mutations are in the aminoacylation domain (PDB ID: 7BY6) (Fig. 7b, c). A particular SNP occurs in at least one sample and a maximum 266 samples (of a total of 9751 samples), and three mutations, in particular, are seen in a high number of samples: T377I (266 samples), E50K (23 samples) and N11S (23 samples). Further, every sample has only one SNP, with the exception of three samples from Mali, Ghana and Gambia, which have two SNPs each: E28V and I469V; N53K and S81T; N151S and I469V. SNPs in the PfFRS α-subunit are very diverse, and the frequencies of hydrophobicto-hydrophobic, hydrophobic-to-polar, polar-to-hydrophobic and charged to hydrophobic/polar mutations are similar. Remarkably, unlike PfKRS and PfPRS, a higher frequency of polar-to-charged, hydrophobicto-charged, and switch of charge mutations are seen in the PfFRS α-subunit (Fig. 4). The PfFRS α-subunit N-terminal has the highest mutation frequency, and a frequency of 20% is seen in the aminoacylation domain (Fig. 7c). Up to ~50% of mutations in the aminoacylation domain of the α-subunit are conservative, and the majority are located in loops. Further, an equivalent 51 mutations are present in the PvFRS α-subunit, and an additional five mutations are identified from the P. vivax project that are located in the disordered N-terminal. The PfFRS β-subunit, which does not have a catalytic domain, has 135 unique SNPs, with an overall frequency of 22% for 617 residues (Fig. 7b, c). In the PfFRS β-subunit, an individual SNP occurs in the range of one to 99 samples, with A35T and K617Q present at the highest frequency. The PvFRS β-subunit has an equivalent 73 mutations and an additional 15 unique mutations from the P. vivax project. The maximum number of β-subunit mutations lie in β-sheets, unlike the α-subunit mutations which are typically in loops.
Three sequence stretches with no mutations are present on PvFRS, with the longest stretch in the α-subunit (amino acids 313-401) (Fig. 7d, "Patch 1") and two in the β-subunit (amino acids 495-549 and amino acids 567-617) (Fig. 7d, "Patch 2" and "Patch 3"). The α-subunit stretch surrounds the inhibitor-binding pocket from one side and is also a significant contributor to the dimer interface between the α-and β-subunits. No mutation was seen in any of the dimer interface residues. The disordered α-subunit N-terminal (amino acids 1-271) could help decipher the complete heterodimeric interface of PvFRS. Also, all mutations are dispersed separately, with no prominent clustering seen on the surface. Further, the inhibitor BRD1389 interacting residues of PvFRS show very high conservation (Fig. 7e). A loop (amino acids 443-453) that lies in the ATP binding pocket and 'opens' to enable BRD1389 [18] binding is seen to be conserved. A second loop (amino acids 507-515) that lies in the auxiliary pocket and 'closes' to enable binding has one mutation (A453S; hydrophobic to polar) and that is seen in only one sample (from Malawi). This A-to-S mutation retains the small side chain thereby avoiding any possible obstruction to ligand entry. Finally, four residues M310, L544, V339, and G506 near the binding pocket known to confer resistance to bicyclic azetidines [18], are also conserved.

Drug-binding regions of validated aaRSs are conserved compared to P. falciparum DHPS
Drug resistance causing mutations in the DHPS domain of Plasmodium HPPK-DHPS are well-established [26,28,29], and our SNP analysis further confirms the existence of these mutations in the PfDHPS domain in a majority of the analysed samples; however, the overall mutation frequency for the PfDHPS domain was found to be only ~4% (Fig. 8a). In contrast, the drug binding regions in aaRSs show very high conservation. In PfKRS, the inhibitor cladosporin (covering the ATP-binding site) and substrate L-Lys binding regions show high conservation except for two conservative mutations, V500I and L555I -both are present at the cladosporin-binding site but occur in a minuscule number of samples (Fig. 8b). The binding regions of the inhibitor halofuginone (covering the 3'-prime end of the tRNA site and amino acid proline site) of PfPRS are highly conserved, with the closest mutation being at a distance of ~8 Å (Fig. 8c). Similarly, the binding region of the inhibitor bicyclic azetidine BRD1389 (covering the amino acid phenylalanine site and an auxiliary site) on PvFRS is highly conserved, with the closest mutation being at a distance of ~12 Å (Fig. 8d). Sequence and structural evaluation of SNPs of drug target aaRSs and their comparison with the DHPS domain (for which mutations causing resistance are already known) is crucial to substantiate prospects of aaRSs as successful drug targets.
Field isolate data for the PfDHPS domain are available from when resistance mutations were already detected in majority of samples; it is difficult to assess whether other non-active site mutations already existed in this domain previous to the introduction of the drug [29,32]. Resistance against antimalarial drugs remains a major concern. Resistance to inhibitors of parasite aaRSs may eventually develop; however, studies on parasite FRS indicate a low tendency to generate resistance mutations in malaria parasites [33]. The anticipated drug resistance can be delayed or partially prevented by using the drug combination approaches wherein aaRSs inhibitors can be a promising part of drug cocktails.

Conclusions
Assessment of genetic variations caused by SNPs in validated antimalarial drug targets is vital for understanding the possibilities of drug resistance and substantiating their validation to achieve improved drug design. Aminoacyl-tRNA synthetase (aaRS) enzymes, which are essential for protein synthesis in malaria parasites, are validated and potent antimalarial targets. Our SNP analysis of P. falciparum and P. vivax aaRSs from MalariaGEN involving 9751 genome sequences from over 50 countries reveals 3182 unique SNPs in the 20 cytoplasmic P. falciparum aaRSs, with a mutation frequency of 9-31% for different aaRSs. In the present study, structural mapping of SNPs on the three most advanced inhibitor/substrate-bound cytoplasmic aaRSs, namely PfKRS, PfPRS, and PvFRS, showed very high conservation in drug/substrate binding regions, with an overall low mutation frequency in the crucial aminoacylation domain and low SNP occurrence in individual samples. In comparison, the PfDHPS domain of PfHPPK-DHPS, another key parasitic enzyme for which drug resistance is well-established, prominently exhibits mutations that cause drug resistance despite having a low overall SNP frequency. The present study emphasises the significance of screening antimalarial drug targets such as aminoacyl-tRNA synthetases against SNPs from global databases to evaluate global genetic polymorphisms. Understanding these variances is key to strengthening drug design against validated targets and to counter resistance towards antimalarial drugs.