Genetic diversity of Plasmodium vivax reticulocyte binding protein 2b in global parasite populations

Background Plasmodium vivax reticulocyte binding protein 2b (PvRBP2b) plays a critical role in parasite invasion of reticulocytes by binding the transferrin receptor 1. PvRBP2b is a vaccine candidate based on the negative correlation between antibody titers against PvRBP2b recombinant proteins and parasitemia and risk of vivax malaria. The aim of this study was to analyze the genetic diversity of the PvRBP2b gene in the global P. vivax populations. Methods Near full-length PvRBP2b nucleotide sequences (190–8349 bp) were obtained from 88 P. vivax isolates collected from the China–Myanmar border (n = 44) and Thailand (n = 44). An additional 224 PvRBP2b sequences were retrieved from genome sequences from parasite populations worldwide. The genetic diversity, neutral selection, haplotype distribution and genetic differentiation of PvRBP2b were examined. Results The genetic diversity of PvRBP2b was distributed unevenly, with peak diversity found in the reticulocyte binding region in the N-terminus. Neutrality analysis suggested that this region is subjected to balancing selection or population bottlenecks. Several amino acid variants were found in all or nearly all P. vivax endemic regions. However, the critical residues responsible for reticulocyte binding were highly conserved. There was substantial population differentiation according to the geographical separation. The distribution of haplotypes in the reticulocyte binding region varied among regions; even the two major haplotypes Hap_6 and Hap_8 were found in only five populations. Conclusions Our data show considerable genetic variations of PvRBPb in global parasite populations. The geographic divergence may pose a challenge to PvRBP2b-based vaccine development. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13071-022-05296-6.

invasion [7,8]. Compared to P. falciparum, much fewer vaccine candidates have been identified for P. vivax, partially owing to the absence of a long-term in vitro culture system for this parasite [9,10]. Plasmodium vivax requires the Duffy antigen receptor for chemokines (DARC) on the surface of the RBC for invasion. However, solid evidence of P. vivax infection in DARC-negative individuals in Africa suggests that this Plasmodium species may have evolved to explore alternative pathways for invasion [11,12]. Plasmodium vivax shows a restricted tropism for reticulocytes with high levels of transferrin receptor 1 (TfR1 or CD71) [13]. Recently, TfR1 has been identified as the reticulocyte-specific receptor for P. vivax reticulocyte binding protein 2b (PvRBP2b), a ternary complex expressed in the schizont stage [14,15]. PvRBP2b belongs to the PvRBP family, which is composed of at least 11 members with different binding preferences for normocytes or reticulocytes [5,[16][17][18][19]. Studies on the crystal structure of the N-terminal domain of PvRBP2b have revealed a similar structural scaffold to that of P. falciparum reticulocyte binding protein homolog 5 (PfRh5), a well-characterized vaccine candidate for P. falciparum [15,20]. In one study, monoclonal antibodies against PvRBP2b or TfR1 mutants that impede the binding of PvRBP2b to the reticulocytes successfully blocked the entry of P. vivax into the reticulocytes, suggesting that PvRBP2b is a promising vaccine candidate targeting blood-stage infections [15].
The N-terminal domain of PvRBP2b is responsible for reticulocyte binding [14,15], whereas the function of the C-terminal domain is not clear. Studies with the recombinant PvRBP2b N-terminal domain have detected antibodies against PvRBP2b in plasma samples collected from P. vivax-infected patients, supporting the notion that PvRBP2b contains immune recognition epitopes in the N-terminal domain [15,18,21]. In addition, immunoglobulin G (IgG) levels against PvRBP2b have been found to be negatively correlated with parasitemia and the risk of vivax infections [18,21,22]. These studies have highlighted the potential of PvRBP2b as a promising target for P. vivax vaccine development [14,15].
A major challenge for the efficacy of blood-stage malaria vaccines is the extensive genetic diversity of the target antigens. Therefore, understanding the genetic diversity of vaccine candidates is necessary for designing effective vaccines and predicting vaccine efficacy. In this study, we analyzed the genetic diversity, phylogenetic relationship and population differentiation of PvRBP2b from 312 global P. vivax isolates, with the aim to provide the necessary information for PvRBP2b-based vaccine development.

Sample collection and ethics statements
This study used 88 dried blood spots on filter papers collected from P. vivax-infected patients attending the Laiza and Nabang hospitals in the China-Myanmar border area in 2014 (n = 44) and the Tha Song Yang hospital in western Thailand in 2011-2012 (n = 44). Malaria was diagnosed by microscopic examination, and finger-prick blood samples were collected.

PvRBP2b gene amplification and sequencing
Genomic DNA was extracted from the dried blood spots on filter papers using the QIAamp DNA Mini kit (Qiagen, Hilden, Germany). The studied parasites were part of the clinical samples genotyped by PCR-restriction fragment length polymorphism analysis of the polymorphic Msp3α and Msp3β genes in a previous study [23]. Only monoclonal infections were used for PvRBP2b sequencing. An 8160-bp fragment of the full-length protein-coding sequence of the PvRBP2b gene (8421 bp), i.e. PvRBP2b 190-8349 , corresponding to amino acids 64-2783 of the PvRBP2b protein was amplified from all the samples using KOD-Plus-Neo polymerase (Toyobo, Osaka, Japan). Given the large size of the gene, seven overlapping 1.5-kb fragments were amplified from each sample using seven pairs of primers (Additional file 1: Table S1). PCR amplification was carried out in a total reaction volume of 30 μl containing 1× KOD-Plus-Neo buffer, 200 µM dNTPs, 1 mM MgSO 4 , 250 nM primer, 0.4 units KOD Plus polymerase and 2 µl genomic DNA. The following cycling parameters were used: an initial denaturation at 94 °C for 5 min, followed by 35 cycles of denaturation at 94 °C for 15 s, annealing at a determined temperature (Additional file 1: Table S1) for 15 s, and extension at 68 °C for 90 s, with a final final extension at 68 °C for 5 min. The PCR products were separated in 1% agarose gels and then subjected to DNA sequencing using the ABI BigDye ™ Terminator Reaction Ready kit (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA). The nucleotide sequences were deposited in the GenBank database under the accession numbers OM715042-OM715085 and OM715086-OM715129.

Analysis of genetic diversity and tests for detecting selections
A total of 312 PvRBP2b sequences were aligned with the reference sequence from the Salvador I (Sal I) strain (PVX_094255) using the Clustal W program in MEGA7 software. For the evaluation of PvRBP2b genetic diversity, the nucleotide diversity (π), the number of haplotypes (H) and haplotype diversity (Hd) were computed using DnaSP v5.10 software [30]. To test the departure from neutrality, Tajima's D test [31], Fu and Li's F* test [32] and Fu and Li's D* test [32] were computed using DnaSP v5.10 software. Natural selection was determined by calculating the ratio of nonsynonymous (dN) to synonymous (dS) substitutions per nucleotide site (dN-dS), using the Nei-Gojobori method [33] with Jukes-Cantor correction for multiple substitutions. Statistical significance of the difference was estimated using the codon-based Z-test of selection implemented in MEGA 7 [33].
Finally, to determine the existence of specific codons targeted by selection in the global population [34], the FUBAR [35] and SLAC [36] methods, implemented in the Datamonkey webserver (http:// www. datam onkey. org), were performed. Before analysis, putative recombination joints and parental sequences were determined using the RDP4 suite [37]. Recombination breakpoints were detected using seven methods (RDP, GENECONV, BOOTSCAN, MAXCHI, CHIMAERA, SISCAN and 3SEQ) in the RDP4 package. The probability of a putative recombination event was corrected by a Bonferroni procedure with a cutoff of P < 0.01. Only events supported by at least four of the seven methods were considered to be recombinants. Recombinants were then removed from the phylogenetic tree constructed by the maximum likelihood method implemented in MEGA7 using the nucleotide substitution Hasegawa-Kishino-Yano (HKY)+G+I model. As a last step, the prepared dataset was uploaded to the Datamonkey webserver for analysis.

Population differentiation, structure and phylogenetic relationship
To investigate population subdivision, Wright's fixation index (F ST ) representing inter-population variance in allele frequencies was calculated using DnaSP v5.10 [30,38,39]. The genetic structure of all the vivax parasite populations was then elucidated using STRU CTU RE v2.3.2 software based on the Bayesian analysis and admixture model [40,41]. All samples were run at K = 2-7 (10 iterations each) with a burn-in period of 20,000 iterations followed by 1,200,000 Markov Chain Monte Carlo iterations. Then the optimal number of grouping was determined by ΔK using STRU CTU RE HARVESTER v0.6.94 software [42,43]. The partition of the clusters was presented using CLUMPP v1.1.2 [44] and the DISTRUCT 1.1 tools [45]. To determine the relationship among the parasites, phylogenetic analyses were performed using the neighbor-joining (NJ) method implemented in MEGA7 [33], and the phylogenetic tree was then optimized with the online tool ITOL [46]. A haplotype network based on the polymorphic sites in the reticulocyte binding region of PvRBP2b was constructed using the PHYLOVIZ 2.0 software with the NJ method [47].

Prediction of linear B-cell epitopes
The potential linear B-cell epitopes were predicted using the BCPreds prediction tool (http:// ailab-proje cts1. ist. psu. edu: 8080/ bcpred/ predi ct. html) [48]. Antigenicity was predicted using the VaxiJen v2.0 online tool (http:// www. ddgph armfac. net/ vaxij en/ VaxiJ en/ VaxiJ en. html) [49]. BCPreds predicts a peptide length of 12 consecutive amino acids with a threshold of 0.8, whereas VaxiJen sets a threshold at 0.5. The overlapped regions of predicted linear B-cell epitopes by both methods were selected. The predicted three-dimensional (3D) structures of the reticulocyte binding region of PvRBP2b were constructed with the PHYRE2 algorithm [50] and further visualized and modeled using the molecular modeling tool PyMOL V2.3 [51]. The PvRBP2b amino acid sequence in the reference Sal I strain was used for prediction.

Mutations revealed from global PvRBP2b sequences
Sequencing of the 8160-bp PvRBP2b fragment (190-8349 bp) was successful for the 88 P. vivax field isolates collected from the China-Myanmar and Thailand-Myanmar border areas. To gain a global perspective, we retrieved 224 PvRBP2b sequences from the wholegenome sequences of P. vivax isolates collected in multiple P. vivax endemic areas across the world. A preliminary analysis showed that the nucleotide diversity of sequences obtained by Sanger sequencing from the China-Myanmar border (0.00167) and Thailand (0.00217) was comparable to that obtained from the whole-genome sequencing projects (0.00116 and 0.00148, respectively). Therefore, the combined data were used for analysis.
Alignment of all 312 PvRBP2b sequences with the Sal I reference identified 116 SNPs, including 103 nonsynonymous and 20 synonymous mutations. Of note, the 103 nonsynonymous mutations resulted in 96 amino acid changes. Of all nonsynonymous mutations, 75 SNPs showed allele frequencies of > 1% (Additional file 1: Table S3). The distributions of the 75 mutations in different areas are shown in Fig. 1 and Additional file 1: Table S3. The E136K, N349K, K363E, D366V/H,  V395A/T, K412N, Q564R, D917E, N1529K, K1606E, E2265K and E2746G amino acid mutations were found in all or nearly all endemic sites, reflecting the high prevalence of PvRBP2b polymorphisms in the world. Among these, D917E approached fixation (95.5%).

Genetic diversity of PvRBP2b
The population genetic indices were analyzed to assess the nucleotide diversity (π) of PvRBP2b ( Table 1). The overall nucleotide diversity from 312 sequences was 0.00196, with the highest found in the population from Malaysia (0.00203), followed by Thailand (0.00198) and Vietnam (0.00193). The overall haplotype diversity (Hd) was high (0.997) ( Table 1). The sliding window plots of nucleotide diversity revealed an uneven distribution, with the peaks located at nucleotides 1015-1134 within the reticulocyte binding region (Fig. 3a). Similarly, 46.7% of amino acid substitutions were clustered in the reticulocyte binding region (Fig. 3e). This result reflects the relatively high polymorphism in the reticulocyte binding region of PvRBP2b in the worldwide P. vivax populations.

Evidence of potential selections
Neutrality tests were conducted to evaluate whether the PvRBP2b gene followed the neutral equilibrium model of molecular evolution. Although the overall PvRBP2b sequence did not significantly deviate from neutrality, a sliding window analysis identified significant positive values for the 1015-to 1034-bp region by Tajima's D* and Fu and Li's F* tests, which paralleled the peak π value, reflecting balancing selection within the reticulocyte binding region or population bottlenecks in the global populations ( Fig. 3b-d). In addition, some significant negative values for several C-terminal domains suggested population expansion or excess of singletons ( Fig. 3b-d).
The dN-dS statistic was positive for the reticulocytebinding region among the global populations (Table 2), Fig. 2 Three-dimensional structure of the PvRBP2b reticulocyte binding region. a A modified schematic diagram from [14] shows the important residues in the reticulocyte binding region composed of the N-terminal and C-terminal domains. The polymorphic residues previously reported are labeled in magenta, whereas the polymorphic residues newly identified in the present study are labeled in red. Residues interacting with TfR1 and Tf are shown in green and cyan, respectively. b Two orthogonal views of the three-dimensional model structure of the PvRBP2b reticulocyte binding region shows the 14 residues positively selected by both codon-based tests in the Datamonkey webserver and S586 positively selected by FUBAR. The hydrophobic binding region is shown in yellow. Abbreviations: Tf, transferrin; TfR1 transferrin receptor 1 suggesting that polymorphisms found for this region of PvRBP2b were maintained by diversifying selection.
The likelihood-based algorithms were used to determine specific codons targeted by selection. Only one recombinant identified by the RDP, GENECONV, MAXCHI, CHIMAERA, SISCAN and 3SEQ methods implemented in the RDP suite was removed from the phylogenetic and selection analyses (Additional file 1: Table S4). Thirty-two positively selected and 11 negatively selected codons were identified by both the SLAC and FUBAR methods ( Table 3). Fourteen of the positively selected mutations are located in the reticulocytebinding region, presumptively associated with parasite invasion and/or immune recognition (Table 3). K363 and S586 have been shown to be the reticulocyte-binding sites [14]. In contrast to the positive selection at K363 confirmed by all three methods, S586 was supposed to be positively selected only by the FUBAR method (Table 3). Meanwhile, sites under purifying selection were scattered along the PvRBP2b (Table 3).

Mutations within the 3D structure and predicted B-epitopes in the reticulocyte binding region
The 3D model structure with positively selected amino acid mutations mapped in the reticulocyte binding region of PvRBP2b showed that most mutated residues were on the surface of α helices, with the exceptions of V395, which was hidden inside the protein, and K412, which was located in the flexible loop structure (Fig. 2b). Most mutations were close to the hydrophobic binding region. Three peptides (270-281, 519-530, and 566-577 amino acids) in the reticulocyte binding region of PvRBP2b were predicted to be linear B-epitopes with the prediction scores of 0.839, 0.987, and 0.959, respectively, by the BCPreds and VaxiJen software (Table 4). However, only one polymorphic residue, K575, was presented in the predicted B-epitope of 566-577 amino acids. Seven additional polymorphic residues were found in the predicted linear B-epitopes in other regions of PvRBP2b (Table 4).

Haplotype network analysis
The 17 SNPs with minor allele frequencies of > 5% in the PvRBP2b reticulocyte binding region were chosen for haplotype analysis, which yielded a total of 114 haplotypes among the global dataset of 312 sequences (Additional file 1: Table S5; Figs. 4, 5). Of the 114 haplotypes, 58 (50.9%) were region-specific, with a frequency of between 0.32 and 1.60%. There were many rare haplotypes; 92.1% (105/114) of the haplotypes were shared by no more than five parasite isolates, among which 70/114 were represented by single parasite isolates. Even the two major haplotypes, i.e. Hap_6 and Hap_8, with a frequency of 14.74 and 12.82%, respectively, were only found in five regional populations. Hap_8, the same as the Sal I haplotype, was most abundant in the China-Myanmar border (39.7%) and Colombia (42.9%). Hap_6, the predominant haplotype identified in Thailand, was also shared among the parasite populations from the China-Myanmar border, Cambodia, Brazil and Colombia. The haplotypes in Asia were distributed in the parasite populations included in this study, and Thailand harbored the highest haplotype diversity (52/114) (Additional file 1: Table S5; Figs. 4, 5). In contrast, the distribution pattern shown in the haplotype network was different from that of parasite populations from South America (Brazil and Colombia), Africa (Ethiopia) and Oceania (Papua New Guinea [PNG]) (Fig. 5). Of the 20 parasite isolates in PNG, 16 were from a unique haplotype restricted to this specific region (Additional file 1: Table S5; Fig. 4).

Population structure and differentiation
Analysis of the PvRBP2b sequences showed that the global parasites were optimally grouped into three clusters (K = 3) (Fig. 6a, c). These clusters were unevenly distributed among different geographical regions. The parasites from PNG and Indonesia were mainly represented by the purple cluster, whereas the parasites from Ethiopia, Brazil and Colombia occupied the green cluster. The red cluster represents parasites predominantly from the China-Myanmar border and Thailand. Interestingly, the remaining parasites from the China-Myanmar border and Thailand and parasites from other countries of the Greater Mekong Subregion (GMS) (Cambodia, Laos and Vietnam) showed genetic mixing of the green and purple clusters. When structure analysis was conducted using the PvRBP2b reticulocyte binding region, six clusters were identified with overlapping, worldwide distribution (Fig. 6b, d). It is noteworthy that only the parasites from the PNG and Indonesia formed genetically distinct clusters from other geographical regions. The genetic differentiation between two parasite populations was also evaluated via Wright's fixation index (F ST ) using the entire PvRBP2b sequence and reticulocyte binding region, respectively. The heatmap of the F ST values from both analyses revealed population differentiation patterns consistent with the structure analysis ( Fig. 7). Consistent with the principle of isolation by distance, parasite populations were genetically similar within each continent (e.g. Brazil vs Colombia, PNG vs. Indonesia, countries within the GMS). In contrast, considerable differentiation was detected between populations from different continents. Notably, the PNG and Indonesia P. vivax populations had high levels of genetic differentiation from the rest of the global parasite populations. In comparison, parasite populations from the GMS were moderately differentiated from the South American parasite populations. Interestingly, parasites from Africa (represented by Ethiopia) showed little genetic differentiation from the GMS and South American parasites.

Discussion
In the present study, we evaluated the genetic diversity of PvRBP2b as a P. vivax malaria blood-stage vaccine candidate. The overall nucleotide diversity of PvRBP2b from the global samples was modest (0.00196), much lower than that of the highly polymorphic surface antigens, such as PvAMA1 (0.0093, n = 372) and PvMSP1 (0.0023-0.0495 for the conserved blocks and 0.1193-0.2055 for the variable blocks, n = 40) [52,53]. The genetic polymorphisms of PvRBP2b were geographically heterogeneous, with higher diversity found in GMS countries than in South America (Brazil and Colombia), Africa (Ethiopia) and Oceania (PNG). A similar genetic diversity distribution pattern was also observed in PvAMA1, and it seemed to correlate with the larger effective population size in Southeast Asia [52]. The authors of a previous study speculated that P. vivax in Southeast Asia was possibly the source population [54]. The high transmission intensity and the frequent migrations of infected people among Southeast Asian countries might result in the large effective population size in this region. Recent continuous malaria control and elimination strategies successfully reduced  the parasite incidence, but these seemed to have little short-term effect on the P. vivax population size [55,56]. The relatively high levels of asymptomatic vivax infections in endemic populations possibly contributed to the maintenance of the effective population size [57,58]. PvRBP2b-mediated reticulocyte invasion depends on the PvRBP2b-TfR1-Tf ternary complex [14]. Consistent with previous reports [15], we found that the reticulocyte binding region was under balancing selection; it accumulated most of the amino acid substitutions and had the highest diversity. Interestingly, most of the residues on the reticulocyte binding region were conserved. At least three residues (Y542, K600 and Y604) on PvRBP2b were critical to the reticulocyte binding and complex formation: mutations at these sites reduced binding affinity by around 80% [14]. These amino acids were absolutely conserved among all 312 samples, reflecting their essential function in reticulocyte binding. Most of the identified residues making contact with TfR1/Tf based on structure analysis [14] were conserved, with the exception of S586R and K363E, which had low (3.5%) and moderate (40.4%) allele frequencies, respectively. S586R had a limited distribution in Brazil, Ethiopia and Thailand, whereas K363E occurred in all P. vivax endemic areas. Additional nonsynonymous SNPs of low-modest frequencies (1.6-51%) were detected surrounding the interaction sites. Several residues within the reticulocyte binding region were under positive selection, raising the possibility that they resulted from the host immune selection [59]. Moreover, most of the positively selected residues are located on the α helices, some being close to the hydrophobic binding region. The K412N mutation had a frequency of 51.3% and resided in the flexible loop structure. Only one mutation, K575E, was mapped to the predicted linear B-cell epitope. However, it is still unclear whether these specific polymorphic residues would change protein structure by altering protein polarity and hydrophilicity, therefore adapting to the different TfR1 mutants (e.g. L212V, N348A, and S412G) [14,60,61]. More functional investigations are required to answer these questions.
One of the obstacles hindering the development of a successful malaria vaccine is the extensive genetic diversity of blood-stage antigens in P. vivax [62]. Naturally acquired antibodies against the reticulocyte binding region of PvRBP2b showed a strong association with reduced parasitemia [18,21,22], highlighting the vaccine potential of PvRBP2b. Although the overall genetic diversity of PvRBP2b was not as high as that of leading bloodstage vaccine candidates, PvAMA1 and PvMSP1, the testing of which has advanced to clinical trials [63], the reticulocyte binding region of PvRBP2b deserves further attention. This region carried almost 50% of the mutations in the entire PvRBP2b, and analysis of the global parasite samples identified 114 haplotypes. Furthermore, most haplotypes were region-specific and represented by a single parasite isolate, and very few haplotypes were shared worldwide. Even the predominant haplotypes Hap_6 (14.74%) and Hap_8 (12.82%) were distributed in only five regions. Of note, haplotype patterns in PNG were distinct from those of other worldwide populations. This enormous haplotype diversity may present a challenge to developing a PvRBP2b-based vaccine. Since an effective vaccine should include most of the common alleles relevant to the induction of immune responses to ensure sufficient coverage of the genetic diversity [64,65], it is essential to determine whether the major PvRBP2b alleles confer strain-transcending immunity. This study provides several lines of evidence confirming the geographical separation of global P. vivax populations. Pairwise F ST comparison identified the considerably high levels of differentiation between the Oceania (PNG and Indonesia) parasite populations and other P. vivax endemic regions, whereas parasite populations in the GMS were much less differentiated. Interestingly, populations from South America appeared to be closely related to those from Africa (Ethiopia). Population structure analysis further reinforced this finding on population relatedness. The population differentiation pattern identified in the present study is in agreement with that reported in other population studies using individual genes [66], microsatellites [54,67] or whole-genome sequences [24,25,68]. The distinct parasite genetic structure in PNG may reflect the limited gene flow between PNG and the rest of the world, while the unique RBC polymorphisms in the human populations may have also contributed to this difference [52,67,69,70]. In the GMS, however, the high transmission intensity in some border areas and frequent host migrations among the different countries are likely responsible for the panmixia of parasite populations. It should be noted that this study was biased, with most of the samples originating from Southeast Asia and, consequently, the conclusions may not represent global parasite populations. Population genetics can help assess the effects of malaria control strategies, track the source of imported infections, and inform the vaccine design.

Conclusion
In conclusion, this study revealed a remarkably high level of genetic polymorphisms in PvRBP2b among the P. vivax populations in different endemic areas, with
Additional file 1: Table S1. Primers used for PvRBP2b gene amplification and sequencing. Table S2. Accession number, country of origin and referenced study for the 224 P. vivax isolates analyzed in this study. Table S3. Genetic mutations of near-full length PvRBP2b in different areas.