Phylogeography and population differentiation in Hepatozoon canis (Apicomplexa: Hepatozoidae) reveal expansion and gene flow in world populations

Hepatozoon canis is a protozoan transmitted to dogs and other wild carnivores by the ingestion of ticks containing mature oocysts and is considered the principal cause of canine hepatozoonosis in the world. Here, we examined ribosomal RNA 18S gene sequence variation to determine the genetic differences and phylogeographic diversity of H. canis from various geographical areas around the world. We used 550 publicly available sequences of H. canis from 46 countries to assess haplotype relationships, geographical structure, genetic diversity indices, and relationships among populations. We performed neutrality tests and pairwise comparisons of fixation index (FST) values between groups and pairwise comparisons of FST values between populations. To determine whether populations are structured, analyses of molecular variance (AMOVAs) and spatial analysis of molecular variance (SAMOVA) were performed. The dataset of H. canis yielded 76 haplotypes. Differentiation among populations indicated that there is no phylogeographical structure (GST = 0.302 ± 0.0475). Moreover, when samples were grouped by continents a significant FST was obtained, meaning that populations were genetically differentiated. The AMOVA showed that 57.4% of the genetic variation was explained by differences within populations when all locations were treated as a single group and revealed that there is no population structure when populations are grouped into two, three, and four groups (FCT, p > 0.05), suggesting that dispersal between populations is high. SAMOVA revealed significant FCT values for groups K = 5. The Tajima’s D and Fu’s Fs show that populations have undergone recent expansion, and the mismatch distribution analysis showed population expansion (multimodal distribution). The current molecular data confirmed that H. canis does not show phylogeographic or population structure. The haplotypes exhibit low genetic differentiation, suggesting a recent expansion due to gene flow among populations. These results provide pivotal information required for future detailed population genetic analysis or to establish control strategies of this parasite.

few compared with other taxa like mammals, amphibians, birds, and plants. Molecular detection of H. canis is widely reported in different studies in several regions around the world; however, the genetic diversity and phylogeography of these hemoparasites is understudied [1,15,16]. In this article, we examined ribosomal RNA 18S gene sequence variation to determine the genetic differences and phylogeographic diversity of H. canis from various geographical areas around the world. For H. canis, we can infer that dispersion around the world could be related to the movement of dogs associated with human migration; hence, we hypothesized there is no population or phylogeographic structure. Thus, we aimed to test whether the migratory movements in humans is associated with the population structure of eukaryotic parasites, using Hepatozoon as a model of study.

Sequence selection
To define the final dataset of sequences included in this study, several filtering steps were applied as follows: (1) a total of 1170 sequences were downloaded from the ENA/ GenBank database using as searching criteria "Hepatozoon canis" to discard any sequences not identified to species level (Hepatozoon sp.); (2) 55 sequences without complete metadata lacking host and country of origin were discarded; (3) sequences with duplicated accession number (redundant) or missed annotations from parasites different from H. canis were not selected; (4) considering an observed wide range in sequence length between 140 bp and 3.1 kbp, sequences < 500 bp were omitted from further analysis. The remaining sequences had a median of 602 bp (533-677). During this step, sequences from genetic regions other than the ribosomal 18S gene were discarded. (5) Finally, the remaining 618 sequences were aligned, and sequences with coverage < 50% (in the final alignment with length of 589 bp) were discarded. The custom R scripts used for the sequence selection are available at: https:// github. com/ Victo rHJD/ Hepat ozoon_ Phylo geogr aphy. The script for selection was run in R version 4.0.3 [29].

Haplotype relationships
To infer genealogical relationships among H. canis haplotypes, statistical parsimony networks (haplotypes network) were constructed using the program TCS 1.2.1 [32]. The aligned sequences were mapped to estimate gene genealogies with gaps treated as missing data and a 95% connection probability limit. Loops were resolved following the criteria given by Pfenninger and Posada [33]. The full dataset was processed blindly and treated independently from their host of origin. The most likely number of genetically differentiated clusters was estimated using BAPS (Bayesian Analysis of Population Structure, version 6.0) [34]. BAPS assesses the most likely number of genetically different clusters using the module for linked molecular data. The codon linkage model, appropriate for our sequence data, was applied. For the chosen marker, the probability of a different number of genetic clusters (K = 2 to K = 10) under the independent loci model was evaluated. Two independent runs with ten replicates for each K, accepting the partition with the K value that had the highest likelihood and posterior probability (PP) provided the final result.

Geographical structure, genetic diversity indices, and relationships among populations
Population diversity indices for unordered (h S , h T ) and ordered haplotypes (v S , v T ) and differentiation parameters (G ST , N ST ) were estimated using software PER-MUT 1.0 [35]. The within-population diversity (h S ), total diversity (h T ), geographical average haplotype diversity (v S ), geographical total haplotype diversity (v T ), level of population differentiation at the species level (G ST ), and an estimate of population subdivisions for phylogenetically ordered alleles (N ST ) were calculated. G ST and N ST are often used to assess the geographical structure affecting population differentiation. Significant differences between N ST and G ST parameters were tested with 10,000 permutations. If N ST is significantly larger than G ST , this means that two alleles sampled from the same region are phylogenetically more closely related than two alleles sampled from different regions, evidencing the presence of a significant phylogeographical signal in the data [36].
Neutrality tests and pairwise comparisons of F ST values between continents were calculated using ARLEQUIN 3.11 [36]. Pairwise comparisons of F ST values between populations were calculated with 1000 permutations. Populations with three samples or fewer were excluded as required for this analysis.
To determine whether populations were structured, analyses of molecular variance (AMOVAs) [37] were performed based on pairwise differences. The AMOVA was carried out considering all the populations across the sampling area as a single unit. Furthermore, hierarchical AMOVA was performed with populations treated as (i) grouped into two, New world and Old world (America, Europe + Africa + Asia), or (ii) three (America, Africa, Asia + Europe), according to the hypothesis that H. canis arises in Eurasia and spreads to Africa and America, or (iii) four groups (America, Africa, Asia, Europe) similar to the episode where the four geographic regions (continents) have been isolated long enough to differentiate populations. The AMOVAs were performed using ARLEQUIN 3.11 with 16,000 permutations to determine the significance of each AMOVA. Additionally, spatial analysis of molecular variance (SAMOVA) was performed using SAMOVA 1.0 [38] to identify clusters of populations that are geographically homogeneous and genetically differentiated.

Demographic history
Molecular diversity indices, including the number of haplotypes (N H ), gene diversity (h), nucleotide diversity ( π ), and pairwise comparisons of F ST values between populations, were calculated using ARLEQUIN 3.11 [36]. We used mismatch distributions to assess population expansion and time variation in effective population size (Ne). The historical population expansions were analyzed with Tajima's D (D T ) [39] and Fu's (F S ) [40]. Negative values in D T and F S that result from an excess of rare alleles present at low frequencies indicate that populations have undergone recent expansion. Neutrality tests were performed in ARLEQUIN 3.11, with 10,000 permutations [36]. Mismatch distributions [41] were calculated using the sudden expansion model of Schneider & Excoffier [42], with 1000 bootstrap replicates. Mismatch distributions [43] and Harpending's raggedness index (Hri; [41]) were calculated in DnaSP v.6 [44]. The validity of the sudden expansion assumption was determined using the sum of squares differences (SSDs) and Hri [41], both of which are higher in stable, non-expanding populations [43]. Low and non-significant values of Hri and SDD indicated a good fit between the observed and the expected values of the sudden expansion model.
Ramos-Onsins and Rozas R 2 statistics were calculated based on the difference between the number of singleton mutations and the average number of nucleotide differences, where small positive values of R 2 are expected under a scenario of population expansion [45]. Significance was evaluated by comparing observed values with null distributions generated by 10,000 replicates, using the empirical population sample size and the observed number of segregating sites in the "pegas" package of R version 4.0.3 [29].
Finally, the Bayesian skyline plot [46] method implemented in BEAST 2.6.4 [47] was performed to further explore the demographic history and to estimate the timing of the population expansion in H. canis populations (four continents and all continents). The 550 sequences were used to run the analysis using the coalescent Bayesian skyline tree prior. This coalescent-based approach estimates the posterior distribution of effective population size (Ne) at intervals along a phylogeny, thereby allowing one to infer population fluctuations over time. The haplotype sequence alignments were imported in the program BEAUti2 (included in BEAST 2.6.4) in nexus format. The best nucleotide substitution model was estimated using Jmodeltest 2.0 [48]. The HKY + G substitution model was the best model identified for African, American and European H. canis, while that for Asia was TPM3uf. Finally, when populations were treated as a single group, the best substitution model was TPM3uf + G. Since the literature lacks estimations of substitution rates for H. canis, a substitution rate of 1.5% per 100 million years for the 18S rRNA used to study the lineage evolution of Toxoplasma gondii [49] was implemented in the analysis. Three independent runs of 10 million generations each were run, with trees and parameters sampled every 1000 iterations, with a burn-in of 10%. Results of each run were visualized using TRACER to ensure that stationarity and convergence had been reached [50].

A highly abundant haplotype of H. canis is widely spread worldwide
Statistical parsimony retrieved a well-resolved haplotype network that identified 76 haplotypes in the world representing four haplogroups (Fig. 1, Additional file 2: Figure  S1). The aligned 18S rRNA gene dataset for 550 samples of H. canis (mean length: 586 bp) yielded 76 haplotypes with different abundance along the geography (Table 1, Fig. 1a). The haplotype H1 (20.3%, 112/550 sequences) was the most common and widespread of the individuals and in 43.4% (20/46 total populations, considering each set of sequences from the same country as a different population), retrieved in populations located in the four continents. It was followed by haplotypes H7 (18%, 99/550 sequences) in populations located in the Old World and H9 (13.3%, 73/550 sequences) retrieved in populations located in the four continents. Haplotypes H13, H24, and H16 were exclusively found in Europe, and haplotype H41 was only retrieved in Africa and Asia (Fig. 1). The haplotypes did not cluster based on their host of origin, suggesting the lack of variability of the sequences between invertebrate and vertebrate hosts (Additional file 2: Table S1, Figure S2).
BAPS analyses with sequences indicated the existence of four genetic clusters (K = 4) (log marginal likelihood = − 4667.6928, PP = 1.0) (Fig. 1c). The four genetic clusters from the BAPS analysis distributed in different proportions among continents. The yellow cluster includes individuals from the Asian populations, exclusively. The light blue cluster gathers individuals from the African, Asia and European populations exclusively. The pink and magenta clusters include individuals from populations distributed in the four continents, confirming the lack of genetic structure (Fig. 1c). BAPS analysis was in agreement with the shallow genetic divergence and haplotype distribution shown in Fig. 1b.

Lack of phylogeographic structure of H. canis suggests gene flow between populations
Differentiation among populations based on 18S rRNA gene variation (mean ± SE, G ST = 0.302 ± 0.0475) indicated that H. canis is genetically subdivided. Genetic diversity across all populations (h T = 0.913 ± 0.0136; v T = 0.914 ± 0.0616) was higher than the average within-population value (h S = 0.637 ± 0.0463; v S = 0.550 ± 0.0682), indicating that populations are similar among them. PERMUT analysis showed that N ST was not significantly different (PERMUT: N ST = 0.399 ± 0.0578 vs. G ST = 0.302 ± 0.0475, perm: 10,000, p > 0.05) than G ST , indicating no phylogeographical structuring.
Hepatozoon canis is genetically differentiated between populations, as pairwise comparisons of F ST values were significant for the data set when samples were grouped by continents (Table 2). Europe was the continent that most differed from the other continents. The lowest differentiation was observed between Asia and Africa, while the highest difference was observed between Asia and Europe ( Table 2).
The AMOVA results showed that 57.4% of the genetic variation was explained by differences within populations when all locations were treated as a single group ( Table 3). The hierarchical AMOVA revealed that there is no population structure, with the lowest F CT value obtained when populations are grouped into two, three, and four groups with differences that were not statistically significant ( Table 3), suggesting that dispersal of H. canis between populations is high, thus increasing gene flow. SAMOVA results revealed significant F CT values for groups between K = 5. The highest F CT value was for K = 5 (Table 3). When K = 6, FCT was smaller than the F CT of K = 5, and an additional increase in the number of K led to a dissolution of group structure, and single-population groups were formed.
The Bayesian skyline plots suggest that population size effectively increased around 75,000-5000 years ago in  (Fig. 3). The African population was relatively stable over time, showing a slight increase around 5000 years ago. In contrast, the American population and Asia population increased around 75,000-5000 years ago. The Bayesian skyline plots from the European population and World population indicated a marked Ne increase around 25,000-5000 years ago, consistent with a more recent and faster population expansion (Fig. 3d, e).

Discussion
Our results showed no structured genetic patterns in H. canis populations, suggesting high levels of gene flow. Likewise, our results provide evidence of past and recent population expansions that could be due to human migrations carrying their cats and dogs as pets. These results are discussed in detail below. Since its first description in 1905 [51], H. canis has been reported in several countries of America, Europe, Asia, Africa, and Australia, [1,15,[52][53][54]. Most studies lack genetic information [54], reporting the presence of H. canis by histopathological or microscopic diagnosis [55], which complicates the better understanding of the genetic structure and gene flow of their populations. Our survey of 18S rRNA gene sequence data in populations of H. canis identified four haplogroups in the world (Fig. 1). However, haplogroup structure was not geographically congruent nor did it fit previous hypotheses proposed in the past [25,26]. The more frequently found haplotypes (H1, H4, H7, H9, H16, H23, H28, and H41) are distributed in multiple populations in the four continents, and they come from different hosts. This differs from other studies performed with parasites like Corynosoma australe [25] (cytochrome c oxidase subunit I (cox 1) sequences), Angiostrongylus cantonensis and A. malaysiensis [26] (partial 66-kDa protein gene sequence), or Plasmodium knowlesi [27] (cox-1 and 18S RNA sequences), which showed haplogroups defined in the network and with geographic congruence. However, Table 3 Results of AMOVA models on H. canis populations with no groups defined a priori (a) and (b) grouped into two groups, New World and Old World (America, Europe + Africa + Asia) and (c) grouped into three groups (America, Africa, Asia + Europe) or (d) four groups (America, Africa, Asia, Europe) according to geographical distribution in the continents ns not significant (p > 0.05), **p < 001, ***p < 0.0001 these studies used DNA fragments shorter than the data used in our present study. On the other hand, our results agreed with a previous study that reported that dogs carried the same or very similar H. canis haplotypes as red foxes, suggesting that both hosts participate in the same epidemiological cycle [56]. In contrast, in this study, the similarities in the haplotypes between host species suggest cross-species transmission, which could be due to (i) the mobility of diverse tick species and their host across the world and (ii) shared habitats of the different host species, both promoting greater gene flow between populations ( Fig. 1). Haplotype diversity (h), a measure of species evenness, was high for all populations (Table 4), which may indicate a sustained transmission of H. canis in the world over long time periods and high gene flow. Similar high haplotype diversity values have been reported for other parasites such as P. knowlesi [27], widely distributed in Southeast Asia. Nucleotide diversity was low in our study, irrespective of the samples geographic or host origins, suggesting that only minor differences occurred between the haplotypes observed, as has been suggested for other protozoa [57,58]. The haplotype diversity and distribution could be related to multiples factors associated with H. canis life cycle, transmission cycle, and dispersion capacity. Our results showed that genetic diversity across all populations was higher than within-populations and genetic differentiation analyses indicated no phylogeographical structuring. These results could be due to the existence of strong gene flow between populations as shown in other species (Cryptosporidium, Giardia, Fasciola) [59,60]. Inferred patterns of genetic differentiation in H. canis do not appear to be correlated with the geographical regions (Fig. 1) and could be associated to diverse human migration events (travel and trade) as reported with other dog pathogens [24,60].
F ST pairwise comparisons, AMOVAs, and BAPS (Tables 2, 3, Fig. 1C) suggest low levels of genetic structuring and the existence of four genetic groups. Altogether these results suggest that dispersal of H. canis between populations is high. This pattern is commonly reported for other taxa (e.g. Psittacanthus calyculatus, canine distemper virus, Corynosoma austral, Fasciola hepatica) [24][25][26]59]. Our results showed low structure and high genetic flow among domestic and wildlife hosts, and among populations worldwide. However, more studies are required to predict the most probable reservoirs for this parasite.
Human activities have affected the geographical ranges of many host species, shrinking by loss of habitat or increasing through introduction of individuals into new regions and promoted spillover events. In parasitic nematodes, the impact of human activity is well documented, offering an opportunity to study how changes in host population size and connectivity shape the population genetics structure [61]. In this study, we infer that the geographical structure, genetic diversity, and relationships among populations in H. canis are affected by anthropogenic activities.
Our results showed that H. canis populations had a recent expansion (Table 4, Fig. 3). A signature of population expansion was also evident from the multi modal   shape of the pairwise mismatch distribution of the 18S rRNA genes (Fig. 2) and Bayesian skyline plots (Fig. 3). Despite our results confirming the hypothesis of population expansion, these findings cannot define the geographical origin of this expansion. A further analysis dating the evolutionary divergence within Apicomplexa phylum could provide a refined overview regarding the origin of H. canis. Historical demographic expansions were determined by analyzing the frequency distributions of pairwise differences between sequences [36,40]. Neutrality tests with Tajima's D and Fu's Fs statistics estimate the deviation from neutrality, based on the expectation of a constant population size at mutationdrift equilibrium. Here, a negative Tajima's D signifies an excess of low frequency polymorphisms relative to expectation, indicating population size expansion or positive selection [62]. Hepatozoon canis populations displayed a genetic pattern typical of a population that has undergone a recent expansion, and the range of expansion could be a recent phenomenon (5000 years ago), at least that shows the Bayesian skyline plots analysis for African population and when all population were analyzed like a group (Fig. 3). These populations may not have achieved the migration-drift equilibrium, as shown by the lack of phylogeographical structure [39,40,62]. The discrepancy in detecting population expansion, based on the Tajima's D, Fu's F S , and R 2 statistics, reflects different responses to past changes (e.g. population reduction, population subdivision, a recent bottleneck) or high gene flow between populations [63]. However, these changes could be recent in evolutionary times and could be related to anthropogenic factors, as reported in other studies carried out with canid pathogens [24]. Moreover, the ability of H. canis to spillover among host species is feasible given the wide range of carnivores that can be infected by these hemoparasites worldwide [1,16,[18][19][20].
In the mismatch distribution result, H. canis presented a multimodal pattern suggesting recent expansion. Furthermore, non-significant values for SSD mean that the observed data do not deviate from that expected under the model of expansion, and the non-significant raggedness index also indicates population expansion. These results suggest that population expansion occurred recently, except to Africa [43]. Demographic expansion supported by the statistics of demographic analysis and the mismatch distribution as well as the pattern shown by the haplotype network reflects the existence of many different and rare haplotypes (high value of h, Table 4, Fig. 1), possibly because not enough time elapsed to accumulate genetic differences. Metazoonosis, like hepatozoonosis, involves complex interactions among the hosts, parasite, and environmental factors [64]. Owing to the historical relationship between dogs and humans (since domestication in Siberia ∼ 23,000 years ago) [65], high mobility from the H. canis host favors dispersion and gene flow among populations. The latter is supported by several studies done with dogs and wild canids worldwide, which observed the interactions networks among parasites, vectors, and hosts [1,20,52,54].
The detection of H. canis in felids and capybaras highlights the lack of information on the transmission dynamics of H. canis into new hosts [6,8,66]. Moreover, it shows how disregarded the role of other little-studied tick species, such as A. mixum, as potential vectors has been [1]. Although these isolates in non-canid hosts represent incidental findings, this supports the hypothesis that H. canis moves between different host geographic areas and vectors favoring gene flow.

Conclusion
This study provides novel and significant insights into the phylogeography and population differentiation of H. canis in the world. The current molecular data confirmed that H. canis does not show phylogeographic and population structure, which could be due to its wide range of definitive host (domestics and wild canids), lack of clear understanding of the role of other non-canid species in the transmission dynamics, and the impact of human activities. Our results showed identical H. canis haplotypes co-occurring in several geographical regions and host species, indicating wide distribution of the parasite. The haplotypes exhibit low genetic differentiation, suggesting a recent expansion due to gene flow among However, we consider that the role of human migrations might cause the dispersion of vector and host. Finally, using more variable target regions such as mitochondrial and apicoplast genomes might improve resolution. We suggest these more variable regions should be applied in future studies to understand the natural history of H. canis and approach it from an eco-epidemiological vision.
Additional file 1: Dataset S1. GenBank accession numbers for the sequences used on this study.
Additional file 2: Figure S1. Statistical parsimony network of 76 rRNA 18S haplotypes found in 46 populations of H. canis worldwide. Black dots represent unsampled haplotypes. The frequency of each haplotype is represented by the size of the circle. Each haplotype is coded with a different color according to the continent. The numbers inside the haplotypes in the network indicate the number of individuals that share that haplotype. The four rectangles represent the four haplogroups reported in this study. Figure S2. Geographical distribution of host and statistical parsimony network of 76 rRNA 18S haplotypes found of H. canis in the world. a Distribution map of the hosts. Black dots represent sampled populations and pie charts represent hosts found in each sampling population. Section size of pie charts corresponds to the proportion of individuals with given hosts. b Haplotype network. Black dots represent unsampled haplotypes. The frequency of each haplotype is represented by the size of the circle. Each haplotype is coded with a different color according to the hosts. The numbers inside the haplotypes in the network indicate the number of individuals that share that haplotype. Table S1. Number of analyzed samples (n) for molecular marker (18S rRNA gene) and hosts found of H. canis.