North American import? Charting the origins of an enigmatic Trypanosoma cruzi domestic genotype

Background Trypanosoma cruzi, the agent of Chagas disease, is currently recognized as a complex of six lineages or Discrete Typing Units (DTU): TcI-TcVI. Recent studies have identified a divergent group within TcI - TcIDOM. TcIDOM. is associated with a significant proportion of human TcI infections in South America, largely absent from local wild mammals and vectors, yet closely related to sylvatic strains in North/Central America. Our aim was to examine hypotheses describing the origin of the TcIDOM genotype. We propose two possible scenarios: an emergence of TcIDOM in northern South America as a sister group of North American strain progenitors and dispersal among domestic transmission cycles, or an origin in North America, prior to dispersal back into South American domestic cycles. To provide further insight we undertook high resolution nuclear and mitochondrial genotyping of multiple Central American strains (from areas of México and Guatemala) and included them in an analysis with other published data. Findings Mitochondrial sequence and nuclear microsatellite data revealed a cline in genetic diversity across isolates grouped into three populations: South America, North/Central America and TcIDOM. As such, greatest diversity was observed in South America (Ar = 4.851, π = 0.00712) and lowest in TcIDOM (Ar = 1.813, π = 0.00071). Nuclear genetic clustering (genetic distance based) analyses suggest that TcIDOM is nested within the North/Central American clade. Conclusions Declining genetic diversity across the populations, and corresponding hierarchical clustering suggest that emergence of this important human genotype most likely occurred in North/Central America before moving southwards. These data are consistent with early patterns of human dispersal into South America.


Findings
Trypanosoma cruzi, the aetiological agent of Chagas disease, infects 6-8 million people in Latin America, while some 25 million more are at risk of acquiring the disease [1]. Parasite transmission to mammal hosts, including humans, can occur through contact with the faeces of hematophagous triatomine bugs. However, non-vectorial routes are also recognized, including blood transfusion, organ transplantation, congenital transmission, and oral transmission via ingestion of meals contaminated with infected triatomine feces [2,3].
T. cruzi (family Trypanosomatidae; Euglenozoa: Kinetoplastida) is most closely related to several widely dispersed species of bat trypanosomes [4]. Salivarian trypanosomes including medically important Trypanosoma brucei subspecies, represent a more divergent group [5]. The age of the split between the T. cruzi-containing and T. brucei-containing trypanosome lineages is thought to have been concurrent with the separation of Africa and South America/Antarctica/Australasia 100MYA [6], implying that T. cruzi and the other Schizotrypanum  species evolved exclusively in South America. Others propose an alternative origin of T. cruzi from an ancestral bat trypanosome potentially capable of long range dispersal [7]. Whilst the precise scenario for the arrival of ancestral Schizotrypanum lineages in South America is a matter for debate, the current continental distribution and genetic diversity of T. cruzi supports an origin within South America. Parasite transmission is maintained via hundreds of mammal and triatomine species in different biomes throughout South and Central America, as well as the southern states of the USA [8].
Biochemical and molecular markers support the existence of six lineages or Discrete Typing Units (DTU): TcI, -TcVI agreed by international consensus ( [9]. Each DTU can be loosely associated with a particular ecological and/ or geographical framework [10]. TcI is ubiquitous among arboreal sylvatic foci throughout the geographic distribution of T. cruzi and is the major agent of human Chagas disease in northern South America. Several molecular tools now identify substantial genetic diversity within TcI [11][12][13][14]. Importantly these new approaches consistently reveal the presence of a genetically divergent and homogeneous TcI group (henceforth TcI DOMpreviously TcIa/ VEN DOM ) associated with human infections from Venezuela to Northern Argentina, and largely absent from wild mammals and vectors sampled to date [14]. The origin of this clade is unclear, although recent work supports a  [12] sister group relationship with TcI circulating in North America (e.g. [12,13]). In this manuscript we have set out to evaluate the genetic diversity of TcI in North/Central America, undertaking a comparison with TcI diversity in South America, including TcI DOM . Our aim was to examine hypotheses describing the origin of the TcI DOM clade. We propose two possible scenarios: an emergence of TcI DOM in northern South America as a sister group of North American strains and dispersal among domestic transmission cycles, or an origin in North America, prior to dispersal back into South American domestic cycles, possibly anthropically. To provide further insight into this question we undertook high resolution nuclear and mitochondrial genotyping of multiple Central American strains (from areas of México and Guatemala) and included them in an analysis with other published data [11][12][13].
A panel of 72 TcI isolates and clones was assembled for analysis (Table 1) [11][12][13][14][15][16]. Of these, existing sequences and microsatellite data were available for 46 isolates [11,12]. Isolates were classified into three populations: TcI-NORTH-CENT , TcI SOUTH and TcI DOM . TcI NORTH-CENT includes samples from the USA, México, Guatemala and Honduras; TcI SOUTH corresponds to South America (Argentina, Bolivia, Colombia, Venezuela and Brazil) and TcI DOM with exclusively domestic isolates from Colombia and Venezuela, already known to correspond to a genotype with restricted genetic diversity: TcIa, as previously described by Herrera et al., (2007) [17] and VEN Dom , as described by Llewellyn et al., (2009) [13]. Additional DTU isolates (TcIII-TcIV) were included as out-groups in the mitochondrial analysis.
Isolates from México and Guatemala were characterized to DTU level via the amplification and sequencing of glucose-6-phosphate isomerase (GPI) as previously described by Lauthier et al., (2012) [18]. Subsequently, nine maxicircle gene fragments were amplified, sequenced and concatenated from the Méxican and Guatemalan strains according to Messenger et al., 2012 (excluding ND4) [12]. Phylogenetic analysis was also conducted as in Messenger et al., 2012 [12]. Nineteen nuclear microsatellite loci previously described by Llewellyn et al., 2009 [13], were selected based on their level of TcI intra-lineage resolution. Microsatellite loci were amplified across 21 unpublished biological stocks from México and Guatemala. Reaction conditions were as described previously [13]. Dendrograms based on multilocus allele profiles were constructed also according to Llewellyn et al., 2009 [13].
Maxicircle nucleotide diversity (π) was calculated for TcI NORTH-CENT , TcI SOUTH and TcI DOM respectively in DNAsp v5 [19]. Nuclear allelic diversity was calculated for the same populations using allelic richness (A r ) in FSTAT [20]. The resulting values are shown in Figure 1.
Nucleotide sequences per gene fragment are available from GenBank under the accession numbers MURF1 phylogeny was between TcI SOUTH and TcI DOM /TcI-NORTH-CENT (98% ML BS/0.98 BPP). However, this division is incomplete, such that a subset of South American strains is also grouped with TcI DOM and TcI-NORTH-CENT . Thus, it is not possible to conclude that TcI DOM maxicircle sequences nest uniquely among those from TcI NORTH-CENT strains. Conversely, a basal relationship of the TcI NORTH-CENT to TcI DOM is suggested at the level of nucleotide diversity by population (Figure 1), whereby TcI DOM <TcI NORTH-CENT <TcI SOUTH . Low standard errors about the mean in all three populations, but especially in TcI DOM and TcI NORTH-CENT , suggest that sample size had little impact on the accuracy of estimation between populations.
Distance-based clustering using the microsatellite dataset indicated the presence of several well defined clades ( Figure 3). Importantly in this case the monophyly of North-Central American isolates was corroborated, and TcI DOM clustered firmly within them (bootstrap 65%). By contrast, South American isolates fall into a divergent but diverse clade. Thus the nuclear data provide stronger support for divergence of TcI DOM from within TcI NORTH-CENT than the maxicircle phylogeny. Sample size-corrected allelic richness estimates are consistent with hierarchical patterns of clustering based on pair-wise genetic distances. As with the maxicircle dataset, there is a pronounced cline in diversity across the populations studied -A r TcI DOM < A r TcI NORTH-CENT < A r TcI South (Figure 1). Figure 2 Isolate grouping of 72 Trypanosoma cruzi I strains, as well as outgroups, based on nine concatenated maxicircle sequences. Bayesian consensus topology is displayed. Bayesian posterior probability analysis (BPP) was performed using MrBAYES v3.1. Five independent analyses were run using a random starting tree with three heated chains and one cold chain over 10 million generations with sampling every 10 simulations (25% burn-in). Decimal values (second number) on nodes indicate Bayesian probabilities for clusters. First number indicates the Maximum-Likelihood (ML) % bootstrap support for clade topologies, which was estimated following the generation of 1000 pseudo-replicate datasets. Branch colours indicate isolate origin. Isolates that show clear incongruity between nuclear genotype and maxicircle genotype are marked. Outgroup branches were cropped for ease of visualization, full branch lengths are show inset top right.

TcI dispersion into Central and North America
Using a 100 MYA biogeographic calibration point [6], molecular clock analyses point to the origin of T. cruzi (sensu stricto) 5 -1 MYA [21][22][23] and a most recent common ancestor for TcI at 1.3-0.2 MYA [22]. Reduced genetic diversity among North-Central American isolates by comparison to their southern counterparts is powerful evidence in support of others who suggest that TcI originated in South America [13,24]. The emergence of TcI in the South occurred prior to either migration across the Isthmus of Panama alongside didelphid marsupials during the Great American Interchange [25], or perhaps prior to northerly dispersal via volant mammals (e.g. bats).

Origin of TcI DOM
Recent findings indicate a close resemblance between TcI DOM isolates from the northern region of South America and parasite populations from Central and North America by the use of nuclear and mitochondrial markers [11][12][13]. Indeed SL-IR genotyping suggests a distribution for TcI DOM that now extends as far south as the Argentine Chaco, where multiple sequences have been identified from human and domestic vector sources [14]. Llewellyn et al., (2009) originally hypothesised that a distinct human/domestic clade could be maintained despite the presence of nearby infective sylvatic strains due to the low parasite transmission efficiency by the vector [13]. In this case multiple feeds by domestic vector nymphs are required to infect individuals, as such humanhuman transmission is far more common than reservoir host -human transmission. Originally this hypothesis was developed to explain the epidemiology of Chagas disease in Venezuela. However, TcI DOM is clearly widespread and recent data propose a date for its emergence 23,000 ± 12,000 years ago [11]. Branch colours indicate isolate origin. The three principal populations TcI DOM TcI SOUTH and TcI NORTH-CENT are shown on both map and tree. Red circles correspond to isolates from TcI DOM . Isolates that show clear incongruity between nuclear genotype and maxicircle genotype are marked with reference to Figure 2.