Remarkable genetic diversity of Trypanosoma cruzi and Trypanosoma rangeli in two localities of southern Ecuador identified via deep sequencing of mini-exon gene amplicons.

BACKGROUND
Trypanosoma cruzi, the causative agent of Chagas disease, and T. rangeli are kinetoplastid parasites endemic to Latin America. Although closely related to T. cruzi and capable of infecting humans, T. rangeli is non-pathogenic. Both parasite species are transmitted by triatomine bugs, and the presence of T. rangeli constitutes a confounding factor in the study of Chagas disease prevalence and transmission dynamics. Trypanosoma cruzi possesses high molecular heterogeneity: seven discrete typing units (DTUs) are currently recognized. In Ecuador, T. cruzi TcI and T. rangeli KP1(-) predominate, while other genetic lineages are seldom reported.


METHODS
Infection by T. cruzi and/or T. rangeli in different developmental stages of triatomine bugs from two communities of southern Ecuador was evaluated via polymerase chain reaction product size polymorphism of kinetoplast minicircle sequences and the non-transcribed spacer region of the mini-exon gene (n = 48). Forty-three mini-exon amplicons were also deep sequenced to analyze single-nucleotide polymorphisms within single and mixed infections. Mini-exon products from ten monoclonal reference strains were included as controls.


RESULTS
Trypanosoma cruzi genetic richness and diversity was not significantly greater in adult vectors than in nymphal stages III and V. In contrast, instar V individuals showed significantly higher T. rangeli richness when compared with other developmental stages. Among infected triatomines, deep sequencing revealed one T. rangeli infection (3%), 8 T. cruzi infections (23.5%) and 25 T. cruzi + T. rangeli co-infections (73.5%), suggesting that T. rangeli prevalence has been largely underestimated in the region. Furthermore, deep sequencing detected TcIV sequences in nine samples; this DTU had not previously been reported in Loja Province.


CONCLUSIONS
Our data indicate that deep sequencing allows for better parasite identification/typing than amplicon size analysis alone for mixed infections containing both T. cruzi and T. rangeli, or when multiple T. cruzi DTUs are present. Additionally, our analysis showed extensive overlap among the parasite populations present in the two studied localities (c.28 km apart), suggesting active parasite dispersal over the study area. Our results highlight the value of amplicon sequencing methodologies to clarify the population dynamics of kinetoplastid parasites in endemic regions and inform control campaigns in southern Ecuador.

Background Chagas disease (CD) is caused by the kinetoplastid hemoflagellate parasite Trypanosoma cruzi and affects 6-7 million people worldwide [1]. This neglected disease is endemic to Latin America, where it poses risk of infection for > 65 million people and kills an estimated 50,000 every year [2]. Transmission in endemic countries primarily occurs via contact with the feces of an infected triatomine bug. However, other secondary mechanisms of transmission, such as blood transfusion, organ transplants, congenital transmission and laboratory accidents exist, and can cause infections in nonendemic regions where vectors are not present [3].
In Ecuador, Triatoma dimidiata, T. carrioni, Panstrongylus chinai, P. rufotuberculatus and Rhodnius ecuadoriensis are epidemiologically relevant vector species [18,19]. Rhodnius ecuadoriensis is widely distributed in the western lowlands and the southern provinces of the country, as well as in northern Peru [20], both in domestic and in peridomestic environments where contact with humans occurs [19,21].
Trypanosoma rangeli, a related kinetoplastid parasite whose geographical distribution overlaps with that of T. cruzi, may also be found in the same triatomine vectors and mammalian hosts. Mixed infections with both species are often reported [18,22,23]. Despite being considered non-pathogenic to humans, T. rangeli is epidemiologically relevant since it may cause false-positive results in microscopical and serological tests used for diagnosis of T. cruzi infection [24,25]. In contrast to that of T. cruzi, transmission of T. rangeli to vertebrate hosts occurs via the salivary route, although the possibility of transmission through infected feces has also been suggested [26,27]. Upon ingestion from the circulation of vertebrate hosts, T. rangeli trypomastigotes accumulate in the digestive tract of triatomines, replicate, transform into epimastigotes, cross the intestinal epithelium into the hemocoel and migrate to the salivary glands. Infective metacyclic trypomastigotes are released with the saliva during a blood meal [28]. Little is known regarding T. rangeli in Ecuador. PCR minicircle amplification of > 3600 samples revealed that 10% of triatomines and mammals from Manabí and Loja provinces were infected with T. rangeli, while 1.25% presented T. cruzi + T. rangeli coinfections [29].
The kinetoplast DNA of trypanosomatid parasites is composed of maxicircles and minicircles, whose number varies between species. Minicircles possess a conserved 100-200 bp repetitive region, whose number of repeats also differs among species. In T. cruzi, there are four copies of the conserved sequence [30]. Trypanosoma rangeli may present one, two or four copies of the mini-repeat. These three different minicircle classes in T. rangeli are known respectively as KP1, KP2 and KP3 [31,32]. Further analysis based on polymorphism of randomly amplified DNA (RAPD), sequencing of the small subunit of ribosomal RNA (SSU rRNA), the internal transcribed spacer of rDNA (ITS rDNA) and the intergenic region of the splice leader, has unveiled a more complex population structure for T. rangeli, with five groups defined as A, B, C, D and E, each of which shows a strong association with vector species and geographical distribution [27,33,34]. More recently, through the analysis of microsatellites and single-nucleotide polymorphisms (SNPs) of the splice leader gene, Sincero et al. [35] suggested the presence of three main groups in T. rangeli: (i) the Amazonian group, associated to Rhodnius brethesi and vertebrate hosts; (ii) the KP1(-) group, linked to the R. pallescens complex; and (iii) the KP1(+) group, related to the R. robustus complex.
Infection by T. cruzi in mammalian reservoirs and invertebrate vectors is frequently multiclonal, involving several intra-specific genotypes with genetically dissimilar profiles [36][37][38][39]. Multiclonal parasitic infections have been suggested to impact host immunity [40], disease transmission rate and population structure [41]. They can also mislead drug resistance evaluation, diagnostics and various other applications important to disease control [42]. localities (c.28 km apart), suggesting active parasite dispersal over the study area. Our results highlight the value of amplicon sequencing methodologies to clarify the population dynamics of kinetoplastid parasites in endemic regions and inform control campaigns in southern Ecuador. Keywords: Chagas disease, Mini-exon, Trypanosoma cruzi, Trypanosoma rangeli In the genus Trypanosoma, mini-exon genes are present in several tandemly arranged copies and encode the splice leader, a 35-nucleotide sequence translocated to the 5'-end of every newly synthesized mRNA [43,44] in a process referred to as discontinuous transcription [45]. Trypanosoma cruzi mini-exon genes include conserved, semi-conserved and highly variable regions, which have been used for phylogenetic analysis [46], discrimination between DTUs [47], population genetic inference [48], and more recently, in diversity analysis within naturally infected mammalian hosts [49]. Nonetheless, it is not clear how mini-exon sequence diversity relates to DTU identity, and restriction fragment length polymorphism (PCR-RFLP) assays targeting the mini-exon locus are not considered reliable for T. cruzi DTU detection and identification [50]. Given its variability, however, sequencing of the intergenic spacer has proven useful in the examination of population dynamics within a single DTU [50][51][52][53][54][55].
Next-generation sequencing (NGS) has helped provide insights into multiclonal T. cruzi infections in humans [56] and to assess relationships between gut microbiota, parasite diversity, and vertebrate feeding sources of triatomine bugs [57]. In the present study, we aimed to evaluate the molecular diversity of T. cruzi and T. rangeli in two localities of southern Ecuador via analysis of the mini-exon gene. We analyzed 48 intestinal DNA extracts from triatomines collected in Loja Province as well as DNA from ten cloned isolates of reference strain DTUs I, III, IV, V and VI in order to identify relationships between mini-exon diversity and DTU identity. Mini-exon amplicon sequence reads generated by NGS were sorted into haplotype clusters to measure genetic richness, diversity, variability and substructure in the landscape. Moreover, the dataset included samples isolated from different developmental stages of the vector to evaluate a potential correlation with mini-exon diversity. We also compared genotyping results from NGS and PCR product size polymorphism analysis in terms of sensitivity for discrimination between T. cruzi and T. rangeli parasites.

Sample panel and PCR typing
Archived DNA samples collected between years 2009-2013 by the Center for Research on Health in Latin America (CISeAL) were employed in the study. Details of entomological searches, intestinal content DNA extractions and corresponding approved protocols and collection permits have been reported elsewhere [19,23]. Samples selected for the study had been previously genotyped by PCR amplification using primers 121 and 122, which anneal to highly conserved regions of the kinetoplast minicircle [58] and are useful to distinguish between T. cruzi (330 bp fragment size expected) and T. rangeli (380 bp). All samples appeared to be infected exclusively with T. cruzi based on visualization of 121/122 amplicons (data not shown).
The mini-exon gene is a multicopy target. DNA from 10 biologically cloned reference strains was therefore included in the analysis to evaluate whether intra-clonal sequence diversity generated from the amplicons via NGS recapitulated the DTU to which the clones had been previously assigned via classical methods [50].

Multiplex PCR
PCR amplification of the non-transcribed spacer region of the mini-exon gene was performed employing five primers in a multiplex reaction to discriminate between zymodemes I, II, III and T. rangeli, as previously described [47]. In the current nomenclature zymodeme I is equivalent to DTU TcI, zymodeme II comprises DTUs II, V and VI, while zymodeme III corresponds to DTUs III and IV. The amplified products were electrophoresed in 2% agarose gels and visualized with SYBR green under UV light. Samples co-infected with T. cruzi and T. rangeli displayed a two-band pattern. In such cases, gel excision and purification were performed for each fragment. Subsequently, amplicons were sequenced by NGS technology as detailed below.

Next-generation sequencing and bioinformatic analysis
Paired-end sequence reads were generated on the Illumina MiSeq platform (600-cycle Reagent Kit v3; Illumina, San Diego, USA) following a custom protocol [59]. Paired-end sequence reads were processed with Cutadapt v1.12 for adapter removal and Sickle v1.33 for quality trimming (-q 27) [60]. Trimmed forward and reverse reads were overlapped using default settings in Pandaseq v2.7 [61]. Eleven of 56 sample barcodes used during multiplexed sequencing in this study were also used for a separate study on 18S gene amplicons. These 18S amplicons were sequenced in the same MiSeq flow cell in order to save sequencing costs. We discarded all sequences containing 18S k-mer matches after the Pandaseq overlap step. Sequences with > 98% sequence similarity were then clustered to consensus haplotypes based on the UPARSE-OTU algorithm implemented by the 'cluster_otus' command in USEARCH v8.1 [62]. Chimeric sequences were discarded using the UCHIME algorithm and clusters represented by < 200 reads were discarded to avoid artefactual SNPs. Samples for which the 'usearch_global' command assigned < 20,000 reads to the remaining set of consensus sequences were also discarded based on rarefaction curves computed from cluster size annotations using 'fasta_rarify' . Finally, parasite species and DTUs were determined for each haplotype by searching the complete nucleotide collection (nr/nt) with the Basic Local Alignment Search Tool (BLAST) at the National Center for Biotechnology Information (NCBI). All sequences with ≥ 90% similarity to trypanosomatid mini-exon sequences were kept for further analysis. (Additional file 1: Table S1 shows raw data for cloned reference strains and Ecuadorian samples).

Richness and diversity analysis
Richness was defined as the total number of haplotypes found in each sample. Shannonʼs index was defined as -Ʃ(Piln(Pi)) [63]. As mentioned previously, samples with mixed infections showed a two-band pattern in agarose gels. In these cases, gel excision rendered two sequencing templates and the information obtained was merged (addition of reads from each template) (Additional file 2: Table S2). For each sample, richness and diversity were calculated for both parasite species (T. cruzi and T. rangeli) independently and combined. Data corresponding to samples TBM2823, TBR1487 and TBR1489 were excluded from this analysis because they are the only representatives of instar IV. TBR1446 and TBR1503 were also excluded to avoid bias because the sequence reads from these samples has been merged across four sequencing reactions.

Statistical analysis
An analysis of variance (ANOVA) was performed when data showed normality and homoscedasticity. When data violated normality, a Kruskal-Wallis test was employed, followed by a Bonferroni-adjusted Dunn's post-hoc test. A P-value of ≤ 0.05 was considered significant.

Genotyping by multiplex PCR and NGS
Among the 48 samples analyzed by multiplex PCR, agarose electrophoresis patterns indicated that 37.5% (n = 18) of the samples were co-infected with T. cruzi and T. rangeli, while 52.1% (n = 25) presented infection exclusively with T. cruzi and 6.25% (n = 3) were infected only with T. rangeli. Samples TBM2873 and TBM2983 showed faint bands in agarose gels and no clear pattern was obtained. Meanwhile, NGS was performed only on 43 samples from the Ecuadorian dataset; 5 were not sequenced due to technical issues. An additional 9 samples were excluded from further analysis because they did not reach the 20,000 reads threshold, indicating insufficient sequencing depth. Among the 34 samples for which NGS was successfully performed, taxonomic assignment in GenBank identified sequence types associated with both T. cruzi and T. rangeli in 25, while 8 were deemed to contain exclusively T. cruzi-like sequences, and only one had exclusively T. rangeli-like sequences (TBR1455). Remarkably, sequence types matching DTU TcIV were found in samples TBR1410, TBR1422, TBR1445,  TBR1475, TBR1492, TBR1510, TBM2873, TBM2903 and  TBM2983 (Table 1, Fig. 3).
Mini-exon sequence types distinctively associated with monoclonal strains TcI, TcIII, TcIV and TcV/VI were used for the construction of a neighbor-joining (NJ) tree alongside selected sequence types derived from Ecuadorian strains and T. rangeli. Haplotypes with more than 10,000 reads were selected, although key sequences associated with a specific DTU and Ecuadorian TcIV representatives were included in the alignment (Additional file 3: Alignment S1). The tree topology shows five robustly supported clusters, one corresponding to T. rangeli, separated from four corresponding to T. cruzi. The TcI-like sequences cluster together in one branch, separated from TcIV-like, TcIII-like and TcII/V/VI-like sequences (Fig. 4). Monoclonal samples from reference strains are displayed on the x-axis. The y-axis includes distinctive sequence types identified in this study that showed association with reference strains (haplotypes seen almost exclusively within a single monoclonal sample as representative of a DTU). Color intensity is proportional to the number of reads, here standardized for each row (sequence/ haplotype) and displayed as a z-score  . 3 Sequence types associated with T. cruzi and/or T. rangeli amplified from intestinal content of Ecuadorian triatomines. The y-axis shows the sequence types and their assigned DTU according to their similarity with GenBank reference sequences (> 98%). Samples are shown in the x-axis, indicating their assigned species and DTU. Color intensity is proportional to the number of reads, here standardized for each row (sequence/ haplotype) and displayed as a z-score. * Indicates that the number of reads for these samples is too low to be visualized in the z-score adjusted heatmap. Among these, ** TBM2873 is the only sample where a TcI/TcIV co-infection was detected NJ analysis including Ecuadorian T. rangeli haplotypes yielded a rather uninformative clustering with several polytomies. Therefore, only selected T. rangeli haplotypes were included along with available sequence data retrieved from NCBI (Fig. 5, Additional file 4: Alignment S2, Additional file 5: Table S3). The Ecuadorian haplotypes clustered with KP1(-)/C reference sequences. Although bootstrap values are low for most branches, KP1(+), KP1(−) and groups A, B, C and E [34] are also evident in the tree.

Parasite richness and diversity
Chao richness and Shannonʼs diversity indices were calculated for each parasite species within each sample. No significant differences in richness or diversity were found between samples from the two studied localities (Bramaderos and Bellamaria) for either parasite species. Analyses were performed merging both male and female adults and with adults from each sex taken independently. In both cases, we found significant differences for T. rangeli richness across developmental stages (Kruskal-Wallis H-test: χ 2 = 8.4478, df = 3, P = 0.0376 for male/female independent analysis, and Kruskal-Wallis H-test: χ 2 = 6.8417, df = 2, P = 0.0327 for the merged group). The Bonferroni-adjusted Dunn's post-hoc test showed significant differences between instar V nymphs and adult females (P = 0.0124) and among instar V nymphs and the merged group of adult females and males (P = 0.0138). In both cases, instar V showed significantly higher richness than the remaining developmental stages.

Comparison of parasite identification/genotyping methods
We compared the results from three different types of molecular analysis for 48 intestinal content DNA samples from triatomine bugs collected in two rural communities of Loja Province, southern Ecuador. All samples selected for the study had been previously characterized as infected with T. cruzi based on PCR product size polymorphism of kinetoplast minicircle sequences, with no evidence of co-infection with the sympatric sister species T. rangeli. The sample set was subsequently analyzed by two additional molecular techniques; first by PCR product size polymorphism analysis of the non-transcribed spacer region of the mini-exon gene, then by deep sequencing of these mini-exon amplicons. Incongruences were detected among the results obtained by the three methods, and our results suggest that simple amplicon visualization of kinetoplast minicircle sequences may not suffice for detection of T. cruzi + T. rangeli co-infections.
The PCR-based method for mini-exon gene analysis without sequencing [47] uncovered 18 instances of T. cruzi + T. rangeli co-infections in our sample panel, which the kinetoplast minicircle PCR [58] did not identify. Mixed infections of T. cruzi and T. rangeli in vector and mammalian hosts are frequent [18,23,65,66] and several groups have reported that minicircle kinetoplast PCR product visualization techniques are inadequate to identify them [66][67][68][69]. Previous studies have applied this same method in southern Ecuador and reported ~10% prevalence of T. rangeli infection in sampled triatomines and mammals and just ~1.25% in co-infection with T. cruzi [29]. In stark contrast, NGS revealed T. rangeli-like sequences in 26 out of 34 (76.47%) infected samples characterized in our study.

Trypanosoma rangeli and T. cruzi genetic diversity
Our sample panel was heavily biased toward R. ecuadoriensis (45 out of 48 DNA samples were isolated from this vector species), one of the two most important vectors of CD in the study region and in Ecuador at large [70]. A close association between T. rangeli and Rhodnius spp. has been established in the past, where 12 out of 15 species of the genus have shown vectorial capacity [26]. Our results suggest that previous reports regarding T. rangeli infection in R. ecuadoriensis may underestimate the presence of this parasite in southern Ecuador, where a wider distribution of T. rangeli in Loja, and perhaps the rest of the country, may have been overlooked. Furthermore, although only a ~100-bp fragment was analyzed, a wide T. rangeli genetic diversity was encountered (79 haplotypes in 26 triatomines). Consistent with previous reports, where 99% of Ecuadorian T. rangeli were classified as KP1(−), and 80% as lineage C [29], NJ analysis clustered all Ecuadorian sample haplotypes with KP1(-)/C reference sequences. Low bootstrap values in this analysis may arise from the small size (~100 bp) of the T. rangeli amplicon and the limited number of informative SNPs found within the sequence (some haplotypes differ by only one SNP). Notably, we did not identify any KP1(+) haplotypes, although previous reports indicate KP1(+) and (−) lineages are sympatric and may be found in the intestinal contents of Rhodnius spp. [71]. Co-infection with T. cruzi and T. rangeli occurred in 73.5% of analyzed triatomines. Despite earlier reports suggesting T. rangeli infection is pathogenic to triatomines, more recent studies point to the need of more solid evidence before generalizing this claim [72]. Indeed, recent reports suggest co-infection increases R. prolixus survival, reproduction and fitness, which in turn would favor transmission of both parasite species [73]. The interplay of the infection with these two trypanosomatids in R. ecuadoriensis and its possible impact over the epidemiology of Chagas disease in the south of Ecuador warrants further investigation.
Sequencing of the non-transcribed spacer of the miniexon gene has proven useful in phylogenetic analysis and has provided valuable insights for DTU discrimination while analyzing T. cruzi intra-strain variability [10,48,51,55]. Recently, deep sequencing of the non-transcribed spacer has revealed high levels of T. cruzi diversity in the  Table S3 shows the sequences retrieved from the NCBI database) intestinal content of T. dimidiata [57]. Similarly, Herrera et al. [49] recently employed NGS sequencing of the mini-exon gene to analyze the genetic diversity of T. cruzi naturally infecting a group of captive non-human primates in the southern USA. In our study, 14.5 T. cruzi and 14.4 T. rangeli haplotypes were detected on average per triatomine. However, we do note from our analysis of monoclonal reference samples that mini-exon sequences may not accurately resolve T. cruzi phylogenetic diversity, thus reflecting to an extent the lack of reliability of this locus in identifying DTUs. From monoclonal strain haplotypes, our data showed that some low abundance sequence types are shared by almost all DTUs, with important implications for those using the mini-exon splice leader locus to define intra-host DTU diversity (Additional file 1: Table S1). Therefore, it is also difficult to say with confidence that the high levels of T. cruzi diversity found in our samples result from multiclonal infections, given that multiple divergent copies are present per genome. Nonetheless, a subset of high abundance sequence types could be associated with different DTUs and it is upon these that we base our discrimination between TcI and TcIV in this study.
TcI is the predominant DTU in Ecuador [13][14][15]23]. As expected, TcI-like haplotypes were widely represented in the deep sequencing analysis of the mini-exon gene performed for the Ecuadorian samples. Notably, we identified sequences matching DTU TcIV in nine of the Ecuadorian samples. Two TcIV-like haplotypes displayed a 99% match with an NCBI entry originated from a TcIV isolate. The other two TcIV-like haplotypes found also showed high homology (97% and 98%). Additionally, NJ analysis clustered the Ecuadorian TcIV haplotypes with discriminatory TcIV sequences available from monoclonal strains with known DTU identity. Strong bootstrap values support this finding; therefore, we are confident of the identification of TcIV haplotypes in our sample panel. To the best of our knowledge, this constitutes the first report of TcIV in Loja Province and is a testament to the power of amplicon sequencing to uncover 'hidden' T. cruzi diversity. Only two previous reports of DTUs different from TcI in Ecuador exist. One of them, from the early 2000s, employed multilocus enzyme electrophoresis to identify zymodemes II and III [16]. The second report provided serological evidence using a TcII/V/VI-specific epitope [17]. Our findings expand the distribution of DTU TcIV to the southernmost province of Ecuador, where its presence had not previously been reported.
No major differences were found between the studied localities in terms of parasite genetic richness or diversity.
Both communities are only 28.3 km apart and infected livestock, small mammals, passive transportation of triatomines or human movement may cause active parasite dispersal, homogenizing the parasite populations. In addition, no significant differences in T. cruzi richness or diversity were detected across vector developmental stages. The hostile environment in the vector's anterior midgut has been previously suggested to create a bottleneck [74], which could reduce the diversity of T. cruzi. In contrast, the significantly higher T. rangeli richness found in instar V compared to adult triatomines is consistent with the notion of this parasite being pathogenic for triatomines [75,76], which could affect their development into adulthood.
Dumonteil et al. [57] reported evidence of T. cruzi clone accumulation in T. dimidiata as a result of subsequent feeding events by this vector. The data do not support the occurrence of such accumulation in our sample set, which may be linked to the presence of multiple copies, and variants, of mini-exon sequences within clones, which obscure the signal of accumulating clonal diversity over multiple infected blood meals.

Conclusions
Our results indicate that techniques based solely on PCR amplification of minicircle or mini-exon sequences are not optimal for detection and characterization of mixed infections with kinetoplastid parasites in triatomines. Employing deep sequencing of the non-transcribed spacer region of the mini-exon gene, we have encountered 73.5% (considering the 34 samples with NGS results) of studied triatomines to be co-infected with T. cruzi and T. rangeli, a much higher prevalence for T. rangeli infection than previously reported for vectors in the study region. Haplotypes of T. rangeli found in Ecuadorian samples were characterized as KP1(-)/C, in congruence with previous reports. Additionally, we provide the first report for the presence of DTU TcIV in Loja Province. In conclusion, we demonstrate the power of NGS to scrutinize parasite populations with high resolution. However, we do note that the miniexon marker must be approached with caution, and further deep sequencing analyses targeted at additional markers, potentially conserved house-keeping genes [77], are required for confirmation. Nonetheless, the information presented in this report is of value in the characterization of the dynamics of T. cruzi transmission in southern Ecuador and must be considered in disease control efforts in the region.