Tsetse flies circulating in the Shimba Hills National Reserve are mostly infected with T. congolense
Over the course of 10 days, we collected 3237 live tsetse flies, of which 108 (3.33%) had positive midgut infections. Of those, 62 (57.41%) had detectable metacyclic parasites in their mouthparts. By microscopy, and based on their larger size, we did not detect T. vivax parasites in the proboscis, only T. congolense. However, given the challenge in distinguishing African trypanosome species by microscopy, the possibility of mixed infections cannot be completely excluded. To address this, we performed whole-genome mapping.
We prepared six pools of 10 probosces, extracted RNA and sequenced them on the Illumina MiSeq platform. One of the samples was discarded post-sequencing due to low quality. For each sample, we obtained 7 to 13 million read pairs (mean ± standard error of the mean [SEM], 1.02E7 ± 9.73E05) (Table 1, of which 22.46% ± 3.99 mapped to the IL3000 genome and only 23.48% ± 2.70 mapped to the tsetse genome. As the trypanosome enrichment method performed at the cDNA library preparation stage (spliced-leader sequencing) cannot accurately select between African trypanosome species, it was necessary to check for contamination from T. vivax and T. brucei. We found a negligible number of reads mapping to these species (1.67% ± 0.23 and 0.46% ± 0.13, respectively), consistent with background noise. Therefore, we conclude that the probosces of the flies collected were mostly, if not only, colonised by T. congolense, and not by T. vivax or T. brucei.
To further investigate which T. congolense subspecies were present in our samples, we extracted all T. congolense reads mapping to gapdh gene sequences available from TriTrypDB or NCBI (TcIL3000.A.H_000478300 (savannah), TcIL3000.A.H_000155000 (savannah), TcIL3000.A.H_000478100 (savannah), TcIL3000.A.H_000775800 (savannah), AJ620287.1 (kilifi), AJ620288.1 (kilifi), AJ620289.1 (forest), AJ620286.1 (forest), AJ620285.1 (forest)). We found evidence of all three subspecies in our samples.
Expression profiles of mouthpart parasites display wide variation
We selected reads mapped to the T. congolense IL3000 genome, assembled the transcriptomes of each pool, and compared them. We observed that they were reproducible and correlated well (average r2 = 0.94 ± 0.04, Pearson correlation). This shows that mouthpart parasites from naturally infected tsetse flies collected in the Shimba Hills National Reserve have similar transcriptomes. We then examined how the transcriptomes from parasites collected from wild flies compared to mouthpart parasites from experimental infections and found that such reproducibility is lost. Specifically, our data correlate weakly with the transcriptomes previously published by us [5] and by Awuoche et al. [26] (r2 = 0.11 and r2 = 0.31) (Fig. 1a–c). Nonetheless, variation is large even between experimental infections, as the correlation between these two previously published transcriptomes is also low (r2 = 0.10), despite similar experimental approaches. Correlation analysis of the most-abundant 1000 transcripts from the average read counts of natural infection transcriptomes does not improve correlation with those obtained from experimental infections (r2 = 0.09). This suggests that there is a wide variation in the expressed gene profile of mouthpart transcriptomes.
To further understand the differences between transcriptomes, we searched for differentially expressed genes between parasites collected from experimental and natural infections. We did not detect any differentially expressed genes between natural infection transcriptomes and those from Awuoche et al. [27]. This was surprising given the low correlation between the transcriptomes, and is likely a reflection of the low number of replicates in the latter dataset, which reduces the statistical power of the analysis. In contrast, we identified 757 differentially expressed genes (9.5%, corresponding to 321 upregulated and 436 downregulated genes) between the present natural infections and our previous experimental infections [5] (Additional file 1: Table S1). These differences may be explained by strain diversity, infection mode, tsetse rearing conditions, and number of pooled probosces, among other factors.
To better understand the variation in gene expression profiles between the natural and experimental datasets, we compared the experimental and natural infection transcriptomes to previously published bloodstream-form transcriptomes, using limma-voom [21]. We detected 316 differentially expressed (DE) genes in the mouthpart transcriptome produced by Awuoche et al. [27], 2672 in the sample previously published by us [5], and 1208 genes in the transcriptomes from the natural infections presented in the current study. However, only 110 of the DE genes are shared between all three datasets; 431 are shared between our experimental and natural infection datasets; 39 between Awuoche’s and our natural infection transcriptomes; and 104 are common to both experimental datasets.
Together, the strong correlation between the five samples collected from wild flies, the weak correlation between mouthpart transcriptomes of different sources, and the small overlap between the genes DE in the mouthparts compared to the bloodstream suggest that there is wide variation in the expressed gene profile of mouthpart transcriptomes. However, the weak correlation between the experimental datasets, indeed of the same magnitude as the correlation between natural and experimental datasets, suggests that such variation is not due to the wildness of the flies.
Differential gene expression shows the abundance of hypothetical proteins upregulated in mouthpart parasites
Having established that trypanosomes from natural fly infections have a characteristic, but reproducible, expression profile, we proceeded to investigate how they differ from cultured T. congolense IL3000 bloodstream forms [28]. As previously described, these bloodstream-form transcriptomes were produced from parasites collected from the blood of infected laboratory mice at the first peak of parasitaemia and the ascending parasitaemia phase preceding it [28]. The library size (a proxy for sequencing depth) of mouthpart transcriptomes ranged between 2.48 × 106 and 1.16 × 107 per sample (average of 6.61 × 106 ± 2.48 × 106), whereas for bloodstream-form transcriptomes the library size ranged from 2.99 × 107 to 3.57 × 107 (average of 3.35 × 107 ± 1.90 × 106). Of the 10,315 transcripts annotated in the IL3000 genome, we detected 8835 transcripts with more than 10 read counts. Only these were considered in downstream analyses, as a measure to reduce the noise associated with low read counts (Fig. 2a). We have also normalised read counts for the differences in sequencing depth (Fig. 2b). As expected, we observed a clear separation between the expression profiles of mouthpart- and bloodstream-form parasites. The transcriptomes of mouthpart parasites are more variable than the bloodstream-forms, reflected by longer distances in the multidimensional scaling plot (Fig. 2c). This was predictable, as they potentially represent not only a mixture of parasite strains circulating in the Shimba Hills National Reserve, but also a mixture of parasites in different life stages (i.e. epimastigotes, pre-metacyclics, metacyclics).
We have subjected our samples to differential expression analysis using limma-voom [21] (Fig. 2d) and DESeq2 [22] (log2FC >|1.5| and P-value < 0.01). Limma-voom detected a total of 1208 differentially expressed genes (648 downregulated and 560 upregulated), whereas DEseq2 detected 1276 (Additional file 2: Table S2). Of these, 1062 were identified by both methods, corresponding to 88% and 83% of the total, respectively (Fig. 3a).
Our results suggest a strong contribution of epimastigote parasites to the mouthpart parasite population. The list of genes upregulated in mouthpart parasites is rich in hypothetical proteins, GTPase-related genes, zinc-finger proteins, and histones (Fig. 3b, red), which is consistent with previous observations in experimental tsetse fly infections [26]. Downregulated genes relate to VSG, several other surface genes, including ESAG2-like (most likely transferrin receptors), adenylate cyclases, ISG and RHS genes, and trans-sialidases (Fig. 3b, blue). This is also reflected within the top 25 most downregulated genes (Fig. 3c). Indeed, within the pool of DE genes, we found 111 genes to belong to 32 of the 74 previously described non-VSG cell-surface gene families [29]. Fam50, encoding brucei alanine-rich proteins (BARP), is the most represented cell-surface gene family amongst upregulated genes (N = 13), whereas Fam51, encoding adenylate cyclases, is the most common amongst downregulated genes (N = 20) Fig. 3d). Six gene families had members within both the up- and downregulated gene lists, which may indicate developmental regulation within these gene families. These were Fam46 (major surface protease–gp63), Fam49 (invariant surface glycoproteins), Fam51 (adenylate cyclases), Fam58 (MFS transporters), Fam63 (lipases), and Fam77 (hypothetical proteins conserved in T. brucei, T. congolense, and T. vivax).
We subjected the list of up- and downregulated genes to GO term enrichment analysis using topGO and the GO term annotation of the IL3000 genome available at TriTrypDB. We observed that the biological processes significantly enriched amongst the upregulated gene list are quite varied, including terms relating to mitosis, cell-wall organisation, glutathione catabolic process, or protein retention (Fig. 3e). In contrast, the downregulated processes relate to biosynthetic, metabolic, and signalling processes. In terms of molecular functions, we observed down-regulation of ribosome processes and mostly upregulation of DNA binding, protein heterodimerization activity and enzymatic functions (Fig. 3e). The enrichment analysis of cellular components also reflects lower protein synthesis in mouthpart forms, which is consistent with the quiescent state of metacyclic parasites (Fig. 3e). In contrast, there seems to be an increase in nucleus and nucleosome activity in mouthpart parasites, which was also observed by Awuoche et al. [26] in their comparison with cardia parasites. The GO term analysis confirms that mouthpart forms are less physiologically active than bloodstream parasite forms.
Interestingly, we found that 143 out of the 511 upregulated genes and 144 of the 491 downregulated genes are annotated as hypothetical proteins. Whilst 135 of these hypothetical proteins have already been detected by Awuoche et al. [26] as differentially expressed in mouthpart parasites compared to cardia parasites, we found 83 additional hypothetical proteins that are downregulated and 69 that are upregulated in mouthpart parasites compared to bloodstream forms (Additional file 3: Table S3).
At the infrapopulation level, variant antigen profiles of T. congolense metacyclic parasites show a partly reproducible expression pattern
We then proceeded to investigate the metacyclic VSG repertoire and how it differs from that of bloodstream forms. An infected proboscis contains epimastigotes, metacyclic parasites, and a small percentage of long dividing trypomastigotes migrating from the cardia [30]. Whilst distinguishing between epimastigote and metacyclic transcriptomes is not possible without parasite purification or enrichment, we can analyse metacyclic VSG expression because VSG genes are not expressed by epimastigotes. We estimated the variant antigen profiles of each sample with VAPPER [6] (Fig. 4a). VAPPER allows the comparison of VSG diversity by searching for diagnostic protein motifs that allow grouping of individual VSG genes into 15 conserved phylotypes. By comparing the representation of individual phylotypes, we can detect changes in VSG expression patterns [5]. In the present study, we prepared pools of 10 probosces to ensure that we obtained sufficient material for reliable transcriptomes. As such, we committed to profiling the VSG repertoire across a cohort, rather than a clonal population. We observed that all five samples have an overall similar pattern of VSG expression, which is consistent with previous observations from experimental infections [5]. However, we detected subtle fluctuations in VSG phylotype (P) expression patterns (Fig. 4a). Specifically, we observed the highest variation in the abundance of P1, P8, P4, P15, and P10, whereas P14, P6, P5, P2, and P12 were the least variable between samples. We also estimated VSG diversity in the bloodstream-form parasites. These displayed a very high degree of reproducibility between replicates, reflecting their origin from a single infection with the same blood stabilate [28].
When we compared the expressed VSG profiles of metacyclic parasites to the average genomic VSG profile of T. congolense (estimated from 97 genomes, available from the VAPPER database [6]), quite considerable variation was revealed. We observed that all phylotypes, with the exception of P1, P3, P7, and P13, were statistically different (independent t-test, P-value < 0.01) (Fig. 4b). P2, P5, P6, P12, P14, and P15 were less abundant in the transcriptomes than in their relative frequency in the genomic repertoire (independent t-test, P-value < 0.001), whilst P4, P8, P9, P10, and P11 were more abundant (independent t-test, P-value < 0.01). The differences in abundance of phylotypes P4, P7, P8, P9, P11, P12, and P15 are consistent with previous observations from experimental infections with two different strains (1/148 and TC13) [5, 6].
The contribution of phylotype 8 to the expressed VSG profiles of metacyclic parasites (10.85% ± 1.94) is noteworthy, because it represents a very small proportion of VSG genes in the genome (0.94% ± 0.09), and likewise to the expressed repertoires of bloodstream-form parasites (3.98% ± 0.46). Upon more detailed analysis, we discovered that its abundance is mostly due to the high expression of gene TcIL3000.A.H_000381200 (formerly TcIL3000_0_09520). This VSG transcript is present in all samples from both mouthpart- and bloodstream-form parasites. Despite being slightly more abundant in metacyclic than in bloodstream-form parasites, it is not differentially expressed (log2FC = 2.32, P-value = 0.12).