Prevalence and under-detection of gambiense human African trypanosomiasis during mass screening sessions in Uganda and Sudan

Background Active case detection through mass community screening is a major control strategy against human African trypanosomiasis (HAT, sleeping sickness) caused by T. brucei gambiense. However, its impact can be limited by incomplete attendance at screening sessions (screening coverage) and diagnostic inaccuracy. Methods We developed a model-based approach to estimate the true prevalence and the fraction of cases detected during mass screening, based on observed prevalence, and adjusting for incomplete screening coverage and inaccuracy of diagnostic algorithms for screening, confirmation and HAT stage classification. We applied the model to data from three Médecins Sans Frontières projects in Uganda (Adjumani, Arua-Yumbe) and Southern Sudan (Kiri). Results We analysed 604 screening sessions, targeting about 710 000 people. Cases were about twice as likely to attend screening as non-cases, with no apparent difference by stage. Past incidence, population size and repeat screening rounds were strongly associated with observed prevalence. The estimated true prevalence was 0.46% to 0.90% in Kiri depending on the analysis approach, compared to an observed prevalence of 0.45%; 0.59% to 0.87% in Adjumani, compared to 0.92%; and 0.18% to 0.24% in Arua-Yumbe, compared to 0.21%. The true ratio of stage 1 to stage 2 cases was around two-three times higher than that observed, due to stage misclassification. The estimated detected fraction was between 42.2% and 84.0% in Kiri, 52.5% to 79.9% in Adjumani and 59.3% to 88.0% in Arua-Yumbe. Conclusions In these well-resourced projects, a moderate to high fraction of cases appeared to be detected through mass screening. True prevalence differed little from observed prevalence for monitoring purposes. We discuss some limitations to our model that illustrate several difficulties of estimating the unseen burden of neglected tropical diseases.


Background
Human African trypanosomiasis (HAT, sleeping sickness) due to Trypanosoma brucei gambiense is a neglected, tsetse-fly borne parasitic disease that affects mainly remote and crisis-affected populations of sub-Saharan Africa [1]. Disease begins in a mildly symptomatic, haemolymphatic stage (stage 1) and within about 1-2 years progresses to the meningo-encephalitic stage 2, which is fatal unless treated and can leave sequelae [2,3].
Active case detection has been a mainstay intervention to control HAT since the 1920s [4]. It consists of crosssectional mass screenings, whereby entire communities (usually villages or urban neighbourhoods) are targeted for testing. The screening test is usually the Card Agglutination Test for Trypanosomiasis (CATT), though palpation of lymph nodes in the neck is also often performed (enlarged lymph nodes are a prominent sign of HAT). The confirmation and staging components of the complex diagnostic algorithm [5] are carried out either on site or at a fixed HAT treatment centre, depending on proximity and ease of patient transport. Staging and treatment are often done at the treatment centre, but stage 1 cases are increasingly treated at the community level.
Active case detection prevents disease progression to stage 2 through early treatment irrespective of symptoms; reduces mortality of stage 2 cases; decreases transmission intensity by reducing the infectious pool (humans are thought to be the main ecological reservoir [1]); creates community awareness; and generates an estimate of infection prevalence, the key indicator of HAT burden. Mass screening is empirically associated with reduction in transmission in various settings [6][7][8], and its decline in the post-colonial era is heavily implicated in the resurgence of HAT in the 1980s and 1990s [9][10][11].
Active case detection may be indispensible for HAT elimination [6,12]. However, attendance at screening sessions is often low, and diagnostic sensitivity is imperfect [13], limiting its impact. Furthermore, false positives due to imperfect specificity confound prevalence estimates. Here, we use modelling to estimate the fraction of cases detected during mass screening (henceforth referred to as the detected fraction) and the true infection prevalence based on data from three Médecins Sans Frontières (MSF) projects in Uganda and Southern Sudan. Estimates of the detected fraction and true prevalence are critical for evaluating the true impact of control programmes and measuring the unseen burden of this neglected tropical disease.

Data sources
We assembled aggregate data from screening sessions conducted in the Kiri (Kajo-Keji county, Southern Sudan), Adjumani and Arua-Yumbe (north west Uganda) MSF projects, previously described [14][15][16][17]. Data include village population size (estimated through census by home visitors), numbers screened and cases detected by stage. We excluded sessions that yielded zero prevalence in villages where no cases were detected throughout the project duration. The study was approved by the Ethics Committee of the London School of Hygiene and Tropical Medicine.

Conceptual framework
Model states and parameters are listed in Table 1. Let screening coverage c be the number of people screened divided by the total village population N; detected fraction the number of truly positive stage 1 or stage 2 cases detected (S 1,TP , S 2,TP ) out of all cases prevalent (S 1 , S 2 ); and observed prevalence the number of cases diagnosed (including false positives) in either stage (S 1,TP + S 1, FP , S 2,TP + S 2,FP ), divided by the number of people screened (cN).
We hypothesized that the relative probability ρ of attending screening during a session is higher for cases than for non-cases. Accordingly, as screening coverage decreases, the selection bias favouring cases should increase, yielding a higher observed prevalence at coverage c (for c < 1), compared to the prevalence measurable if c = 1. We can thus define a coverage-dependent ratio of observed prevalence for any screening coverage < 1, compared to observed prevalence when everyone is screened: Under this hypothesis, β c should increase exponentially as screening coverage decreases.
In addition, observed prevalence is biased upward by false positive tests (incomplete diagnostic specificity), and downward by false negatives (incomplete sensitivity), while the number of stage 1 and stage 2 cases is biased by stage misclassification (Figure 1).
In this paper we develop a static, stochastic mathematical model to predict the relationship between observed prevalence and true prevalence given a specific relative probability ρ of attending a screening session among cases compared to non-cases, which is a parameter we can estimate from field data. This model enabled us to estimate true prevalence and therefore the detected fraction. The different steps in the implementation of the model are outlined in Table 2, and described below.

Description of the mathematical model
The model predicts the number of stage 1 and stage 2 observed cases (S 1,obs and S 2,obs ) and the true cases among these (S 1,TP and S 2,TP ), based on a set of input parameters, including village population N, true number of prevalent cases S 1 and S 2 , screening coverage c, relative risk of attending screening among cases versus noncases ρ, and accuracy (sensitivity, specificity, probabilities of correct stage 1 and 2 classification) of the diagnostic algorithm, as estimated in previous work [13].
Because the number of prevalent cases in a village is often very small and in order to incorporate uncertainty in several parameters, the model was implemented stochastically. Accordingly, individuals in the population have a given probability of experiencing certain events (e.g. attending screening, being detected if positive); chance determines whether the event occurs. The stochastic variation is then examined over a large number of iterations of the model: best estimates and confidence intervals are generated from the distribution of predicted values. Furthermore, during each iteration fresh random values of certain parameters (e.g. diagnostic accuracy) are drawn from their distributions.

Cases and non-cases screened
The model firstly predicts the number of cases and noncases screened. If coverage = 1, everyone is screened. If coverage < 1, the situation is akin to sampling without replacement, with sample size = people screened (cN). The probabilities that the i th person screened will be a stage 1 case, stage 2 case or non-case are the product of ρ and the relative proportions of each type of patient in the remaining unscreened population, which change and thus must be updated after each person is screened. Accordingly, the number of cases predicted to be screened over the entire screening session is computed as follows: In the above equations, random numbers between 0 and 1 are sampled from a uniform distribution to determine whether an event occurs. The probabilities that the next person screened is a stage 1 or stage 2 case are, respectively: The number of predicted non-cases screened is the total sample cN minus cases screened: In cases where c > 1 (as can occur if people from surrounding villages also attend the screening session), we assumed that the entire village population was screened, i.e. c = 1 for the village in question; additional persons screened from outside the village are ignored in the model, as they do not contribute to the prevalence pool (and thus the detected fraction) within the village in question. MSF datasets specify the origin of cases detected and only cases from the village screened were considered in our analysis. However, when computing observed prevalence, all persons screened (including those from outside the village) were considered in the denominator, as MSF data do not contain the origin of persons screened. In both Uganda and Sudan projects, observed prevalence was also calculated in this way. Figure 1 Illustration of the relationship between true and observed prevalence during mass screening.

True cases detected
The number of true cases detected among those screened is given by the binomial probability of detection conditional on being screened (diagnostic sensitivity σ), applied to each case screened: However, some cases detected are misclassified in the wrong stage: False positive cases Out of non-cases screened, some are classified as false positives due to imperfect specificity: For completeness, we note that some false positives may be classified as stage 1, based on the relative proportion ω of stage 1 s among all false positives, which is highly dependent on the diagnostic algorithm being used: All other false positives are classified as stage 2: In practice, ω was estimated at zero in the MSF projects we analysed [13].

Predicted observed prevalence
The predicted numbers of cases observed include true and false positives, with some stage misclassification: S 1;obs;pred ¼ S 1;TP;pred À S 1;TP;mis;pred þ S 2;TP;mis;pred þ S 1;FP;pred ð14Þ Table 2 Steps in the implementation of the model Step 1 Step 2 Fitting procedure Predictions fitted against observed β c for the same coverage strata.
Predictions fitted against actual observed cases in screening session (S 1,obs and S 2,obs ).
Observed β c estimated based on a statistical model of field data. S 1 and S 2 candidate sets resulting in best-fitting S 1,pred and S 2,pred adopted as maximum likelihood estimates of true prevalence. Joint likelihood distribution informs confidence intervals.
Step 1: Estimation of the relative probability of attending screening (ρ) Estimation of observed to true prevalence ratio (β c ) based on field data We estimated the actual β c within each MSF project and for four screening coverage strata (5-24%, 25-44%, 45-64% and 65-84%), compared to coverage 85-115% as the reference stratum (while this reference stratum should theoretically consist only of screening sessions with c = 100%, in practice very few screening sessions achieved exactly this coverage, and we therefore adopted a wider range assuming that it was practically equivalent to 100%). We estimated β c based on screening data and a statistical model of the association between screening coverage and observed prevalence.
As observed prevalence distributions featured an excess of zeroes and were over-dispersed, a hurdle model [18,19] was used to estimate β c , consisting of (i) a first complementary log-log binomial component that models the probability of a non-zero prevalence, and (ii) a second negative binomial component (offset by the natural log of the number of people screened) that models the probability of a given discrete number of cases, conditional on prevalence being non-zero (i.e. on the first "hurdle" of zero having been crossed). This model provided a good fit to the data (results not shown).
In addition to screening coverage, all potential confounding variables available from the data (screening round [first versus subsequent], village population size, observed incidence rate in the six months before the mass screening, and project) were included in the hurdle model. Coefficient standard errors were adjusted for clustering due to repeated screening sessions within individual villages (to do this, "village" was set as the cluster variable).
So as to verify whether ρ differs in stage 2 versus stage 1 cases, we also stratified the hurdle model by stage, and modelled the association between screening coverage and the proportion of stage 2 diagnoses using an alternative group logit regression. Both these analyses (omitted for brevity) suggested no significant difference in β c according to stage; we thus assumed that ρ is equal for stage 1 and stage 2.

Estimation of ρ for each MSF project
We implemented the stochastic model described above to predict β c for various coverage values and for each MSF project, as a function of different values of ρ. For each candidate value of ρ in a large plausible range, we examined the distribution of β c generated from 10 000 runs of the stochastic model, and adopted the value of ρ that generated predicted values of β c that best fit those estimated for each site from the available data, i.e. the hurdle model. The value of ρ yielding the best fitting value of β c was selected by minimizing the squared deviation of the predicted β c compared to the actual β c , with actual values sampled from the uncertainty distribution provided by the coefficients of the hurdle model ( Table 1). The model was run using the diagnostic accuracy parameters specific to each project, sampled from their uncertainty distributions as computed in prior work, and input values of N = 10 000, S 1 = Uniform [1-50] and S 2 = Uniform [1-50] (the results were insensitive to input values of N, S 1 and S 2 ). The coverage values at which we predicted β c were also randomly selected from the distribution of screening session coverage values falling within each of the above coverage strata (5-24%, 25-44%, 45-64% and 65-84%).
Step 2: Estimation of the number of true prevalent cases We next inputted into the model, for each screening session, the project-specific ρ estimates derived above, sampled from their uncertainty distribution; the actual values of N, c and diagnostic accuracy specific to the session; and candidate sets of S 1 and S 2 values (from 0 to N). For each screening session, we evaluated each set of S 1 and S 2 values over 10 000 iterations, by computing how frequently the set of values yielded perfect predictions of observed prevalence, i.e. S 1;obs;pred ¼ S 1;obs;data AND S 2;obs;pred ¼ S 2;obs;data For each iteration that yielded a perfect fit, we also recorded the predicted true cases detected S 1,TP,pred and S 2,TP,pred if they did not exceed the total cases observed S 1;TP;pred ≤S 1;obs;pred AND S 2;TP;pred ≤S 2;obs;pred À Á , and those among these that were classified in the correct stage (S 1, TP,pred -S 1,TP,mis,pred and S 2,TP,pred -S 2,TP,mis,pred ). The set of S 1 and S 2 most frequently fitting the data was adopted as the best estimate for that screening session. 95% confidence intervals were computed by the method of profiles applied to a two-dimensional joint likelihood distribution [20].
Best estimates and uncertainty bounds for each project as a whole were computed by two alternative analysis approaches: (i) summing the best-fitting values of of S 1 and S 2 or S 1,TP and S 2,TP for each screening session over the project as a whole (no uncertainty bounds could be computed for this approach); and (ii) a bootstrapping routine, whereby we repeatedly sampled from the joint likelihood distributions of S 1 and S 2 or S 1,TP and S 2,TP for each screening session, totalled the randomly sampled values over all sessions in the project, and computed the median and 95% percentile interval of the resulting distribution of random project totals.
S TP /S is the detected fraction. We could not find a straightforward way to compute uncertainty bounds around this estimate, as it includes error from several sources arising from different statistical processes. However, we present alternative best estimates of detected fraction using either of the above estimation approaches.

Description of mass screening data Screening output
Altogether, 819 mass screening sessions took place in the three projects over the periods covered by the datasets used in this study. However, population data were missing for 203 sessions; 10 yielded zero prevalence in villages that also reported no cases throughout the project duration; and two had a coverage <5% and were assumed to be data entry errors. This left 604 sessions for the present analysis, performed in 246 villages (Table 3).
Screening coverage was highest in Kiri, where about half of screening sessions reported a coverage > 100%, suggesting people from neighbouring communities may have attended (Table 3)

Exploration of factors associated with observed prevalence
A hurdle model of factors associated with observed prevalence combining data from all projects (Table 4) suggested weak evidence of a trend in the association between screening coverage and occurrence of non-zero prevalence (log-log component): sessions with coverage <15% were about one third as likely to yield any HAT cases than sessions with coverage around 100%. The probability of non-zero prevalence also increased with village population size and previous observed incidence rate, but was lower in repeat screening rounds.
Among screenings that yielded non-zero prevalence (negative binomial component), there was also evidence of a trend in the association of screening coverage and prevalence, with β c increasing as a function of decreasing coverage, as hypothesized. Prevalence increased with previous incidence, but repeat screening rounds were associated with lower prevalence. Unlike in the log-log component, prevalence decreased with increasing population size (see Discussion). There was no evidence of interactions in either model component (data not shown).
Estimates of the detected fraction Estimated relative risk ρ of attending screening Table 5 shows adjusted estimates of β c based on a hurdle model of field data for each project, used in further steps of the analysis to β c . The fit of estimated ρ values was good ( Figure 2). The best estimates of ρ were 1.6 (95%CI 0.7-12.8) for Kiri, 2.5 (1.2-36.6) for Adjumani and 1.9 (0.9-4.0) for Arua-Yumbe, suggesting a consistent pattern across sites. These ρ estimates yielded β c values that provided a good fit to the β c values estimated from field data.

Estimated true prevalence and detected fraction
The estimated true prevalence using the best-fitting estimates from each session (approach i) was very similar to that observed (Table 6). True prevalence using bootstrapping estimates (approach ii) was almost equal to that observed in Adjumani and Arua-Yumbe, but was about double the observed in Kiri, though still below 1% in absolute terms; the proportion of stage 1 cases was estimated to be higher in reality than that observed, as expected due to the adjustment for stage misclassification, and the fact that most false positives would have been diagnosed as stage 2 ( Table 6): observed stage-specific prevalence differed from the true prevalence accordingly.

Discussion
This study outlines a potential method to estimate the extent of under-detection and the true infection burden of gambiense HAT, based only on observed data. Because of the extent of uncertainty as regards model parameters, estimates of detected fraction are quite imprecise, but suggest that between 20-50% of prevalent cases were not detected in the screening sessions Table 5 Adjusted estimates of β c (ratio of observed prevalence at coverage c to observed prevalence at coverage = 100%) for each project, by screening coverage stratum   Figure 2 Predicted versus observed β c (ratio of observed prevalence at coverage c to observed prevalence at coverage = 100%) values, by project, using the best estimate of ρ (relative probability of attending screening among cases versus non-cases). Vertical bars indicate 95% confidence intervals.
analysed. There appears to be no appreciable difference between observed and true prevalence. However, adjustment for incomplete specificity and stage misclassification suggests a higher ratio of stage 1 to stage 2 than that observed by programmes.

Interpretation of findings Internal validity of findings
The hurdle model is internally consistent: with the exception of population size (see below), associations of explanatory variables and prevalence in the log-log component are mirrored in the negative-binomial component. Furthermore, the log-log component supports the hypothesis of ρ > 1. If ρ = 1, the probability of a village featuring a non-zero observed prevalence should be linearly proportional to screening coverage. However, this probability is higher than expected based on coverage alone, consistent with self-selection of cases even at low coverage.
While increasing village population size was associated with a higher probability of non-zero prevalence, prevalence among non-zero screenings appeared to decrease with higher population. This apparently inconsistent finding may be explained as follows: (i) in fact, the probability of non-zero prevalence increases less than proportionately with increasing population size, meaning that, on a per capita basis, it is lower in large villages than small ones; (ii) in smaller communities, there may be a greater risk of chance extinction of transmission, and thus a greater frequency of zero prevalence, all else being equal; (iii) if cases are present in a small village, their very small number, not divisible below discrete units, affects the prevalence calculation (e.g. if two villages A and B both have one prevalent case, but A's population is 100 and B's 1000, the prevalence will be ten times higher in A); (iv) larger communities are usually administrative and economic centres, and attract infected migrants from rural areas; (v) village population size may not reflect the actual denominator at risk: it is likely that only a fraction of the population has a livelihood-dependent exposure to tsetse [21,22], and that this fraction may be smaller in larger, less rural communities where many people are engaged in trade or services: in other words, when considering the true population at risk, denominators might be more comparable across differently sized villages than it appears.

Under-detection
Overall, this study estimates that about 20-50% of prevalent cases potentially detectable fell through the net of active case detection, and that about a fourth of cases detected were not classified in the correct stage (however, most misclassification would be from stage 1 to stage 2, which would still guarantee effective treatment). Our model did not incorporate the final step of treatment, as our question concerned case detection specifically; furthermore, the MSF projects used a variety of regimens, including second-line regimens for patients with treatment failure. In national programmes without strong funding and technical support, screening coverage could be lower, and our findings thus reflect an optimistic scenario. In the Democratic Republic of Congo (DRC), the estimated detected fraction (including treatment) was <50% in most scenarios, and between 30% and 65% attended and were correctly diagnosed [23]. Screening coverage was 22-98% in other DRC sites (average 70-80%) [7,23], 47-93% in Equatorial Guinea [24], and 70-94% in Angola [25].
In the colonial era, HAT active case detection was successful due to largely coercive measures. Few recent studies discussing the barriers to and facilitators of screening attendance have been published. In the Republic of Congo, villagers reported that biomedical medicine was the main remedy against HAT, and did not trust traditional remedies [26]. In the DRC, communities' knowledge of HAT and its control was very good, but concern with drug toxicity and the stigma of public HAT diagnosis were prominent barriers [27]. Both studies found that cost of treatment was a barrier to service uptake; while MSF projects offered free testing and treatment, patients and families face transport costs, income lost, etc. In both Congo [28] and DRC, stage 2 HAT was often associated with sorcery, especially when the case was fatal: however, there was no evidence that this kept patients from seeking care. In the Ugandan sites we analysed, traditional healers were often a recourse, and working with these providers and communities was suggested as a way to improve screening attendance [29].

Other findings
In communities where a non-zero incidence was observed in the six months prior to the mass screening, there was a doubled probability of finding at least one case during active screening. Furthermore, past incidence was associated with observed prevalence.
There was no evidence that cases in stage 2 have a greater probability of attending mass screening than those in stage 1. This observation is somewhat unexpected: stage 2 cases, being more symptomatic, might be expected to have a greater probability of attending screenings. This finding, however, may not apply to passive case detection. Furthermore, early stage 2 cases may in fact be less prone to present with systemic symptoms like fever, pruritus or arthralgia than stage 1 cases [30].

Programmatic implications
While the uncertainty around the estimates of detected fraction (see below) hampers meaningful interpretation, it is clear that a considerable proportion of HAT infections remain undetected even in a well-resourced active case finding context. These cases would then go on to seed renewed epidemics once mass screening is scaled down, and, where no passive case detection is available, would probably die. Long-term control of HAT through mass screening thus probably requires very high screening coverage, underscoring the need for programmes to work closely with communities to ensure high acceptance and uptake and identify and address barriers to screening attendance. This is also justifiable from an economic standpoint, given that the costs of mass screening are mainly fixed (e.g. transport, human resources, information campaigns, programme overheads) rather than variable (i.e. per person screened).
This study suggests that, for purposes of assessing HAT burden and monitoring trends, calculating the observed prevalence based on detected cases and the number of people screened provides a reasonable approximation to the true prevalence. Furthermore, programmes should continue to use the observed incidence in different communities (as computed based on passive case detection, where available) as a guide for deciding where to focus mass screening efforts.

Study limitations
Estimates were subject to considerable uncertainty, which hampers interpretation of the key findings on detected fraction. The striking differences according to analysis approach are due to the very skewed likelihood distributions arising from the fitting procedures (data not shown): reporting the mode (best-fitting values) or median of these distributions changes the inference considerably. For completeness, we have chosen to report both, and suggest that reality lies somewhere in between. Furthermore, the model does not adequately deal with screening sessions featuring zero observed prevalence (37% of sessions analysed). If screening coverage is < 100%, various possible sets of S 1 and S 2 values could result in S 1,obs = 0 and S 2,obs = 0; in most scenarios, however, the set [S 1 = 0, S 2 = 0], i.e. zero prevalence, will by default yield the best fits and will thus be adopted as the best estimate, potentially resulting in a systematic underestimation of true prevalence in very low transmission villages (and overestimation of the detected fraction) if analysis approach i is used. Approach ii is less affected by this bias.
For screening sessions with coverage > 100%, the model relies on an assumption that the entire population of the village was screened, and that any other persons screened come from neighbouring villages. While this occurred rarely in Adjumani and Arua-Yumbe, in Kiri about half of screening sessions attracted a population greater than that of the village; results for Kiri should thus be considered somewhat less robust.
The association between coverage and observed prevalence was adjusted for all available confounding variables, but these (screening round, population size, project, past incidence) were few, and additional hidden confounding may be present: villages with low coverage may differ systematically from high-coverage ones in other key determinants of prevalence, such as exposure to vectors; low coverage might also be a proxy for remoteness and low security, which could be associated with higher prevalence.