Prevalence and under-detection of gambiense human African trypanosomiasis during mass screening sessions in Uganda and Sudan

  • Francesco Checchi1Email author,

    Affiliated with

    • Andrew P Cox2,

      Affiliated with

      • François Chappuis3, 4,

        Affiliated with

        • Gerardo Priotto5,

          Affiliated with

          • Daniel Chandramohan1 and

            Affiliated with

            • Daniel T Haydon6

              Affiliated with

              Parasites & Vectors20125:157

              DOI: 10.1186/1756-3305-5-157

              Received: 1 August 2011

              Accepted: 1 August 2012

              Published: 7 August 2012



              Active case detection through mass community screening is a major control strategy against human African trypanosomiasis (HAT, sleeping sickness) caused by T. brucei gambiense. However, its impact can be limited by incomplete attendance at screening sessions (screening coverage) and diagnostic inaccuracy.


              We developed a model-based approach to estimate the true prevalence and the fraction of cases detected during mass screening, based on observed prevalence, and adjusting for incomplete screening coverage and inaccuracy of diagnostic algorithms for screening, confirmation and HAT stage classification. We applied the model to data from three Médecins Sans Frontières projects in Uganda (Adjumani, Arua-Yumbe) and Southern Sudan (Kiri).


              We analysed 604 screening sessions, targeting about 710 000 people. Cases were about twice as likely to attend screening as non-cases, with no apparent difference by stage. Past incidence, population size and repeat screening rounds were strongly associated with observed prevalence. The estimated true prevalence was 0.46% to 0.90% in Kiri depending on the analysis approach, compared to an observed prevalence of 0.45%; 0.59% to 0.87% in Adjumani, compared to 0.92%; and 0.18% to 0.24% in Arua-Yumbe, compared to 0.21%. The true ratio of stage 1 to stage 2 cases was around two-three times higher than that observed, due to stage misclassification. The estimated detected fraction was between 42.2% and 84.0% in Kiri, 52.5% to 79.9% in Adjumani and 59.3% to 88.0% in Arua-Yumbe.


              In these well-resourced projects, a moderate to high fraction of cases appeared to be detected through mass screening. True prevalence differed little from observed prevalence for monitoring purposes. We discuss some limitations to our model that illustrate several difficulties of estimating the unseen burden of neglected tropical diseases.


              Trypanosomiasis Gambiense Sleeping sickness Case detection Screening Coverage Prevalence Uganda Sudan Mathematical model


              Human African trypanosomiasis (HAT, sleeping sickness) due to Trypanosoma brucei gambiense is a neglected, tsetse-fly borne parasitic disease that affects mainly remote and crisis-affected populations of sub-Saharan Africa [1]. Disease begins in a mildly symptomatic, haemo-lymphatic stage (stage 1) and within about 1–2 years progresses to the meningo-encephalitic stage 2, which is fatal unless treated and can leave sequelae [2, 3].

              Active case detection has been a mainstay intervention to control HAT since the 1920s [4]. It consists of cross-sectional mass screenings, whereby entire communities (usually villages or urban neighbourhoods) are targeted for testing. The screening test is usually the Card Agglutination Test for Trypanosomiasis (CATT), though palpation of lymph nodes in the neck is also often performed (enlarged lymph nodes are a prominent sign of HAT). The confirmation and staging components of the complex diagnostic algorithm [5] are carried out either on site or at a fixed HAT treatment centre, depending on proximity and ease of patient transport. Staging and treatment are often done at the treatment centre, but stage 1 cases are increasingly treated at the community level.

              Active case detection prevents disease progression to stage 2 through early treatment irrespective of symptoms; reduces mortality of stage 2 cases; decreases transmission intensity by reducing the infectious pool (humans are thought to be the main ecological reservoir [1]); creates community awareness; and generates an estimate of infection prevalence, the key indicator of HAT burden. Mass screening is empirically associated with reduction in transmission in various settings [68], and its decline in the post-colonial era is heavily implicated in the resurgence of HAT in the 1980s and 1990s [911].

              Active case detection may be indispensible for HAT elimination [6, 12]. However, attendance at screening sessions is often low, and diagnostic sensitivity is imperfect [13], limiting its impact. Furthermore, false positives due to imperfect specificity confound prevalence estimates. Here, we use modelling to estimate the fraction of cases detected during mass screening (henceforth referred to as the detected fraction) and the true infection prevalence based on data from three Médecins Sans Frontières (MSF) projects in Uganda and Southern Sudan. Estimates of the detected fraction and true prevalence are critical for evaluating the true impact of control programmes and measuring the unseen burden of this neglected tropical disease.


              Data sources

              We assembled aggregate data from screening sessions conducted in the Kiri (Kajo-Keji county, Southern Sudan), Adjumani and Arua-Yumbe (north west Uganda) MSF projects, previously described [1417]. Data include village population size (estimated through census by home visitors), numbers screened and cases detected by stage. We excluded sessions that yielded zero prevalence in villages where no cases were detected throughout the project duration. The study was approved by the Ethics Committee of the London School of Hygiene and Tropical Medicine.

              Conceptual framework

              Model states and parameters are listed in Table 1. Let screening coverage c be the number of people screened divided by the total village population N; detected fraction the number of truly positive stage 1 or stage 2 cases detected (S1,TP, S2,TP) out of all cases prevalent (S1, S2); and observed prevalence the number of cases diagnosed (including false positives) in either stage (S1,TP + S1,FP, S2,TP + S2,FP), divided by the number of people screened (cN).
              Table 1

              Model parameters





              Village population size




              Screening coverage (%)




              Relative probability of attending screening (cases versus non-cases)



              Estimate (95% percentiles)

              Prediction of step 1 of model. Random values for each iteration sampled from squared deviance distributions of ρ estimates.


              1.6 (0.7-12.8)


              2.5 (1.2-36.6)


              1.9 (0.9-4.0)

              Probability that the next person screened is S1 or S2

              pS1, pS2

              from 0 to 1

              Updated after each ith person screened. See Equations 4 and 5.

              Ratio of observed prevalence at coverage c to observed prevalence at coverage = 100%.


              Computed for various values of c, and for each MSF project as a whole.

              Data and model predictions. See Equation 1 and text.

              Diagnostic accuracy



              Mode (range)

              Random values sampled from the likelihood distributions generated by Checchi et al. [13] based on a probabilistic decision model (one random value generated for each iteration). Values for the new Kiri algorithm apply to all screenings conducted since March 2005 (n = 17).

              Diagnostic sensitivity in stage 1 (%)


              Kiri (old)

              98.0 (83.1-99.5)

              Kiri (new)

              57.4 (41.2-78.2)


              97.9 (74.1-99.2)


              96.5 (74.6-98.8)

              Diagnostic sensitivity in stage 2 (%)


              Kiri (old)

              98.0 (83.5-99.6)

              Kiri (new)

              67.5 (53.6-84.0)


              97.5 (75.1-99.4)


              97.7 (75.0-99.3)

              Diagnostic specificity (%)


              Kiri (old)

              100.0 (99.8- 100.0)

              Kiri (new)

              100.0 (99.95-100.0)


              100.0 (99.8-100.0)


              100.0 (99.8-100.0)

              Probability of being correctly classified into stage 1 (%)


              Kiri (old)

              67.7 (38.5-86.8)

              Kiri (new)

              66.0 (39.0-87.2)


              70.4 (39.1-88.6)


              66.1 (39.2-88.5)

              Probability of being correctly classified into stage 2 (%)


              Kiri (old)

              94.7 (82.1-98.6)

              Kiri (new)

              95.1 (81.4-98.4)


              94.0 (78.7-98.2)


              93.1 (78.7-98.2)

              Probability that a false positive case will be classified into stage 1 (%)


              Kiri (old)


              Based on the algorithms used in these projects, false positives can only be classified as stage 2 [13].

              Kiri (new)






              Binary dummy variables


              0 or 1

              Denote occurrence of event in a given individual.

              We hypothesized that the relative probability ρ of attending screening during a session is higher for cases than for non-cases. Accordingly, as screening coverage decreases, the selection bias favouring cases should increase, yielding a higher observed prevalence at coverage c (for c < 1), compared to the prevalence measurable if c = 1. We can thus define a coverage-dependent ratio of observed prevalence for any screening coverage < 1, compared to observed prevalence when everyone is screened:
              β c = S o b s , c c N c < 1 S o b s , c c N c = 1

              Under this hypothesis, βc should increase exponentially as screening coverage decreases.

              In addition, observed prevalence is biased upward by false positive tests (incomplete diagnostic specificity), and downward by false negatives (incomplete sensitivity), while the number of stage 1 and stage 2 cases is biased by stage misclassification (Figure 1).
              Figure 1

              Illustration of the relationship between true and observed prevalence during mass screening.

              In this paper we develop a static, stochastic mathematical model to predict the relationship between observed prevalence and true prevalence given a specific relative probability ρ of attending a screening session among cases compared to non-cases, which is a parameter we can estimate from field data. This model enabled us to estimate true prevalence and therefore the detected fraction. The different steps in the implementation of the model are outlined in Table 2, and described below.
              Table 2

              Steps in the implementation of the model


              Step 1

              Step 2


              Estimate ρ (relative probability of attending screening among cases versus non-cases)

              Estimate the true prevalence and the detected fraction

              Geographical resolution

              Each MSF project

              Each screening session (results then totalled over each project)

              Model inputs

              Project-specific diagnostic accuracy parameters

              Diagnostic accuracy parameters


              N = 10 000, S1 = Uniform [1–50] and S2 = Uniform [1–50] (hypothetical values)

              Observed N, c, S1,obs and S2,obs for the screening session


              Observed βc (ratio of observed prevalence at coverage c to observed prevalence at coverage = 100%) for four coverage strata (5-24%, 25-44%, 45-64% and 65-84%)

              ρ values estimated in Step 1 for each MSF project, sampled from their deviance distribution


              Observed c values sampled from within each coverage stratum and for each project

              Various candidate sets of S1 and S2 (true prevalent cases)


              Various candidate ρ values


              Model predicted outputs

              βc for the same coverage strata (5-24%, 25-44%, 45-64% and 65-84%)

              Number of observed cases (S1,pred and S2,pred)


              Number of true positive cases among those observed (S1,TP,pred and S2,TP,pred)

              Number of iterations

              10 000 for each project and for each candidate ρ value

              10 000 for each screening session and for each candidate set of S1 and S2

              Fitting procedure

              Predictions fitted against observed βc for the same coverage strata.

              Predictions fitted against actual observed cases in screening session (S1,obs and S2,obs).


              Observed βc estimated based on a statistical model of field data.

              S1 and S2 candidate sets resulting in best-fitting S1,pred and S2,pred adopted as maximum likelihood estimates of true prevalence. Joint likelihood distribution informs confidence intervals.


              Candidate ρ value resulting in best-fitting βc adopted as point estimate of ρ. Confidence interval based on squared deviance distribution.


              Description of the mathematical model

              The model predicts the number of stage 1 and stage 2 observed cases (S1,obs and S2,obs) and the true cases among these (S1,TP and S2,TP), based on a set of input parameters, including village population N, true number of prevalent cases S1 and S2, screening coverage c, relative risk of attending screening among cases versus non-cases ρ, and accuracy (sensitivity, specificity, probabilities of correct stage 1 and 2 classification) of the diagnostic algorithm, as estimated in previous work [13].

              Because the number of prevalent cases in a village is often very small and in order to incorporate uncertainty in several parameters, the model was implemented stochastically. Accordingly, individuals in the population have a given probability of experiencing certain events (e.g. attending screening, being detected if positive); chance determines whether the event occurs. The stochastic variation is then examined over a large number of iterations of the model: best estimates and confidence intervals are generated from the distribution of predicted values. Furthermore, during each iteration fresh random values of certain parameters (e.g. diagnostic accuracy) are drawn from their distributions.

              Cases and non-cases screened

              The model firstly predicts the number of cases and non-cases screened. If coverage = 1, everyone is screened. If coverage < 1, the situation is akin to sampling without replacement, with sample size = people screened (cN). The probabilities that the ith person screened will be a stage 1 case, stage 2 case or non-case are the product of ρ and the relative proportions of each type of patient in the remaining unscreened population, which change and thus must be updated after each person is screened. Accordingly, the number of cases predicted to be screened over the entire screening session is computed as follows:
              S 1 , s c , p r e d = i = 1 c N δ 1 , s c , i , w h e r e δ 1 , s c , i = { 1 , 0 , U n i f o r m 0 , 1 p S 1 , i U n i f o r m 0 , 1 > p S 1 , i
              S 2 , s c , p r e d = i = 1 c N δ 2 , s c , i , w h e r e δ 2 , s c , i = { 1 , 0 , U n i f o r m 0 , 1 p S 2 , i U n i f o r m 0 , 1 > p S 2 , i
              In the above equations, random numbers between 0 and 1 are sampled from a uniform distribution to determine whether an event occurs. The probabilities that the next person screened is a stage 1 or stage 2 case are, respectively:
              p S 1 , i = ρ S 1 j = 1 i 1 δ 1 , s c , j ρ S 1 j = 1 i 1 δ 1 , s c , j + ρ S 2 j = 1 i 1 δ 2 , s c , j + N i 1 j = 1 i 1 δ 1 , s c , j j = 1 i 1 δ 2 , s c , j
              p S 2 , i = ρ S 2 j = 1 i 1 δ 2 , s c , j ρ S 1 j = 1 i 1 δ 1 , s c , j + ρ S 2 j = 1 i 1 δ 2 , s c , j + N i 1 j = 1 i 1 δ 1 , s c , j j = 1 i 1 δ 2 , s c , j
              The number of predicted non-cases screened is the total sample cN minus cases screened:
              H s c , p r e d = c N S 1 , s c , p r e d S 2 , s c , p r e d

              In cases where c > 1 (as can occur if people from surrounding villages also attend the screening session), we assumed that the entire village population was screened, i.e. c = 1 for the village in question; additional persons screened from outside the village are ignored in the model, as they do not contribute to the prevalence pool (and thus the detected fraction) within the village in question. MSF datasets specify the origin of cases detected and only cases from the village screened were considered in our analysis. However, when computing observed prevalence, all persons screened (including those from outside the village) were considered in the denominator, as MSF data do not contain the origin of persons screened. In both Uganda and Sudan projects, observed prevalence was also calculated in this way.

              True cases detected

              The number of true cases detected among those screened is given by the binomial probability of detection conditional on being screened (diagnostic sensitivity σ), applied to each case screened:
              S 1 , T P , p r e d = B i n S 1 , s c , p r e d , σ 1
              S 2 , T P , p r e d = B i n S 2 , s c , p r e d , σ 2
              However, some cases detected are misclassified in the wrong stage:
              S 1 , T P , m i s , p r e d = B i n S 1 , T P , p r e d , 1 σ 1 *
              S 2 , T P , m i s , p r e d = B i n S 2 , T P , p r e d , 1 σ 2 *

              False positive cases

              Out of non-cases screened, some are classified as false positives due to imperfect specificity:
              S F P , p r e d = B i n H s c , p r e d , 1 φ
              For completeness, we note that some false positives may be classified as stage 1, based on the relative proportion ω of stage 1 s among all false positives, which is highly dependent on the diagnostic algorithm being used:
              S 1 , F P , p r e d = B i n S F P , p r e d , ω
              All other false positives are classified as stage 2:
              S 2 , F P , p r e d = S F P , p r e d S 1 , F P , p r e d

              In practice, ω was estimated at zero in the MSF projects we analysed [13].

              Predicted observed prevalence

              The predicted numbers of cases observed include true and false positives, with some stage misclassification:
              S 1 , o b s , p r e d = S 1 , T P , p r e d S 1 , T P , m i s , p r e d + S 2 , T P , m i s , p r e d + S 1 , F P , p r e d
              S 2 , o b s , p r e d = S 2 , T P , p r e d S 2 , T P , m i s , p r e d + S 1 , T P , m i s , p r e d + S 2 , F P , p r e d
              S o b s , p r e d = S 1 , o b s , p r e d + S 2 , o b s , p r e d

              The model's predictions can be plugged into Equation 1 so as to predict βc for any screening coverage level, compared to 100% coverage.

              Step 1: Estimation of the relative probability of attending screening (ρ)

              Estimation of observed to true prevalence ratio (βc) based on field data

              We estimated the actual βc within each MSF project and for four screening coverage strata (5-24%, 25-44%, 45-64% and 65-84%), compared to coverage 85-115% as the reference stratum (while this reference stratum should theoretically consist only of screening sessions with c = 100%, in practice very few screening sessions achieved exactly this coverage, and we therefore adopted a wider range assuming that it was practically equivalent to 100%). We estimated βc based on screening data and a statistical model of the association between screening coverage and observed prevalence.

              As observed prevalence distributions featured an excess of zeroes and were over-dispersed, a hurdle model [18, 19] was used to estimate βc, consisting of (i) a first complementary log-log binomial component that models the probability of a non-zero prevalence, and (ii) a second negative binomial component (offset by the natural log of the number of people screened) that models the probability of a given discrete number of cases, conditional on prevalence being non-zero (i.e. on the first “hurdle” of zero having been crossed). This model provided a good fit to the data (results not shown).

              In addition to screening coverage, all potential confounding variables available from the data (screening round [first versus subsequent], village population size, observed incidence rate in the six months before the mass screening, and project) were included in the hurdle model. Coefficient standard errors were adjusted for clustering due to repeated screening sessions within individual villages (to do this, "village" was set as the cluster variable).

              So as to verify whether ρ differs in stage 2 versus stage 1 cases, we also stratified the hurdle model by stage, and modelled the association between screening coverage and the proportion of stage 2 diagnoses using an alternative group logit regression. Both these analyses (omitted for brevity) suggested no significant difference in βc according to stage; we thus assumed that ρ is equal for stage 1 and stage 2.

              Estimation of ρ for each MSF project

              We implemented the stochastic model described above to predict βc for various coverage values and for each MSF project, as a function of different values of ρ. For each candidate value of ρ in a large plausible range, we examined the distribution of βc generated from 10 000 runs of the stochastic model, and adopted the value of ρ that generated predicted values of βc that best fit those estimated for each site from the available data, i.e. the hurdle model. The value of ρ yielding the best fitting value of βc was selected by minimizing the squared deviation of the predicted βc compared to the actual βc, with actual values sampled from the uncertainty distribution provided by the coefficients of the hurdle model (Table 1). The model was run using the diagnostic accuracy parameters specific to each project, sampled from their uncertainty distributions as computed in prior work, and input values of N = 10 000, S1 = Uniform [1–50] and S2 = Uniform [1–50] (the results were insensitive to input values of N, S1 and S2). The coverage values at which we predicted βc were also randomly selected from the distribution of screening session coverage values falling within each of the above coverage strata (5-24%, 25-44%, 45-64% and 65-84%).

              Step 2: Estimation of the number of true prevalent cases

              We next inputted into the model, for each screening session, the project-specific ρ estimates derived above, sampled from their uncertainty distribution; the actual values of N, c and diagnostic accuracy specific to the session; and candidate sets of S1 and S2 values (from 0 to N). For each screening session, we evaluated each set of S1 and S2 values over 10 000 iterations, by computing how frequently the set of values yielded perfect predictions of observed prevalence, i.e.
              S 1 , o b s , p r e d = S 1 , o b s , d a t a A N D S 2 , o b s , p r e d = S 2 , o b s , d a t a

              For each iteration that yielded a perfect fit, we also recorded the predicted true cases detected S1,TP,pred and S2,TP,pred if they did not exceed the total cases observed S 1 , T P , p r e d S 1 , o b s , p r e d A N D S 2 , T P , p r e d S 2 , o b s , p r e d, and those among these that were classified in the correct stage (S1, TP,pred – S1,TP,mis,pred and S2,TP,pred – S2,TP,mis,pred). The set of S1 and S2 most frequently fitting the data was adopted as the best estimate for that screening session. 95% confidence intervals were computed by the method of profiles applied to a two-dimensional joint likelihood distribution [20].

              Best estimates and uncertainty bounds for each project as a whole were computed by two alternative analysis approaches: (i) summing the best-fitting values of of S1 and S2 or S1,TP and S2,TP for each screening session over the project as a whole (no uncertainty bounds could be computed for this approach); and (ii) a bootstrapping routine, whereby we repeatedly sampled from the joint likelihood distributions of S1 and S2 or S1,TP and S2,TP for each screening session, totalled the randomly sampled values over all sessions in the project, and computed the median and 95% percentile interval of the resulting distribution of random project totals.

              STP/S is the detected fraction. We could not find a straightforward way to compute uncertainty bounds around this estimate, as it includes error from several sources arising from different statistical processes. However, we present alternative best estimates of detected fraction using either of the above estimation approaches.


              Description of mass screening data

              Screening output

              Altogether, 819 mass screening sessions took place in the three projects over the periods covered by the datasets used in this study. However, population data were missing for 203 sessions; 10 yielded zero prevalence in villages that also reported no cases throughout the project duration; and two had a coverage <5% and were assumed to be data entry errors. This left 604 sessions for the present analysis, performed in 246 villages (Table 3).
              Table 3

              Screening coverage of screening sessions included in the analysis, by project

              Coverage stratum (%)

              Kiri, Sudan (n = 142)

              Adjumani, Uganda (n = 320)

              Arua-Yumbe, Uganda (n = 142)


              1 (0.7)

              13 (4.1)

              2 (1.4)


              9 (6.3)

              26 (8.1)

              3 (2.1)


              5 (3.5)

              34 (10.6)

              5 (3.5)


              13 (9.2)

              38 (11.9)

              14 (9.9)


              9 (6.3)

              49 (15.3)

              16 (11.3)


              7 (4.9)

              42 (13.1)

              15 (10.6)


              8 (5.6)

              40 (12.5)

              18 (12.7)


              6 (4.2)

              38 (11.9)

              22 (15.5)


              7 (4.9)

              14 (4.4)

              23 (16.2)


              4 (2.8)

              12 (3.8)

              6 (4.2)


              31 (21.8)

              12 (3.8)

              16 (11.3)


              42 (29.6)

              2 (0.6)

              2 (1.4)

              Mean coverage% (IQR†)

              192.9 (51.7-231.0)

              58.7 (37.4-74.0)

              75.3 (52.5-89.5)

              Mean coverage% (IQR†) considering any coverage > 100% as = 100%

              77.9 (52.4-100.0)

              55.8 (37.6-73.9)

              70.6 (52.7-89.3)

              †Inter-quartile range.

              Screening coverage was highest in Kiri, where about half of screening sessions reported a coverage > 100%, suggesting people from neighbouring communities may have attended (Table 3). Overall, 714 898 people were targeted for screening (with 472 015 actually screened): 56 590 (49 551) in Kiri, 300 406 (158 954) in Adjumani, and 364 902 (263 510) in Arua-Yumbe. Cases diagnosed were 221 (114 in stage 1 or 51.6%) in Kiri, 1419 (692, 48.8%) in Adjumani, and 570 (327, 57.4%) in Arua-Yumbe.

              Exploration of factors associated with observed prevalence

              A hurdle model of factors associated with observed prevalence combining data from all projects (Table 4) suggested weak evidence of a trend in the association between screening coverage and occurrence of non-zero prevalence (log-log component): sessions with coverage <15% were about one third as likely to yield any HAT cases than sessions with coverage around 100%. The probability of non-zero prevalence also increased with village population size and previous observed incidence rate, but was lower in repeat screening rounds.
              Table 4

              Hurdle model exploring factors associated with observed HAT prevalence (all projects combined)


              Number of observations (number with non-zero prevalence)

              Log-log component: probability of non-zero prevalence

              Negative-binomial component: prevalence conditional on prevalence being non-zero

              Probability ratio (adjusted)


              Prevalence ratio (adjusted)


              Screening coverage (%)


              16 (5)




              β c



              38 (19)






              44 (31)






              65 (37)






              74 (46)






              64 (46)






              66 (47)






              66 (46)






              44 (27)






              22 (16)







              59 (33)






              46 (25)





              Screening round

              first round

              246 (176)





              subsequent rounds

              358 (202)





              Village population size


              38 (12)






              141 (71)






              166 (111)






              259 (184)





              Observed incidence rate in the past 6 months (cases per 1000 person-months)


              239 (100)






              263 (201)






              86 (63)






              16 (14)







              320 (215)






              142 (104)






              142 (59)






              p (goodness of fit): <0.0001

              p (goodness of fit): <0.0001

              † Test for trend p < 0.001.

              Among screenings that yielded non-zero prevalence (negative binomial component), there was also evidence of a trend in the association of screening coverage and prevalence, with βc increasing as a function of decreasing coverage, as hypothesized. Prevalence increased with previous incidence, but repeat screening rounds were associated with lower prevalence. Unlike in the log-log component, prevalence decreased with increasing population size (see Discussion). There was no evidence of interactions in either model component (data not shown).

              Estimates of the detected fraction

              Estimated relative risk ρ of attending screening

              Table 5 shows adjusted estimates of βc based on a hurdle model of field data for each project, used in further steps of the analysis to βc. The fit of estimated ρ values was good (Figure 2). The best estimates of ρ were 1.6 (95%CI 0.7-12.8) for Kiri, 2.5 (1.2-36.6) for Adjumani and 1.9 (0.9-4.0) for Arua-Yumbe, suggesting a consistent pattern across sites. These ρ estimates yielded βc values that provided a good fit to the βc values estimated from field data.
              Table 5

              Adjusted estimates of β c (ratio of observed prevalence at coverage c to observed prevalence at coverage = 100%) for each project, by screening coverage stratum


              Screening coverage stratum (%)





              85-115 (ref.)













              1.64 (0.57-4.70)


              1.35 (0.63-2.90)


              1.35 (0.50-3.62)


              1.22 (0.64-2.35)


              1 [ref.]



              2.76 (1.72-4.43)


              1.50 (0.93-2.41)


              1.41 (0.93-2.13)


              1.05 (0.67-1.66)


              1 [ref.]



              1.81 (1.28-2.55)


              1.25 (0.63-2.49)


              1.47 (1.12-1.93)


              1.02 (0.73-1.43)


              1 [ref.]

              †Number in category.

              Quantities in parentheses indicate 95% confidence intervals.

              Figure 2

              Predicted versus observed β c (ratio of observed prevalence at coverage c to observed prevalence at coverage = 100%) values, by project, using the best estimate of ρ (relative probability of attending screening among cases versus non-cases). Vertical bars indicate 95% confidence intervals.

              Estimated true prevalence and detected fraction

              The estimated true prevalence using the best-fitting estimates from each session (approach i) was very similar to that observed (Table 6). True prevalence using bootstrapping estimates (approach ii) was almost equal to that observed in Adjumani and Arua-Yumbe, but was about double the observed in Kiri, though still below 1% in absolute terms; the proportion of stage 1 cases was estimated to be higher in reality than that observed, as expected due to the adjustment for stage misclassification, and the fact that most false positives would have been diagnosed as stage 2 (Table 6): observed stage-specific prevalence differed from the true prevalence accordingly.
              Table 6

              Estimated true number of cases and prevalence, by stage, project and overall


              Estimated number of cases (95% confidence interval)

              Prevalence in% (95% confidence interval)



              True cases among observed

              True cases overall




              stage 1


              135, 143 (127–158)

              177, 315 (255–388)


              0.31, 0.56 (0.45-0.69)

              stage 2


              86, 71 (55–86)

              86, 189 (145–257)


              0.15, 0.33 (0.26-0.45)



              221, 214 (207–219)

              263, 507 (429–608)


              0.46, 0.90 (0.76-1.07)


              stage 1


              868, 913 (863–963)

              1129, 1628 (1485–1775)


              0.38, 0.54 (0.49-0.59)

              stage 2


              551, 463 (410–513)

              648, 993 (872–1128)


              0.22, 0.33 (0.29-0.38)



              1419, 1375 (1360–1389)

              1777, 2618 (2436–2811)


              0.59, 0.87 (0.81-0.94)


              stage 1


              404, 392 (366–417)

              495, 624 (564–693)


              0.14, 0.17 (0.15-0.19)

              stage 2


              166, 135 (109–162)

              153, 262 (214–321)


              0.04, 0.07 (0.06-0.09)



              570, 527 (510–540)

              648, 888 (816–974)


              0.18, 0.24 (0.22-0.27)

              †Observed cases divided by the total population actually screened. ‡Estimated cases divided by the total population targeted for screening.

              Estimated figures indicate, respectively, sum of best-fitting values for each screening session, median of bootstrapping replicate samples (95% percentile of bootstrapping samples).

              Overall, the estimated detected fraction was relatively high everywhere using analysis approach i, i.e. taking the best-fitting estimates from each screening session (84.0% [221/263] in Kiri, 79.9% [1419/1777] in Adjumani and 88.0% [570/648] in Arua-Yumbe), but much lower (42.2% [214/507] in Kiri, 52.5% [1375/2618] in Adjumani and 59.3% [527/888] in Arua-Yumbe) using approach ii, i.e. taking median estimates from bootstrapping (see Discussion). When considering only cases detected and correctly staged, these percentages declined to 68.1% (179/263), 60.4% (1074/1777) and 61.9% (401/648) for approach i, and 33.1% (168/507), 39.9% (1045/2617) and 47.4% (421/888) for approach ii.


              This study outlines a potential method to estimate the extent of under-detection and the true infection burden of gambiense HAT, based only on observed data. Because of the extent of uncertainty as regards model parameters, estimates of detected fraction are quite imprecise, but suggest that between 20-50% of prevalent cases were not detected in the screening sessions analysed. There appears to be no appreciable difference between observed and true prevalence. However, adjustment for incomplete specificity and stage misclassification suggests a higher ratio of stage 1 to stage 2 than that observed by programmes.

              Interpretation of findings

              Internal validity of findings

              The hurdle model is internally consistent: with the exception of population size (see below), associations of explanatory variables and prevalence in the log-log component are mirrored in the negative-binomial component.

              Furthermore, the log-log component supports the hypothesis of ρ > 1. If ρ = 1, the probability of a village featuring a non-zero observed prevalence should be linearly proportional to screening coverage. However, this probability is higher than expected based on coverage alone, consistent with self-selection of cases even at low coverage.

              While increasing village population size was associated with a higher probability of non-zero prevalence, prevalence among non-zero screenings appeared to decrease with higher population. This apparently inconsistent finding may be explained as follows: (i) in fact, the probability of non-zero prevalence increases less than proportionately with increasing population size, meaning that, on a per capita basis, it is lower in large villages than small ones; (ii) in smaller communities, there may be a greater risk of chance extinction of transmission, and thus a greater frequency of zero prevalence, all else being equal; (iii) if cases are present in a small village, their very small number, not divisible below discrete units, affects the prevalence calculation (e.g. if two villages A and B both have one prevalent case, but A’s population is 100 and B’s 1000, the prevalence will be ten times higher in A); (iv) larger communities are usually administrative and economic centres, and attract infected migrants from rural areas; (v) village population size may not reflect the actual denominator at risk: it is likely that only a fraction of the population has a livelihood-dependent exposure to tsetse [21, 22], and that this fraction may be smaller in larger, less rural communities where many people are engaged in trade or services: in other words, when considering the true population at risk, denominators might be more comparable across differently sized villages than it appears.


              Overall, this study estimates that about 20-50% of prevalent cases potentially detectable fell through the net of active case detection, and that about a fourth of cases detected were not classified in the correct stage (however, most misclassification would be from stage 1 to stage 2, which would still guarantee effective treatment). Our model did not incorporate the final step of treatment, as our question concerned case detection specifically; furthermore, the MSF projects used a variety of regimens, including second-line regimens for patients with treatment failure. In national programmes without strong funding and technical support, screening coverage could be lower, and our findings thus reflect an optimistic scenario. In the Democratic Republic of Congo (DRC), the estimated detected fraction (including treatment) was <50% in most scenarios, and between 30% and 65% attended and were correctly diagnosed [23]. Screening coverage was 22-98% in other DRC sites (average 70-80%) [7, 23], 47-93% in Equatorial Guinea [24], and 70-94% in Angola [25].

              In the colonial era, HAT active case detection was successful due to largely coercive measures. Few recent studies discussing the barriers to and facilitators of screening attendance have been published. In the Republic of Congo, villagers reported that biomedical medicine was the main remedy against HAT, and did not trust traditional remedies [26]. In the DRC, communities’ knowledge of HAT and its control was very good, but concern with drug toxicity and the stigma of public HAT diagnosis were prominent barriers [27]. Both studies found that cost of treatment was a barrier to service uptake; while MSF projects offered free testing and treatment, patients and families face transport costs, income lost, etc. In both Congo [28] and DRC, stage 2 HAT was often associated with sorcery, especially when the case was fatal: however, there was no evidence that this kept patients from seeking care. In the Ugandan sites we analysed, traditional healers were often a recourse, and working with these providers and communities was suggested as a way to improve screening attendance [29].

              Other findings

              In communities where a non-zero incidence was observed in the six months prior to the mass screening, there was a doubled probability of finding at least one case during active screening. Furthermore, past incidence was associated with observed prevalence.

              There was no evidence that cases in stage 2 have a greater probability of attending mass screening than those in stage 1. This observation is somewhat unexpected: stage 2 cases, being more symptomatic, might be expected to have a greater probability of attending screenings. This finding, however, may not apply to passive case detection. Furthermore, early stage 2 cases may in fact be less prone to present with systemic symptoms like fever, pruritus or arthralgia than stage 1 cases [30].

              Programmatic implications

              While the uncertainty around the estimates of detected fraction (see below) hampers meaningful interpretation, it is clear that a considerable proportion of HAT infections remain undetected even in a well-resourced active case finding context. These cases would then go on to seed renewed epidemics once mass screening is scaled down, and, where no passive case detection is available, would probably die. Long-term control of HAT through mass screening thus probably requires very high screening coverage, underscoring the need for programmes to work closely with communities to ensure high acceptance and uptake and identify and address barriers to screening attendance. This is also justifiable from an economic standpoint, given that the costs of mass screening are mainly fixed (e.g. transport, human resources, information campaigns, programme overheads) rather than variable (i.e. per person screened).

              This study suggests that, for purposes of assessing HAT burden and monitoring trends, calculating the observed prevalence based on detected cases and the number of people screened provides a reasonable approximation to the true prevalence. Furthermore, programmes should continue to use the observed incidence in different communities (as computed based on passive case detection, where available) as a guide for deciding where to focus mass screening efforts.

              Study limitations

              Estimates were subject to considerable uncertainty, which hampers interpretation of the key findings on detected fraction. The striking differences according to analysis approach are due to the very skewed likelihood distributions arising from the fitting procedures (data not shown): reporting the mode (best-fitting values) or median of these distributions changes the inference considerably. For completeness, we have chosen to report both, and suggest that reality lies somewhere in between. Furthermore, the model does not adequately deal with screening sessions featuring zero observed prevalence (37% of sessions analysed). If screening coverage is < 100%, various possible sets of S1 and S2 values could result in S1,obs = 0 and S2,obs = 0; in most scenarios, however, the set [S1 = 0, S2 = 0], i.e. zero prevalence, will by default yield the best fits and will thus be adopted as the best estimate, potentially resulting in a systematic underestimation of true prevalence in very low transmission villages (and overestimation of the detected fraction) if analysis approach i is used. Approach ii is less affected by this bias.

              For screening sessions with coverage > 100%, the model relies on an assumption that the entire population of the village was screened, and that any other persons screened come from neighbouring villages. While this occurred rarely in Adjumani and Arua-Yumbe, in Kiri about half of screening sessions attracted a population greater than that of the village; results for Kiri should thus be considered somewhat less robust.

              The association between coverage and observed prevalence was adjusted for all available confounding variables, but these (screening round, population size, project, past incidence) were few, and additional hidden confounding may be present: villages with low coverage may differ systematically from high-coverage ones in other key determinants of prevalence, such as exposure to vectors; low coverage might also be a proxy for remoteness and low security, which could be associated with higher prevalence.


              The fraction of HAT cases detected during active screening may be relatively high in well-resourced control programmes, providing a considerable immediate public health benefit. However, the minority of cases that remain undetected may play a critical epidemiological role in sustaining transmission.

              The limitations of this study illustrate multiple difficulties in estimating the unseen burden of neglected tropical diseases in settings with low access to health care and limited availability of data. Our modelling approach may be useful for improved HAT burden estimation and programme evaluation, but needs to be improved.

              Determinants of under-detection should also be researched further using both quantitative and qualitative tools, so as to maximise the future impact of this control strategy.



              We are grateful to the MSF and national sleeping sickness programme staff, too numerous to mention, who collected data used in this study, and to two anonymous reviewers for helpful suggestions.

              Authors’ Affiliations

              Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine
              Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine
              Médecins Sans Frontières
              Geneva University Hospitals & University of Geneva
              College of Medical, Veterinary and Life Sciences, University of Glasgow


              1. Brun R, Blum J, Chappuis F, Burri C: Human African trypanosomiasis. Lancet. 2010, 375 (9709): 148-159. 10.1016/S0140-6736(09)60829-1.View ArticlePubMed
              2. Aroke AH, Asonganyi T, Mbonda E: Influence of a past history of Gambian sleeping sickness on physical growth, sexual maturity and academic performance of children in Fontem, Cameroon. Ann Trop Med Parasitol. 1998, 92 (8): 829-835. 10.1080/00034989858862.View ArticlePubMed
              3. Cramet R: [Sleeping sickness in children and its long term after-effects. Apropos 110 personal observations at Fontem Hospital (Cameroon)]. Med Trop. 1982, 42 (1): 27-31. Mars
              4. Steverding D: The history of African trypanosomiasis. Parasit Vectors. 2008, 1 (1): 3. 10.1186/1756-3305-1-3.PubMed CentralView ArticlePubMed
              5. Chappuis F, Loutan L, Simarro P, Lejon V, Buscher P: Options for field diagnosis of human african trypanosomiasis. Clin Microbiol Rev. 2005, 18 (1): 133-146. 10.1128/CMR.18.1.133-146.2005.PubMed CentralView ArticlePubMed
              6. Abel PM, Kiala G, Loa V, Behrend M, Musolf J, Fleischmann H, Theophile J, Krishna S, Stich A: Retaking sleeping sickness control in Angola. Trop Med Int Health. 2004, 9 (1): 141-148. 10.1046/j.1365-3156.2003.01152.x.View ArticlePubMed
              7. Lutumba P, Robays J, Miaka mia Bilenge C, Mesu VK, Molisho D, Declercq J, Van der Veken W, Meheus F, Jannin J, Boelaert M: Trypanosomiasis control, Democratic Republic of Congo, 1993–2003. Emerg Infect Dis. 2005, 11 (9): 1382-1388. 10.3201/eid1109.041020.PubMed CentralView ArticlePubMed
              8. Simarro PP, Jannin J, Cattand P: Eliminating human african trypanosomiasis: where do we stand and what comes next. PLoS Med. 2008, 5 (2): e55. 10.1371/journal.pmed.0050055.PubMed CentralView ArticlePubMed
              9. Louis FJ, Simarro PP, Lucas P: Sleeping sickness: one hundred years of control strategy evolution. Bull Soc Pathol Exot. 2002, 95 (5): 331-336.PubMed
              10. Ekwanzala M, Pepin J, Khonde N, Molisho S, Bruneel H, De Wals P: In the heart of darkness: sleeping sickness in Zaire. Lancet. 1996, 348 (9039): 1427-1430. 10.1016/S0140-6736(96)06088-6.View ArticlePubMed
              11. Moore A, Richer M: Re-emergence of epidemic sleeping sickness in southern Sudan. Trop Med Int Health. 2001, 6 (5): 342-347. 10.1046/j.1365-3156.2001.00714.x.View ArticlePubMed
              12. Artzrouni M, Gouteux JP: Control strategies for sleeping sickness in Central Africa: a model-based approach. Trop Med Int Health. 1996, 1 (6): 753-764.View ArticlePubMed
              13. Checchi F, Chappuis F, Karunakara U, Priotto G, Chandramohan D: Accuracy of five algorithms to diagnose gambiense human african trypanosomiasis. PLoS Negl Trop Dis. 2011, 5 (7): e1233. 10.1371/journal.pntd.0001233.PubMed CentralView ArticlePubMed
              14. Chappuis F, Stivanello E, Adams K, Kidane S, Pittet A, Bovier PA: Card agglutination test for trypanosomiasis (CATT) end-dilution titer and cerebrospinal fluid cell count as predictors of human African Trypanosomiasis (Trypanosoma brucei gambiense) among serologically suspected individuals in southern Sudan. AmJTrop Med Hyg. 2004, 71 (3): 313-317.
              15. Chappuis F, Udayraj N, Stietenroth K, Meussen A, Bovier PA: Eflornithine is safer than melarsoprol for the treatment of second-stage Trypanosoma brucei gambiense human African trypanosomiasis. Clin Infect Dis. 2005, 41 (5): 748-751. 10.1086/432576.View ArticlePubMed
              16. Paquet C, Castilla J, Mbulamberi D, Beaulieu MF, Gastellu Etchegorry MG, Moren A: [Trypanosomiasis from Trypanosoma brucei gambiense in the center of north-west Uganda. Evaluation of 5 years of control (1987–1991)]. Bull Soc Pathol Exot. 1995, 88 (1): 38-41.PubMed
              17. Priotto G, Kaboyo W: Final evaluation of the MSF-France trypanosomiasis control programme in West Nile, Uganda. 2002, Epicentre, Paris
              18. McDowell A: From the help desk: hurdle models. Stata J. 2003, 3 (2): 178-184.
              19. Cameron CA, Trivedi PK: Regression analysis of count data. 1998, Cambridge University Press, CambridgeView Article
              20. Bolker B: Ecological Models and Data in R. 2008, Princeton University Press, Princeton, NJ
              21. Fournet F, Kone A, Traore S, Hervouet JP: Heterogeneity in the risk of sleeping sickness in coffee and cocoa commercial plantations in Ivory Coast. Med Vet Entomol. 1999, 13 (3): 333-335. 10.1046/j.1365-2915.1999.00164.x.View ArticlePubMed
              22. Meda AH, Laveissiere C, De Muynck A, Doua F, Diallo PB: Risk factors for human African trypanosomiasis in the endemic foci of Ivory Coast. Med Trop (Mars). 1993, 53 (1): 83-92.
              23. Robays J, Bilengue MM, Van der Stuyft P, Boelaert M: The effectiveness of active population screening and treatment for sleeping sickness control in the Democratic Republic of Congo. Trop Med Int Health. 2004, 9 (5): 542-550. 10.1111/j.1365-3156.2004.01240.x.View ArticlePubMed
              24. Simarro PP, Franco JR, Ndongo P, Nguema E, Louis FJ, Jannin J: The elimination of Trypanosoma brucei gambiense sleeping sickness in the focus of Luba, Bioko Island, Equatorial Guinea. Trop Med Int Health. 2006, 11 (5): 636-646. 10.1111/j.1365-3156.2006.01624.x.View ArticlePubMed
              25. Ruiz JA, Simarro PP, Josenando T: Control of human African trypanosomiasis in the Quicama focus, Angola. Bull World Health Organ. 2002, 80 (9): 738-745.PubMed CentralPubMed
              26. Gouteux JP, Malonga JR: Socio-entomologic survey in human trypanosomiasis focus of Yamba (Peoples Republic of Congo). Med Trop (Mars). 1985, 45 (3): 259-263.
              27. Robays J, Lefevre P, Lutumba P, Lubanza S, Kande Betu Ku Mesu V, Van der Stuyft P, Boelaert M: Drug toxicity and cost as barriers to community participation in HAT control in the Democratic Republic of Congo. Trop Med Int Health. 2007, 12 (2): 290-298. 10.1111/j.1365-3156.2006.01768.x.View ArticlePubMed
              28. Hagenbucher-Sacripanti F: Myth reconstruction and health representations in a Southern Congolese therapeutic sect. Sante. 1996, 6 (1): 43-52.PubMed
              29. Kovacic V: Health seeking behaviour in relation to sleeping sickness (Human African trypanosomiasis) in West Nile, Uganda (MPhil thesis). 2009, University of Oxford, Oxford
              30. Blum J, Schmid C, Burri C: Clinical aspects of 2541 patients with second stage human African trypanosomiasis. Acta Trop. 2006, 97 (1): 55-64. 10.1016/j.actatropica.2005.08.001.View ArticlePubMed


              © Checchi et al.; licensee BioMed Central Ltd. 2012

              This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.