Back-calculating the incidence of infection of leprosy in a Bayesian framework

Crump, Ronald E.; Medley, Graham F.

doi:10.1186/s13071-015-1142-5

Research
Open access
Published: 22 October 2015

Back-calculating the incidence of infection of leprosy in a Bayesian framework

Parasites & Vectors volume 8, Article number: 534 (2015) Cite this article

2549 Accesses
13 Citations
2 Altmetric
Metrics details

Abstract

Background

The number of new leprosy cases reported annually is falling worldwide, but remains relatively high in some populations. Because of the long and variable periods between infection, onset of disease, and diagnosis, the recently detected cases are a reflection of infection many years earlier. Estimation of the numbers of sub-clinical and clinical infections would be useful for management of elimination programmes. Back-calculation is a methodology that could provide estimates of prevalence of undiagnosed infections, future diagnoses and the effectiveness of control.

Methods

A basic back-calculation model to investigate the infection dynamics of leprosy has been developed using Markov Chain Monte Carlo in a Bayesian context. The incidence of infection and the detection delay both vary with calendar time. Public data from Thailand are used to demonstrate the results that are obtained as the incidence of diagnosed cases falls.

Results

The results show that the underlying burden of infection and short-term future predictions of cases can be estimated with a simple model. The downward trend in new leprosy cases in Thailand is expected to continue. In 2015 the predicted total number of undiagnosed sub-clinical and clinical infections is 1,168 (846–1,546) of which 466 (381–563) are expected to be clinical infections.

Conclusions

Bayesian back-calculation has great potential to provide estimates of numbers of individuals in health/infection states that are as yet unobserved. Predictions of future cases provides a quantitative measure of understanding for programme managers and evaluators. We will continue to develop the approach, and suggest that it might be useful for other NTD in which incidence of diagnosis is not an immediate measure of infection.

Background

Despite a large reduction in prevalence over the last half century, leprosy remains a public health issue in many countries [1]. In 2013 there were 215,000 new cases of leprosy detected and reported worldwide, with 96 % of these being detected in the 18 countries with the highest burden [2]. Leprosy is caused by the bacterium Mycobacterium leprae and this infection can be cured through the use of multidrug therapy (MDT). The drug regimen is dependent upon the classification of the case: paucibacillary cases receive six months of treatment with two-drug MDT while multibacillary cases are treated for 12 months with three-drug MDT. Multibacillary cases are identified as those with more than five skin lesions; paucibacillary cases have five skin lesions or fewer. Outside of the Americas there is little published evidence of transmission to humans from non-human reservoirs of infection [3]. Taking this as evidence that non-human reservoirs are not an important source of transmission, then removing clinical, or sub-clinical, infection from people by treatment should reduce the incidence of infection.

In 1991, the World Health Assembly resolved to eliminate leprosy as a public health problem by the year 2000 [4]. The practical definition of this target was a prevalence of less than 1 registered case per 10,000 of the global population. Progress towards the target was helped by the free provision of multidrug therapy (MDT) from 1995 through the World Health Organisation (WHO), initially by the Nippon Foundation and since 2000 by Novartis. The global target was achieved, but leprosy remains an important public health issue at a national level for a number of countries and in some regions of other (previously high burden) countries [5, 6].

Early diagnosis and use of MDT are the principle tools used to combat leprosy [1]. The social stigma associated with leprosy remains a barrier to early diagnosis, and education is seen as an important step to overcome this. The bacillus Calmette-Guérre (BCG) tuberculosis vaccine also has a protective effect against M. leprae infection [7], although the relative contribution of these interventions to current control has not been quantified.

The latest WHO target for leprosy is a reduction in the number of new cases with grade 2 disability to below one per million of the global population by 2020 [5]. Grade 2 disability involves deformity of the hands or feet, or severe visual impairment. The national number of new cases with grade 2 disability is one of the variables reported annually to the WHO, along with the total number of new cases and the number of new multibacillary cases.

We consider elimination as the process of reducing the incidence of infection until transmission is zero. The aim of modelling in the context of elimination is two-fold. First, to provide quantitative forecasts of future cases based on data and current understanding. Such forecasts (made public) are essential as a test of programme management and evaluation, and are an early warning of changes in programme effectiveness. Given the long and variable time periods between infection and disease in leprosy, the incidence of disease is a biased measure of the incidence of infection (unless the infection rate is constant). For example, when incidence is exponentially decreasing, there is a bias towards individuals who have longer infection times [8]. Consequently, changes in incidence of disease require interpretation before they can be used to infer changes in incidence of infection. The second, and most common, use of modelling for elimination is the prediction of impacts of changes to control programmes, this type of modelling has been applied to leprosy [9]. Here we concentrate on the first of these aims: forecasting and interpretation. The method we have chosen to use is back-calculation [10], which provides estimates of past incidence of sub-clinical infections, current prevalence of (undiagnosed) sub-clinical and clinical infections and short-term forward predictions of the numbers of new detected and reported cases. Note that even if transmission is halted, cases will still arise in the future as infected individuals progress through disease stages.

Back-calculation [10] has been used to study the infection dynamics of a variety of infectious diseases with long and variable incubation period distributions, i.e. the time between infection and diagnosis, for example HIV/AIDS, BSE, vCJD, and also cancer. Each observed diagnosis is the result of a previous infection. Therefore, given information on diagnoses and the incubation period distribution (IPD), empirical estimates can be made of the number of infections per time interval. The diagnoses made during one interval are a mixture of individuals infected at different times in the past; for example, individuals diagnosed in 2015 might include people infected in the years 1990 to 2010, depending on the infection rates in those years, and the IPD.

For discrete time periods, the basic back-calculation equation is:

$$ {D}_i={\displaystyle \sum_{j=1}^i}{I}_j{f}_{i-j} $$

where D _i is the number of new diagnoses in the i ^th time interval, I _j is the expected number of infections in the j ^th time interval and f _i−j is the probability that the time between infection and diagnosis is i-j time intervals. In the simplest case f _i−j comes directly from the IPD. The logic of back-calculation is that given D and f, the convolution above allows estimation of I. Here, we adopt a Bayesian approach to estimate posterior densities, with a concentration on estimation of variability in I as much as central locations, and so include some variability in the time period distributions, f, and estimate some relative factors. It is impossible to estimate both I and f simultaneously because of non-identifiability, but we can use prior knowledge and external data to provide estimates of I given uncertainty in f.

This basic model can be modified to reflect the natural history of the disease under study. For example, AIDS diagnosis was originally modelled as the end-point for disease [10] but this has subsequently been refined to incorporate both AIDS and HIV diagnoses as well as a number of other intermediate disease states based on CD4 count [11].

Previous uses of back-calculation have concentrated on epidemic or endemic situations (i.e. where I is increasing or constant). In contrast, leprosy is globally decreasing, and back-calculation might be more generally useful in disease control situations approaching elimination. In the case of leprosy, the time to diagnosis also depends on the public health system in that more active case finding will result in faster diagnosis. Consequently, we will consider time-varying hazards associated with the distribution of time from onset of symptoms to diagnosis, as well as time-varying rates of infection.

Methods

The endemic nature of leprosy and it’s long incubation period mean that data on individuals’ time of infection are difficult to obtain. Consequently, information on the IPD and the distribution of the time period from onset of clinical symptoms to detection, the detection delay distribution (DDD), must be obtained independently and then incorporated into the back-calculation. Both the fitting of distributions to time periods and the back-calculation were implemented in a Bayesian Markov Chain Monte Carlo (MCMC) setting. The program Just Another Gibbs Sampler (JAGS) [12] was used, with the R statistical environment [13] acting as the front end.

Fitting time period distributions

Data on time of infection, onset and diagnosis

Data on 49 individual cases of leprosy with short exposure periods were extracted from the literature [14–20]. These were mostly military service personnel who spent known, limited periods of time in endemic countries before returning to live in non-endemic communities and subsequently being diagnosed.

Modelling time period distributions

For each individual, the observed endpoints for each time period (onset of symptoms and diagnosis) were modelled as t _o = t _i + p _i:o and t _d = t _i + p _i:o + p _o:d, respectively, where t _i was the unobserved time of infection and p _i:o and p _o:d were the periods from incubation to onset and from onset to diagnosis. Infection time was a uniformly distributed variable between the start and end of exposure and the two time periods were gamma distributed random variables. The observed values, t _o and t _d, were both left and right censored, with onset and diagnosis dates taken as being known to within some time interval. The Gibbs Sampler was used to obtain samples from the posterior probability distributions of the shape and rate parameters of the gamma distributions.

Back-calculation

National incidence of new leprosy cases

The data used consisted of annual numbers of new leprosy cases, new multibacillary leprosy cases and new leprosy cases with grade 2 disability reported at the national level for Thailand. These data were taken from published reports [2, 21–28]. The variables reported to the WHO have changed over time, resulting in the number of new cases detected, number of new cases with grade 2 disability and the number of new multibacillary cases being available from 1965, 1982 and 1984, respectively. Data on the three variables were available up to 2014. Since 1993, cases of multibacillary leprosy have been identified via case classification (with >5 skin lesions indicating a multibacillary case) rather than a bacterial index score. The transition from the use of the bacterial index to case classification took place from 1989 to 1993. From 1984 to 1988 the number of new lepromatous leprosy cases identified under the Madrid classification system [29] was used as the count of new multibacillary cases [21]. The observed values are plotted against year in Fig. 1.

Back-calculation model

The progression from sub-clinical to clinical infection and subsequent diagnosis of leprosy was described by a discrete time period model. Calendar years were used as the discrete time step, in keeping with the annual reporting of national summary information to the WHO. The time period of the analysis was set to start 30 years before the first observed record, to allow the system to generate an appropriate pattern of infection to explain the early observations, leprosy being an endemic disease.

The expected incidence of infections in year b was simulated with a random walk process. The observed number of infections in year b was modelled as:

$$ {I}_b\sim \mathrm{Poisson}\left({\lambda}_b\right) $$

where prior distributions for the λ_b were

$$ {\lambda}_b\sim \left\{\begin{array}{cc}\hfill \mathrm{Uniform}\left(0,10000\right)\hfill & \hfill \mathrm{if}\ b=1\hfill \\ {}\hfill \mathcal{N}\left({\lambda}_{b-1},{\sigma}^2\right),{\lambda}_b>0\hfill & \hfill \mathrm{if}\ b>1\hfill \end{array}\right. $$

The standard deviation, σ, used in the random walk process was set to 30 for the analyses reported in the paper. This was chosen from a number of test runs of the model to provide a smooth downward curve of infections over time while still maintaining some variation. There is no information in the data to allow σ to be estimated within the analyses.

Progress from sub-clinical infection to the onset of symptoms was controlled by the IPD, a gamma distribution with constant parameters within a single MCMC chain. The parameters of the IPD were taken from the analyses reported here. From the numbers of sub-clinical infections, the expected number of new clinical infections in any year can be calculated using the IPD:

$$ \mathrm{E}\left[{O}_c\right]={\displaystyle \sum_{b=1}^c}{I}_b{f}_{c-b} $$

Where f _c-b was the probability that time between sub-clinical infection and onset of clinical symptoms was c-b years. The observed number of clinical infections in year c was then O _c ∼ Poisson (E[O _c]).

There is no test for sub-clinical leprosy. Therefore, diagnosis occurs subsequent to the onset of symptoms in the model. The probability of being diagnosed in any year given that onset occurred in some known previous year was derived from the onset to diagnosis time period distribution; again using a gamma distribution with constant parameters within a single MCMC chain with the parameters taken from the separate time period fitting reported in this paper. We additionally added a parameter reflecting the effort being put into diagnosis at that time (i.e. scaling the diagnosis hazard for that year).

Variations in the effort put into diagnosis programmes, or the effectiveness of these programmes, is handled by scaling the hazard of being diagnosed in year d given that onset of clinical symptoms occurred in year c (h _dc). The probability of an individual being diagnosed in year d given that the onset of clinical symptoms was in year c is:

$$ {g}_{dc}={\gamma}_k{h}_{dc}\left(1-{G}_{dc}\right) $$

Where $ {G}_{dc}={\displaystyle \sum_{i=1}^{d-1}}{g}_{ic} $, diagnosis effort is modelled by γ _k and h _dc comes from the DDD:

$$ {h}_{dc}=\frac{\mathrm{Prob}\left(c-1\le X<c\right)}{1-\mathrm{Prob}\left(X<c-1\right)} $$

For the Thailand data, six different effort parameters were included to correspond to periods in which the diagnosis effort or effectiveness may have differed [21], as set out in Table 1. The effort parameters, γ _k, were modelled using a random walk, such that effort in period k was a change from the effort in the previous period. The prior distributions for γ _k were:

$$ {\gamma}_k\sim \left\{\begin{array}{cc}\hfill \mathrm{Uniform}\left(1,{0.25}^2\right)\hfill & \hfill \mathrm{if}\ k=1\hfill \\ {}\hfill \mathcal{N}\left({\gamma}_{k-1},{0.25}^2\right)\hfill & \hfill \mathrm{if}\ k>1\hfill \end{array}\right. $$

Table 1 Periods for which the effort or effectiveness of diagnosis may have varied in Thailand, corresponds to parameters fitted

Full size table

The expected number of diagnoses in year d was then:

$$ \mathrm{E}\left[{D}_d\right]={\displaystyle \sum_{j=1}^d}\mathrm{E}\left[{O}_j\right]{g}_{dj} $$

The observed number of new diagnoses in year i was a Poisson variable with parameter E[D _i], D _i ∼ Poisson(E[D _i]). The probability of a multibacillary diagnosis being made in year d given that onset of clinical symptoms happened in year c was assumed to be unrelated to both the length of time since onset and calendar time; f ^M_dc = ϕ _Mi f _dc, where ϕ _Mi was the proportion of multibacillary cases expected in the i ^th period (d < 1989, 1989 ≤ d ≤ 1993 and d ≥ 1994) and f _dc was the probability of being diagnosed in year d given that the onset of clinical symptoms occurred in year c. The same prior was used for all three ϕ _Mi, ϕ _Mi ∼ Beta(2, 2).

The existence of a relationship between the proportion of grade 2 disability cases and detection delay has been demonstrated previously [30]. This relationship was incorporated into the back-calculation model as a linear regression on detection delay. The probability of a new case with grade 2 disability being made in year d given that onset of symptoms happened in year c was taken to be f ^D_dc = min(ϕ _D + (d − c + 1)κ _D, 1)f _dc. Uniform priors were assumed for ϕ _D and κ _D, ranging from zero to 0.25 for both parameters.

The uncertainty about the parameters associated with the IPD and DDD was incorporated into the back-calculation via multiple imputation [31] to avoid problems arising from the lack of information in reported case data on the parameters of the IPD and DDD. Multiple MCMC chains were run, with each one using a randomly sampled set of parameters from a multivariate normal approximation to the posterior probability distribution of the IPD and DDD parameters. A total of 100 chains were run, with 50 samples being retained from each. Each chain was run for 250,000 iterations; after 25,000 adaptation and 150,000 burn-in samples one out of every 1,500 samples was retained. The resulting 5,000 samples were merged to provide an overall posterior probability distribution for the parameters of interest. The chains were given equal weight in the merging as they were expected to fit the data equally well.

Results

Time period distributions

The IPD and DDD were fitted as gamma distributions with a common rate parameter. In this way the distribution of the sub-clinical infection to detection period is also a gamma distribution with the same rate parameter and a shape parameter equal to the sum of the shape parameters of the two constituent distributions. Table 2 contains a summary of the posterior probability distributions of the three parameters. The parameters are strongly correlated with each other and these relationships were taken into account in the back-calculation. This was done through the use of a multivariate normal approximation to the joint posterior probability distribution of these parameters from which samples were drawn for the multiple imputation runs.

Table 2 Summary of posterior probability distributions of the shape parameters for the incubation period and detection delay distributions (α _i:o and α _o:d) and the common rate parameter (β)

Full size table

The posterior samples of the time period distribution parameters were used to generate samples of the gamma density for each curve at each time point. These were used to create Fig. 2, which shows the mode and 95 % highest posterior density interval (HPD95) of the two distributions. The modal value of the IPD was 3.8 years (HPD95: 2.10 to 5.64 years). For 95 % of infected individuals, onset occurs by 17.8 years after infection (modal value, HPD95: 15.2 to 22.3 years). Detection occurred for 95 % of clinical infections by 11.4 years post-onset (modal value, HPD95: 9.4 to 14.9 years). The mode of the sub-clinical infection to diagnosis period distribution was at 7.9 years (HPD95: 5.9 to 9.7 years), and 95 % of clinical infections were detected by 23.6 years after sub-clinical infection (modal value, HPD95: 20.4 to 29.2 years).

Trends in cases

Figure 3 shows the predicted trends in the numbers of new sub-clinical and clinical infections and reported cases for 1950–2025. The trends in sub-clinical infections and clinical infections are smooth with the clinical infection curve lagging the sub-clinical infection curve slightly.

The diagnosis hazard in any year is scaled by a diagnostic effort parameter, and this enables the trend in new diagnoses to follow the observed data points reasonably well (Fig. 1). The posterior probability distributions of the six effort parameters are in Fig. 4. During the period where leprosy care was being incorporated into the general health care system and workers were being trained (1972–1977), fewer diagnoses were being made than would be predicted based on the DDD. The modes of the parameter distributions for the other five intervals are above 1, indicating that more diagnoses are made than predicted just using the DDD estimated previously. In particular, the highest diagnostic effort/effectiveness was before 1972, when leprosy care joined the general health care system.

Figure 1 shows the predicted numbers of new cases, new multibacillary cases and new cases with grade 2 disability along with the observed numbers and the cut-offs for different periods of diagnostic effort. The observed values are quite well predicted by the analysis.

The scale which results from the large reduction in the number of new cases over recent decades prevents the variation of the posterior probability distribution being as clear as it would be in close-up, but there is an increase in variation after 2014, when the observations end.

Prevalence of infection

In line with the observed decline in new cases, the cumulative burden of undiagnosed infections, both sub-clinical and clinical, has also reduced over time (Fig. 5). As the incidence of diagnoses falls, so the uncertainty in numbers of undiagnosed sub-clinical and clinical infections reduces. The uncertainty in incidence of sub-clinical infection in the very recent past (see Fig. 3) contributes to slightly increasing uncertainty in prevalence in the very recent past. However, the uncertainty in the recent past is largely determined by the choice of σ in the random walk.

Forecasting of diagnoses

The back-calculation analysis was run excluding data from 2010–2014, in order that these could be predicted (Fig. 6). As with the analysis of the full data, 100 samples were taken from a multivariate normal approximation to the posterior probability distribution of the parameters of the IPD and DDD and included via multiple imputation. The predictions of new cases, new multibacillary cases and new cases with grade 2 disability show the correct trend, but a number of observations lie outside of the HPD95 of their predicted values.

Discussion

We have developed a back-calculation approach to reconstruct the past incidence of infection with leprosy in Thailand. This approach also predicts the numbers of individuals with sub-clinical and clinical leprosy who are yet to be diagnosed, and also future predictions of diagnoses of different types. We believe that these results demonstrate the value of this approach.