Spatio-temporal distribution of soil-transmitted helminth infections in Brazil

Background In Brazil, preventive chemotherapy targeting soil-transmitted helminthiasis is being scaled-up. Hence, spatially explicit estimates of infection risks providing information about the current situation are needed to guide interventions. Available high-resolution national model-based estimates either rely on analyses of data restricted to a given period of time, or on historical data collected over a longer period. While efforts have been made to take into account the spatial structure of the data in the modelling approach, little emphasis has been placed on the temporal dimension. Methods We extracted georeferenced survey data on the prevalence of infection with soil-transmitted helminths (i.e. Ascaris lumbricoides, hookworm and Trichuris trichiura) in Brazil from the Global Neglected Tropical Diseases (GNTD) database. Selection of the most important predictors of infection risk was carried out using a Bayesian geostatistical approach and temporal models that address non-linearity and correlation of the explanatory variables. The spatial process was estimated through a predictive process approximation. Spatio-temporal models were built on the selected predictors with integrated nested Laplace approximation using stochastic partial differential equations. Results Our models revealed that, over the past 20 years, the risk of soil-transmitted helminth infection has decreased in Brazil, mainly because of the reduction of A. lumbricoides and hookworm infections. From 2010 onwards, we estimate that the infection prevalences with A. lumbricoides, hookworm and T. trichiura are 3.6%, 1.7% and 1.4%, respectively. We also provide a map highlighting municipalities in need of preventive chemotherapy, based on a predicted soil-transmitted helminth infection risk in excess of 20%. The need for treatments in the school-aged population at the municipality level was estimated at 1.8 million doses of anthelminthic tablets per year. Conclusions The analysis of the spatio-temporal aspect of the risk of infection with soil-transmitted helminths contributes to a better understanding of the evolution of risk over time. Risk estimates provide the soil-transmitted helminthiasis control programme in Brazil with useful benchmark information for prioritising and improving spatial and temporal targeting of interventions. Electronic supplementary material The online version of this article (doi:10.1186/1756-3305-7-440) contains supplementary material, which is available to authorized users.

, where B − 1 groups contain predictors wich are considered highly correlated with a Pearson coefficient > 0.9, while the Bth group includes predictors that exhibit only moderate correlation with other potential predictors. In addition, potential predictors presenting a non-linear association to the infection risk in explanatory analyses have been categorised and we define X (b) lj b as being the lth categorical form of predictor X (b) j b , where l = 1, ..., L categories (excluding baseline). Our variable selection procedure aims to select most important predictors, while accounting for spatial correlation, addressing non-linearity of the predictors, and forcing the model to choose a maximum of one predictor among the ones considered as highly correlated. To that end, we model a categorical temporal trend T il (l = 1, ..., L categories), the potential predictors X (b) lj b i and a spatial random effect ϕ i on the logit scale, such as : where regression coefficients of potential predictors X j b are defined as the product of an overall contribution α j b and the effect ξ lj b of each of its elements (i. e., categories).
Within a Bayesian framework of inference, we assign a spike and slab prior (Scheipl et al., 2012;Chammartin et al., 2013a,b) where a τ and b τ are fixed parameters of non-informative inverse-gamma distribution set to 5 and 25, respectively, while υ 0 is a small constant set to 0.00025, shrinking α j b to zero when the predictor is excluded. The product of the two indicators γ 2 , respectively, such as γ To allow greater flexibility in estimating model size, these probabilities are considered as hyper-parameters having non-informative beta and Dirichlet distributions ; Ω , which shrinks ξ lj b towards |1| (multiplicative identity). For predictors moderately correlated, γ 2j b is fixed to 1, while the effect of linear predictors is only defined by an overall contribution of α. In addition, non-informative normal priors have been assigned to the constant β 0 and the effects β 1l of the temporal trend ; β 0 , β 1l ∼ N (0, 100).
Large matrix computation cost in estimating this latent spatial process ϕ is overcome with the predictive process estimation (Banerjee et al., 2008). In more details, ϕ is estimated from a subset of 200 locations (knots) {s * k , k = 1, .., K} with latent observations ϕ * = (ϕ * 1 , ..., ϕ * K ) T , ϕ * ∼ M V N (0, Σ * ). Σ * is the KXK variance-covariance matrix modelled by an isotropic exponential correlation function of distance, i. e, Σ * cd = σ 2 sp exp(−ρd cd ), where d cd is the Euclidean distance between locations c and d, σ 2 sp is the geographical variability, and ρ controls the rate of correlation decay. Inverse gamma distribution σ 2 sp ∼ IG(2.01, 1.01) is chosen for the variance σ 2 sp and a gamma distribution is assumed for the spatial decay ρ, ρ ∼ G(0.01, 0.01). Spatial random effect ϕ at original set of locations are predicted via the conditional mean Q T Σ * −1 ϕ * , where Q = Cov(ϕ * , ϕ) is a N XK matrix of the covariance function between the K knots and the N observed locations. Minimax space filling sampling (Johnson et al., 1990;Diggle and Lophaven, 2006) is used to select the knots using the cover.design routine in R (The R Foundation for Statistical Computing R v.3.0.2).
Geostatistical variable selection was run in JAGS through the rjags library of R (The R Foundation for Statistical Computing v.3.0.2) in JAGS 3.4.0 with on chain sampler and 40,000 iterations (including a burn-in of 10,000 iterations). Final 10,000 iterations were used to calculate models posterior probabilities and the subset of variables included in the models with the highest posterior probabilities identified the final models.

Bayesian spatio-temporal model formulation
Our Bayesian spatio-temporal model formulation follows the approach introduced by Cameletti et al. (2013). In particular, we define Y it , p it and n it as the number of infected individuals, the number of individuals screened, and the prevalence of infection at location i (i = 1, ..., N ) for time t (t = 1, ..., T ), and we assume Y it to be generated from a binomial distribution, i. e., Y it ∼ Bin(p it , n it ). Prevalence of infection is then linearly regressed on the logit scale as follows : logit(p it ) = X T it β + ϕ it , where X is the matrix of explanatory variables (including an intercept, a temporal trend, and the predictors selected by the variable selection), β is the regression coefficient vector, and ϕ is a spatio-temporally-structured random effect. We allow the spatio-temporal process ϕ it to change in time with a first order autoregressive process (AR1), such as : with a temporal autoregressive coefficient a, |a| < 1 and a temporally independent spatially-structured effect ω which is assumed to be multivariate normal with zero mean and spatio-temporal covariance function of the Matérn family : where σ 2 ω is the variance of the structured effect ω, (σ 2 ω =Var(ω i,t )). The spatial correlation function C(d ij ) is function of the Euclidean distance between locations i and j (d ij ) and is defined by the Matérn function given by : where K υ is the modified Bessel function of second kind and order υ (υ > 0), υ is a smoothing parameter controlling the rate of correlation decay fixed to 1, and κ (κ > 0) is a scaling parameter. The spatial range is defined as the minimum distance at which spatial correlation between locations is less than 10% and is given by 8υ/κ.