 Research
 Open Access
 Published:
Bayesian geostatistical modelling of soiltransmitted helminth survey data in the People’s Republic of China
Parasites & Vectorsvolume 6, Article number: 359 (2013)
Abstract
Background
Soiltransmitted helminth infections affect tens of millions of individuals in the People’s Republic of China (P.R. China). There is a need for highresolution estimates of atrisk areas and number of people infected to enhance spatial targeting of control interventions. However, such information is not yet available for P.R. China.
Methods
A georeferenced database compiling surveys pertaining to soiltransmitted helminthiasis, carried out from 2000 onwards in P.R. China, was established. Bayesian geostatistical models relating the observed survey data with potential climatic, environmental and socioeconomic predictors were developed and used to predict atrisk areas at high spatial resolution. Predictors were extracted from remote sensing and other readily accessible opensource databases. Advanced Bayesian variable selection methods were employed to develop a parsimonious model.
Results
Our results indicate that the prevalence of soiltransmitted helminth infections in P.R. China considerably decreased from 2005 onwards. Yet, some 144 million people were estimated to be infected in 2010. High prevalence (>20%) of the roundworm Ascaris lumbricoides infection was predicted for large areas of Guizhou province, the southern part of Hubei and Sichuan provinces, while the northern part and the southeastern coastalline areas of P.R. China had low prevalence (<5%). High infection prevalence (>20%) with hookworm was found in Hainan, the eastern part of Sichuan and the southern part of Yunnan provinces. High infection prevalence (>20%) with the whipworm Trichuris trichiura was found in a few small areas of south P.R. China. Very low prevalence (<0.1%) of hookworm and whipworm infections were predicted for the northern parts of P.R. China.
Conclusions
We present the first modelbased estimates for soiltransmitted helminth infections throughout P.R. China at high spatial resolution. Our prediction maps provide useful information for the spatial targeting of soiltransmitted helminthiasis control interventions and for longterm monitoring and surveillance in the frame of enhanced efforts to control and eliminate the public health burden of these parasitic worm infections.
Background
Soiltransmitted helminths are a group of parasitic nematode worms causing human infection through contact with parasite eggs (Ascaris lumbricoides and Trichuris trichiura) or larvae (hookworm) that thrive in the warm and moist soil of the world’s tropical and subtropical countries [1]. More than 5 billion people are at risk of soiltransmitted helminthiasis [2]. Estimates published in 2003 suggest that 1,221 million people were infected with A. lumbricoides, 795 million with T. trichiura and 740 million with hookworms [3]. The greatest number of soiltransmitted helminth infections at that time occurred in the Americas, the People’s Republic of China (P.R. China), East Asia and subSaharan Africa [4]. Socioeconomic development and largescale control efforts have lowered the number of people infected with soiltransmitted helminths in many parts of the world [1]. For the year 2010, the global burden due to soiltransmitted helminthiasis has been estimated at 5.2 million disabilityadjusted life years [5].
In P.R. China, there have been two national surveys for parasitic diseases, including soiltransmitted helminthiasis. Both surveys used the KatoKatz technique as the diagnostic approach, based on a single KatoKatz thick smear obtained from one stool sample per individual. The first national survey was conducted from 1988 to 1992 and the second in 20012004. In the first survey, there were a total of 2,848 study sites with approximately 500 people examined per site. The survey indicated overall prevalences of 47.0%, 18.8% and 17.2% for A. lumbricoides, T. trichiura and hookworm infections, respectively, corresponding to 531 million, 212 million and 194 million infected people, respectively [6]. The second survey involved 687 study sites and there were 356,629 individuals examined overall. Analyses of the data revealed considerably lower prevalences for soiltransmitted helminth infections than in the first survey; A. lumbricoides, hookworm and T. trichiura prevalences were 12.7%, 6.1% and 4.6%, respectively [7]. However, interventions were less likely to reach marginalized communities in the poorest areas [8] and the diseases reemerged whenever control measures were discontinued [9, 10]. To overcome the challenge of parasite infections in P.R. China, in 2005, the Chinese Ministry of Health issued the “National Control Program on Important Parasitic Diseases from 2006 to 2015” with its target to reduce the prevalence of helminth infections by 70% by the year 2015 [8]. The key strategy for control was largescale administration of anthelminthic drugs in high prevalence areas, especially targeting schoolaged children and people living in rural areas [9, 11].
Maps depicting the geographical distribution of the disease risk can aid control programmes to deliver costeffective interventions and assist in monitoring and evaluation. The Coordinating Office of the National Survey on the Important Human Parasitic Diseases in P.R. China [7] obtained prevalence maps by averaging the data of the second national survey within each province. To our knowledge, highresolution, modelbased maps using available national survey data are not available to date in P.R. China. Modelbased geostatistics predict the disease prevalence at places without observed data by quantifying the relation between the disease risk at observed locations with potential predictors such as socioeconomic, environmental, climatic and ecological information, the latter often obtained via remote sensing. Modelbased geostatistics have been used before to map and predict the geographical distribution of soiltransmitted helminth infections in Africa [12, 13], Asia and Latin America [14–16]. Modelbased geostatistics typically employ regression analysis with random effects at the locations of the observed data. The random effects are assumed to be latent observations from a zeromean Gaussian process, which models spatial correlation to the data via a spatially structured covariance. Bayesian formulations enable model fit via Markov chain Monte Carlo (MCMC) simulation algorithms [17, 18] or other computational algorithms (e.g. integrated nested Laplace approximations (INLA) [19]). INLA is a computational approach for Bayesian inference and is an alternative to MCMC to overcome computational burden for obtaining the approximated posterior marginal distribution for the latent variables, as well as for the hyperparameters [20].
In this study, we aimed to: (i) identify the most important climatic, environmental and socioeconomic determinants of soiltransmitted helminth infections; and (ii) develop modelbased Bayesian geostatistics to assess the geographical distribution and number of people infected with soiltransmitted helminths in P.R. China.
Methods
Ethical considerations
The work presented here is based on soiltransmitted helminth survey data derived from the second national survey and additional studies identified through an extensive review of the literature. All data in our study was extracted from published sources and they are aggregated over villages, towns or counties; therefore, do not contain information that is identifiable at individual or household level. Hence, there are no specific ethical considerations.
Disease data
Georeferenced data on soiltransmitted helminth infections from the second national survey conducted in P.R. China from 2001 to 2004 were provided by the National Institute of Parasitic Diseases, Chinese Center for Diseases Control and Prevention (IPD, China CDC; Shanghai, P.R. China). Moreover, an extensive literature search was undertaken in PubMed and China National Knowledge Internet (CNKI) from January 1, 2000 until April 25, 2013 to identify studies reporting village, town and countylevel prevalence data of soiltransmitted helminth infections in P.R. China. Data were excluded if (i) they were from hospital surveys, postintervention surveys, drug efficacy studies and clinical trials; (ii) reports on disease infection among travellers, military personnel, expatriates, mobile populations and other displaced or migrating populations; (iii) the geographical coordinates could not be identified; and (iv) the diagnostic technique was not reported [21]. Data were entered into the Global Neglected Tropical Diseases (GNTD) database, which is a georeferenced, openaccess source [21]. Geographical coordinates for the survey locations were obtained via Google maps, a free web mapping service application and technology system. As we focus on recent data pertaining to soiltransmitted helminth infections in P.R. China, we only considered surveys carried out from 2000 onwards.
Climatic, demographic and environmental data
Climatic, demographic and environmental data were downloaded from different readily accessible remote sensing data sources, as shown in Table 1. Land surface temperature (LST) and normalized difference vegetation index (NDVI) were calculated to annual averages and land cover data was summarised to the most frequent category over the period of 20012004. Moreover, land cover data were regrouped into six categories based on betweenclass similarities: (i) forest; (ii) shrubland and savanna; (iii) grassland; (iv) cropland; (v) urban; and (vi) wet areas. Monthly precipitation values were averaged to obtain a longterm average for the period 19502000. Four climatic zones were considered: (i) equatorial; (ii) arid; (iii) warm; and (iv) snow/polar. The following 13 soil types, which may be related to the viability of parasites or microorganisms living in the soil, were used: (i) percentage of coarse fragments (CFRAG, % >2 mm); (ii) percentage of sand (SDTO, mass %); (iii) percentage of silt (STPC, mass %); (iv) percentage of clay (CLPC, mass %); (v) bulk density (BULK, km/dm^{3}); (vi) available water capacity (TAWC, cm/m); (vii) base saturation as percentage of ECEsoil (BSAT); (viii) pH measured in water (PHAQ); (ix) gypsum content (GYPS, g/kg); (x) organic carbon content (TOTC, g/kg); (xi) total nitrogen (TOTN, g/kg); (xii) FAO texture class (PSCL); and (xiii) FAO soil drainage class (DRAIN). Human influence index (HII) was included in the analysis to capture direct human influence on ecosystems [22]. Urban/rural extent was considered as a binary indicator. Gross domestic product (GDP) per capita was used as a proxy of people’s socioeconomic status. We obtained GDP per capita for each county from the P.R. China Yearbook fulltext database in 2008.
Moderate Resolution Imaging Spectroradiometer (MODIS) Reprojection Tool version 4.1 (EROS; Sioux Falls, USA) was applied to process MODIS/Terra data. All remotely sensed data were aligned over a prediction grid of 5 × 5 km spatial resolution using Visual Fortran version 6.0 (Digital Equipment Corporation; Maynard, USA). Data at the survey locations were also extracted in Visual Fortran. As the outcome of interest (i.e. infection prevalence with a specific soiltransmitted helminth species) is not available at the resolution of the covariates for surveys aggregated over counties, we linked the centroid of those counties with the average value of each covariate within the counties. Distances to the nearest water bodies were calculated using ArcGIS version 9.3 (ERSI; Redlands, USA). For countylevel surveys, the distances of all the 5 × 5 km pixel centroids to their nearest water bodies within the county were extracted and averaged. The arithmetic mean was used as a summary measure of continuous data, while the most frequent category was used to summarise categorical variables.
Statistical analysis
The survey year was grouped into two categories: before 2005 and from 2005 onwards. Land cover, climatic zones, soil texture and soil drainage were included into the model as categorical covariates. Continuous variables were standardised to mean 0 and standard deviation 1 using the command “std()” in Stata version 10 (Stata Corp. LP; College Station, USA). Pearson’s correlation was calculated between continuous variables. One of the two variables, which had correlation coefficient greater than 0.8, was dropped to avoid collinearity [23]. Preliminary analysis indicated that for this dataset, three categories were sufficient to encapsulate for nonlinearity of continuous variables, therefore we constructed 3level categorical variables based on their distribution. Subsequent variable selection incorporated within the geostatistical model selected the most probable functional form (linear vs. categorical). Bivariate and multivariate logistic regressions were carried out in Stata version 10.
Bayesian geostatistical logistic regression models with locationspecific random effects were fitted to obtain spatially explicit soiltransmitted helminth infection estimates. Let Y_{ i }, n_{ i } and p_{ i } be the number of positive individuals, the number of those examined and the probability of infection at location i (i = 1, 2,…, L), respectively. We assume that Y_{ i } arises from a binominal distribution Y_{ i } ~ Bn(p_{ i },n_{ i }), where $\text{logit}\left({\mathrm{p}}_{\mathrm{i}}\right)={\beta}_{0}+{\displaystyle {\sum}_{k=1}{\beta}_{k}\times {X}_{i}^{\left(k\right)}}+{\u03f5}_{i}+{\varphi}_{i}$. β_{ k } is the regression coefficient of the k^{th} covariate ${X}_{i}^{\left(k\right)},$ϵ_{ i } is a locationspecific random effect and ϕ_{ i } is an exchangeable nonspatial random effect. To estimate the parameters, we formulate our model in a Bayesian framework. We assumed ϵ = (ϵ_{1},…,ϵ_{ L }) followed a zeromean multivariate normal distribution, ϵ ~ MVN(0,Σ), where Matérn covariance function ${\Sigma}_{\mathit{ij}}={\sigma}_{\mathit{sp}}^{2}{\left(\kappa {d}_{\mathit{ij}}\right)}^{\upsilon}{K}_{\upsilon}\left(\kappa {d}_{\mathit{ij}}\right)/\left(\Gamma \left(\upsilon \right){2}^{\upsilon 1}\right).$d_{ ij } is the Euclidean distance between locations i and j. κ is a scaling parameter, υ is a smoothing parameter fixed to 1 and K_{ υ } denotes the modified Bessel function of second kind and order υ. The spatial range $\rho =\sqrt{8}/\kappa $, is the distance at which spatial correlation becomes negligible (<0.1) [24]. We assumed that ϕ_{ i } follows a zeromean normal distribution ${\varphi}_{i}~N\left(0,{\sigma}_{\mathit{\text{nonsp}}}^{2}\right).$ A normal prior distribution was assigned to the regression coefficients, that is β_{0}, β_{ k } ∼ N(0, 1000) and loggamma priors were adopted for the precision parameters, ${\tau}_{\mathit{sp}}=1/{\sigma}_{\mathit{sp}}^{2}$ and ${\tau}_{\mathit{\text{nonsp}}}=1/{\sigma}_{\mathit{\text{nonsp}}}^{2}$ on the log scale, that is log(τ_{ sp }) ∼ log gamma(1, 0.00005) and log(τ_{ nonsp }) ∼ log gamma(1, 0.00005). Furthermore, we assumed the following prior distribution for range parameter log(ρ) ~ log gamma(1,0.01).
The most widely used computational approach for Bayesian geostatistical model fit is MCMC simulation. However, large spatial covariance matrix calculations can increase computational time and possibly introduce numerical errors. Hence, we fitted the geostatistical model using the stochastic partial differential equations (SPDE)/INLA [19, 25] approach, readily implemented in the INLA Rpackage (available at: http://www.rinla.org). Briefly, the spatial process assuming a Matérn covariance matrix Σ can be represented as a Gaussian Markov random field (GMRF) with mean zero and a symmetric positive definite precision matrix Q (defined as the inverse of Σ) [20]. The SPDE approach constructs a GMRF representation of the Matérn field on a triangulation (a set of nonintersecting triangles where any two triangles meet in at most a common edge or corner) partitioning the domain of the study region [25]. Subsequently, the INLA algorithm is used to estimate the posterior marginal (or joint) distribution of the latent Gaussian process and hyperparameters by Laplace approximation [19].
Bayesian variable selection, using normal mixture of inverse Gammas with parameter expansion (peNMIG) spikeandslab priors [26] was applied on the model with independent random effect for each location to identify the best set of predictors (i.e. climatic, environmental and socioeconomic). In particular, we assumed a normal distribution for the regression coefficients with a hyperparameter for the variance σ_{ B }^{2} to be a mixture of inverse Gamma distributions, that is β_{ k } ~ N(0,σ_{ B }^{2}) where σ_{ B }^{2} ~ I_{ k }IG(a_{ σ }, b_{ σ }) + (1  I_{ k })υ_{0}IG(a_{ σ }, b_{ σ }) and a_{ σ }b_{ σ } are fixed parameters. υ_{0} is some small positive constant [27] and the indicator I_{ k } has a Bernoulli prior distribution I_{ k } ~ bern(π_{ k }), where π_{ k } ~ beta(a_{ π },b_{ π }). We set (a_{ σ },b_{ σ }) = (5,25) (a_{ π },b_{π}) = (1,1) and υ_{0} = 0.00025. The above prior of mixed inverse Gamma distributions is called a mixed spike and slab prior for β_{ k } as one component of the mixture υ_{0}IG(a_{ σ },b_{ σ }) (when I_{ k } = 0) is a narrow spike around zero that strongly shrinks β_{ k } to zero, while the other component IG(a_{ σ },b_{ σ }) (when I_{ k } = 1) is a wide slab that moves β_{ k } away from zero. The posterior distribution of I_{ k } determines which component of the mixture is predominant contributing to the inclusion or exclusion of β_{ k }. For categorical variables, we applied a peNMIG prior developed by Scheipl et al. [26], which allows to include or exclude blocks of coefficients by improving “shrinkage” properties. Let β_{ kh } be the regression coefficient for the h^{th} category of the k^{th} predictor, then β_{ kh } = a_{ k }ξ_{ hk }, where a_{ k } is assigned a NMIG prior described above and ξ_{ hk } ~ N(m_{ hk },1). Here m_{ hk } = o_{ hk }(1o_{ hk }) and o_{ hk } ~ bern(0.5), allow to shrink ξ_{ hk } towards 1. Hence, a_{ k } models the overall contribution of the k^{th} predictor and ξ_{ hk } estimates the effects of each element β_{ kh } of the predictor [27]. In addition, we introduced another indicator I_{ d } for selection of either a categorical or a linear form of a continuous variable. Let β_{kd 1} and β_{kd 2} indicate coefficients of the categorical and linear form of k^{th} predictor, respectively, then β_{ k } = I_{ d }β_{kd 1} + (1  I_{ d })β_{kd 2}, where I_{ d } ~ Be(0.5). MCMC simulation was employed to estimate the model parameters for variable selection in OpenBUGS version 3.0.2 (Imperial College and Medical Research Council; London, UK) [28]. Convergence was assessed by the Gelman and Rubin diagnostics [29], using the coda library in R [30]. In Bayesian variable selection, all models arising from any combination of covariates are fitted and the posterior probability for each model to be the true one is calculated. The predictors corresponding to the highest joint posterior probability of indicators (I_{1},I_{2},…I_{ k },…,I_{ K }) were subsequently used as the best set of predictors to fit the final geostatistical model.
A 5 × 5 km grid was overlaid to the P.R. China map, resulting in 363,377 pixels. Predictions for each soiltransmitted helminth species were obtained via INLA at the centroids of the grid’s pixels. An overall soiltransmitted helminth prevalence was calculated assuming independence in the risk between any two species, that is, p_{ S } = p_{ A } + p_{ T } + p_{ h }  p_{ A } × p_{ T }  p_{ A } × p_{ h }  p_{ T } × p_{ h } + p_{ A } × p_{ T } × p_{ h }, where p_{ S }, p_{ A }, p_{ T } and p_{ h } indicate the predicted prevalence of overall soiltransmitted helminth, A. lumbricoides, T. trichiura and hookworm, respectively, for each pixel. The number of infected individuals at pixel level was estimated by multiplying the median of the corresponding posterior predictive distribution of the infection prevalence with the population density.
Model validation
Our model was fitted on a subset of the data, including approximately 80% of survey locations. Validation was performed on the remaining 20% by estimating the mean predictive error (ME) between the observed π_{ i } and predicted prevalence ${\widehat{\pi}}_{i}$ at location i, where $\mathit{ME}=1/N*{\displaystyle {\sum}_{i=1}({\pi}_{i}{\widehat{\pi}}_{i}})$ and N is the total number of test locations. In addition, we calculated Bayesian credible intervals (BCI) of various probability and the percentages of observations included in these intervals.
Results
Data summaries
The final dataset included 1,187 surveys for hookworm infection carried out at 1,067 unique locations; 1,157 surveys for A. lumbricoides infection at 1,052 unique locations; and 1,138 surveys for T. trichiura infection at 1,028 unique locations. The overall prevalence was 9.8%, 6.6% and 4.1% for A. lumbricoides, hookworm and T. trichiura infection, respectively. Details about the number of surveys by location type, study year, diagnostic method and infection prevalence are shown in Table 2. The geographical distribution of locations and observed prevalence for each soiltransmitted helminth species are shown in Figure 1. Maps of the spatial distribution of environmental/climatic, soil types and socioeconomic covariates used in Bayesian variable selection are provided in Additional file 1: Figure S1.
Spatial statistical modelling and variable selections
The models with the highest posterior probabilities selected the following covariates: GDP per capita, elevation, NDVI, LST at day, LST at night, precipitation, pH measured in water, and climatic zones for T. trichiura; GDP per capita, elevation, NDVI, LST at day, LST at night, precipitation, bulk density, gypsum content, organic carbon content, climatic zone and land cover for hookworm; and GDP per capita, elevation, NDVI, LST at day and climatic zone for A. lumbricoides. The corresponding posterior probabilities of the respective models were 33.2%, 23.6% and 21.4% for T. trichiura, hookworm and A. lumbricoides, respectively.
The parameter estimates that arose from the Bayesian geostatistical logistic regression fit are shown in Tables 3, 4 and 5. The infection risk of all three soiltransmitted helminth species decreased considerably from 2005 onwards. We found significant positive association between NDVI and the prevalence of A. lumbricoides. A negative association was found between GDP per capita, arid or snow/polar climatic zones and the prevalence of A. lumbricoides. High precipitation and LST at night are favourable conditions for the presence of hookworm, while high NDVI, LST at day, urban or wet land covers and arid or snow/polar climatic zones are less favourable. Elevation, LST at night, NDVI larger than 0.45 and equatorial climatic zone were associated with a higher odds of T. trichiura infection, while LST at day, arid or snow climatic zones were associated with a lower odds of T. trichiura infection.
Model validation results
Model validation indicated that the Bayesian geostatistical logistic regression models were able to correctly estimate within a 95% BCI 84.2%, 81.5% and 79.3% for T. trichiura, hookworm and A. lumbricoides, respectively. A plot of coverage for the full range of credible intervals is presented in Additional file 2: Figure S2. The MEs for hookworm, A. lumbricoides and T. trichiura were 0.56%, 1.7%, and 2.0% respectively, suggesting that our model may slightly underestimate the risk of each of the soiltransmitted helminth species.
Predictive risk maps of soiltransmitted helminth infections
Figures 2, 3 and 4 present speciesspecific predictive risk maps of soiltransmitted helminth infections for the period 2005 onwards. High prevalence of A. lumbricoides (>20%) was predicted in large areas of Guizhou province and the southern part of Sichuan and Hubei provinces. Moderate to high prevalence (520%) were predicted for large areas of Hunan, Yunnan, Jiangxi, some southern areas of Gansu and Anhui provinces and Chongqing city. For the northern part of P.R. China and the southeastern coastalline areas, low prevalences were predicted (<5%). The high prediction uncertainty shown in Figure 2B is correlated with high prevalence areas. High infection prevalence (>20%) with T. trichiura was predicted for a few small areas of the southern part of P.R. China. Moderatetohigh prevalence (520%) was predicted for large areas of Hainan province. High hookworm infection prevalence (>20%) was predicted for Hainan, eastern parts of Sichuan and southern parts of Yunnan provinces. Low prevalence (0.15%) of T. trichiura and hookworm infections were predicted for most areas of the southern part of P.R. China, while close to zero prevalence areas were predicted for the northern part.
Estimates of number of people infected
Figure 5 shows the combined soiltransmitted helminth prevalence and the number of infected individuals from 2005 onwards. Table 6 summarises the populationadjusted predicted prevalence and the number of infected individuals, stratified by province. The overall populationadjusted predicted prevalence of A. lumbricoides, hookworm and T. trichiura infections were, respectively, 6.8%, 3.7% and 1.8%, corresponding to 85.4, 46.6 and 22.1 million infected individuals. The overall populationadjusted predicted prevalence for combined soiltransmitted helminth infections was 11.4%.
For A. lumbricoides, the predicted prevalence ranged from 0.32% (Shanghai) to 27.9% (Guizhou province). Shanghai had the smallest (0.05 million) and Sichuan province the largest number (14.8 million) of infected individuals. For T. trichiura, the predicted prevalence ranged from 0.01% (Tianjin) to 18.3% (Hainan province). The smallest number of infected individuals were found in Nei Mongol, Ningxia Hui, Qinghai provinces and Tianjin (<0.01 million) whereas the largest number, 3.7 million, was predicted for Sichuan province. For hookworm, Ningxia Hui and Qinghai province had the lowest predicted prevalence (<0.01%), while Hainan province had the highest (22.1%). The provinces of Gansu, Nei Mongol, Ningxia Hui, Qinghai, Xinjiang Uygur and Tibet, and the cities of Beijing, Shanghai and Tianjin each had less than 10,000 individuals infected with hookworm. Sichuan province had the largest predicted number of hookworm infections (14.3 million).
The predicted combined soiltransmitted helminth prevalence ranged from 0.70% (Tianjin) to 40.8% (Hainan province). The number of individuals infected with soiltransmitted helminths ranged from 0.07 million (Tianjin) to 29.0 million (Sichuan province). Overall, slightly more than one out of ten people in P.R. China is infected with soiltransmitted helminths, corresponding to more than 140 million infections in the year 2010.
Discussion
To our knowledge, we present the first modelbased, nationwide predictive infection risk maps of soiltransmitted helminths for P.R. China. Previous epidemiological studies [7] were mainly descriptive, reporting prevalence estimates at specific locations or visualized at province level using interpolated risk surface maps. We carried out an extensive literature search and collected published georeferenced soiltransmitted helminth prevalence data across P.R. China, alongside the ones from the second national survey that had been completed in 2004. Bayesian geostatistical models were utilised to identify climatic/environmental and socioeconomic factors that were significantly associated with infection risk, and hence, the number of infected individuals could be calculated at high spatial resolution. We derived speciesspecific risk maps. Additionally, we produced a risk map with any soiltransmitted helminth infection, which is particularly important for the control of soiltransmitted helminthiasis, as the same drugs (mainly albendazole and mebendazole) are used against all three species [31, 32].
Model validation suggested good predictive ability of our final models. In particular, 84.2%, 81.5% and 79.3% of survey locations were correctly predicted within a 95% BCI for T. trichiura, hookworm and A. lumbricoides, respectively. The combined soiltransmitted helminth prevalence (11.4%) is supported by the current surveillance data reported to China CDC that shows infection rates in many areas of P.R. China around 10%. We found that all ME were above zero, hence the predictive prevalence slightly underestimated the true prevalence of each of the three soiltransmitted helminth species. The combined soiltransmitted helminth prevalence estimates assume that the infection of each species is independent of each other. However, previous research reported significant associations, particularly between A. lumbricoides and T. trichiura[33, 34]. Hence, our assumption may overestimate the true prevalence of soiltransmitted helminths. Unfortunately we do not have coinfection data from P.R. China, and thus we are unable to calculate a correction factor.
Our results indicate that several environmental and climatic predictors are significantly associated with soiltransmitted helminth infections. For example, LST at night was significantly associated with T. trichiura and hookworm, suggesting that temperature is an important driver of transmission. Similar results have been reported by other researchers [2, 35]. Our results suggest that the risk of infection with any of the soiltansmitted helminth species is higher in equatorial or warm zones, compared to the arid and snow/polar zones. This is consistent with earlier findings that extremely arid environments limit the transmission of soiltransmitted helminths [2], while equatorial or warm zones provide temperatures and soil moisture that are particularly suitable for larval development [35]. However, we found a positive association between elevation and T. trichiura infection risk, which contradicts earlier reports [36, 37]. The reason may be the altitude effect, i.e. the negative correlation between altitude and economy in P.R. China [38]. The low socioeconomic development in high altitude or mountainous areas might result in limited access to healthcare services [39, 40].
On the other hand, it is reported that socioeconomic factors are closely related with the behaviour of people, which in turn impacts the transmission of soiltransmitted helminths [41]. Indeed, wealth, inadequate sewage discharge, drinking of unsafe water, lack of sanitary infrastructure, personal hygiene habits, recent travel history, low education and demographic factors are strongly associated with soiltransmitted helminth infections [42–46]. Our results show that GDP per capita has a negative effect on A. lumbricoides infection risk. Other socioeconomic proxies such as sanitation level, number of hospital beds and percentage of people with access to tap water might be more readily able to explain the spatial distribution of infection risk.
Modelbased estimates adjusted for population density indicate that the highest prevalence of A. lumbricoides occurred in Guizhou province. T. trichiura and hookworm were most prevalent in Hainan province. Although the overall soiltransmitted helminth infection risk decreased over the past several years, Hainan province had the highest risk in 2010, followed by Guizhou and Sichuan provinces. These results are consistent with the reported data of the second national survey on important parasitic diseases [7], and hence more effective control strategies are needed in these provinces.
The targets set out by the Chinese Ministry of Health in the “National Control Program on Important Parasitic Diseases from 2006 to 2015” are to reduce the prevalence of soiltransmitted helminth infections by 40% until 2010 and up to 70% until 2015 [8]. The government aims to reach these targets by a series of control strategies, including anthelminthic treatment, improvement of sanitation, and better information, education and communication (IEC) campaigns [47]. Preventive chemotherapy is recommended for populations older than 3 years in areas where the prevalence of soiltransmitted helminth infection exceeds 50%, while targeted drug treatment is recommended for children and rural population in areas where infection prevalences range between 10% and 50% [48]. Our models indicate that the first step of the target, i.e. reduction of prevalence by 40% until 2010, has been achieved. Indeed, the prevalence of T. trichiura, hookworm and A. lumbricoides dropped from 4.6%, 6.1% and 12.7% in the second national survey between 2001 and 2004 [7] to 1.8%, 3.7% and 6.8% in 2010, which corresponds to respective reductions of 60.9%, 39.3% and 46.5%. The combined soiltransmitted helminth prevalence dropped from 19.6% to 11.4% in 2010, a reduction of 41.8%. These results also suggest that, compared to T. trichiura and A. lumbricoides, more effective strategies need to be tailored for hookworm infections.
The data of our study stem largely from communitybased surveys. However, the information extracted from the literature is not disaggregated by age, and hence we were not able to obtain ageadjusted predictive risk maps. In addition, more than 96% of observed surveys used the KatoKatz technique [49, 50]. We assumed that the diagnostic sensitivity was similar across survey locations. However, the sensitivity depends on the intensity of infection, and hence varies in space [51]. The above data limitations are known in geostatistical metaanalyses of historical data [27] and we are currently developing methods to address them.
Conclusion
The work presented here is the first major effort to present modelbased estimates of the geographical distribution of soiltransmitted helminth infection risk across P.R. China, and to identify the associated climatic, environmental and socioeconomic risk factors. Our prediction maps provide useful information for identifying priority areas where interventions targeting soiltransmitted helminthiasis are most urgently required. In a next step, we plan to further develop our models to address data characteristics and improve modelbased predictions.
Abbreviations
 BCI:

Bayesian credible interval
 BSAT:

Base saturation as percentage of ECEsoil
 BULK:

Bulk density
 CFRAG:

Percentage of coarse fragments
 China CDC:

Chinese center for diseases control and prevention
 CLPC:

Percentage of clay
 CNKI:

China national knowledge internet
 DRAIN:

FAO soil drainage class
 GDP:

Gross domestic product
 GMRF:

Gaussian Markov random field
 GNTD database:

Global neglected tropical diseases database
 GYPS:

Gypsum content
 HII:

Human influence index
 IEC:

Information, education, and communication
 INLA:

Integrated nested Laplace approximations
 IPD:

National Institute of Parasitic Diseases
 LST:

Land surface temperature
 MCMC:

Markov chain Monte Carlo
 MODIS:

Moderate Resolution Imaging Spectroradiometer
 NDVI:

Normalized difference vegetation index
 P.R. China:

People’s Republic of China
 peNMIG:

Normal mixture of inverse Gammas with parameter expansion
 PHAQ:

pH measured in water
 PSCL:

FAO texture class
 SPDE:

Stochastic partial differential equations
 TAWC:

Available water capacity
 TOTC:

Organic carbon content
 TOTN:

Total nitrogen
 SDTO:

Percentage of sand
 STPC:

Percentage of silt.
References
 1.
Bethony J, Brooker S, Albonico M, Geiger SM, Loukas A, Diemert D, Hotez PJ: Soiltransmitted helminth infections: ascariasis, trichuriasis, and hookworm. Lancet. 2006, 367: 15211532. 10.1016/S01406736(06)686534.
 2.
Pullan RL, Brooker SJ: The global limits and population at risk of soiltransmitted helminth infections in 2010. Parasit Vectors. 2012, 5: 8110.1186/17563305581.
 3.
de Silva NR, Brooker S, Hotez PJ, Montresor A, Engels D, Savioli L: Soiltransmitted helminth infections: updating the global picture. Trends Parasitol. 2003, 19: 547551. 10.1016/j.pt.2003.10.002.
 4.
Hotez PJ, Bundy DAP, Beegle K, Brooker S, Drake L, de Silva N, Montresor A, Engels D, Jukes M, Chitsulo L: Helminth infections: soiltransmitted helminth infections and schistosomiasis. Disease Control Priorities in Developing Countries. Edited by: Jamison DT, Breman JG, Measham AR, Alleyne G, Claeson M, Evans DB, Jha P, Mills A, Musgrove P. 2006, Washington, DC: World Bank, 467482. 2
 5.
Murray CJL, Vos T, Lozano R, Naghavi M, Flaxman AD, Michaud C, Ezzati M, Shibuya K, Salomon JA, Abdalla S: Disabilityadjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 19902010: a systematic analysis for the global burden of disease study 2010. Lancet. 2012, 380: 21972223. 10.1016/S01406736(12)616894.
 6.
Xu LQ, Yu SH, Jin ZX, Yang JL, Lai CQ, Zhang XJ, Zheng CQ: Soiltransmitted helminthiases  nationwide survey in China. Bull World Health Organ. 1995, 73: 507513.
 7.
Coordinating Office of the National Survey on the Important Human Parasitic Diseases: A national survey on current status of the important parasitic diseases in human population. Chin J Parasitol Parasit Dis. 2005, 23: 332340. (in Chinese)
 8.
Zheng Q, Chen Y, Zhang HB, Chen JX, Zhou XN: The control of hookworm infection in China. Parasit Vectors. 2009, 2: 4410.1186/17563305244.
 9.
Li T, He SY, Zhao H, Zhao GH, Zhu XQ: Major trends in human parasitic diseases in China. Trends Parasitol. 2010, 26: 264270. 10.1016/j.pt.2010.02.007.
 10.
Wang XB, Zhang LX, Luo RF, Wang GF, Chen YD, Medina A, Eggleston K, Rozelle S, Smith DS: Soiltransmitted helminth infections and correlated risk factors in preschool and schoolaged children in rural southwest China. PLoS One. 2012, 7: e4593910.1371/journal.pone.0045939.
 11.
Zhou XN, Bergquist R, Tanner M: Elimination of tropical disease through surveillance and response. Infect Dis Poverty. 2013, 2: 110.1186/2049995721.
 12.
Raso G, Vounatsou P, Gosoniu L, Tanner M, N’Goran EK, Utzinger J: Risk factors and spatial patterns of hookworm infection among schoolchildren in a rural area of western Côte d’Ivoire. Int J Parasitol. 2006, 36: 201210. 10.1016/j.ijpara.2005.09.003.
 13.
Pullan RL, Gething PW, Smith JL, Mwandawiro CS, Sturrock HJW, Gitonga CW, Hay SI, Brooker S: Spatial modelling of soiltransmitted helminth infections in Kenya: a disease control planning tool. PLoS Negl Trop Dis. 2011, 5: e95810.1371/journal.pntd.0000958.
 14.
Pullan RL, Bethony JM, Geiger SM, Cundill B, CorreaOliveira R, Quinnell RJ, Brooker S: Human helminth coinfection: analysis of spatial patterns and risk factors in a Brazilian community. PLoS Negl Trop Dis. 2008, 2: e35210.1371/journal.pntd.0000352.
 15.
Chammartin F, Scholte RGC, Guimarães LH, Tanner M, Utzinger J, Vounatsou P: Soiltransmitted helminth infection in South America: a systematic review and geostatistical metaanalysis. Lancet Infect Dis. 2013, 13: 507518. 10.1016/S14733099(13)700719.
 16.
Scholte RGC, Schur N, Bavia ME, Carvalho EM, Chammartin F, Utzinger J, Vounatsou P: Spatial analysis and risk mapping of soiltransmitted helminth infections in Brazil, using Bayesian geostatiscal models. Geosopat Health. 2013, 8: 97110.
 17.
Gelfand AE, Hills SE, Racinepoon A, Smith AFM: Illustration of Bayesianinference in normal data models using Gibbs sampling. J Am Statist Assoc. 1990, 85: 972985. 10.1080/01621459.1990.10474968.
 18.
Diggle PJ, Tawn JA, Moyeed RA: Modelbased geostatistics. J R Stat Soc Ser C: Appl Stat. 1998, 47: 299326.
 19.
Rue H, Martino S, Chopin N: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B: Stat Methodol. 2009, 71: 319392. 10.1111/j.14679868.2008.00700.x.
 20.
Cameletti M, Lindgren F, Simpson D, Rue H: Spatiotemporal modeling of particulate matter concentration through the SPDE approach. Adv Stat Anal. 2013, 97: 109131. 10.1007/s1018201201963.
 21.
Hürlimann E, Schur N, Boutsika K, Stensgaard AS, Laserna de Himpsl M, Ziegelbauer K, Laizer N, Camenzind L, Di Pasquale A, Ekpo UF: Toward an openaccess global database for mapping, control, and surveillance of neglected tropical diseases. PLoS Negl Trop Dis. 2011, 5: e140410.1371/journal.pntd.0001404.
 22.
Sanderson EW, Jaiteh M, Levy MA, Redford KH, Wannebo AV, Woolmer G: The human footprint and the last of the wild. Bioscience. 2002, 52: 891904. 10.1641/00063568(2002)052[0891:THFATL]2.0.CO;2.
 23.
Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carre G, Marquez JRG, Gruber B, Lafourcade B, Leitao PJ: Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography. 2013, 36: 2746. 10.1111/j.16000587.2012.07348.x.
 24.
KaragiannisVoules DA, Scholte RGC, Guimarães LH, Utzinger J, Vounatsou P: Bayesian geostatistical modeling of leishmaniasis incidence in Brazil. PLoS Negl Trop Dis. 2013, 7: e221310.1371/journal.pntd.0002213.
 25.
Lindgren F, Rue H, Lindstrom J: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc Ser B: Stat Methodol. 2011, 73: 423498. 10.1111/j.14679868.2011.00777.x.
 26.
Scheipl F, Fahrmeir L, Kneib T: Spikeandslab priors for function selection in structured additive regression models. J Am Statist. 2012, 107: 15181532. 10.1080/01621459.2012.737742.
 27.
Chammartin F, Hürlimann E, Raso J, N’Goran EK, Utzinger J, Vounatsou P: Statistical methodological issues in mapping historical schistosomiasis survey data. Acta Trop. 2013, 128: 345352. 10.1016/j.actatropica.2013.04.012.
 28.
Lunn D, Spiegelhalter D, Thomas A, Best N: The BUGS project: evolution, critique and future directions. Stat Med. 2009, 28: 30493067. 10.1002/sim.3680.
 29.
Gelman A, Rubin DB: Inference from iterative simulation using multiple sequences. Stat Sci. 1992, 7: 457511. 10.1214/ss/1177011136.
 30.
Plummer M, Best N, Cowles K, Vines K: CODA: convergence diagnosis and output analysis for MCMC. R News. 2006, 6: 711.
 31.
WHO: Prevention and control of schistosomiasis and soiltransmitted helminthiasis: report of a WHO expert committee. WHO Tech Rep Ser. 2002, 912: 157.
 32.
Keiser J, Utzinger J: Efficacy of current drugs against soiltransmitted helminth infections: systematic review and metaanalysis. JAMA. 2008, 299: 19371948.
 33.
Booth M, Bundy DAP: Comparative prevalences of Ascaris lumbricoides, Trichuris trichiura and hookworm infections and the prospects for combined control. Parasitology. 1992, 105: 151157. 10.1017/S0031182000073807.
 34.
Tchuem Tchuenté LA, Behnke JM, Gilbert FS, Southgate VR, Vercruysse J: Polyparasitism with Schistosoma haematobium and soiltransmitted helminth infections among school children in Loum, Cameroon. Trop Med Int Health. 2003, 8: 975986. 10.1046/j.13602276.2003.01120.x.
 35.
Tchuem Tchuenté LA: Control of soiltransmitted helminths in subSaharan Africa: diagnosis, drug efficacy concerns and challenges. Acta Trop. 2011, 120: S4S11.
 36.
Flores A, Esteban JG, Angles R, MasComa S: Soiltransmitted helminth infections at very high altitude in Bolivia. Trans R Soc Trop Med Hyg. 2001, 95: 272277. 10.1016/S00359203(01)902329.
 37.
Gunawardena K, Kumarendran B, Ebenezer R, Gunasingha MS, Pathmeswaran A, de Silva N: Soiltransmitted helminth infections among plantation sector schoolchildren in Sri Lanka: prevalence after ten years of preventive chemotherapy. PLoS Negl Trop Dis. 2011, 5: e134110.1371/journal.pntd.0001341.
 38.
Zhai S, Sun A: On the relationship between altitude and economy–the inspiration of altitude effects to the economic development of the QinghaiTibet plateau region. Nationalities Res Qinghai. 2012, 23: 152159. (in Chinese)
 39.
Schratz A, Pineda MF, Reforma LG, Fox NM, Le AT, Tommaso CavalliSforza L, Henderson MK, Mendoza R, Utzinger J, Ehrenberg JP: Neglected diseases and ethnic minorities in the Western Pacific Region: exploring the links. Adv Parasitol. 2010, 72: 79107.
 40.
Yap P, Du ZW, Wu FW, Jiang JY, Chen R, Zhou XN, Hattendorf J, Utzinger J, Steinmann P: Rapid reinfection with soiltransmitted helminths after tripledose albendazole treatment of schoolaged children in Yunnan, People’s Republic of China. Am J Trop Med Hyg. 2013, 89: 2331. 10.4269/ajtmh.130009.
 41.
Brooker S, Clements ACA, Bundy DAP: Global epidemiology, ecology and control of soiltransmitted helminth infections. Adv Parasitol. 2006, 62: 221261.
 42.
Norhayati M, Oothuman P, Fatmah MS: Some risk factors of Ascaris and Trichuris infection in Malaysian aborigine (Orang Asli) children. Med J Malaysia. 1998, 53: 401407.
 43.
Hohmann H, Panzer S, Phimpachan C, Southivong C, Schelp FP: Relationship of intestinal parasites to the environment and to behavioral factors in children in the Bolikhamxay province of Lao PDR. Southeast Asian J Trop Med Public Health. 2001, 32: 413.
 44.
Escobedo AA, Canete R, Nunez FA: Prevalence, risk factors and clinical features associated with intestinal parasitic infections in children from San Juan y Martínez, Pinar del Río, Cuba. West Indian Med J. 2008, 57: 377382.
 45.
Knopp S, Mohammed KA, Stothard JR, Khamis IS, Rollinson D, Marti H, Utzinger J: Patterns and risk factors of helminthiasis and anemia in a rural and a periurban community in Zanzibar, in the context of helminth control programs. PLoS Negl Trop Dis. 2010, 4: e68110.1371/journal.pntd.0000681.
 46.
Pinheiro ID, de Castro MF, Mitterofhe A, Pires FAC, Abramo C, Ribeiro LC, Tibirica SHC, Coimbra ES: Prevalence and risk factors for giardiasis and soiltransmitted helminthiasis in three municipalities of Southeastern Minas Gerais State, Brazil: risk factors for giardiasis and soiltransmitted helminthiasis. Parasitol Res. 2011, 108: 11231130. 10.1007/s004360102154x.
 47.
Bergquist R, Whittaker M: Control of neglected tropical diseases in Asia Pacific: implications for health information priorities. Infect Dis Poverty. 2012, 1: 310.1186/2049995713.
 48.
Ministry of Health: Notice of the Ministry of Public Health concerning publishing “National Control Program on Important Parasitic Diseases in 20062015”. Gazette of the Ministry of Health of People’s Republic of Chin. 2006, 33: 4144. (in Chinese)
 49.
Katz N, Chaves A, Pellegrino J: A simple device for quantitative stool thicksmear technique in schistosomiasis mansoni. Rev Inst Med Trop S ã o Paulo. 1972, 14: 397400.
 50.
Speich B, Knopp S, Mohammed KA, Khamis IS, Rinaldi L, Cringoli G, Rollinson D, Utzinger J: Comparative cost assessment of the KatoKatz and FLOTAC techniques for soiltransmitted helminth diagnosis in epidemiological surveys. Parasit Vectors. 2010, 3: 7110.1186/17563305371.
 51.
Booth M, Vounatsou P, N’Goran EK, Tanner M, Utzinger J: The influence of sampling effort and the performance of the KatoKatz technique in diagnosing Schistosoma mansoni and hookworm coinfections in rural Côte d’Ivoire. Parasitology. 2003, 127: 525531. 10.1017/S0031182003004128.
Acknowledgements
We thank two anonymous referees for a series of useful comments and suggestions. This study received financial support from the China Scholarship Council (CSC) to YSL, the UBS Optimus Foundation (project no. 5879) and the Swiss National Science Foundation (PDFMP3_137156).
Author information
Additional information
Competing interests
The authors have declared that no competing interests exist.
Authors’ contributions
YSL and PV analyzed the data. YSL, JU and PV wrote the paper. PV, JU and XNZ conceptualized the project. XNZ provided data. YSL did the literature review and processed the data. PV, JU and XNZ provided important intellectual content. All authors read and approved the originally submitted and the revised manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Soiltransmitted helminths
 Ascaris lumbricoides
 Trichuris trichiura
 Hookworm
 Bayesian geostatistics
 People’s Republic of China
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Please note that comments may be removed without notice if they are flagged by another user or do not comply with our community guidelines.