Skip to main content

Bayesian geostatistical modelling of soil-transmitted helminth survey data in the People’s Republic of China



Soil-transmitted helminth infections affect tens of millions of individuals in the People’s Republic of China (P.R. China). There is a need for high-resolution estimates of at-risk areas and number of people infected to enhance spatial targeting of control interventions. However, such information is not yet available for P.R. China.


A geo-referenced database compiling surveys pertaining to soil-transmitted helminthiasis, carried out from 2000 onwards in P.R. China, was established. Bayesian geostatistical models relating the observed survey data with potential climatic, environmental and socioeconomic predictors were developed and used to predict at-risk areas at high spatial resolution. Predictors were extracted from remote sensing and other readily accessible open-source databases. Advanced Bayesian variable selection methods were employed to develop a parsimonious model.


Our results indicate that the prevalence of soil-transmitted helminth infections in P.R. China considerably decreased from 2005 onwards. Yet, some 144 million people were estimated to be infected in 2010. High prevalence (>20%) of the roundworm Ascaris lumbricoides infection was predicted for large areas of Guizhou province, the southern part of Hubei and Sichuan provinces, while the northern part and the south-eastern coastal-line areas of P.R. China had low prevalence (<5%). High infection prevalence (>20%) with hookworm was found in Hainan, the eastern part of Sichuan and the southern part of Yunnan provinces. High infection prevalence (>20%) with the whipworm Trichuris trichiura was found in a few small areas of south P.R. China. Very low prevalence (<0.1%) of hookworm and whipworm infections were predicted for the northern parts of P.R. China.


We present the first model-based estimates for soil-transmitted helminth infections throughout P.R. China at high spatial resolution. Our prediction maps provide useful information for the spatial targeting of soil-transmitted helminthiasis control interventions and for long-term monitoring and surveillance in the frame of enhanced efforts to control and eliminate the public health burden of these parasitic worm infections.


Soil-transmitted helminths are a group of parasitic nematode worms causing human infection through contact with parasite eggs (Ascaris lumbricoides and Trichuris trichiura) or larvae (hookworm) that thrive in the warm and moist soil of the world’s tropical and subtropical countries [1]. More than 5 billion people are at risk of soil-transmitted helminthiasis [2]. Estimates published in 2003 suggest that 1,221 million people were infected with A. lumbricoides, 795 million with T. trichiura and 740 million with hookworms [3]. The greatest number of soil-transmitted helminth infections at that time occurred in the Americas, the People’s Republic of China (P.R. China), East Asia and sub-Saharan Africa [4]. Socioeconomic development and large-scale control efforts have lowered the number of people infected with soil-transmitted helminths in many parts of the world [1]. For the year 2010, the global burden due to soil-transmitted helminthiasis has been estimated at 5.2 million disability-adjusted life years [5].

In P.R. China, there have been two national surveys for parasitic diseases, including soil-transmitted helminthiasis. Both surveys used the Kato-Katz technique as the diagnostic approach, based on a single Kato-Katz thick smear obtained from one stool sample per individual. The first national survey was conducted from 1988 to 1992 and the second in 2001-2004. In the first survey, there were a total of 2,848 study sites with approximately 500 people examined per site. The survey indicated overall prevalences of 47.0%, 18.8% and 17.2% for A. lumbricoides, T. trichiura and hookworm infections, respectively, corresponding to 531 million, 212 million and 194 million infected people, respectively [6]. The second survey involved 687 study sites and there were 356,629 individuals examined overall. Analyses of the data revealed considerably lower prevalences for soil-transmitted helminth infections than in the first survey; A. lumbricoides, hookworm and T. trichiura prevalences were 12.7%, 6.1% and 4.6%, respectively [7]. However, interventions were less likely to reach marginalized communities in the poorest areas [8] and the diseases re-emerged whenever control measures were discontinued [9, 10]. To overcome the challenge of parasite infections in P.R. China, in 2005, the Chinese Ministry of Health issued the “National Control Program on Important Parasitic Diseases from 2006 to 2015” with its target to reduce the prevalence of helminth infections by 70% by the year 2015 [8]. The key strategy for control was large-scale administration of anthelminthic drugs in high prevalence areas, especially targeting school-aged children and people living in rural areas [9, 11].

Maps depicting the geographical distribution of the disease risk can aid control programmes to deliver cost-effective interventions and assist in monitoring and evaluation. The Coordinating Office of the National Survey on the Important Human Parasitic Diseases in P.R. China [7] obtained prevalence maps by averaging the data of the second national survey within each province. To our knowledge, high-resolution, model-based maps using available national survey data are not available to date in P.R. China. Model-based geostatistics predict the disease prevalence at places without observed data by quantifying the relation between the disease risk at observed locations with potential predictors such as socioeconomic, environmental, climatic and ecological information, the latter often obtained via remote sensing. Model-based geostatistics have been used before to map and predict the geographical distribution of soil-transmitted helminth infections in Africa [12, 13], Asia and Latin America [1416]. Model-based geostatistics typically employ regression analysis with random effects at the locations of the observed data. The random effects are assumed to be latent observations from a zero-mean Gaussian process, which models spatial correlation to the data via a spatially structured covariance. Bayesian formulations enable model fit via Markov chain Monte Carlo (MCMC) simulation algorithms [17, 18] or other computational algorithms (e.g. integrated nested Laplace approximations (INLA) [19]). INLA is a computational approach for Bayesian inference and is an alternative to MCMC to overcome computational burden for obtaining the approximated posterior marginal distribution for the latent variables, as well as for the hyperparameters [20].

In this study, we aimed to: (i) identify the most important climatic, environmental and socioeconomic determinants of soil-transmitted helminth infections; and (ii) develop model-based Bayesian geostatistics to assess the geographical distribution and number of people infected with soil-transmitted helminths in P.R. China.


Ethical considerations

The work presented here is based on soil-transmitted helminth survey data derived from the second national survey and additional studies identified through an extensive review of the literature. All data in our study was extracted from published sources and they are aggregated over villages, towns or counties; therefore, do not contain information that is identifiable at individual or household level. Hence, there are no specific ethical considerations.

Disease data

Geo-referenced data on soil-transmitted helminth infections from the second national survey conducted in P.R. China from 2001 to 2004 were provided by the National Institute of Parasitic Diseases, Chinese Center for Diseases Control and Prevention (IPD, China CDC; Shanghai, P.R. China). Moreover, an extensive literature search was undertaken in PubMed and China National Knowledge Internet (CNKI) from January 1, 2000 until April 25, 2013 to identify studies reporting village, town and county-level prevalence data of soil-transmitted helminth infections in P.R. China. Data were excluded if (i) they were from hospital surveys, post-intervention surveys, drug efficacy studies and clinical trials; (ii) reports on disease infection among travellers, military personnel, expatriates, mobile populations and other displaced or migrating populations; (iii) the geographical coordinates could not be identified; and (iv) the diagnostic technique was not reported [21]. Data were entered into the Global Neglected Tropical Diseases (GNTD) database, which is a geo-referenced, open-access source [21]. Geographical coordinates for the survey locations were obtained via Google maps, a free web mapping service application and technology system. As we focus on recent data pertaining to soil-transmitted helminth infections in P.R. China, we only considered surveys carried out from 2000 onwards.

Climatic, demographic and environmental data

Climatic, demographic and environmental data were downloaded from different readily accessible remote sensing data sources, as shown in Table 1. Land surface temperature (LST) and normalized difference vegetation index (NDVI) were calculated to annual averages and land cover data was summarised to the most frequent category over the period of 2001-2004. Moreover, land cover data were re-grouped into six categories based on between-class similarities: (i) forest; (ii) shrubland and savanna; (iii) grassland; (iv) cropland; (v) urban; and (vi) wet areas. Monthly precipitation values were averaged to obtain a long-term average for the period 1950-2000. Four climatic zones were considered: (i) equatorial; (ii) arid; (iii) warm; and (iv) snow/polar. The following 13 soil types, which may be related to the viability of parasites or microorganisms living in the soil, were used: (i) percentage of coarse fragments (CFRAG, % >2 mm); (ii) percentage of sand (SDTO, mass %); (iii) percentage of silt (STPC, mass %); (iv) percentage of clay (CLPC, mass %); (v) bulk density (BULK, km/dm3); (vi) available water capacity (TAWC, cm/m); (vii) base saturation as percentage of ECEsoil (BSAT); (viii) pH measured in water (PHAQ); (ix) gypsum content (GYPS, g/kg); (x) organic carbon content (TOTC, g/kg); (xi) total nitrogen (TOTN, g/kg); (xii) FAO texture class (PSCL); and (xiii) FAO soil drainage class (DRAIN). Human influence index (HII) was included in the analysis to capture direct human influence on ecosystems [22]. Urban/rural extent was considered as a binary indicator. Gross domestic product (GDP) per capita was used as a proxy of people’s socioeconomic status. We obtained GDP per capita for each county from the P.R. China Yearbook full-text database in 2008.

Table 1 Remote sensing data sources a

Moderate Resolution Imaging Spectroradiometer (MODIS) Reprojection Tool version 4.1 (EROS; Sioux Falls, USA) was applied to process MODIS/Terra data. All remotely sensed data were aligned over a prediction grid of 5 × 5 km spatial resolution using Visual Fortran version 6.0 (Digital Equipment Corporation; Maynard, USA). Data at the survey locations were also extracted in Visual Fortran. As the outcome of interest (i.e. infection prevalence with a specific soil-transmitted helminth species) is not available at the resolution of the covariates for surveys aggregated over counties, we linked the centroid of those counties with the average value of each covariate within the counties. Distances to the nearest water bodies were calculated using ArcGIS version 9.3 (ERSI; Redlands, USA). For county-level surveys, the distances of all the 5 × 5 km pixel centroids to their nearest water bodies within the county were extracted and averaged. The arithmetic mean was used as a summary measure of continuous data, while the most frequent category was used to summarise categorical variables.

Statistical analysis

The survey year was grouped into two categories: before 2005 and from 2005 onwards. Land cover, climatic zones, soil texture and soil drainage were included into the model as categorical covariates. Continuous variables were standardised to mean 0 and standard deviation 1 using the command “std()” in Stata version 10 (Stata Corp. LP; College Station, USA). Pearson’s correlation was calculated between continuous variables. One of the two variables, which had correlation coefficient greater than 0.8, was dropped to avoid collinearity [23]. Preliminary analysis indicated that for this dataset, three categories were sufficient to encapsulate for non-linearity of continuous variables, therefore we constructed 3-level categorical variables based on their distribution. Subsequent variable selection incorporated within the geostatistical model selected the most probable functional form (linear vs. categorical). Bivariate and multivariate logistic regressions were carried out in Stata version 10.

Bayesian geostatistical logistic regression models with location-specific random effects were fitted to obtain spatially explicit soil-transmitted helminth infection estimates. Let Y i , n i and p i be the number of positive individuals, the number of those examined and the probability of infection at location i (i = 1, 2,…, L), respectively. We assume that Y i arises from a binominal distribution Y i ~ Bn(p i ,n i ), where logit p i = β 0 + k = 1 β k × X i k + ϵ i + ϕ i . β k is the regression coefficient of the kth covariate X i k , ϵ i is a location-specific random effect and ϕ i is an exchangeable non-spatial random effect. To estimate the parameters, we formulate our model in a Bayesian framework. We assumed ϵ = (ϵ1,…,ϵ L ) followed a zero-mean multivariate normal distribution, ϵ~ MVN(0,Σ), where Matérn covariance function Σ ij = σ sp 2 κ d ij υ K υ κ d ij / Γ υ 2 υ - 1 . d ij is the Euclidean distance between locations i and j. κ is a scaling parameter, υ is a smoothing parameter fixed to 1 and K υ denotes the modified Bessel function of second kind and order υ. The spatial range ρ = 8 / κ , is the distance at which spatial correlation becomes negligible (<0.1) [24]. We assumed that ϕ i follows a zero-mean normal distribution ϕ i ~ N 0 , σ nonsp 2 . A normal prior distribution was assigned to the regression coefficients, that is β0, β k N(0, 1000) and loggamma priors were adopted for the precision parameters, τ sp = 1 / σ sp 2 and τ nonsp = 1 / σ nonsp 2 on the log scale, that is log(τ sp )  log gamma(1, 0.00005) and log(τ nonsp )  log gamma(1, 0.00005). Furthermore, we assumed the following prior distribution for range parameter log(ρ) ~ log gamma(1,0.01).

The most widely used computational approach for Bayesian geostatistical model fit is MCMC simulation. However, large spatial covariance matrix calculations can increase computational time and possibly introduce numerical errors. Hence, we fitted the geostatistical model using the stochastic partial differential equations (SPDE)/INLA [19, 25] approach, readily implemented in the INLA R-package (available at: Briefly, the spatial process assuming a Matérn covariance matrix Σ can be represented as a Gaussian Markov random field (GMRF) with mean zero and a symmetric positive definite precision matrix Q (defined as the inverse of Σ) [20]. The SPDE approach constructs a GMRF representation of the Matérn field on a triangulation (a set of non-intersecting triangles where any two triangles meet in at most a common edge or corner) partitioning the domain of the study region [25]. Subsequently, the INLA algorithm is used to estimate the posterior marginal (or joint) distribution of the latent Gaussian process and hyperparameters by Laplace approximation [19].

Bayesian variable selection, using normal mixture of inverse Gammas with parameter expansion (peNMIG) spike-and-slab priors [26] was applied on the model with independent random effect for each location to identify the best set of predictors (i.e. climatic, environmental and socioeconomic). In particular, we assumed a normal distribution for the regression coefficients with a hyperparameter for the variance σ B 2 to be a mixture of inverse Gamma distributions, that is β k  ~ N(0,σ B 2) where σ B 2 ~ I k IG(a σ , b σ ) + (1 - I k )υ0IG(a σ , b σ ) and a σ b σ are fixed parameters. υ0 is some small positive constant [27] and the indicator I k has a Bernoulli prior distribution I k  ~ bern(π k ), where π k ~ beta(a π ,b π ). We set (a σ ,b σ ) = (5,25) (a π ,bπ) = (1,1) and υ0 = 0.00025. The above prior of mixed inverse Gamma distributions is called a mixed spike and slab prior for β k as one component of the mixture υ0IG(a σ ,b σ ) (when I k  = 0) is a narrow spike around zero that strongly shrinks β k to zero, while the other component IG(a σ ,b σ ) (when I k  = 1) is a wide slab that moves β k away from zero. The posterior distribution of I k determines which component of the mixture is predominant contributing to the inclusion or exclusion of β k . For categorical variables, we applied a peNMIG prior developed by Scheipl et al. [26], which allows to include or exclude blocks of coefficients by improving “shrinkage” properties. Let β kh be the regression coefficient for the hth category of the kth predictor, then β kh  = a k ξ hk , where a k is assigned a NMIG prior described above and ξ hk ~ N(m hk ,1). Here m hk  = o hk -(1-o hk ) and o hk  ~ bern(0.5), allow to shrink |ξ hk | towards 1. Hence, a k models the overall contribution of the kth predictor and ξ hk estimates the effects of each element β kh of the predictor [27]. In addition, we introduced another indicator I d for selection of either a categorical or a linear form of a continuous variable. Let βkd 1 and βkd 2 indicate coefficients of the categorical and linear form of kth predictor, respectively, then β k  = I d βkd 1 + (1 - I d )βkd 2, where I d  ~ Be(0.5). MCMC simulation was employed to estimate the model parameters for variable selection in OpenBUGS version 3.0.2 (Imperial College and Medical Research Council; London, UK) [28]. Convergence was assessed by the Gelman and Rubin diagnostics [29], using the coda library in R [30]. In Bayesian variable selection, all models arising from any combination of covariates are fitted and the posterior probability for each model to be the true one is calculated. The predictors corresponding to the highest joint posterior probability of indicators (I1,I2,…I k ,…,I K ) were subsequently used as the best set of predictors to fit the final geostatistical model.

A 5 × 5 km grid was overlaid to the P.R. China map, resulting in 363,377 pixels. Predictions for each soil-transmitted helminth species were obtained via INLA at the centroids of the grid’s pixels. An overall soil-transmitted helminth prevalence was calculated assuming independence in the risk between any two species, that is, p S  = p A  + p T  + p h  - p A  × p T  - p A  × p h  - p T  × p h  + p A  × p T  × p h , where p S , p A , p T and p h indicate the predicted prevalence of overall soil-transmitted helminth, A. lumbricoides, T. trichiura and hookworm, respectively, for each pixel. The number of infected individuals at pixel level was estimated by multiplying the median of the corresponding posterior predictive distribution of the infection prevalence with the population density.

Model validation

Our model was fitted on a subset of the data, including approximately 80% of survey locations. Validation was performed on the remaining 20% by estimating the mean predictive error (ME) between the observed π i and predicted prevalence π ^ i at location i, where ME = 1 / N * i = 1 ( π i - π ^ i ) and N is the total number of test locations. In addition, we calculated Bayesian credible intervals (BCI) of various probability and the percentages of observations included in these intervals.


Data summaries

The final dataset included 1,187 surveys for hookworm infection carried out at 1,067 unique locations; 1,157 surveys for A. lumbricoides infection at 1,052 unique locations; and 1,138 surveys for T. trichiura infection at 1,028 unique locations. The overall prevalence was 9.8%, 6.6% and 4.1% for A. lumbricoides, hookworm and T. trichiura infection, respectively. Details about the number of surveys by location type, study year, diagnostic method and infection prevalence are shown in Table 2. The geographical distribution of locations and observed prevalence for each soil-transmitted helminth species are shown in Figure 1. Maps of the spatial distribution of environmental/climatic, soil types and socioeconomic covariates used in Bayesian variable selection are provided in Additional file 1: Figure S1.

Table 2 Overview of the number of soil-transmitted helminth surveys
Figure 1
figure 1

Survey locations and observed prevalence across P.R. China. The maps show the survey locations and observed prevalence for (A) A. lumbricoides, (B) T. trichiura and (C) hookworm.

Spatial statistical modelling and variable selections

The models with the highest posterior probabilities selected the following covariates: GDP per capita, elevation, NDVI, LST at day, LST at night, precipitation, pH measured in water, and climatic zones for T. trichiura; GDP per capita, elevation, NDVI, LST at day, LST at night, precipitation, bulk density, gypsum content, organic carbon content, climatic zone and land cover for hookworm; and GDP per capita, elevation, NDVI, LST at day and climatic zone for A. lumbricoides. The corresponding posterior probabilities of the respective models were 33.2%, 23.6% and 21.4% for T. trichiura, hookworm and A. lumbricoides, respectively.

The parameter estimates that arose from the Bayesian geostatistical logistic regression fit are shown in Tables 3, 4 and 5. The infection risk of all three soil-transmitted helminth species decreased considerably from 2005 onwards. We found significant positive association between NDVI and the prevalence of A. lumbricoides. A negative association was found between GDP per capita, arid or snow/polar climatic zones and the prevalence of A. lumbricoides. High precipitation and LST at night are favourable conditions for the presence of hookworm, while high NDVI, LST at day, urban or wet land covers and arid or snow/polar climatic zones are less favourable. Elevation, LST at night, NDVI larger than 0.45 and equatorial climatic zone were associated with a higher odds of T. trichiura infection, while LST at day, arid or snow climatic zones were associated with a lower odds of T. trichiura infection.

Table 3 Posterior summaries (median and 95% BCI) of the geostatistical model parameters for A. lumbricoides
Table 4 Posterior summaries (median and 95% BCI) of the geostatistical model parameters for T. trichiura
Table 5 Posterior summaries (median and 95% BCI) of the geostatistical model parameters for hookworm

Model validation results

Model validation indicated that the Bayesian geostatistical logistic regression models were able to correctly estimate within a 95% BCI 84.2%, 81.5% and 79.3% for T. trichiura, hookworm and A. lumbricoides, respectively. A plot of coverage for the full range of credible intervals is presented in Additional file 2: Figure S2. The MEs for hookworm, A. lumbricoides and T. trichiura were 0.56%, 1.7%, and 2.0% respectively, suggesting that our model may slightly under-estimate the risk of each of the soil-transmitted helminth species.

Predictive risk maps of soil-transmitted helminth infections

Figures 2, 3 and 4 present species-specific predictive risk maps of soil-transmitted helminth infections for the period 2005 onwards. High prevalence of A. lumbricoides (>20%) was predicted in large areas of Guizhou province and the southern part of Sichuan and Hubei provinces. Moderate to high prevalence (5-20%) were predicted for large areas of Hunan, Yunnan, Jiangxi, some southern areas of Gansu and Anhui provinces and Chongqing city. For the northern part of P.R. China and the south-eastern coastal-line areas, low prevalences were predicted (<5%). The high prediction uncertainty shown in Figure 2B is correlated with high prevalence areas. High infection prevalence (>20%) with T. trichiura was predicted for a few small areas of the southern part of P.R. China. Moderate-to-high prevalence (5-20%) was predicted for large areas of Hainan province. High hookworm infection prevalence (>20%) was predicted for Hainan, eastern parts of Sichuan and southern parts of Yunnan provinces. Low prevalence (0.1-5%) of T. trichiura and hookworm infections were predicted for most areas of the southern part of P.R. China, while close to zero prevalence areas were predicted for the northern part.

Figure 2
figure 2

The geographical distribution of A. lumbricoides infection risk in P.R. China. The maps show the situation from 2005 onwards based on the median and standard deviation of the posterior predictive distribution. Estimates of (A) infection prevalence, (B) prediction uncertainty and (C) number of infected individuals.

Figure 3
figure 3

The geographical distribution of T. trichiura infection risk in P.R. China. The maps show the situation from 2005 onwards based on the median and standard deviation of the posterior predictive distribution. Estimates of (A) infection prevalence, (B) prediction uncertainty and (C) number of infected individuals.

Figure 4
figure 4

The geographical distribution of hookworm infection risk in P.R. China. The maps show the situation from 2005 onwards based on the median and standard deviation of the posterior predictive distribution. Estimates of (A) infection prevalence, (B) prediction uncertainty and (C) number of infected individuals.

Estimates of number of people infected

Figure 5 shows the combined soil-transmitted helminth prevalence and the number of infected individuals from 2005 onwards. Table 6 summarises the population-adjusted predicted prevalence and the number of infected individuals, stratified by province. The overall population-adjusted predicted prevalence of A. lumbricoides, hookworm and T. trichiura infections were, respectively, 6.8%, 3.7% and 1.8%, corresponding to 85.4, 46.6 and 22.1 million infected individuals. The overall population-adjusted predicted prevalence for combined soil-transmitted helminth infections was 11.4%.

Figure 5
figure 5

The geographical distribution of soil-transmitted helminth infection risk in P.R. China. The maps show the situation from 2005 onwards based on the median and standard deviation of the posterior predictive distribution. Estimates of (A) infection prevalence, (B) prediction uncertainty and (C) number of infected individuals.

Table 6 Population-adjusted predicted prevalence (%) and number of individuals (×10 6 ) infected with soil-transmitted helminths, stratified by province

For A. lumbricoides, the predicted prevalence ranged from 0.32% (Shanghai) to 27.9% (Guizhou province). Shanghai had the smallest (0.05 million) and Sichuan province the largest number (14.8 million) of infected individuals. For T. trichiura, the predicted prevalence ranged from 0.01% (Tianjin) to 18.3% (Hainan province). The smallest number of infected individuals were found in Nei Mongol, Ningxia Hui, Qinghai provinces and Tianjin (<0.01 million) whereas the largest number, 3.7 million, was predicted for Sichuan province. For hookworm, Ningxia Hui and Qinghai province had the lowest predicted prevalence (<0.01%), while Hainan province had the highest (22.1%). The provinces of Gansu, Nei Mongol, Ningxia Hui, Qinghai, Xinjiang Uygur and Tibet, and the cities of Beijing, Shanghai and Tianjin each had less than 10,000 individuals infected with hookworm. Sichuan province had the largest predicted number of hookworm infections (14.3 million).

The predicted combined soil-transmitted helminth prevalence ranged from 0.70% (Tianjin) to 40.8% (Hainan province). The number of individuals infected with soil-transmitted helminths ranged from 0.07 million (Tianjin) to 29.0 million (Sichuan province). Overall, slightly more than one out of ten people in P.R. China is infected with soil-transmitted helminths, corresponding to more than 140 million infections in the year 2010.


To our knowledge, we present the first model-based, nation-wide predictive infection risk maps of soil-transmitted helminths for P.R. China. Previous epidemiological studies [7] were mainly descriptive, reporting prevalence estimates at specific locations or visualized at province level using interpolated risk surface maps. We carried out an extensive literature search and collected published georeferenced soil-transmitted helminth prevalence data across P.R. China, alongside the ones from the second national survey that had been completed in 2004. Bayesian geostatistical models were utilised to identify climatic/environmental and socioeconomic factors that were significantly associated with infection risk, and hence, the number of infected individuals could be calculated at high spatial resolution. We derived species-specific risk maps. Additionally, we produced a risk map with any soil-transmitted helminth infection, which is particularly important for the control of soil-transmitted helminthiasis, as the same drugs (mainly albendazole and mebendazole) are used against all three species [31, 32].

Model validation suggested good predictive ability of our final models. In particular, 84.2%, 81.5% and 79.3% of survey locations were correctly predicted within a 95% BCI for T. trichiura, hookworm and A. lumbricoides, respectively. The combined soil-transmitted helminth prevalence (11.4%) is supported by the current surveillance data reported to China CDC that shows infection rates in many areas of P.R. China around 10%. We found that all ME were above zero, hence the predictive prevalence slightly under-estimated the true prevalence of each of the three soil-transmitted helminth species. The combined soil-transmitted helminth prevalence estimates assume that the infection of each species is independent of each other. However, previous research reported significant associations, particularly between A. lumbricoides and T. trichiura[33, 34]. Hence, our assumption may over-estimate the true prevalence of soil-transmitted helminths. Unfortunately we do not have co-infection data from P.R. China, and thus we are unable to calculate a correction factor.

Our results indicate that several environmental and climatic predictors are significantly associated with soil-transmitted helminth infections. For example, LST at night was significantly associated with T. trichiura and hookworm, suggesting that temperature is an important driver of transmission. Similar results have been reported by other researchers [2, 35]. Our results suggest that the risk of infection with any of the soil-tansmitted helminth species is higher in equatorial or warm zones, compared to the arid and snow/polar zones. This is consistent with earlier findings that extremely arid environments limit the transmission of soil-transmitted helminths [2], while equatorial or warm zones provide temperatures and soil moisture that are particularly suitable for larval development [35]. However, we found a positive association between elevation and T. trichiura infection risk, which contradicts earlier reports [36, 37]. The reason may be the altitude effect, i.e. the negative correlation between altitude and economy in P.R. China [38]. The low socioeconomic development in high altitude or mountainous areas might result in limited access to healthcare services [39, 40].

On the other hand, it is reported that socioeconomic factors are closely related with the behaviour of people, which in turn impacts the transmission of soil-transmitted helminths [41]. Indeed, wealth, inadequate sewage discharge, drinking of unsafe water, lack of sanitary infrastructure, personal hygiene habits, recent travel history, low education and demographic factors are strongly associated with soil-transmitted helminth infections [4246]. Our results show that GDP per capita has a negative effect on A. lumbricoides infection risk. Other socioeconomic proxies such as sanitation level, number of hospital beds and percentage of people with access to tap water might be more readily able to explain the spatial distribution of infection risk.

Model-based estimates adjusted for population density indicate that the highest prevalence of A. lumbricoides occurred in Guizhou province. T. trichiura and hookworm were most prevalent in Hainan province. Although the overall soil-transmitted helminth infection risk decreased over the past several years, Hainan province had the highest risk in 2010, followed by Guizhou and Sichuan provinces. These results are consistent with the reported data of the second national survey on important parasitic diseases [7], and hence more effective control strategies are needed in these provinces.

The targets set out by the Chinese Ministry of Health in the “National Control Program on Important Parasitic Diseases from 2006 to 2015” are to reduce the prevalence of soil-transmitted helminth infections by 40% until 2010 and up to 70% until 2015 [8]. The government aims to reach these targets by a series of control strategies, including anthelminthic treatment, improvement of sanitation, and better information, education and communication (IEC) campaigns [47]. Preventive chemotherapy is recommended for populations older than 3 years in areas where the prevalence of soil-transmitted helminth infection exceeds 50%, while targeted drug treatment is recommended for children and rural population in areas where infection prevalences range between 10% and 50% [48]. Our models indicate that the first step of the target, i.e. reduction of prevalence by 40% until 2010, has been achieved. Indeed, the prevalence of T. trichiura, hookworm and A. lumbricoides dropped from 4.6%, 6.1% and 12.7% in the second national survey between 2001 and 2004 [7] to 1.8%, 3.7% and 6.8% in 2010, which corresponds to respective reductions of 60.9%, 39.3% and 46.5%. The combined soil-transmitted helminth prevalence dropped from 19.6% to 11.4% in 2010, a reduction of 41.8%. These results also suggest that, compared to T. trichiura and A. lumbricoides, more effective strategies need to be tailored for hookworm infections.

The data of our study stem largely from community-based surveys. However, the information extracted from the literature is not disaggregated by age, and hence we were not able to obtain age-adjusted predictive risk maps. In addition, more than 96% of observed surveys used the Kato-Katz technique [49, 50]. We assumed that the diagnostic sensitivity was similar across survey locations. However, the sensitivity depends on the intensity of infection, and hence varies in space [51]. The above data limitations are known in geostatistical meta-analyses of historical data [27] and we are currently developing methods to address them.


The work presented here is the first major effort to present model-based estimates of the geographical distribution of soil-transmitted helminth infection risk across P.R. China, and to identify the associated climatic, environmental and socioeconomic risk factors. Our prediction maps provide useful information for identifying priority areas where interventions targeting soil-transmitted helminthiasis are most urgently required. In a next step, we plan to further develop our models to address data characteristics and improve model-based predictions.



Bayesian credible interval


Base saturation as percentage of ECEsoil


Bulk density


Percentage of coarse fragments

China CDC:

Chinese center for diseases control and prevention


Percentage of clay


China national knowledge internet


FAO soil drainage class


Gross domestic product


Gaussian Markov random field

GNTD database:

Global neglected tropical diseases database


Gypsum content


Human influence index


Information, education, and communication


Integrated nested Laplace approximations


National Institute of Parasitic Diseases


Land surface temperature


Markov chain Monte Carlo


Moderate Resolution Imaging Spectroradiometer


Normalized difference vegetation index

P.R. China:

People’s Republic of China


Normal mixture of inverse Gammas with parameter expansion


pH measured in water


FAO texture class


Stochastic partial differential equations


Available water capacity


Organic carbon content


Total nitrogen


Percentage of sand


Percentage of silt.


  1. Bethony J, Brooker S, Albonico M, Geiger SM, Loukas A, Diemert D, Hotez PJ: Soil-transmitted helminth infections: ascariasis, trichuriasis, and hookworm. Lancet. 2006, 367: 1521-1532. 10.1016/S0140-6736(06)68653-4.

    Article  PubMed  Google Scholar 

  2. Pullan RL, Brooker SJ: The global limits and population at risk of soil-transmitted helminth infections in 2010. Parasit Vectors. 2012, 5: 81-10.1186/1756-3305-5-81.

    Article  PubMed Central  PubMed  Google Scholar 

  3. de Silva NR, Brooker S, Hotez PJ, Montresor A, Engels D, Savioli L: Soil-transmitted helminth infections: updating the global picture. Trends Parasitol. 2003, 19: 547-551. 10.1016/

    Article  PubMed  Google Scholar 

  4. Hotez PJ, Bundy DAP, Beegle K, Brooker S, Drake L, de Silva N, Montresor A, Engels D, Jukes M, Chitsulo L: Helminth infections: soil-transmitted helminth infections and schistosomiasis. Disease Control Priorities in Developing Countries. Edited by: Jamison DT, Breman JG, Measham AR, Alleyne G, Claeson M, Evans DB, Jha P, Mills A, Musgrove P. 2006, Washington, DC: World Bank, 467-482. 2

    Google Scholar 

  5. Murray CJL, Vos T, Lozano R, Naghavi M, Flaxman AD, Michaud C, Ezzati M, Shibuya K, Salomon JA, Abdalla S: Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990-2010: a systematic analysis for the global burden of disease study 2010. Lancet. 2012, 380: 2197-2223. 10.1016/S0140-6736(12)61689-4.

    Article  PubMed  Google Scholar 

  6. Xu LQ, Yu SH, Jin ZX, Yang JL, Lai CQ, Zhang XJ, Zheng CQ: Soil-transmitted helminthiases - nationwide survey in China. Bull World Health Organ. 1995, 73: 507-513.

    PubMed Central  CAS  PubMed  Google Scholar 

  7. Coordinating Office of the National Survey on the Important Human Parasitic Diseases: A national survey on current status of the important parasitic diseases in human population. Chin J Parasitol Parasit Dis. 2005, 23: 332-340. (in Chinese)

    Google Scholar 

  8. Zheng Q, Chen Y, Zhang HB, Chen JX, Zhou XN: The control of hookworm infection in China. Parasit Vectors. 2009, 2: 44-10.1186/1756-3305-2-44.

    Article  PubMed Central  PubMed  Google Scholar 

  9. Li T, He SY, Zhao H, Zhao GH, Zhu XQ: Major trends in human parasitic diseases in China. Trends Parasitol. 2010, 26: 264-270. 10.1016/

    Article  PubMed  Google Scholar 

  10. Wang XB, Zhang LX, Luo RF, Wang GF, Chen YD, Medina A, Eggleston K, Rozelle S, Smith DS: Soil-transmitted helminth infections and correlated risk factors in preschool and school-aged children in rural southwest China. PLoS One. 2012, 7: e45939-10.1371/journal.pone.0045939.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Zhou XN, Bergquist R, Tanner M: Elimination of tropical disease through surveillance and response. Infect Dis Poverty. 2013, 2: 1-10.1186/2049-9957-2-1.

    Article  PubMed Central  PubMed  Google Scholar 

  12. Raso G, Vounatsou P, Gosoniu L, Tanner M, N’Goran EK, Utzinger J: Risk factors and spatial patterns of hookworm infection among schoolchildren in a rural area of western Côte d’Ivoire. Int J Parasitol. 2006, 36: 201-210. 10.1016/j.ijpara.2005.09.003.

    Article  PubMed  Google Scholar 

  13. Pullan RL, Gething PW, Smith JL, Mwandawiro CS, Sturrock HJW, Gitonga CW, Hay SI, Brooker S: Spatial modelling of soil-transmitted helminth infections in Kenya: a disease control planning tool. PLoS Negl Trop Dis. 2011, 5: e958-10.1371/journal.pntd.0000958.

    Article  PubMed Central  PubMed  Google Scholar 

  14. Pullan RL, Bethony JM, Geiger SM, Cundill B, Correa-Oliveira R, Quinnell RJ, Brooker S: Human helminth co-infection: analysis of spatial patterns and risk factors in a Brazilian community. PLoS Negl Trop Dis. 2008, 2: e352-10.1371/journal.pntd.0000352.

    Article  PubMed Central  PubMed  Google Scholar 

  15. Chammartin F, Scholte RGC, Guimarães LH, Tanner M, Utzinger J, Vounatsou P: Soil-transmitted helminth infection in South America: a systematic review and geostatistical meta-analysis. Lancet Infect Dis. 2013, 13: 507-518. 10.1016/S1473-3099(13)70071-9.

    Article  PubMed  Google Scholar 

  16. Scholte RGC, Schur N, Bavia ME, Carvalho EM, Chammartin F, Utzinger J, Vounatsou P: Spatial analysis and risk mapping of soil-transmitted helminth infections in Brazil, using Bayesian geostatiscal models. Geosopat Health. 2013, 8: 97-110.

    Article  Google Scholar 

  17. Gelfand AE, Hills SE, Racinepoon A, Smith AFM: Illustration of Bayesian-inference in normal data models using Gibbs sampling. J Am Statist Assoc. 1990, 85: 972-985. 10.1080/01621459.1990.10474968.

    Article  Google Scholar 

  18. Diggle PJ, Tawn JA, Moyeed RA: Model-based geostatistics. J R Stat Soc Ser C: Appl Stat. 1998, 47: 299-326.

    Article  Google Scholar 

  19. Rue H, Martino S, Chopin N: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B: Stat Methodol. 2009, 71: 319-392. 10.1111/j.1467-9868.2008.00700.x.

    Article  Google Scholar 

  20. Cameletti M, Lindgren F, Simpson D, Rue H: Spatio-temporal modeling of particulate matter concentration through the SPDE approach. Adv Stat Anal. 2013, 97: 109-131. 10.1007/s10182-012-0196-3.

    Article  Google Scholar 

  21. Hürlimann E, Schur N, Boutsika K, Stensgaard AS, Laserna de Himpsl M, Ziegelbauer K, Laizer N, Camenzind L, Di Pasquale A, Ekpo UF: Toward an open-access global database for mapping, control, and surveillance of neglected tropical diseases. PLoS Negl Trop Dis. 2011, 5: e1404-10.1371/journal.pntd.0001404.

    Article  PubMed Central  PubMed  Google Scholar 

  22. Sanderson EW, Jaiteh M, Levy MA, Redford KH, Wannebo AV, Woolmer G: The human footprint and the last of the wild. Bioscience. 2002, 52: 891-904. 10.1641/0006-3568(2002)052[0891:THFATL]2.0.CO;2.

    Article  Google Scholar 

  23. Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carre G, Marquez JRG, Gruber B, Lafourcade B, Leitao PJ: Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography. 2013, 36: 27-46. 10.1111/j.1600-0587.2012.07348.x.

    Article  Google Scholar 

  24. Karagiannis-Voules DA, Scholte RGC, Guimarães LH, Utzinger J, Vounatsou P: Bayesian geostatistical modeling of leishmaniasis incidence in Brazil. PLoS Negl Trop Dis. 2013, 7: e2213-10.1371/journal.pntd.0002213.

    Article  PubMed Central  PubMed  Google Scholar 

  25. Lindgren F, Rue H, Lindstrom J: An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc Ser B: Stat Methodol. 2011, 73: 423-498. 10.1111/j.1467-9868.2011.00777.x.

    Article  Google Scholar 

  26. Scheipl F, Fahrmeir L, Kneib T: Spike-and-slab priors for function selection in structured additive regression models. J Am Statist. 2012, 107: 1518-1532. 10.1080/01621459.2012.737742.

    Article  CAS  Google Scholar 

  27. Chammartin F, Hürlimann E, Raso J, N’Goran EK, Utzinger J, Vounatsou P: Statistical methodological issues in mapping historical schistosomiasis survey data. Acta Trop. 2013, 128: 345-352. 10.1016/j.actatropica.2013.04.012.

    Article  PubMed  Google Scholar 

  28. Lunn D, Spiegelhalter D, Thomas A, Best N: The BUGS project: evolution, critique and future directions. Stat Med. 2009, 28: 3049-3067. 10.1002/sim.3680.

    Article  PubMed  Google Scholar 

  29. Gelman A, Rubin DB: Inference from iterative simulation using multiple sequences. Stat Sci. 1992, 7: 457-511. 10.1214/ss/1177011136.

    Article  Google Scholar 

  30. Plummer M, Best N, Cowles K, Vines K: CODA: convergence diagnosis and output analysis for MCMC. R News. 2006, 6: 7-11.

    Google Scholar 

  31. WHO: Prevention and control of schistosomiasis and soil-transmitted helminthiasis: report of a WHO expert committee. WHO Tech Rep Ser. 2002, 912: 1-57.

    Google Scholar 

  32. Keiser J, Utzinger J: Efficacy of current drugs against soil-transmitted helminth infections: systematic review and meta-analysis. JAMA. 2008, 299: 1937-1948.

    Article  CAS  PubMed  Google Scholar 

  33. Booth M, Bundy DAP: Comparative prevalences of Ascaris lumbricoides, Trichuris trichiura and hookworm infections and the prospects for combined control. Parasitology. 1992, 105: 151-157. 10.1017/S0031182000073807.

    Article  PubMed  Google Scholar 

  34. Tchuem Tchuenté LA, Behnke JM, Gilbert FS, Southgate VR, Vercruysse J: Polyparasitism with Schistosoma haematobium and soil-transmitted helminth infections among school children in Loum, Cameroon. Trop Med Int Health. 2003, 8: 975-986. 10.1046/j.1360-2276.2003.01120.x.

    Article  PubMed  Google Scholar 

  35. Tchuem Tchuenté LA: Control of soil-transmitted helminths in sub-Saharan Africa: diagnosis, drug efficacy concerns and challenges. Acta Trop. 2011, 120: S4-S11.

    Article  PubMed  Google Scholar 

  36. Flores A, Esteban JG, Angles R, Mas-Coma S: Soil-transmitted helminth infections at very high altitude in Bolivia. Trans R Soc Trop Med Hyg. 2001, 95: 272-277. 10.1016/S0035-9203(01)90232-9.

    Article  CAS  PubMed  Google Scholar 

  37. Gunawardena K, Kumarendran B, Ebenezer R, Gunasingha MS, Pathmeswaran A, de Silva N: Soil-transmitted helminth infections among plantation sector schoolchildren in Sri Lanka: prevalence after ten years of preventive chemotherapy. PLoS Negl Trop Dis. 2011, 5: e1341-10.1371/journal.pntd.0001341.

    Article  PubMed Central  PubMed  Google Scholar 

  38. Zhai S, Sun A: On the relationship between altitude and economy–the inspiration of altitude effects to the economic development of the Qinghai-Tibet plateau region. Nationalities Res Qinghai. 2012, 23: 152-159. (in Chinese)

    Google Scholar 

  39. Schratz A, Pineda MF, Reforma LG, Fox NM, Le AT, Tommaso Cavalli-Sforza L, Henderson MK, Mendoza R, Utzinger J, Ehrenberg JP: Neglected diseases and ethnic minorities in the Western Pacific Region: exploring the links. Adv Parasitol. 2010, 72: 79-107.

    Article  PubMed  Google Scholar 

  40. Yap P, Du ZW, Wu FW, Jiang JY, Chen R, Zhou XN, Hattendorf J, Utzinger J, Steinmann P: Rapid re-infection with soil-transmitted helminths after triple-dose albendazole treatment of school-aged children in Yunnan, People’s Republic of China. Am J Trop Med Hyg. 2013, 89: 23-31. 10.4269/ajtmh.13-0009.

    Article  PubMed Central  PubMed  Google Scholar 

  41. Brooker S, Clements ACA, Bundy DAP: Global epidemiology, ecology and control of soil-transmitted helminth infections. Adv Parasitol. 2006, 62: 221-261.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  42. Norhayati M, Oothuman P, Fatmah MS: Some risk factors of Ascaris and Trichuris infection in Malaysian aborigine (Orang Asli) children. Med J Malaysia. 1998, 53: 401-407.

    CAS  PubMed  Google Scholar 

  43. Hohmann H, Panzer S, Phimpachan C, Southivong C, Schelp FP: Relationship of intestinal parasites to the environment and to behavioral factors in children in the Bolikhamxay province of Lao PDR. Southeast Asian J Trop Med Public Health. 2001, 32: 4-13.

    CAS  PubMed  Google Scholar 

  44. Escobedo AA, Canete R, Nunez FA: Prevalence, risk factors and clinical features associated with intestinal parasitic infections in children from San Juan y Martínez, Pinar del Río, Cuba. West Indian Med J. 2008, 57: 377-382.

    CAS  PubMed  Google Scholar 

  45. Knopp S, Mohammed KA, Stothard JR, Khamis IS, Rollinson D, Marti H, Utzinger J: Patterns and risk factors of helminthiasis and anemia in a rural and a peri-urban community in Zanzibar, in the context of helminth control programs. PLoS Negl Trop Dis. 2010, 4: e681-10.1371/journal.pntd.0000681.

    Article  PubMed Central  PubMed  Google Scholar 

  46. Pinheiro ID, de Castro MF, Mitterofhe A, Pires FAC, Abramo C, Ribeiro LC, Tibirica SHC, Coimbra ES: Prevalence and risk factors for giardiasis and soil-transmitted helminthiasis in three municipalities of Southeastern Minas Gerais State, Brazil: risk factors for giardiasis and soil-transmitted helminthiasis. Parasitol Res. 2011, 108: 1123-1130. 10.1007/s00436-010-2154-x.

    Article  Google Scholar 

  47. Bergquist R, Whittaker M: Control of neglected tropical diseases in Asia Pacific: implications for health information priorities. Infect Dis Poverty. 2012, 1: 3-10.1186/2049-9957-1-3.

    Article  PubMed Central  PubMed  Google Scholar 

  48. Ministry of Health: Notice of the Ministry of Public Health concerning publishing “National Control Program on Important Parasitic Diseases in 2006-2015”. Gazette of the Ministry of Health of People’s Republic of Chin. 2006, 33: 41-44. (in Chinese)

    Google Scholar 

  49. Katz N, Chaves A, Pellegrino J: A simple device for quantitative stool thick-smear technique in schistosomiasis mansoni. Rev Inst Med Trop S ã o Paulo. 1972, 14: 397-400.

    CAS  Google Scholar 

  50. Speich B, Knopp S, Mohammed KA, Khamis IS, Rinaldi L, Cringoli G, Rollinson D, Utzinger J: Comparative cost assessment of the Kato-Katz and FLOTAC techniques for soil-transmitted helminth diagnosis in epidemiological surveys. Parasit Vectors. 2010, 3: 71-10.1186/1756-3305-3-71.

    Article  PubMed Central  PubMed  Google Scholar 

  51. Booth M, Vounatsou P, N’Goran EK, Tanner M, Utzinger J: The influence of sampling effort and the performance of the Kato-Katz technique in diagnosing Schistosoma mansoni and hookworm co-infections in rural Côte d’Ivoire. Parasitology. 2003, 127: 525-531. 10.1017/S0031182003004128.

    Article  CAS  PubMed  Google Scholar 

Download references


We thank two anonymous referees for a series of useful comments and suggestions. This study received financial support from the China Scholarship Council (CSC) to YSL, the UBS Optimus Foundation (project no. 5879) and the Swiss National Science Foundation (PDFMP3_137156).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Penelope Vounatsou.

Additional information

Competing interests

The authors have declared that no competing interests exist.

Authors’ contributions

YSL and PV analyzed the data. YSL, JU and PV wrote the paper. PV, JU and XNZ conceptualized the project. XNZ provided data. YSL did the literature review and processed the data. PV, JU and XNZ provided important intellectual content. All authors read and approved the originally submitted and the revised manuscript.

Electronic supplementary material


Additional file 1: Figure S1: Spatial distribution of environmental/climatic, soil types and socioeconomic factors in P.R. China. (TIFF 19 MB)


Additional file 2: Figure S2: Model validation results. Percentage of survey locations with observed prevalence included within the Bayesian credible interval (BCI) of various probability coverage cut-offs (bar plots) calculated from the posterior predicted distribution. Solid lines indicate the corresponding width of BCI. (TIFF 921 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Lai, YS., Zhou, XN., Utzinger, J. et al. Bayesian geostatistical modelling of soil-transmitted helminth survey data in the People’s Republic of China. Parasites Vectors 6, 359 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: