### Data sources and data selection

Data on the prevalence of LF infection were identified from searches of the formal and informal literature and direct communication with LF control programmes. Details of the search strategies, inclusion criteria, data abstraction and geolocation procedures are provided by Cano et al. [12]. In brief, only population-based survey data based on random sampling were included, whereas data from non-random data, including surveys from hospitals, prisons, mental institutions or military facilities, were excluded. In the current analysis, only surveys conducted prior to the implementation of countrywide, population-based MDA were included. Infection prevalence was defined as either (i) the proportion of surveyed individuals with detectable microfilaraemia or (ii) the proportion of surveyed individuals with detectable antigenaemia. A table with a brief description of the surveys compiled and used in this work is provided as supplemental information (see Additional file 1).

Microfilaraemia (mf) was estimated from examination of thick blood films collected during night blood surveys to detect the presence of microfilariae using microscopy [14, 15]. It is generally assumed that the sensitivity of methods for detecting microfilaraemia is influenced by the volume of blood sampled and previous authors have adjusted prevalence estimates according to blood volume [16]. We therefore investigated this issue further but found that although estimates of infection prevalence varied according to blood volume, we were unable to derive consistent adjustment factors and therefore did not adjust prevalence estimates according to blood volume (see Additional file 1). It is also known that concentration and counting chamber methods have greater sensitivity [17] and therefore where estimates were derived using both concentration methods and thick blood smears (*n* = 10 surveys), we used only the thick smear data.

*Wuchereria bancrofti* antigenaemia was typically estimated using an immuno-chromatographic card test (ICT) [14, 18]. These tests are more sensitive than mf detection and can be conducted on blood collected at any time of the day and therefore since 2000 have been the diagnostic method of choice for mapping the distribution of LF caused by *W. bancrofti*. Recent work has highlighted potential cross-reactivity of the ICT test with *Loa loa* [19], therefore we excluded ICT-based surveys conducted in areas of *L. loa* transmission (*n* = 314), as defined by Zouré et al. [20].

Data derived from parasitological blood examinations during day time (*n* = 71), ELISA (*n* = 70), clinical examinations (*n* = 24) and other molecular or antibody-based tests (i.e. PCR, IFI, new rapid test) (*n* = 17) were excluded because of the lack of comparable data. Finally, we excluded data from Egypt since this country is no longer considered endemic for LF and has a different transmission ecology from SSA [21, 22]. Figure 1 summarises the survey data selection and Fig. 2 shows the geographical distribution of survey data by diagnostic method.

### Covariate variables

There are a number of climatic, environmental and demographic factors that influence the transmission and epidemiology of LF, including temperature, humidity, elevation and population density [12, 23–25]. We generated a suite of climatic and environmental covariates from surfaces interpolated from meteorological stations or remote sensing imagery, including estimates of mean, minimum and maximum temperature and precipitation at 1 km^{2} resolution [26], averaged long-term estimates of land surface temperature [27], elevation at 1 km^{2} resolution [28], and aridity index [28], averaged enhanced vegetation index for the period 2000 to 2012 [27], and land cover data [29]. Estimates of population density were obtained from the Gridded Population of the World [30] and the United Nations Environment Programme [31], which we used to classify areas as urban, peri-urban or rural areas, based on the assumption that urban extents (UE) have a population densities ≥1000 persons/km^{2}, peri-urban >250 persons/km^{2} within a 15 km distance from the UE edge, and rural <250 persons/km^{2} and/or >15 km from the UE edge. From these datasets of population density, population growth rate for the period 1960 to 2010, as a measure of the change rate in population over this period, was also calculated. A detailed description of each covariate and source is provided in the supplemental file (Additional file 2: Table S3). Each covariate grid was resampled to a 10 km spatial resolution using bilinear interpolation for continuous surface and a majority approach for categorical data [32]. Covariate data were extracted to survey locations using ArcGIS Desktop v10.2 (Environmental Systems Research Institute Inc., Redlands CA, USA).

### Vector distribution maps

LF is transmitted mainly by mosquito species belonging to the *Anopheles, Culex,* and *Mansonia* genera and to lesser extent *Aedes*, *Coquillettidia* and *Ochlerotatus* genera [33, 34]. *Anopheles* mosquitoes are the main vectors of LF through much of west and central Africa and inland east Africa, whereas *Culex* species are the main vectors in coastal east and southern Africa [33]. Studies have shown that the survival, parasite uptake and development of infective L3 stages and overall transmission potential varies by vector species [35–37]. In order to capture these species differences, we developed maps of the distribution of each dominant LF vector species: *Anopheles, Culex* and *Mansonia*.

Maps of the distribution of mosquitoes belonging to the *An. gambiae* complex and *An. funestus* complex were obtained from the Malaria Atlas Project project (http://www.map.ox.ac.uk/) [38, 39] and a binary map displaying the distribution of each complex was created. Maps of *Culex* and *Mansonia* mosquitoes were obtained from the VectorMap project (http://www.vectormap.org/), which collates collection records of major vector insects and uses maximum entropy ecological niche modelling to develop occurrence maps [40, 41]. The maps present the probability of occurrence of *Cx. quinquefasciatus, Cx. pipiens, Cx. univittatus,* and *Mansonia Africana*, and we used a 90 % probability threshold to indicate dominance of a particular species according to a set of presence records obtained from the Global Biodiversity Information Facility database [42]. Each of the species-specific binary maps were combined to produce gridded maps of mosquito distribution by genus which, in turn, were combined into a single map of the different LF vector species at a 5 km spatial resolution (Fig. 3). The developed maps of *Anopheles, Culex* and *Mansonia* genera were subsequently incorporated as covariates in the goestatistical and mathematical modelling.

### Geostatistical modelling approach

The prevalence of mf and prevalence of antigenaemia were modelled separately due to a poor correlation between outcomes and the inability to predict one from another, as detailed elsewhere [43]. Bayesian geostatistical models, which included both fixed and random effects, were used to predict the spatial distribution of each outcome. Fixed effects quantify the effects of the covariates on LF infections, whereas random effects account for unexplained spatial variation whose structure can eventually help identify anomalous areas of high or low risk that can be further investigated.

The ICT-based model accounted for environmental and demographic covariates and spatial random effects as data spanned only 15 years and no temporal trend was observed in the data. From ICT-based data, we fitted a model in which conditional on the true prevalence *P*(*x*
_{
i
}) at location *x*
_{
i
}, *i* = 1, …, *n*, the number of positive results *Y*
_{
i
} out of *N*
_{
i
} individuals sampled at location *x*
_{
i
}, follows a binomial distribution *Y*
_{
i
}
*|P* (*x*
_{
i
}) *~ Binomial(N*
_{
i
}
*,P* (*x*
_{
i
})). The log-odds of *P*(*x*
_{
i
}) is modelled as logit(*P*(*x*
_{
i
})) = *z*
_{
i
}
*β* + *S*(*x*
_{
i
}) + *u*
_{
i
}, where *z*
_{
i
} = (1, *z*
_{
i1}, …, *z*
_{
ip
}) denotes the vector of the intercept and the *p* covariates considered, and *β* = (*β*0, *β*1, . . ., *β*p)*’* is the coefficient vector. *S*(*x*
_{
i
}) is a zero-mean Gaussian process with Matérn covariance function that represents any residual spatial variation which is not explained by included covariates. The Mátern covariance function is expressed as

$$ \mathrm{C}\mathrm{o}\mathrm{v}\left(S\left({x}_i\right),S\left({x}_j\right)\right) = \frac{\upsigma^2}{2^{\upsilon -1}\Gamma \left(\upsilon \right)}{\left(k\parallel {x}_i-{\mathrm{x}}_{\mathrm{j}}\parallel \right)}^{\upsilon }{K}_{\upsilon}\left(k\parallel {x}_i-{\mathrm{x}}_{\mathrm{j}}\parallel \right), $$

where *Κ*
_{
υ
} is the modified Bessel function of second kind and order *υ* > 0. The integer value of *υ* determines the mean square differentiability of the underlying process. *υ* is usually fixed since typically it is poorly identified in applications. *σ*
^{2} is the marginal variance and *k* > 0 is a scaling parameter related to the range *ρ*, the distance at which the *S*(*x*
_{
i
}) and *S*(*x*
_{
j
}) are almost independent. In particular, we used the empirically derived definition \( \rho = \sqrt{8\upupsilon}/k \), with *ρ* corresponding to the distance at which the spatial correlation is approximately 0.1. We also include in the model a spatially unstructured random effect, *u*
_{
i
}, that captures the effects of unmeasured characteristics that affectall members in the survey. *u*
_{
i
} are assumed to be independent zero-mean Gaussian distributed with precision *τ*
_{
u
}.

The model for mf prevalence incorporated a temporal fixed effect to capture changes in mf prevalence over time and a spatio-temporal random effect under the assumption that fitted temporal correlations exist only with the preceding year [44]. Here, the number of positive results *Y*
_{
it
} out of *N*
_{
it
} people sampled at location *x*
_{
i
}, *i* = 1, …, *n* and time *t* = 1, 2, 3, follows a binomial distribution

$$ {Y}_{it}\Big|P\left({x}_i,t\right)\sim Binomial\left({N}_{it},P\left({x}_i,t\right)\right), $$

$$ logit\left(P\left({x}_i,t\right)\right)={z}_{it}\beta +\xi \left({x}_i,t\right)+{u}_{it}, $$

Here *ξ*(*x*
_{
i
}, *t*) denotes the between-location-time random effect at location *x*
_{
i
} and time *t.*

$$ \xi \left({x}_i,t\right)=a\xi \left({x}_i,t-1\right)+w\left({x}_i,t\right), $$

where |*a*| <1, and *ξ*(*x*
_{
i
}, 1) follows the stationary distribution of a first-order autoregressive process, namely N(0, σ
^{2}_{w}
/(1 − *a*
^{2})). *W*(*x*
_{
i
}, *t*) is a spatial correlation term. Each *w*(*x*
_{
i
}, *t*) follows a zero-mean Gaussian distribution. The *w*(*x*
_{
i
}, *t*) is temporally independent but spatially dependent at each time t, with Matérn covariance function. *u*
_{
it
} denotes the unstructured random effect and are assumed to be independent zero-mean Gaussian distributed with precision *τ*
_{
u
}.

### Variable selection and model development

We followed a model selection procedure to identify an optimal suite of covariates to include in the fixed effects part of the geostatistical models. In order to reduce any potential collinearity and confounding effects, we first grouped the variables and use a formal model selection criterion to select one variable within each of the groups (Additional file 2: Table S4). Continuous variables with an absolute value of correlation coefficient higher than 0.8, were part of the same group. Land cover categories formed another group. Within each group, we investigated the relationship between infection prevalence and each potential explanatory variable by fitting univariate generalized linear models relating the logit of infection to each of the variables (Additional file 2: Table S5). We compared the univariate models in terms of the Akaike Information Criterion (AIC) and selected the variables which had the lowest AIC in the univariate analysis. AIC is defined as

$$ \mathrm{A}\mathrm{I}\mathrm{C} = -2l\left(\widehat{\theta}\right) + 2\mathrm{k}, $$

where \( l\left(\widehat{\theta}\right) \) is the maximum log-likelihood function and *k* is the number of parameters. After that, we explored further simplification of the model by backward elimination of selected variables until it was no longer possible to reduce AIC by elimination of any of the remaining variables.

The age group (either below or above 15 years of age) of individuals studied on the prevalence surveys was considered at every step of the modelling process. In addition, population density was considered for the periods at which the surveys were undertaken. Since ICT-based data were available for surveys conducted from 1990 onwards, estimates of population density at 2000 were used. For mf data, in order to take into account temporal changes in population density, gridded maps of population density at 1960, 1980 and 2000 were used accordingly with the period of survey (1950–1969, 1970–1989, 1990–onwards).

The selection of covariates at this stage in the overall model-fitting process ignores spatial and temporal correlation, and as a result is likely to over-state the statistical significance of covariate effects. In the final stage of the model-fitting process we re-assess the covariate effects and their significance within a spatial mixed model for the ICT data and a spatiotemporal mixed model for the Mf data that take into account the spatial and spatiotemporal correlation, respectively.

### Model validation

We assessed the predictive ability of the model using a leave-one-out cross-validation procedure. In this approach, a single observation is retained as the validation data, and the spatial model is fitted to the remaining data. Then, the observation in the validation data is predicted using the fitted model. The validation data needs to spatially represent the whole region where the prevalence is predicted. Therefore, instead of repeating the cross-validation procedure using each observation once as the validation data, we used each of the locations of a spatially representative sample of the prediction surface. To obtain a valid data set, 20 % of the observations were sampled without replacement where each observation had a probability of selection proportional to the area of the Thiessen polygon surrounding its location, that is, the area closest to the location relative to the surrounding points.

The performance of the model was assessed by comparing the observed and the predicted prevalences at each location, and by calculating the coverage probabilities of 95 % confidence interval (CI), that is, the proportion of times that the observed prevalence rates are within 95 % CIs. Specifically, we computed the correlation between the predicted and the observed prevalences, the mean prediction error (ME) defined as

$$ \mathrm{ME}=\frac{1}{\mathrm{m}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{m}}}\left({\mathrm{p}}^{*}\left({\mathrm{x}}_{\mathrm{i}}\right)-\mathrm{p}\left({\mathrm{x}}_{\mathrm{i}}\right)\right), $$

and the mean absolute error (MAE) defined as

$$ \mathrm{M}\mathrm{A}\mathrm{E}=\frac{1}{\mathrm{m}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{m}}}\left|{\mathrm{p}}^{*}\left({\mathrm{x}}_{\mathrm{i}}\right)-\mathrm{p}\left({\mathrm{x}}_{\mathrm{i}}\right)\right|, $$

where m is the number of observations in the validation data, p*(x_{i}) is the predicted prevalence, and p(x_{i}) is the observed prevalence at location x_{i}, i = 1, …, m.

### Implementation and spatial prediction

The models were fitted using the Integrated Nested Laplace Approximation (INLA) [45, 46] and the Stochastic Partial Differential Equation (SPDE) [47] approaches. INLA is a computationally less-intensive alternative to MCMC designed to perform fast Bayesian inference on a large class of latent Gaussian models. By using a combination of analytical approximation and numerical integration, INLA allows to fit models represented as follows,

$$ \left\langle {Y}_i|S,\theta \right\rangle \sim p\left\langle {Y}_i|{\eta}_i,\theta \right\rangle, $$

$$ {\eta}_i={\displaystyle \sum_j{c}_{ij}}{S}_j, $$

$$ \left\langle S|\theta \right\rangle \sim N\left(0,Q{\left(\theta \right)}^{-1}\right), $$

$$ \left(\theta \right)\sim p\left(\theta \right), $$

where *Y* denotes the observation variable that assume independence conditional on some underlying latent Gaussian field *S* and a vector of hyperparameters *θ*, and *η* is a linear predictor based on known covariate values *c*
_{
ij
}. The analysis of spatial point process data is possible by combining INLA and the SPDE approach. This consists in representing the continuously indexed Gaussian field *S*(*x*), as a discretely indexed Gaussian Markov random field (GMRF) by means of a basis function representation defined on a triangulation of the domain of interest. Thus,

$$ S(x)={\displaystyle \sum_{g=1}^G{\psi}_g}(x){S}_g, $$

where *G* is the number of vertices in the triangulation, *ψ*
_{
g
}(⋅) are piecewise polynomial basis functions on each triangle, and {*S*
_{
g
}} are zero-mean Gaussian distributed weights. The INLA and SPDE approaches are easily applied thanks to their implementation in the R-INLA software package [5].

We assigned a flat improper prior for the intercept, \( {\beta}_0\sim N\left(0,{\tau}_{\beta_0}^{-1}\right) \) with \( {\tau}_{\beta_0}=0 \), and independent vague Gaussian priors with fixed precision for all other components of the fixed effects, *β*
_{
i
} ∼ *N*(0, 1/0.001), *i* = 1, … *p*. The smoothness parameter *ν* was considered fixed to 1 implying a continuous domain Markov field. In the spatial model, the vector of weights *S* = (*S*
_{1,.}.., *S*
_{
G
})′ is assigned a Gaussian distribution, *S* ∼ *N*(0, *Q*
^{− 1}) where *Q* is a sparse precision matrix depending on the Matérn covariance function parameter *κ* and variance *σ*
^{2}. In the spatiotemporal model, for *ξ* = (*ξ*
_{1}, …, *ξ*
_{
n
}) ', we use a Gaussian prior with zero mean and precision matrix depending on the autocorrelation hyperparameter *a, k* and σ
^{2}_{w}
. To complete the models, we assign *τ*
_{
u
} a vague Gamma prior.

The model output provided the posterior distribution for each of the model parameters. We summarized these distributions to obtain the posterior mean and 95 % CI of the fixed effects and hyperparameters. Predictions of infection prevalence were provided at a spatial resolution of 10 km. Maps showing the predicted LF prevalence as well as its uncertainty were generated by summarizing the posterior distributions of the prevalence obtained at each of the prediction locations. Specifically, the predicted prevalence was represented as the posterior mean, and its uncertainty was represented using the 95 % CI. Predictions were standardized to provide estimates for entire communities (children and adults combined) and for the period 2000-onwards.

### Mathematical modelling of the intensity of LF transmission

The prevalence of LF infection provides a useful metric to guide the planning of control, but provides limited insight to the dynamics of transmission. Instead the intensity of infection transmission is best quantified by the basic reproduction number, R_{0}, which for macroparasites such as LF can be defined as the average number of female offspring per adult female worm surviving to reproduction (in the absence of density-dependence i.e. phenomena such as host immunity or worm mating probability, which accelerate or curtail the production of parasite life stages) [48]. Existing analytic expressions for R_{0} for LF, and helminths with similar natural history such as onchocerciasis, are defined in terms of average worm burden or microfilarial load in a community [49–51], yet the majority of LF studies only present data on the prevalence of infection. In the framework published by Gambhir et al. a simple, differential equation transmission model of LF is used to estimate R_{0} for a given prevalence value [6]. This model assumes various density-dependent functions which alter the rates of transition between state variables in the model (for example, the rate of parasite establishment and rate of parasite fecundity). The model parameter prior distributions were defined using data from the literature and parameter posteriors were found by fitting to baseline mf prevalence data from low, moderate and high transmission settings in Tanzania [52] – settings which are representative of much of SSA.

For a setting with a particular endemic prevalence, there is an underlying bite rate and force of infection which leads to this endemic equilibrium. This parameter contributes to R_{0} (as explained in detail below), but can only be calculated indirectly through the method described in detail by Gambhir et al. [53] and outlined here.

The first stage in estimating R_{0} is to calculate the effective reproduction number, R_{eff} in terms of the constant terms in the mathematical model and the density-dependent functions of the state variables. The effective reproductive number, R_{eff}, is equal to the basic reproductive number when a parasite is newly introduced to a population, but as parasites become established in the population, density dependent effects, such a density dependent fecundity, can both limit and facilitate transmission, increasing or decreasing R_{eff}. When the system is at equilibrium, the effective reproductive number is one (the number of parasites is neither increasing nor decreasing) and the parasite load in the population is at its equilibrium value. Therefore, we want to calculate the value of the biting rate parameters for which this expression equals 1 for a particular population parasite load. The expression for R_{eff} is an implicit expression which cannot be solved analytically, and therefore has to be solved numerically for particular settings using the following method. When the function R_{eff} is plotted against a population state variable (such as mf prevalence or community mf load), the resulting plot increases with parasite density in the population for low levels of parasite load, but for large levels of parasite load in the host population density dependent processes decrease R_{eff}, leading to a “humped” function (Fig. 1 of Gambhir et al. [53]), i.e. as mf load (or prevalence) is varied, the function rises or falls corresponding to increasing or decreasing LF transmission. This humped function has a maximum. When this peak value of R_{eff} is more than 1, the effective reproduction number intersects the R_{eff} = 1 line twice, meaning that there are two equilibria [53]. The higher of these is the stable endemic equilibrium, and the lower is an unstable extinction ‘breakpoint’ [54]. Therefore in the absence of treatment, we assume that the higher point represents LF prevalence at stable endemic equilibrium (i.e. pre-intervention). Specifically, in order to fit the R_{eff} function to the mf prevalence data, so that its upper equilibrium corresponds to the observed endemic prevalence value, we alter the mosquito-human biting rate parameter – the rate at which humans are bitten by mosquitoes – until this correspondence occurs. The model parameter values denoting a given endemic prevalence value, are used in Gambhir et al., equation 6 [53] to estimate the value of *R*
_{
0
}:

$$ {R}_0=\frac{\lambda {f}_1(0){f}_2(0)\alpha {f}_3(0)\beta {f}_4(0)}{\mu \delta \sigma } $$

where *λ*, *α* and *β* are the ‘immigration’ rates of each of the life stages (adult worms, microfilariae, and L3 infective larvae respectively; *λ* is a composite parameter, incorporating the mosquito-human biting rate which is estimated using the methodology describe above; *μ*, *δ* and *σ* are the death rates of each of the life stages respectively; and *f*
_{1}(*I*), *f*
_{2}(*W*), *f*
_{3}(*W*), *f*
_{4}(*M*) represent the modifying density-dependent functions acting on each of the respective parasite life-stage intensities (*W*, the number of adult worms per definitive host, *M*, the mf load per host, refer to parasite life stages within the definitive host population, whereas the L3 infective larval load per mosquito (larvae develop through L1 and L2 stages but only become infective once they reach the L3 stage), *L*, refers to the parasite life-stage within the vector host. The host immunity variable, *I*, increases in magnitude at a rate equal to the adult worm burden but can also decay over time, allowing the model to capture the waning of immunity). Note that for *R*
_{
0,} host immunity levels and worm burdens are assumed to be zero as this expression calculates the number of offspring in a wholly susceptible population. The above mathematical model was initially fitted separately for settings where either *Anopheles* or *Culex* species are the predominant vector, based on the develop vector distribution maps (Fig. 3). However, the relationship between the mf prevalence and R_{0} is practically identical for *Anopheles* or *Culex* species, as the model assumes similar mosquito life expectancy for each species, and therefore we present a single relationship between prevalence of mf and R_{0}. Finally, the threshold mf prevalence value above which LF transmission can persist, which is higher than the prevalence at which *R*
_{0} = 1 due to the density dependencies assumed in the modelling framework [53], was calculated.