Forecasting United States heartworm Dirofilaria immitis prevalence in dogs

Background This paper forecasts next year’s canine heartworm prevalence in the United States from 16 climate, geographic and societal factors. The forecast’s construction and an assessment of its performance are described. Methods The forecast is based on a spatial-temporal conditional autoregressive model fitted to over 31 million antigen heartworm tests conducted in the 48 contiguous United States during 2011–2015. The forecast uses county-level data on 16 predictive factors, including temperature, precipitation, median household income, local forest and surface water coverage, and presence/absence of eight mosquito species. Non-static factors are extrapolated into the forthcoming year with various statistical methods. The fitted model and factor extrapolations are used to estimate next year’s regional prevalence. Results The correlation between the observed and model-estimated county-by-county heartworm prevalence for the 5-year period 2011–2015 is 0.727, demonstrating reasonable model accuracy. The correlation between 2015 observed and forecasted county-by-county heartworm prevalence is 0.940, demonstrating significant skill and showing that heartworm prevalence can be forecasted reasonably accurately. Conclusions The forecast presented herein can a priori alert veterinarians to areas expected to see higher than normal heartworm activity. The proposed methods may prove useful for forecasting other diseases.


Background
Heartworm disease, caused by the mosquito-borne filarial nematode Dirofilaria immitis, is arguably the most medically important parasitic infection of domestic dogs in the United States (US), affecting at least 115,000 dogs in 2015. Beyond the US, heartworm disease is a global veterinary healthcare problem, with D. immitis affecting dogs in many parts of South America, Europe, Asia, and Australia [1,2]. Infection is associated with life-threatening complications and significant financial burden, costing millions in veterinary care annually for disease treatment [3][4][5][6][7]. Although less common and less studied, heartworm disease is also a health concern for other mammals such as domestic cats, domestic ferrets, and some wildlife species [8]. Clinical signs of heartworm disease in domestic dogs include exercise intolerance, coughing, dyspnea, cachexia, anorexia, epistaxis and ascites. Dogs with a high burden of adult heartworms can suffer from pulmonary arterial occlusion and inflammation, leading to pulmonary hypertension and potentially right heart failure. Cats and ferrets may experience similar signs but acute death may occur, even with very low worm burdens. Humans can also be infected with D. immitis, but infections are rare, with fewer than 100 cases reported in the US over the last 60 years [9]. Human infection is most commonly asymptomatic, with people considered dead-end hosts for the parasite. While rare, human D. immitis infection is highly problematic in that it most often manifests as "coin lesions" in the lungs that may be mistaken for a neoplasm on chest radiographs; surgical excision is necessary to differentiate the two entities [10].
Heartworm disease in dogs is most commonly diagnosed through the detection of circulating D. immitis antigen in the blood [3,11]. The prevalence of heartworm infection in the US varies considerably by geographical region. Two nationwide surveillance studies of D. immitis infection seroprevalence (henceforth prevalence) in domestic dogs found the highest prevalence in the Southeast and the lowest in the Northeast [11]. For unknown reasons, a noted decrease in the prevalence of D. immitis occurred between the 2001-2007 and 2010-2012 in these studies. Importantly, regardless of time period and even within areas where heartworm infection is considered common, there can be considerable local variation, with prevalence reaching as high as 13 % [12,13].
Numerous factors are purported to be associated with regional and local variations in D. immitis prevalence in domestic dogs. Highly effective commercially available anthelmintics (e.g. macrocyclic lactones (ML) [3], including the avermectins (ivermectin, selamectin) and the milbemycins (moxidectin, milbemycin oxime) can be administered monthly to prevent the development of immature stages into adult worms. Year-round preventive use is recommended throughout the US, yet the majority of dogs only receive seasonal treatment [14]. Even within highly endemic regions, anthelminthic use varies based on client compliance, knowledge, or dog owner's demographics. In addition, resistance of D. immitis to ML has been recently documented and is a growing concern in the Gulf States, but the current extent of resistant phenotypes remains unknown [7,15]. Dirofilaria immitis can be transmitted by over 70 species of mosquitoes, although certain species (e.g. Aedes trivittatus, Aedes sierrensis and Culex quinquefasciatus) are considered more important vectors [16]. Because the density of mosquitoes and community composition of competent vector species is influenced greatly by habitat use and climate, these factors should be considered when investigating factors influencing heartworm disease. In support of this, a previous study found that temperature, median household income, population density, precipitation, elevation, relative humidity, forestation coverage, and surface water coverage all significantly influence D. immitis prevalence in dogs [16].
Clearly, it would be advantageous to accurately forecast D. immitis prevalence on a local scale, providing an a priori alert to veterinarians in problem areas where immediate remediation measures could be taken. Annual forecasts of emergent infection will also inform veterinary and public health officials to shifting areas of infection, particularly in temperate regions of the US where D. immitis is generally absent, rare, or prevalence is highly influenced by annual variation in biotic or abiotic factors.

Data structure
The data studied here contain 31,345,244 heartworm antigen test results from dogs in the conterminous United States from 2011 to 2015, and various climate, geographic and socio-economic factors purported to influence heartworm prevalence. The raw tests were obtained from the Antech and IDEXX laboratories [17,18]. Over all 5 years in the study, 384,905 of the tests were positive (1.23 %). The test data contain the county/ parish of the testing clinic and the month when the tests were conducted; however, no measure of uncertainty is given with the individual test results.
The test data were aggregated into the number of positive and negative tests for each year in each conterminous United States county/parish. Table 1 lists 16 explanatory factors that are purportedly related to dog heartworm prevalence, as well as their time period of record and geographic scale of collection. These 16 factors include the climatic variables of annual temperature, precipitation and relative humidity, the geographic variables of county elevation, forestation coverage and surface water coverage, the socio-economic variables of county population density and median household income, and the presence of eight mosquito vectors. For more details on these factors, see [19]. Figure 1 displays county-level raw heartworm prevalences obtained by dividing the number of positive tests by the number of tests over all 5 years in the study. The raw prevalences exhibit a large degree of spatial correlation in that neighboring counties tend to report similar prevalences. Significant temporal dependence is also present in the data: the current prevalence is similar to past prevalence. Therefore, this data set requires a statistical model with both spatial and temporal dependence. Figure 2 provides a spatially smoothed prevalence map, using a head-banging smoothing procedure, based on the Fig. 1 prevalences. In the head-banging smoothing procedure, 45 triples were employed. The smoothing was also weighted proportionally to the number of tests in each county over the 5-year period. This prevents the map from signaling a high/low prevalence that is more likely attributed to a small sample (one positive out of three tests has the same prevalence as one hundred positive in 300 tests, though the latter is more indicative of infection risk). Details on head-banging smoothing and its uses in disease mapping are contained in [16]. Figure 2 serves as a contemporary depiction of the "baseline" heartworm risk for dogs in the United States.

Statistical modeling
The model and methods used to statistically analyze the heartworm tests are now described. The goal here is to assess the significance of the 16 factors and accurately estimate regional heartworm prevalence.
Let Y s (t) and n s (t) denote the number of positive and total tests conducted in county s during year t, respectively, for counties s ∈ {1, …, S} and years t ∈ {1, …, T}. Methodologies for modeling spatial and temporal dependence have received much recent attention in the statistics literature [20][21][22][23][24][25][26][27]. Among many choices, Bayesian hierarchical models have been prominent due to their flexibility. In a Bayesian hierarchical model, spatial and temporal dependence is modelled in a hierarchy via a series of random effect terms with prescribed structures; see [20,21] for a modern review of spatiotemporal models. Typically, when modeling the spatial or spatial-temporal dependent count data via parametric models, a Poisson distribution is preferred [21][22][23][24]. The following hierarchical regression model is used here:  For further discussion, including the source of each factor, see [16] where log(⋅) denotes natural logarithm, X s (t) = (X s,1 (t), …, X s,16 (t)) ' is a vector of covariate information for county s at time t ('denotes matrix transpose), β = (β 0 , …, β p ) ' is a vector of regression coefficients, p s (t) denotes the heartworm prevalence of county s at time t, the symbolm eans has the distributional type, and | indicates given quantities. Equation (1) indicates that Y s (t) has a Poisson distribution with mean n s (t)p s (t). In addition, it is assumed that the positive test counts (i.e. Y s (t)) are conditionally independent of each other given the number of tests, factor information, and random effects. This does not imply that Y s (t) is independent across varying space s or time t.
To relate prevalence to the factors and build spatial and temporal dependence, the model in (2) is proposed, as is common in Poisson regressions [21][22][23][24]. There are many ways to induce spatial and temporal dependence from the random effects {ξ s (t)}. One natural and popular choice is the conditional autoregressive (CAR) structure where ξ t = (ξ 1 (t), …, ξ S (t)) ' and ϕ t = (ϕ 1 (t), …, ϕ S (t)) ' are random vectors. Equation (4) indicates that the spatial effects (i.e. ϕ t , for t = 1,…,T) are independent and identically distributed random vectors that follow a conditional autoregressive (CAR) model [25], which is a popular choice for modeling spatial dependence [26]. More specifically, let ϕ = (ϕ 1 , …, ϕ S ) ' denote a random vector which follows a CAR model. There are several varieties of CAR models. Typically, the CAR model is specified via a series of univariate conditional distributions. Spatial dependence is induced through a neighboring system involving geographically adjacent counties. Our version of the CAR model, taken from [26], uses Here, ϕ − k = (ϕ 1 , …, ϕ k − 1 , ϕ k + 1 , …, ϕ S ) ' is a vector that contains county effects for all counties except the kth one and N(μ, σ 2 ) denotes a normally distributed quantity with mean μ and variance σ 2 . In addition, W = {w k,i } is an S × S dimensional matrix that describes the neighborhood structure of all counties. Specifically, the entries of W are either zero or unity; w k,i = 1 if and only if the ith and kth counties share some common border. The parameter τ 2 in (5) is a scaling variance parameter. In fact, it can be seen that the conditional variance of ϕ k given its neighbor's random effects is inversely proportional to the number of counties bordering county k. Hence, counties with more neighboring counties tend to have a smaller variance, which is reasonable since data from the bordering counties helps predict the prevalence in the said county.
In Eq. (5), ρ ∈ [0, 1] is an autocorrelation parameter that governs correlation between bordering counties. Notice that the conditional expectation of ϕ k is the weighted arithmetic average of the neighboring random effects, multiplied by ρ. When ρ = 0, the conditional expectations of ϕ s are zero and all random effects are independent of each other; antipodally, ρ close to unity indicates strong spatial dependence between bordering counties.
In (3), time-dependence is modeled through a temporal autoregressive model of order one (AR(1)), which is a time series staple [28]. Here, it describes prevalence for a fixed county across different years. The parameter φ is the temporal correlation between consecutive years and lies within (−1, 1). This ensures a causal and stationarity solution to the time series model [28], which is needed in estimation.
From (5), it is possible to explicitly identify the joint distribution of ϕ, which is multivariate normal: where W is the previously mentioned neighborhood matrix and D = {d i,j } is an S × S diagonal matrix whose i th diagonal element is the number of neighboring counties for county i. Bayesian techniques are used to estimate the model parameters, which are β, φ, ρ, and τ 2 . Thus, to complete the Bayesian hierarchical model, the following prior distributions for these parameters are introduced: β k ∼ Nð0; 1000Þ; for k ¼ 0; …; 16; φ ∼ Uniformð−1; 1Þ; ρ ∼ Uniformð0; 1Þ; τ −2 ∼ Gammað0:5; 0:05Þ: Prior distributions for φ and ρ are taken as uninformative in that all admissible possibilities are equally likely. Priors for the regression coefficients β 0 , …, β 16 are taken as diffuse so that inferences for these parameters are based primarily on the data. The prior for τ − 2 is chosen as a conjugate prior (the posterior and prior distributions are from the same distributional family) for ease of computation. The random effects and model parameters are estimated based on posterior samples from a Markov chain Monte Carlo (MCMC) simulation. The MCMC simulation for our model uses a combination of Gibbs and Metropolis-Hastings steps. In the implementation of the algorithm, the test data for non-reporting counties was viewed as being latent and was subsequently sampled along with the model parameters. To run our MCMC simulation and assess significance of model parameters, a program was developed and implemented in R and C++.

Model assessment
The spatio-temporal Poisson regression model in (1) has 16 explanatory factors, all of which may not have predictive power. To assess this issue, a full model with all 16 factors was first fitted. Credible intervals, Bayesian analogs to confidence intervals in frequentist statistics, were then created for the parameters of interest. Table 1 summarizes our full model findings, showing 16 regression coefficients estimates (posterior median) and their 95 % highest posterior density (HPD) intervals; for further details about credible and HPD intervals, see [29,30]. Table 2 implies that not all factors are significant, e.g. 95 % HPD intervals of annual precipitation, elevation, percentage surface water coverage, and all mosquito presence factors except A. albopictus contain zeroes. To develop a parsimonious model with only significant factors, all explanatory variables whose 95 % HPD intervals contain zeroes were removed and the model was refitted. This leaves a "reduced model" with the six explanatory factors: annual temperature, annual relative humidity, percentage forest coverage, population density, median household income and A. albopictus absence/presence. Parameter estimates (posterior median) and 95 % HPD intervals for the regression parameters for the reduced model are shown in Table 3. The estimates (posterior median) of the other model parameters are φ = 0.914, ρ = 0.998, and τ 2 = 0.802.
Most of the significant factors have an intuitive interpretation. For example, the positive regression coefficient for the temperature and relative humidity factors implies that heartworm is more prevalent in warmer and humid locations. On the other hand, as is seen by negative regression coefficients, heartworm prevalence decreases with increasing population densities and median household incomes. Given the presence of relative humidity in the model, it is not overly surprising that precipitation drops out of the model fit. Finally, the negative regression coefficient on A. albopictus presence is not a contradiction: in the presence of all other factors (which include space and time prevalence histories), presence of this mosquito is associated with lessened heartworm prevalence. It is worthwhile to note that in a separate analysis (results not shown) a model with only A. albopictus was fitted, and the accompanying regression coefficient was non-negative.
To assess the overall performance of our reduced model, Fig. 3 graphically portrays our fitted model by plotting the average (over all 5 years) of modelestimated prevalence for each county after smoothing (standard Kriging with default parameters were used here). The model-estimated prevalence in Fig. 3 compares well to the head-banging smoothed baseline in Fig. 2 , and n s is the number of tests conducted in county s. Since the correlation here is between smoothed and model-estimated prevalence (these are non sample size dependent quantities), the weights were taken as n s ≡1 (and not the county-by-county sample sizes). The 0.727 correlation achieved indicates that the regression model has explained most of the data structure.
The fitted model has a number of uses. In the next section, it is used to construct annual heartworm prevalence forecasts. The model could also be used to estimate how climate change could alter heartworm disease risk.

Forecasting
This section shows how to use our model to forecast next year's regional heartworm prevalence. For this, all six significant explanatory factors and the spatialtemporal effects will need to be forecasted for the forthcoming year. To see how our forecast performs, the 2015 test and factor data was removed from the analysis, and the proposed six-factor model was refitted using data from 2011 to 2014 only. Our forecasts simply "plug in" 2015 forecasted factors for A. albopictus, annual temperature, annual relative humidity, percent forest coverage, population density, median household income, and a randomly generated random effect component into our model; for further details see [29,30].
Two of the six factors are relatively stable over time: county forestation and the presence of A. albopictus. For these two factors, the most recent observations are used as 2015's forecasted values.
To forecast annual temperature, historical temperature records were collected from 1895 to 2014 for each county and modeled as an autoregressive model of order one. The AR(1) model for an annual temperature series {F t } (previously denoted by {X s,1 (t)} in Section 3 for county s) obeys the difference equation where {ω t } is zero mean white noise; for further time series forecasting information, see [28]. The AR(1) model can be fitted to the temperature observations  using practically any statistical software package. Letδ andγ denote estimates of δ and γ, respectively. A prediction of the annual temperature at year t + 1 from temperatures from year 1 to year t iŝ F tþ1 ¼δ þγ F t : In our forecast,F tþ1 is used as next year's forecasted temperature factor. Figures 4 and 5 compare forecasted and observed annual temperatures for 2015. The correlation between these two figures in (6) is r = 0.996, which is very good (n s ≡1 here).
A simple linear regression was used to forecast next year's relative humidity (previously denoted by {X s,3 (t)}) and median household income (previously denoted by {X s,8 (t)}) in each county. Historical relative humidities from 2006 to 2014 and median household incomes from 1997 to 2014 were used to fit a regression model of form for each county. Here, {I t } denotes the relative humidity (previously denoted by {X s,3 (t)}) or median household income (previously denoted by {X s,8 (t)}), {η t } is zeromean random noise. Least squares estimators of α and κ, denoted by α andκ, respectively, were computed from the data at each county. The forecasted value for year t + 1 is simplyÎ tþ1 ¼α þκ t þ 1 ð Þ: Forecasting the county population density for next year requires the county areas and their recent population counts. The US Census provides good county population estimates for 2010, but not in years since 2010. Estimated state populations were obtained for each state between 1969 and 2014. A simple linear regression was fitted to these data for each state and 2015 state populations were forecasted. This forecasted state population was then partitioned to the counties within the state at a proportion that agrees with 2010 Census proportions.
To forecast the next year's spatial and temporal random effects, formula (3) is applied. Since the ϕ t s are independent and identically distributed over various years, based on the values of τ 2 and ρ (available from the posterior samples), for the current time t, ϕ t + 1 can be generated from the multivariate normal distribution N(0, τ 2 (D − ρW) − 1 ). Then ξ t + 1 is calculated by ξ t + 1 = φξ t + ϕ t + 1 . This process is repeated for each value of (ρ, τ 2 , φ), which are available from the posterior sample, thus yielding predictive posterior samples of the next year's random effect ξ t + 1 . See [29,30] for additional detail on obtaining predictive posterior samples. Figures 6 and 7 compare observed and forecasted heartworm prevalence during 2015. One can discern where heartworm is forecasted to be higher/lower than normal by comparing Figs. 2 and 7. The correlation  (6), is 0.940. Hence, the model is accurately forecasting in locations that report more tests. Finally, Fig. 8 presents our heartworm forecast for 2016. When 2016 concludes, we will be able to compare this forecast to 2016 test results.

Discussion
Generally, the management of emerging infectious diseases is approached reactively, with efforts focused on managing outbreaks after onset. The ability to reliably forecast transmission risk, particularly for diseases influenced by dynamic factors such as climate, could shift  our paradigm from reaction to prevention. This is particularly true for vector-borne diseases, as specific environmental needs for vector survival are well documented [31]. One approach to infectious disease modeling is to use these factors to predict transmission and model the data in both space and time. This has been used successfully to estimate the incidence of malaria during eradication campaigns in Namibia and cutaneous leishmaniasis in high-risk areas of Columbia [32,33].
Although preventable, heartworm disease is a relatively common and serious vector-borne disease of domestic dogs. Annual disease incidence, as reported by IDEXX and Antech, averages greater than 100,000 new cases annually. Annual data likely represent the true annual incidence of heartworm infection in domestic dogs: when diagnosed with heartworm, most dogs are either treated, or in some cases euthanized, due to poor outcome or financial constraints [34]. While fulminant infection with D. immitis may be due to lack of owner compliance in use of preventatives, it also may be due to misunderstanding the disease risk; mosquito vectors are known to be dynamic in their range and survival under changing climatic conditions.
To enhance veterinary client education and illuminate the benefits of preventatives, factors associated with D. immitis transmission [16,35] were identified and used to develop a spatial-temporal conditional autoregressive forecast model of heartworm prevalence. A comparison of observed versus forecasted heartworm prevalence was made in 2015 and was quite accurate. This may be attributed, in part, to the fact that many of the factors influencing heartworm prevalence do not change significantly from year to year (e.g. forest coverage, population density, household income). While temperature and humidity change annually and are important disease risk factors [36], these factors are still reasonably predictable; however, environmental or climate catastrophes (e.g. regional climate shifts, flooding, hurricanes) could impact heartworm incidence. Finally, mosquito populations can fluctuate greatly from year to year as they depend on numerous local land-use and environmental factors. Some competent vectors of D. immitis are still expanding their range in the US [37].
Several of the mosquito presence/absence factors were not included in our final model; this may be because our currently available data are only presence/absence, whereas mosquito abundance, a purportedly more powerful factor, varies annually at a local level. More accurate mosquito counts would likely yield more accurate forecasts. In addition, human activities such as treatment abatement programs may impact mosquito abundances. Since the introduction of West Nile virus into the US, localities have developed or expanded mosquito control programs, including reducing breeding habitats and application of pesticides. With increased concern over the Chikungunya and Zika viruses, it is possible that increased mosquito control may be initiated in the coming year(s). If such programs are initiated, mosquito abundance counts should take these programs into account.

Conclusion
In conclusion, our 2016 heartworm disease forecast (Fig. 8) has some noteworthy implications for veterinary practitioners, including an increased prevalence in northern California, eastern Montana, and central New Mexico. A relatively small increase in risk is predicted in some areas where heartworm is likely under appreciated, such as parts of the Dakotas and Nebraska. Importantly, our data indicate that all regions of the lower 48 United States have some risk for heartworm infection. Our maps and forecasts provide veterinarians with evidencebased recommendations for use of preventive in non- endemic regions of the US and support the recommendation of year-round use of preventive in high risk areas. Ultimately, we believe that these methods can be used to forecast multiple vector-borne diseases with veterinary and human health impacts, including Lyme disease, ehrlichiosis and anaplasmosis. Currently, the Companion Animal Parasite Council (CAPC, www.capcvet.org) provides monthly updates of heartworm prevalence on a county by county scale. Through a combination of realtime updates and forecasting efforts, we hope to see fewer cases of heartworm disease in dogs and cats in the future.