### Research flowchart

We hypothesized that a particular threshold temperature in winter is highly correlated with the spatial distribution of *Ae. aegypti* and that this low critical temperature limits the distribution of *Ae. aegypti* in Taiwan.

The research flowchart is shown in Fig. 1. The mosquito dataset was transformed and assigned as a dichotomous dataset, which was based on the surveillance of *Ae. aegypti* in each of 348 townships in Taiwan. Even in MODIS monthly composites, 0–15% of the pixels could be missing because of clouds or other reasons. Therefore, the seasonal MODIS LST data in our study were processed to fill the missing data by using four interpolation methods [i.e. inverse distance weighting (IDW), local polynomial interpolation (LPI), radial basis function (RBF), and ordinary kriging (OK)]. According to the Taiwan Central Weather Bureau, minimum air temperatures during the period of 1981–2010 ranged between 12.4–18.1 °C in January and 25.1–26.4 °C in July in the plain areas of the main island of Taiwan [23]. Therefore, we assumed that the search for the low critical temperature should include a range from 12.4 to 26.4 °C. Consequently, we chose to include temperatures ranging from 8 to 28 °C. The interpolated LSTs are continuous data and must be converted into a dichotomous format for evaluating spatial similarity. The degree of the correlation between the geographical distribution of *Ae. aegypti* and LSTs can be evaluated by using phi coefficient calculations. The critical low temperature was defined by determining the highest phi coefficient at which the assumed temperature demonstrates a strong correlation with the spatial distribution of *Ae. aegypti*.

### Surveillance of *Aedes* mosquitoes on the main island of Taiwan

The mosquito dataset used in this study was a part of a national routine entomological surveillance of dengue, and the original data were collected in a 3-year project of the Taiwan CDC from 2009 to 2011 [22]. Briefly, 99.2% of a total of 368 townships and 91.1% of a total of 7835 villages completed this survey. The mean number of villages per township was 21 (range, 6–42 villages). Moreover, 3401, 2671 and 2423 villages were inspected in 2009, 2010 and 2011, respectively. The average area in each village was determined to be 4.6 km^{2} (range, 0.3–26.2 km^{2}). A stratified cluster sampling technique was used to sample 50 residential premises in each village by randomly selecting and surveying the first house and then inspecting surrounding residences, checking water-filled containers indoors and outdoors. The containers included water buckets, pottery pots, urns, flower vases, flower saucers, used tires, tanks, gutters, discarded containers, natural containers, and other water-holding containers. *Aedes* larvae or pupae in containers were collected until the number reached 100. Then, samples were preserved in 70% alcohol and sent to the CDC for species identification. If the number after identification did not reach 100 due to misidentification of immature mosquitoes on site or because of low density, additional visits were conducted for that village. All mosquito data from the villages were categorized into 365 townships. In this study, we focused on the main island of Taiwan to detect spatial contiguity and pattern consistency; hence, townships belonging to outlying islands were excluded, and a total 348 townships were investigated (Wutai Township, which is located in a remote mountainous area, was the only township in the main island of Taiwan that was not investigated). We then mapped the distribution of *Ae. aegypti* by using the binary principle. A township was denoted as having “presence of *Ae. aegypti*” (defined as “1”) if it had one or more than one confirmed cases of *Ae. aegypti*; by contrast, a township was denoted as having the “absence of *Ae. aegypti*” (defined as “0”) if it had no confirmed cases of *Ae. aegypti*. Of the 7019 villages surveyed, 973 were identified as having *Ae. aegypti* presence. These villages were then transformed such that they constituted 73 of the 348 townships investigated in this study (Fig. 2).

### Satellite-derived LST products

The MODIS sensors onboard the Terra and Aqua satellites, two Earth Observing System platforms, can provide daily observations of terrestrial, atmospheric, and oceanographic variables on a global scale. Thus, MODIS products are widely used in meteorological studies worldwide, including LST data and images from MODIS thermal bands distributed by the Land Processes Distributed Active Archive Center of the US Geological Survey [27]. The MODIS LST and emissivity products can provide per-pixel temperature and emissivity values in a sequence of swath-to-grid-based global products. Both Aqua and Terra (MYD11 and MOD11) products contain Level 2 and 3 LST and emissivity retrieved from MODIS data at spatial resolutions of 0.01° [27] and 0.05° over global land surfaces under clear-sky conditions. The nominal accuracy of the MODIS LST product has been reported to be ±1 K. Some validation studies have reported accuracy statistics smaller than 1 K under clear-sky conditions within the temperature range of -10 to 50 °C [28]. In this study, nighttime LST retrievals on a 0.05° latitude/longitude climate modeling grid were collected and analyzed to determine covariates that were exhaustively known over the spatial-temporal domain in order to obtain complete prediction over a land mass. In MODIS monthly composites, 0–15% of the pixels are missing because of clouds or other factors. In this study, these missing pixels were replaced using four spatial interpolation models (i.e. IDW, LPI, RBF, and OK), which filled gaps in areas with sparse or irregularly spaced data points. The units in the original monthly MODIS LST images were also converted from Kelvin to degrees Celsius. The nighttime MODIS LST data were used to calibrate the entire study area by using spatial interpolation models (Fig. 3).

### Spatial interpolation

Spatial interpolation is the procedure of estimating the values of an environmental variable (e.g. nighttime LST) at locations with unknown values by using a sample of locations with known values. We employed spatial interpolation involving the application of four interpolation methods to a set of points with known values to create a continuous surface. These points are also known as sample points or observations.

### Inverse distance weighting

IDW estimates a value of each location by taking the distance-weighted average of the values of sample points in its neighborhood. The closer a sample point is to the location being estimated, the more influence or weight it has in the averaging process [29]. Mathematically, IDW can be expressed as follows:

$$ {Z}_0={\sum}_{i=1}^n{w}_i{Z}_i $$

$$ {w}_i=\frac{d_i^r}{\sum_{j=1}^n\left(1/{d}_i^r\right)} $$

where *Z*_{0} is the interpolated value of the point being estimated, *n* is the number of sample points in the neighborhood, *Z*_{
i
}(*i* = 1, …, *n*) is the value of the *i*th sample point in the neighborhood, *d*_{
i
}(*i* = 1, …, *n*) is the distance from the point being estimated to the *i*th sample point in the neighborhood, and *γ* is the power.

The power parameter determines the significance of sample points on the interpolated value. When a higher power is defined, more emphasis is placed on nearby points, producing a more varying and less smooth surface. Specifying a lower power places more emphasis on distant points, resulting in a smoother surface. A power of 2 is most commonly used with IDW [29].

### Local polynomial interpolation

A trend surface interpolation entails fitting a smooth surface defined by a polynomial function to a set of sample points and then using the polynomial function to estimate the values of unsampled locations. The general equation of a trend surface is as follows:

$$ f\left(x,y\right)={\sum}_{i,j=0}^p{b}_{i,j}{x}^i{y}^i $$

where *p* is the degree of the polynomial. As the polynomial order increases, the surface being fitted has an increasing number of curvatures and becomes progressively more complex. Although polynomial orders as high as 10 are accepted, numerical instability in the analysis often creates artifacts in the trend surfaces of orders higher than 5 [29].

We considered the simplest form of a trend surface to be a planar surface with no curvature, which is defined by a local linear or first-order polynomial:

$$ f\left(x,y\right)={b}_{0,0}+{b}_{1,0}x+{b}_{0,1}y $$

where *x* and *y* are coordinates, *b*_{i, j}(*i*, *j* = 0, 1) are polynomial coefficients, and *f*(*x*, *y*) is the value of an environmental variable at the location (*x*, *y*). The coefficients of the polynomial function can be determined from sample points by minimizing

$$ {\sum}_{i=1}^n{w}_i{\left\{{z}_i-f\left({x}_i,{y}_i\right)\right\}}^2 $$

This is called a local least-squares method, which ensures that the sum of the squared deviations of observed values at sample points from the trend surface is a minimum. Suppose that there exist *n* sample points, whose values are *z*_{1}, *z*_{2}, …, *z*_{
n
} and the corresponding coordinates are (*x*_{1}, *y*_{1}), (*x*_{2}, *y*_{2}), …, (*x*_{
n
}, *y*_{
n
}); *w*_{
i
}is a kernel smoothing weight and is given by a negative exponential function of the following form:

$$ {w}_i\left\{\begin{array}{c}{e}^{-{d}_i},{d}_i\le {d}_0\\ {}0,{d}_i>{d}_0\end{array}\right. $$

where *d*_{
i
}(*i* = 1, …, *n*)is the distance from the point being estimated to the *i*th sample point; *d*_{0} is the bandwidth [30].

### Radial basis function

RBFs are conceptually similar to the approach of fitting a rubber membrane through measured sample values while minimizing the total curvature of the surface. An RBF is a real-valued function whose value depends only on the distance from the origin; the basic idea is to choose a radially symmetric function, φ(*r*_{
i
}). The basis function selected determines how a rubber membrane would fit between the values [31]. The general form of an RBF can be written as

$$ {Z}_p={\sum}_{i=1}^n{w}_i\varphi \left({r}_i\right)+m $$

where *Z*_{
p
}is the estimated value for the surface at the grid point *p*; φ(*r*_{
i
}) is the RBF selected, with *r*_{
i
} being the radial distance from point *p* to the *i*th data point; and *w*_{
i
} is the weight and *m* the bias value (or Lagrangian multiplier), which are estimated from data points [32].

The form of the RBF in this study was a weighted regularized spline, which is expressed as follows:

$$ \upvarphi (r)=\mathit{\ln}{\left( cr/2\right)}^2+{E}_1{(cr)}^2+\gamma $$

where *γ* is the distance between the point and the sample, *c* is a smoothing parameter, and *γ* is Euler’s constant (*γ* = 0.577).

*E*_{1}() is an exponential integral function given by

$$ {E}_1(x)=\underset{1}{\overset{\infty }{\int }}\frac{e^{- tx}}{t} dt $$

In the regularized type, the predicted surface becomes increasingly smooth as the weight value increases [31].

### Ordinary kriging

Kriging assumes that in most cases, spatial variations observed in environmental phenomena are random yet spatially correlated, and data values characterizing such phenomena conform to Tobler’s law of geography (data values at locations that are close to each other generally exhibit lower variability than data values at locations that are farther away from each other). This is called spatial autocorrelation. The exact nature of spatial autocorrelation varies with dataset and each dataset has its own unique function of variability and distance between sample points. This variability is represented by a semivariogram. Kriging uses semivariance values obtained from the fitted semivariogram to estimate the weights used in the interpolation and variance of interpolated values [29].

Semivariance measures the variability of observed values at sample points that are separated by a certain distance. It is calculated using the following equation:

$$ \gamma (h)=\frac{1}{2n}{\sum}_{i=1}^n{\left[z\left({x}_i+h\right)-z\left({x}_i\right)\right]}^2 $$

where *γ*(*h*) is the semivariance for a distance *h* separating two sample points *z*(*x*_{
i
}) and *z*(*x*_{
i
} + *h*), and *n* is the number of pairs of sample points separated by *h*.

OK assumes that no trend exists in the data and that the mean of the dataset is unknown. The weights are derived by solving a system of linear equations, which minimizes the expected variance of data values:

$$ {\sum}_{j=1}^k{w}_j\gamma \left({h}_{ij}\right)+\mu =\gamma \left({h}_{i0}\right)\ \mathrm{for}\ \mathrm{all}\ i=1,\dots, n $$

$$ {\sum}_{i=1}^k{w}_i=1 $$

where *k* is the number of sample points within the neighborhood, *w*_{
i
} is the weight for the *i*th sample point to be estimated, *γ*(*h*_{
ij
}) is the semivariance between sample points *i* and *j*, *γ*(*h*_{i0}) is the semivariance between sample point *i* and the point to be estimated, and *μ* is a Lagrange multiplier, which is added to ensure the minimum possible estimation error. Once *w*_{
i
}(*i* = 1, …, *k*) are found, the equation \( \left({w}_i=\frac{d_i^r}{\sum_{j=1}^n\left(1/{d}_i^r\right)}\right) \) is used to estimate values at unsampled locations.

We selected the spherical semivariogram model to estimate the semivariogram that was fitted with a continuous curve.

The error variance for each interpolated point can be estimated using the following equation:

$$ {\sigma}^2={\sum}_{i=1}^k{w}_i\gamma \left({h}_{i0}\right)+\mu $$

The square root of the variance provides the standard error at the interpolated point, which yields an error estimate and the confidence interval for the unknown point.

Suppose the interpolated value is *z*_{0}. If interpolation errors have a normal distribution, the real value at the interpolated point is within \( {z}_0\pm \left(\sqrt{\sigma^2}\times 2\right) \) with a probability of 95% [29].

### Cross-validation

The cross-validation technique was adopted in this study to evaluate and compare the performance of different interpolation methods. The sample points were arbitrarily divided into two datasets, with one set used to train a model and the other used to validate the model. To reduce variability, the training and validation sets must cross over in successive rounds such that each data point could be validated against. The root mean square error (RMSE) for error measurement was estimated to evaluate the accuracy of the interpolation methods [33].

$$ \mathrm{RMSE}=\sqrt{\frac{\sum_{i=1}^N{\left({O}_i-{P}_i\right)}^2}{N}} $$

where *O*_{
i
} is the observed value, *P*_{
i
} was the predicted value, and *N* is the number of samples.

Seasonal data derived from MODIS raster images were evaluated using the four spatial interpolation methods (i.e. IDW, LPI, RBF, and OK). The interpolation methods were used to transform point data into zonal data by using ArcMap 10, and each model automatically generated the minimum, mean, and maximum estimates. The spatial interpolation models and error measurements (i.e. RMSE) applied in this study were mapped using ArcMap 10.

### Transformation of the interpolated LST into 348-township maps by using the binary principle

We assumed that a critical low temperature limits the biogeographical distribution of *Ae. aegypti* to Southern Taiwan. Designing criteria for creating a binary map of 384 townships for LSTs was necessary; from a binary map, the extent of spatial similarity could be evaluated by comparing geographical distributions between *Ae. aegypti* and LSTs. The threshold temperatures were estimated to range from 8 to 28 °C, encompassing the seasonal temperatures in the plain areas of Taiwan. Criteria were designed as follows: (i) if the LST of a township was higher than an assumed temperature, then this location was considered a favorable habitat for the survival of *Ae. aegypti* (optimal survival conditions); (ii) if the LST of a township was lower than a given temperature assumption, then this location was considered harmful for the survival of *Ae. aegypti* (nonoptimal survival conditions); and (iii) according to the binary principle, optimal and nonoptimal survival conditions were defined as “1” and “0,” respectively. Scenarios from each of the given temperature assumptions were generated, which were then mapped to a dichotomous map of the 348 townships.

### Phi coefficient

The phi coefficient (∅) is the version of Pearson’s *γ* that is used when both X and Y are true dichotomous variables [34]. It can be calculated from general Pearson’s *γ* by using score values of 0 and 1 or 1 and 2 for group membership variables. Alternatively, the phi coefficient can be computed from cell frequencies in a 2×2 table that summarizes the number of cases for each combination of X and Y scores. The frequencies of cases in the four cells of a 2×2 table are labeled in order to compute the phi coefficient from the cell frequencies. Assuming that the cell frequencies are from a to d (i.e. *a* and *d* correspond to “discordant” outcomes and *b* and *c* correspond to “concordant” outcomes), then the following formula can be used to compute the phi coefficient directly from the cell frequencies:

$$ \varnothing =\frac{bc- ad}{\sqrt{\left(a+b\right)\times \left(a+c\right)\times \left(b+d\right)\times \left(c+d\right)}} $$

where *b* and *c* are the numbers of cases in the concordant cells of a 2×2 table and *a* and *d* are the numbers of cases in the discordant cells of a 2 × 2 Table.

A formal significance test for the phi coefficient can be obtained by converting it into a chi-square; in the following equation, *N* represents the total number of scores in the contingency table:

$$ \varnothing =\sqrt{\frac{{\mathrm{X}}^2}{N}} $$

This is a chi-square statistic with a degree of freedom (*df*) of 1. This can be used as one of the many possible statistics to describe relationships between categorical variables based on the tables of cell frequencies. For *X*^{2} with *df* = 1 and α = 0.05, the critical value of *X*^{2} is 3.84; thus, if the obtained *X*^{2} value exceeds 3.84, then the phi coefficient is statistically significant at the 0.05 level [34].

In this study, a correlation statistic was computed using the phi coefficient to evaluate the correlation of the binary variables, namely *Ae. aegypti* geographical distribution and LST, under a given temperature assumption (details described in “Transformation of the interpolated LST into 348-township maps by using the binary principle” in the Methods section). Phi coefficient statistics were calculated using SPSS 12 (SPSS, Chicago, IL, US).