### Sampling

Samples were taken in 10 countries so as to cover most of the range of *Indoplanorbis exustus* (see Table 1 for details of taxa and sampling). No samples were taken in Africa, as these populations were assumed to be represented by the samples from Arabia. Identification of taxa followed published accounts[17]. Whole snails were fixed in 10% ethanol directly in the field. A small subsample of each population was taken in 1% neutral formalin to aid identification. DNA was extracted from individual snails by a standard method[76]. Sequence variation was assessed at two loci, being partial sequences of the mitochondrial (mt) cytochrome oxidase subunit I gene (*cox* 1) and the mt large ribosomal-RNA gene (*rrn* L or 16S). Sequences of the oligonucleotide primers used in the PCR for the amplification of these loci are published elsewhere, for the *cox* 1 amplifications the primers HCO-2198 and LCO-1490 were used[77]. For the *rrn* L locus the primers of Palumbi *et al.* (1991) were used[78]. 2 to 8 snails were sequenced for each locus/population (mode 4 *rrn* L and 2 *cox* 1). Mitochondrial loci were chosen because with their maternal pattern of inheritance and therefore smaller effective population size, these loci may be expected to evolve more rapidly and be better suited to estimation of intraspecific phylogenies and more recent (i.e., post-Miocene radiations). In addition, these loci have proven useful in such studies of gastropods and molecular clock rate data are available[24, 42, 43, 46, 79, 80]. Total genomic DNA was used as a template for PCR amplification on a Progene thermal cycler (MWG) employing standard PCR conditions[81]. Unincorporated primers and nucleotides were removed from PCR products using the QIAQuick PCR purification kit (QIAGEN). Sequences were determined bidirectionally, directly from the products by thermal- cycle-sequencing using Big Dye fluorescent dye terminators and an ABI 377 automated sequencer (Perkin-Elmer), following procedures recommended by the manufacturers. Sequences were assembled and aligned using Sequencher (version 3.1 Gene Codes Corp. Ann Arbor, Michigan). DNA sequences for both strands were aligned and compared to verify accuracy. Controls without DNA template were included in all PCR runs to exclude any cross-over contamination.

A multiple sequence alignment was generated for all taxa for each locus and the *cox* 1 and *rrn* L sequences were then concatenated to form a combined data set. No intrapopulation variation was found. Corresponding sequences were obtained for *Radix auricularia rubiginosa* (Michelin, 1831) (Lymnaeoidea: Lymnaeidae) in this laboratory, as outgroup sequences. Additional outgroup data were obtained for *Biomphalaria glabrata* (Say, 1818) (Planorbidae: Planorbinae) from the GenBank [GenBank:NC_005439][82].

### Preparation of data

As an initial test for phylogenetic signal in the data, a Permutation Tail Probability test (PTP)[83, 84] with 50000 replicates (each of 100 random additions), was performed on the ingroup. Although often criticised for being too liberal[85], the PTP-test is useful as a first step in phylogenetic analysis[86], as rejection of the null hypothesis here suggests that the real data may be better than randomly permuted (phylogenetically uninformative) data. In addition Tajima's test for neutrality (based on the total number of mutations) was performed using DNAsp (v. 5.10.01)[87]. The data were tested for substitution saturation using plots of the numbers of transitions (Ts) and transversions (Tv) against the maximum likelihood (ML) genetic distance[88]. The indications of these plots were further evaluated using the entropy-based test[35] as found in the DAMBE (v. 4.5.20) software package[89], which provides a statistical test for saturation. Aside from the parsimony analyses, gaps were coded as a binary matrix, and all characters were run unordered and equally weighted. Indels for the sequence alignment appeared to be homologous (they were short and of equal length in all taxa) and alignment errors were unlikely (the sequences aligned readily in Sequencer under stringent parameters). In addition, recurrent mutation is less likely to significantly affect the analysis of intraspecific phylogenies and there was no reliably tested approach for choosing a particular model. Consequently, gaps were coded simply in a binary, presence/absence, manner rather than using SIC or other more complex coding methods[90].

### Phylogeny reconstruction: parameters and priors

Phylogenetic estimation used three different approaches; this strategy was adopted to look for resilience of the hypothesis to changes in method and associated assumptions. The rationale was that any clade that was represented in phylogenies found by all methods (and well supported) would be a reliable reconstruction of phylogenetic history for these taxa. The use of a direct ML method and Bayesian methods also allowed comparison of estimated branch lengths, as Bayesian methods (as implemented in MrBayes) have been known to erroneously converge on LT solutions in cases where partitioned data are described by complex models with many parameters close to nonidentifiability[36, 97]. The ILD test comparing the *cox* 1 and *rrn* L loci was significant (*P* = 0.02), so the two partitions were modeled separately. The starting models indicated for the ML analyses were as follows:

All sites: TIM + G

*cox* 1 first codon positions: TrN + I

*cox* 1 second codon positions: F81

*cox* 1 third codon positions: HKY

*rrn* L: HKY + G

Indels: F81

The F81 model for gaps was not chosen by goodness of fit tests, but it is the only appropriate model suitable for indels. Also, it is a monotone model fitting the similarly monotone substitution of a binary character evolving along a phylogeny[98]).

The following details the methods particular to each phylogenetic method. Analyses were time consuming and performed on several multiprocessor machines (4 × 2.44 GHz PCs), with output files pooled from multiple simultaneous runs where appropriate.

#### ML with heuristic search

Heuristic searches were performed (under the respective model and starting parameters indicated by Modeltest) using PAUP* initially with 1000 replicates, random addition of sequences (10 replicates) and tree-bisection-reconnection branch-swapping (TBR) options in effect. The TIM+G model was used, gaps were treated as missing data. The outgroup was forced to be monophyletic. The initial search rapidly converged on a single tree which was hit 100% of the time. Consequently, the search was repeated with nchuck = 10 and chuckscore = 3210 in an attempt to improve exploration of parameter space and find trees of better score. This was repeated with the monophyly constraint both on and off. A final search of 5000 replicates was then performed to increase the chance that the ML tree had been found. A likelihood ratio test for a strict molecular clock hypothesis was also made possible by repeating the final run with a clock constraint. Nodal support was assessed by bootstrap with 5000 replicates.

#### Bayesian method (BM) with McMC applied by P4

Bayesian phylogenetic methods are becoming increasingly popular in molecular systematics with the belief that they are not only faster in terms of computing time (for analyses with an equivalent level of confidence) but are also statistically superior to a solely ML method[99]. For example, such methods do not assume approximate normality or large sample sizes as would general ML methods[100]. BMs differ from direct ML methods in that the former consider the posterior probability of the model (with parameters) and tree after observing the data. The posterior probability of a hypothesis is proportional to the product of its prior probability and the probability of observing the dataset given the hypothesis (i.e., the likelihood of the hypothesis). Consequently, unlike direct ML, a BM allows incorporation of prior information about the phylogenetic process into the analysis. P4 (v. 0.87) was used as the primary method to apply the BM; this employs the same method as MrBayes[101] but allows consideration of unresolved trees (i.e. polytomies) and can implement all commonly used substitution models as well as any custom model the user might need to specify[102].

The priors specified for the BM generally followed the default values found in MrBayes; a flat Dirichlet distribution was set as the prior for the state frequency and for the rate set priors (*e.g*., revmat, tratio), an exponential prior was set on the branch lengths. A polytomy proposal was set as either zero (*i.e*., no favouring of multifurcations) or as *e, e*^{1.11}*e2* to examine the effect this has on the posterior probabilities of the clades found; this implements a move (proposed by Lewis et al., 2005) to counter the problem of the spuriously high posterior clade probabilities returned by MrBayes relative to corresponding ML analyses[103]. Model parameters and relative rates were set to be freely variable and unlinked; there were four discrete rate categories for the Г-distribution. Outgroup monophyly was enforced. The McMC was tuned to give proposal acceptance rates between 10 and 70% for each data partition.

Four simultaneous Markov chains were run (one cold, three heated) and trees were sampled every 10 generations, two such runs were performed simultaneously. Convergence of the McMC was assessed by plotting split support (at the Indonesia/Borneo node) for consensus trees over different generation time windows; the generation of convergence was considered to be that at which the support reached a plateau. In addition, a plot of the cumulative split frequencies from two simultaneous runs (which should cluster along the line y = x when both runs are converging on the same region of parameter space) was also used to determine the point of stationarity (see literature for details of rationale[104]). Generations prior to the inferred point of convergence (i.e., prior to stationarity) were discarded as a burnin.

#### Bayesian method with McMC applied using MrBayes

To afford comparison with more standard analyses and to confirm the findings with P4, the BM was reapplied using the popular package MrBayes (v. 3.1.2). Run conditions followed those of the P4 analyses, where possible. Unlike P4, the polytomy proposal for MrBayes is effectively fixed at zero and model specification was more limited than in P4. In order to help assess if runs had converged the average standard deviation of the split frequencies (ASDSF) was used to gauge the similarity of trees sampled by 2 independent runs and the Potential Scale Reduction Factor (PSRF) was used to compare the final mean parameter estimates (for the 2 runs), including the posterior probabilities of nodes[105, 101]. The McMC was considered complete if this ASDSF was <0.01 otherwise the run (and burnin) was lengthened until the ASDSF was adequate. PSRF values close to 1.000 were also considered as indicating that convergence might have been achieved.

As noted above, it is possible for MrBayes (and P4) to converge on an incorrect LT solution where partitioned data are used with complex models. This can occur because the default starting tree used by MrBayes often has relatively long branch lengths for most data sets (this can draw the chain towards long-trees in the early stages of the run, such that convergence on the correct solution is never achieved). This problem may not be detected by either the ASDSF or PSRF[36]. To reduce the risk of this source of systematic error the following precautions were taken. Branch lengths of the final trees were compared with and without a user specified starting tree (the ML tree from PAUP was used for this). The relative rates for the partitions were examined because as the chain is drawn to LT solutions the relative rate of the partition with fewest variable sites is often inflated (and alpha falls to accommodate this). In the present study the expectation was that third codon positions should have the highest rates and *rrn* L among the lowest - deviations from this would be considered symptomatic of an LT error. Although the mean branch length prior in MrBayes was left at the default 0.1, that in P4 was lowered to 0.001 and the branch lengths of the final trees compared (following published suggestions[106]). There was an exponential prior on the branch lengths in MrBayes. In P4 Pinvar was also removed from the model for *Cox* 1 first codon positions; this move was to reduce or detect any parameter correlation affecting convergence[107].

#### Parsimony method using POY

To provide an analysis free of the assumptions underlying ML methods, and any effects particular to the alignment inferred by Sequencher, the topology for the phylogeny of the snail populations was also estimated using a maximum parsimony approach implemented by POY (v. 4.1.1). POY was used to effect a direct optimization criterion, which performs simultaneous tree searching and sequence alignment. Direct optimization finds a set of parsimonious homologies for a set of sequences of different length by treating the full homologous nucleotide string as a character and each sequence as a state (i.e., the data are treated as dynamic homology characters)[108]. The data are partitioned in the sense that the homologies are not moved between them, but are combined during optimization in that each has the potential to influence the "alignment" of any other partition (strictly alignment is not performed, only implied). The data partitioning scheme and Tv/Ts weighting scheme and gap cost were determined by a MRI[109] and "Navajo rug" approach (a graphic plot where the parameter space is represented as a grid)[110]. The MRI provides only a rough measure of congruence among data partitions; however, the approach has performed adequately in empirical tests[111]. The Tv:Ts weightings considered were 1, 2, 4, 8, 16 and 32, and the gap costs similarly ranged from 1 to 64 (a total of 42 weight matrices). The gap opening cost was set to one. One Navajo rug was produced for each data partition (*cox* 1, *cox* 1 first and second codon positions, *cox* 1 third codon positions, *rrn* L and *cox* 1 + *rrn* L) and weighting scheme was plotted against clade, with each clade sored as monophyletic, paraphyletic or polyphyletic. Clades were taken from the ML tree found in the PAUP analyses as well as additional clades found on trees in the most frequent 5% of trees from the posterior distribution in MrBayes. The Navajo rugs were used to detect any unusual sensitivity of a particular data partition to changes in character weighting. The MRI was used to guide the choice of data partitioning strategy and weighting scheme.

The search strategy was as follows. Initially 250 random-addition (Wagner) trees were built. Alternate sub-tree pruning and TBR rounds of branch-swapping were then performed on all 250 trees until a local optimum was achieved (using default options in POY 4). In addition to the most optimal trees, all the suboptimal trees found within 5% of the best cost were also considered. Then all but the best 100 of all optimal and topologically unique trees were excluded from the analysis. The remaining trees were then subjected to 15 rounds of ratcheting, where 20% of the characters were re-weighted by a factor of 2. During ratcheting, the dynamic homology characters were transformed into static homology characters, so that the fraction of nucleotides (rather than of sequence fragments) could be re-weighted. 200 swappings of subtrees identical in terminal composition but different in topology, were then performed between pairs of best trees from the ratcheting step . These two steps were intended to increase the efficiency of the search of tree space. Each resulting tree was then refined further by local branch swapping under the default parameters in POY for swapping . The optimal and topologically unique trees were then reported. The resulting tree was then used as a start point for 2 rounds of optimization using iterative pass; this constitutes an extremely effective means of determining parsimonious cladogram costs[112]. Clade support was estimated by jackknifing of static transformed characters (5000 replicates, 50% of characters deleted).

### Hypothesis testing

#### Topological method

The posterior probability of a particular phylogeographical hypothesis was tested by determining the proportion of trees with this topology in the posterior probability distribution output from a tree search using P4. This was done by writing a constraint tree describing the hypothesis of interest and reading this into PAUP*.

#### SH-test [113]

Again a constraint tree describing the hypothesis of interest was written and used in the SH-test as implemented in PAUP* with full optimization, branch lengths used in likelihood calculation, and 1000 bootstrap replicates. The test compares between and among topologies from constrained and unconstrained searches; the set of "best" trees (from the 5% confidence interval of unique trees sampled from the posterior of the P4 search) and the ML tree (unconstrained) were used as the set of trees for this comparison.

#### Relaxed molecular clock methods (BEAST)

The program BEAST (v. 1.5.3)[114, 115] was used to estimate the molecular clock rates and divergence dates (TMRCA of clades). BEAST implements a Bayesian method for the simultaneous estimation of divergence times, tree topology and clock rates; this method is currently considered superior to other approaches (e.g., non-parametric methods such as NPRS[116] or penalized likelihood methods[117], particularly for phylogenies with a low time depth, because it can allow for uncertainty in dates assigned to calibration points and does not require untested assumptions about the pattern of clock rate variation among lineages[118]. The greatest benefit of using a Bayesian method for dating is that the specification of prior distributions can be used to ensure that the analysis realistically incorporates the uncertainty associated with the calibration points used[119]. The procedure involves the user specifying both a phylogenetic model (a model of population history; the tree prior) and a clock model (of substitution rate variation); however, the likelihood calculation is based on the clock model only. In addition, as in any BM, one must specify a substitution model for each data partition. Rate variation between adjacent branches is assumed to be uncorrelated, as these rates did not show autocorrelation in recent studies[120]. BEAST can implement several combinations of tree and clock models; the present study used the following criteria to compare and choose tree prior and clock models.

Beginning with a Yule tree prior and an uncorrelated log-normal (relaxed) clock model, and with a prior on the tree height and mean substitution rate, each available model was evaluated. Firstly, any models estimating implausible TMRCAs on the outgroup were discounted. Implausible TMRCAs are, for example, those dated to the pre-Cambrian. For those models which gave stable results, the ratio of the marginal likelihoods (with respect to the prior) of alternative models (i.e., the Bayes Factor, BF) was used to choose between them[121] (here importance sampling and the harmonic mean of the sampled likelihoods are used as estimators of the marginal likelihoods). Where there was a tie between two models both were run and the model giving the higher ESS values (most efficient run) was chosen; there was no marked difference in parameter estimates for such ties. Choice of nucleotide substitution model followed that for the BM described above (based on MrModeltest), followed by use of BF tests to look for simpler, equally suitable, models for use in BEAST.