Significant variance in genetic diversity among populations of Schistosoma haematobium detected using microsatellite DNA loci from a genome-wide database

Background Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. Methods We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. Results We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. Conclusions About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma species (i.e., using DNA sequences conserved among species), as well as other markers that are specific to species or species-groups (i.e., using DNA sequences that differ among species). Full genome-sequencing of additional species and specimens of S. haematobium, S. japonicum, and S. mansoni is desirable to better characterize differences within and among these species, to develop additional genetic markers, and to examine genes as well as conserved non-coding elements associated with drug resistance.


Background
Schistosomiasis is caused by blood flukes of the genus Schistosoma, affecting about 210 million people inhabiting 76 countries with tropical and sub-tropical climates [1]. Urogenital schistosomiasis caused by Schistosoma haematobium affects 112 million people, with cases widely distributed across Africa [2]. Several efforts are underway to control S. haematobium, including the Schistosomiasis Consortium for Operational Research and Evaluation (SCORE) program (http://score. uga.edu/index.html), which was established in 2008 to answer strategic questions about schistosomiasis control and elimination. New tools are needed to monitor changes in population structure of the Schistosoma parasite subjected to various levels of drug pressure [3]. Molecular tools, in particular, will allow researchers to determine whether changes in gene frequencies provide insight into the effectiveness of treatment, understand the impacts of treatment on the gene pool and population structure of Schistosoma parasites, and establish whether movement of humans from refugia or non-treated areas introduces new parasites into local populations [4].
Currently, only a single drug, praziquantel, treats schistosomiasis [5]. Cure rates vary but rarely reach 100%, and some infections may persist even with frequent treatment [6]. One goal of the SCORE program is to understand and document whether changes in drug tolerance arise during different drug administration programs, and specific genetic assays are needed to better monitor population structure and potential changes in Schistosoma populations under praziquantel treatment.
Genetic resources exist for studying these aspects of S. mansoni and S. japonicum, due to the availability of the full genome sequence for each [7,8] and the relatively high level of research devoted to these species [4]. The recently published genome sequence for S. haematobium [9] was not available when we initiated this research, but even with this important new resource, relatively few validated genetic markers exist to quantify and analyze genetic variation of S. haematobium populations, creating a significant knowledge gap for this species. Because several large-scale SCORE projects focus on S. haematobium, obtaining larger-scale genetic and genomic resources enabling the study of S. haematobium at the population-level is desirable. Thus, we produced a large-scale database with a validated subset of DNA markers for S. haematobium to create the resources needed for future work by both the SCORE program and the larger research community.
Here, we focused on characterizing a type of highly variable DNA marker in widespread use for more than 20 years -microsatellite DNA loci [10] -which are the most actively used genetic marker within the schistosomiasis research community reviewed by Rollinson et al. [11].
Although 16 microsatellite loci have been developed for S. haematobium and seven have been used in large-scale studies (Gower et al. [3,12]), only three have proven robust in our hands. We sought to develop numerous microsatellite loci from throughout the S. haematobium genome for long-term use, and to identify a superior small subset of 10-20 loci for immediate, routine use. Thus, our general strategy was to collect DNA sequence data from relatively long-reads of random DNA fragments totaling about 1x genome coverage using a Roche 454 FLX Genome Sequencer and to identify microsatellite DNA markers from the sequences using two different techniques. We created a database containing thousands of potential microsatellite markers, and we validated a random selection of 32 markers from the database using DNA collected from 6 widely dispersed populations of S. haematobium. We show that the markers in the database can be used to obtain highly variable microsatellite DNA genotypes of S. haematobium and enable population-level analyses across the range of S. haematobium in Africa.

Schistosoma haematobium samples Genome sequencing sample
Because it is best to use DNA derived from single genotypes for genome sequencing, 90 laboratory bred Bulinus wrighti snails were individually exposed to single S. haematobium miracidia collected from urine samples of school children in Zanzibar. A large number of snails were used because of the expected low proportion of successful infections from single miracidia and the possibility of snail death prior to sufficient clonal cercariae being collected. The infected snails were maintained in the laboratory for 33 days in standard conditions (water-filled trays at 27°C, 12 h/12 h light/dark), and then transferred to individual vessels and placed in the dark. The individual snails were placed in fresh snail water and brought into the light every 2-3 days to stimulate shedding of the cercariae. Eight snails were found shedding cercariae and over a period of 28 days, cercariae from each individual snail were collected using a pipette to capture them in a minimal water volume. Cercariae from each snail were collected in individual 1.5 ml microcentrifuge tubes, centrifuged and stored at −20°C after excess water was removed. Cercariae from each individual snail collected over multiple days were combined for the extraction procedure. The results presented here are based on DNA purified from 3703 cercariae (the total cercariae collected from a single snail over a 28 day period) using DNeasy blood and tissue kits (Qiagen, Valencia, CA). We eluted DNA from the DNeasy columns using two aliquots of 50 μL of Buffer AE, according to manufacturer instructions, and the DNA purification procedure yielded approximately 3.1 μg of DNA, measured with a Nanodrop ND1000 spectrophotometer.

Genetic survey samples
To identify polymorphic microsatellite DNA loci in the random subset and survey genetic variation among S. haematobium populations, adult worms from six African countries (Table 1) were obtained from the schistosomiasis collection at the Natural History Museum (SCAN) [13]. These isolates had been passaged through golden hamsters and B. wrightii in the Natural History Museum (NHM) laboratory and stored in liquid nitrogen. DNA was extracted individually from six adult male and six adult female worms from each of the six localities (Table 1) using the DNeasy blood and tissue kit (Qiagen, Inc.), as described above. To conserve the limited supply of DNA, we performed whole genome amplifications (WGA) of 1 μL (~10 ng) purified DNA using the REPLI-g UltraFast mini Kit (Qiagen, Crawley, UK), and we diluted the resulting WGA amplifications 1:24 prior to using diluted, WGA DNA as a template for microsatellite locus genotyping.

sequencing
To provide low coverage genome sequences of S. haematobium and to identify microsatellite loci, we converted the DNA extracted from the single, pooled sample of 3703 cercariae shed from an individual snail into a 454 sequencing library by ligating Roche 454 titanium blunt MID-tagged adaptors (RL kits were not yet available) to the DNA according to the manufacturer's instructions. Due to limited amounts of genomic DNA, we reduced the stringency of the 454 size-selection conditions, prior to sequencing, to retain smaller fragments in the resulting sequencing library. We sequenced the sizeselected library using titanium chemistry on a Roche 454 FLX Genome Sequencer according to the manufacturer's instructions.

Sequence filtering, assembly, and microsatellite identification
Following sequencing, we trimmed low quality sequence data using default parameters of the Roche v2.3 analysis pipeline, and we assembled trimmed reads to contigs using gsAssembler v.2.3 software. We used RepeatMasker (http://www.repeatmasker.org) and MsatCommander [14] to identify independently microsatellite loci within resulting contigs, and we merged results of the two searches to generate a general feature format (GFF) file containing the microsatellite information.
Comparative mapping of S. haematobium contigs to the S. mansoni genome Because the S. haematobium genome sequence was not publicly available at the time and because we wanted to (1) determine the proximity of microsatellite loci to annotated or putative genes, (2) select putatively unlinked microsatellite loci for testing, and (3) compare the frequency and size distribution of microsatellite motifs across species, we mapped the S. haematobium contigs to the S. mansoni genome using SHRiMP [15] and BWA [16]. This approach assumes significant synteny between the two species, which the mapping results suggest is true -an observation also reported by Young et al. [9]. We then designed primers adjacent to microsatellites having sufficient flanking sequence (> 20 bp) using Pri-mer3 [17] and a modified version of MsatCommander (http://github.com/faircloth-lab/msatcommander-gs), which designs tagged primers for polymorphism testing using a customized Primer3 workflow. We converted this information to the GFF format and put all the information together in a single GFF file.
We selected a random subset of 32 tagged primers from the database (Additional file 1: Table S1) for polymorphism testing by choosing primers designed from the subset of contigs containing only one microsatellite locus and mapping onto the S. mansoni genome (using BWA) with a map quality score of ≥ 30.
Testing polymorphism and surveying genetic variation in S. haematobium  We amplified all loci using one of three touchdown PCR protocols [18], TD65, TD60, or TD55 (Table 2). Each touchdown protocol included a 10°C span of annealing temperatures (65-55°C, 60°C-50°C, and 55-45°C respectively), and touchdown cycling parameters consisted of 20 cycles of 95°C for 30s, highest annealing temperature (decreased by 0.5°C per cycle) for 30s, and 72°C for 30s; and 20 cycles of 95°C for 30s, lowest annealing temperature for 30s, and 72°C for 30s. We ran all PCR products on an ABI-3730xl sequencer with Naurox size standard prepared as described in DeWoody et al. [19] except that all Naurox reverse primers started with GTTT. We used Genemapper version 4.0 (Applied Biosystems) to analyze the results and determine genotypes. Fifteen primer pairs (47%) amplified high quality, unambiguous, PCR products and displayed polymorphisms. Using these 15 loci, we genotyped 72 individuals (12 individuals sampled from 6 locations) using the reaction conditions described above with WGA DNA R: †CGACGTCAAGCACTGTAC * indicates CAG tag (5′-CAGTCGGGCGTCATCA-3′). † indicates pigtail (5′-GTTT-3′). The annealing temperature (T A ) where TD65, TD60, and TD55 indicates touchdown protocols with a highest annealing temperature of 65°C, 60°C, and 55°C, respectively; size indicates the range of sequenced alleles in base pairs plus the length of the CAG tag; the number of individuals genotyped out of 72 is N; k is the number of alleles observed.
( Table 2). To test for potentially negative effects of using WGA DNA, we also screened genomic DNA from 24 individuals at 3 loci (Shae_03, Shae_13, and Shae_15). We did not detect differences between genotypes derived from genomic DNA or WGA DNA.

Analysis of Microsatellite diversity within and among populations
To measure the degree of genetic polymorphism within populations, we calculated expected heterozygosity (H E ), observed heterozygosity (H O ), and mean number of alleles per locus (MNA) using MICROSATELLITE TOOLKIT [20]. We tested for deviations from Hardy-Weinberg equilibrium (HWE) in H E and H O using tests of heterozygote deficiency (U-test) at an α of 0.05 in GENEPOP web version 4.0.10 [21,22]. To estimate the degree of inbreeding within populations, we calculated mean F IS [23] for each population in GENEPOP, and we used GenAlEx 6 [24] to calculate the number of private alleles within each population. Although many population differentiation statistics have been proposed for use with microsatellites, we chose to calculate and compare two measurements of genetic differentiation: F ST and D est . F ST is relatively common in the literature but is often criticized because of its tendency to underestimate genetic differentiation when markers are highly variable, which is often the case with microsatellites [25,26]. R ST [27] is also commonly reported in studies that use microsatellite markers because it is not affected by the amount of genetic variability within populations. However, R ST does not perform well when sample sizes are small [28], and we opted not to calculate this parameter in our study. In place of R ST, we chose to calculate D est [29], which does not underestimate differentiation when markers are highly variable [29]. We used GenAlEx 6 to calculate pairwise F ST using analysis of molecular variance (AMOVA) with 999 permutations, and we used SMOGD [30] to calculate pairwise, harmonic mean D est values. To test for a pattern of isolation by distance (IBD), we used Isolde in GENEPOP to examine the correlation between Slatkin's linearized [27] and the respective geographic distance between sampling locations.
We used GenAlEx 6 to determine the probability of identity (PI), which describes how well a locus can distinguish among genetically different individuals (a low PI indicates powerful genetic markers). To identify loci with potential null alleles and estimate null allele frequencies we used MICROCHECKER v2.2.3 [31] in combination with the Brookfield estimator [32], which assumes that there are no homozygote null alleles and that the absence of a genotype is caused by degraded DNA, genotyping error, etc. To determine which loci provided the most information about population differentiation, we used a log-likelihood ratio (G) based exact test in GENEPOP, and we compared the statistical significance (α = 0.001) of pairwise F ST values for each locus to the respective statistical significance of pairwise F ST values across all loci. Because all across-loci pairwise F ST values were statistically significant, we counted the number of populations with pairwise F ST P > 0.001 (α = 0.001) to measure disagreement between locus-specific estimates of population differentiation and populationspecific estimates of differentiation. Across all populations we calculated mean allelic richness using MICROSATEL-LITE ANALYZER (MSA) v.4.05 [33].
To provide a relative rank of the utility of each locus to all other loci, we selected several desirable characteristics for ranking, which we weighted equally: low PI, low number of populations having null alleles, low number of disagreements with statistical significance of population structure, and high mean number of alleles. We then scored all loci, by ranking each locus within all categories from most desirable (1) to least desirable (15). We added rank values together to create a final score for each locus, and we ordered loci from lowest values of this final score (most informative) to highest (least informative).

sequences
We obtained 1,058,114 reads totaling 299,226,826 bases from 454 sequencing. Average read length was 283 bases, which is lower than usual for titanium sequencing but is expected due to our reduction of size-selection stringency during library preparation. The S. haematobium genome is about 385 megabases (Mb; [9]), which is between S. mansoni at 363 Mb [7] and S. japonicum at 397 Mb [8]. Thus, the 454 sequences we obtained represent about 0.78x coverage. This low-level of coverage is ideal for discovering a large proportion of unique loci (i.e., each read has a high probability of sampling part of the genome for the first time), but is not, on its own, intended for genome assembly because only about half of the genome will be included in the sequences obtained. Even at this low coverage, some DNA fragments will be sequenced multiple times, especially repetitive elements. After read filtering, sequence assembly yielded 11,166 contigs >500 bases in length with an average size of 682 bases, an N50 of 642 bases, a maximum contig size of 5,900 bases, and 89% of bases ≥ Q40.

Comparison of microsatellites among Schistosoma genomes
We determined the frequency distribution of microsatellites by motif and repeat lengths for all microsatellites found in the S. haematobium 454 reads (this study) and the S. haematobium genome (obtained using Illumina sequencing [9]). In addition, we examined all di-, tri-, and tetra-nucleotide repeats in detail and compared them with S. japonicum and S. mansoni (Figure 1; Additional file 2: Table S2). Schistosoma species differed markedly in the frequency of AT repeats with AT repeats being the most common in S. japonicum, second most common in S. mansoni, and second or third most common in S. haematobium, depending on which data (454 or Illumina) are used. In comparison to the published S. haematobium genome (from a strain in long term culture originally isolated in Egypt and sequenced on the Illumina platform [9]), our sequences (from an individual deriving from Zanzibar and sequenced on the 454 platform), there are markedly more ATC repeats and fewer AAGT repeats in the 454 sequences. The frequency of AAGT repeats found in the 454 sequences is more similar to the frequencies found in S. mansoni and S. japonicum genomes. Thus, it is possible that the current full genome assembly of S. haematobium overestimates AAGT frequency. In contrast, the 454 sequences show far more ATC repeats than any of the full genome assemblies. Thus, the ATC repeats are either systematically overestimated in the 454 reads or underestimated in the Illumina reads used for S. haematobium and potentially all three Schistosoma genome assemblies produced so far.
SchistoDB [34]; http://SchistoDB.net provides a rich resource for comparisons among the three schistosomes with genome sequences available. Our sequences add to the possible comparisons, but our results illustrate that some caution should be used in these comparisons. There are many differences among sequencing platforms. Some of these differences are known from the chemistry and technologies employed within the instruments, but it is often difficult to estimate how these will impact specific research projects [35]. Multiple studies have, therefore, been conducted to directly compare sequencing results from multiple platforms e.g., [36][37][38]. Sequencing of additional specimens and sequencing with other chemistries and platforms is needed to determine whether the differences we observed are artifacts of the chemistry and platforms used or reflect biological differences among or within species.

Microsatellite primers and database
We used two different approaches to identify microsatellites and design primers for those loci. In the first approach, the programs RepeatMasker and MsatCommander identified 18,235 and 17,739 microsatellites, respectively. SHRiMP mapping resulted in 144,450 reads mapped from the S. haematobium data to the S. mansoni genome. For each microsatellite within the mapped reads, we extracted 500 bp upstream and downstream of its location from the S. mansoni genome to identify primers adjacent to each locus. Primer3 yielded 13,790 pairs of primers for these loci.
In the second approach, microsatellite loci were identified and primers were designed from the S. haematobium 454 sequences using a modified version of MsatCommander that is tuned to genome-scale datasets. A total of 113,757 microsatellite loci were identified but collapsed into 84,754 loci that were >50 bp apart. Of those, 35,691 had >20 bp of preceding and trailing sequence (i.e., enough for potential primer design). The 454 reads were then mapped onto the S. mansoni genome (v 3.1), resulting in 114,015 reads with a BWA mapQ score ≥30. We then filtered those reads to include only those that had one matching location in the S. mansoni genome, contained a microsatellite, and had >20 bases of flanking sequence on each side of the repeat, resulting in 3,656 reads. From those, 3,148 primer pairs were designed, of which 1,479 were compatible with 5' CAG-tags (Table 2). We further examined the map locations in S. mansoni to retain only one primer pair per scaffold, resulting in 403 primer pairs (Additional file 3: Table S3). We then randomized the list of 403 primer pairs and empirically tested the first 32.
We compiled results from analyses above into lists and searchable databases for distribution. Primers for microsatellite loci, their location within the genome and a BLAST [Alschul et al. 1997] server are available at: http://www. cebio.org/projetos/schistosoma-haematobium-genome.
The different approaches and lists of primer pairs may be useful for a variety of purposes. For example, the first approach demonstrates that large numbers of microsatellite repeats can be obtained using commonly available bioinformatics tools and that many of the S. haematobium reads can be mapped directly to the S. mansoni genome. Because the S. haematobium genome is more similar to the S. mansoni than S. japonicum genome [9,39], the 13,790 loci identified from the SHRiMP alignment represent a large potential pool of markers that could be mapped to S. mansoni, which is 4 times larger than the 3,148 primer pairs that were mapped to S. mansoni from strategy 2. The major advantage of strategy 2, however, is that 1,479 primer pairs could easily be used with CAG-tags for low-cost, 3-primer PCR [40], and 403 of those could be used for linkage mapping in S. haematobium or S. mansoni. Some may find it curious why linkage mapping would be necessary when a genome sequence is available, but the Schistosoma genomes are not yet finished (i.e., not all sequences are assembled into chromosomes) and finishing generally relies upon data that are independent of the sequences to validate the computational assembly of the short DNA sequence reads. Finally, these primers provide potential markers for diagnosis, epidemiology and phylogenetic analysis of Schistosoma species cf., [41][42][43].
Population genetic survey DNA sample quality Fifteen of the 32 primer pairs we screened yielded highquality polymorphic markers with the initial set of 8 DNA samples. We used these 15 polymorphic markers to screen DNA from 72 S. haematobium individuals (six male worms and six female worms each from six locations; Table 1, Table 2, Additional file 4: Table S4). Three worms appeared to have low-quality DNA or contained significant amounts of PCR inhibitors (we obtained genotypes from zero, four, and four loci in each these three individuals). All other samples yielded data from at least 10 loci (one individual each producing amplicons at 10, 11 and 12 loci, all others having genotypes from 14 or all 15 loci). We concluded that the three DNA samples that yielded few or no genotypes were of low quality, and we excluded them from further analysis. We retained the three samples scored at 10-12 loci because it is possible the failure of loci to amplify in these samples results from the presence of null alleles. However, it is also possible that the failure of loci to amplify in these samples results only from their lower quality, potentially inflating null allele estimates. Thus, primer pairs with reduced numbers of individuals scored ( Table 2) may reflect a higher frequency of null alleles or reduced robustness of the PCR conditions to amplify lower-quality DNA. WGA of Schistosoma DNA has been demonstrated to provide reliable templates for genotyping microsatellite loci [44]. WGA is useful because it increases the small amount of DNA obtained from individual worms, eggs, cercariae or miracidia to quantities that allow multiple PCRs, replicate PCRs and thus estimation of genotyping error rates [44]. However, in the current study, while using WGA DNA resulted in more template for PCRs, and thus more PCRs per sample, it did not increase the number of usable genotypes. In fact, we had complete congruence in genotypes called from WGA and genomic DNA suggesting WGA was unnecessary, and only helped us in confirming that the genotypes were repeatable. It is possible that inhibitors were carried through WGA for the 6 samples that yielded relatively few genotypes, but it is also possible that the starting DNA was not sufficient in these samples and yielded amplification of only part of the genome instead of the whole genome. In the future, if there is concern that genomic DNA is of low quality and may result in null alleles and poor amplification we would recommend initially trying WGA on a small subset of samples to see if amplification is improved. Further work on the specific conditions where WGA will improve genotyping for various Schistosoma samples and species is needed.

Genetic variability of Schistosoma haematobium
We analyzed genotypes of 69 individuals across the six locations sampled ( Table 1). The total number of alleles per locus ranged from 5 for Shae_11 to 15 for Shae_03 ( Table 2). The largest number of alleles was in the Zanzibar population (average = 5.7, Table 3), whereas the lowest number of alleles occurred in the South African population (average = 1.3, Table 3). South Africa was also an outlier for expected heterozygosity (6%, Table 3); all other populations had an H E between 51% and 70%. We did not find differences between expected and observed heterozygosities (P = 0.356), and no populations showed evidence of departures from HWE. We identified private alleles in all six populations (Table 3), occurring at a minimum frequency of 0.10. Four populations had slightly negative F IS values whereas two had slightly positive F IS values (Table 3), but none were significantly different from zero. The limited sample sizes used here limit the power to detect deviations from HWE, but the lack of trends in one direction or the other suggest that neither the markers nor samples were systematically biased.
The variability that we find with these 15 polymorphic loci is likely to be as high or higher than that found with loci developed previously e.g., [3,45]. It is important to note that our sampling is broader and shallower than previous studies of microsatellite loci in S. haematobium [3,45]. Direct comparisons of these two primer sets genotyped in the same samples will be needed to determine definitively which are more powerful. Of critical importance, however, is that the loci we describe are not just variable, but the loci are also easy to amplify and score (see below). We are now routinely using multiplexes of these 15 loci to genotype S. haematobium for a variety of ongoing studies and all 15 loci are proving robust for all S. haematobium life cycle stages and are easily scored. Thus, the primers identified in Table 2 provide a significant new resource for researchers of S. haematobium.

Population structure of Schistosoma haematobium
Pairwise F ST values suggest strong population structure ( Table 4 These results are consistent with previous findings [3,34,35,45]. Gower et al. [45] found an average F ST of 0.173 among the five countries they investigated (Cameroon, Kenya, Mali, Niger, and Tanzania). Our F ST values are higher than those found by Gower et al. [45], but it is important to note that the markers (loci) used and sampling intensity heavily influence resulting measures of differentiation [26]. Additionally, highly polymorphic microsatellite loci constrain F ST [26], which makes measures such as D est more appropriate [29]. Despite these constraints, F ST and D est values are quite similar in this study (Table 4). Indeed, we would draw the same conclusions from either measurethere is very high differentiation among all populations, but differentiation is especially high between South Africa and all other populations. We would also draw the same conclusion of significant population differentiation among S. haematobium populations in different African countries from the results of Gower et al. [45] or our study.

Ranking loci by information content
Although all of the loci we tested are useful, some researchers may choose to use only a subset of these 15 loci. Thus, we sought to compute relative utility of each locus. We measured four parameters across the 15 microsatellite loci (

Comparison to other studies and primers
We have developed microsatellite loci from literally hundreds of species using a variety of approaches during the past 20 years [46][47][48]. It is common for about half of the primer pairs we tested to yield amplifiable, easily genotyped, and polymorphic loci. It is not unusual, however, for the proportion to be substantially lower than 50%, but it is somewhat uncommon for the proportion to be substantially higher than 50%. Here, we focused on single-copy loci which are likely to have been sequenced only a single time. Because 454 sequences have an average error rate of about 1% per base, this means that about 18% (1-0.99 20 ) of 20-base primer sequences will contain at least one error. Errors in primer sequences will be enhanced when they are at the ends of a sequence read (where quality is lowest), but the lowquality bases are mostly mitigated by matching the low-coverage S. haematobium 454 sequences to high-  PI is the probability of identity, the number of populations with expected null alleles present, the number of populations with non-significant (p > 0.001)F ST values (significance determined by G-based exact test), A is the mean number of alleles, Score is determined from rank-sum of desirable characteristics (e.g., allelic diversity, lack of predicted null alleles, etc.; see text for details), Rank of Usefulness is the rank order for loci based on Score (lowest Score and rank is best), loci with the same rank indicate loci with identical rank sums.
coverage S. mansoni genomic data. Indeed, most primers we design typically amplify (~75%), but it is well-known that the proportion of variable loci varies greatly among taxa and that mutation rates vary among loci and is correlated with allele length [49]. We focused on tetranucleotide repeats because they are generally easy to genotype [50]. Tetranucleotide repeats also have the advantage of having the lowest mutation rate [51] and thus the lowest level of homoplasy.
We developed a customized pipeline to design primers and identify primer pairs from regions of interest. This pipeline builds on the easily implemented MSATCOMMANDER [14] program, but includes customization to help speed searches on genomescale data and maintains the output in an SQL database. We did not attempt multiple PCR conditions that could optimize the amplification of each locus. Instead, we focused on applying a small number of broadly useful PCR conditions on a moderate number of primer pairs. This strategy has the advantage of quickly focusing on primers that are likely to work well in multiplexes. It also provides a realistic assessment of the proportion of loci that are likely to work under normal lab conditions, rather than the largest possible proportion that could ever be made to work.
We have only tested 32 of the many thousands of primer pairs in our databases. If researchers need additional loci to address health-care challenges posed by urogenital schistosomiasis [52], we recommend investigating additional primers in the list of 403 loci mapped to S. mansoni scaffolds (Additional file 3: Table S3 cf. Additional file 1: Table S1) as the most productive avenue.

Conclusion
The highly variable DNA markers we have developed can be used to investigate the genetic epidemiology of S. haematobium within both definitive mammalian (worms, eggs and miracidia) and intermediate snail hosts (cercariae) from widespread geographical locations. These markers demonstrate that S. haematobium from five of the sampling locations are genetically diverse, whereas S. haematobium from South Africa has much lower diversity ( Figure 2). Schistosoma haematobium from Zanzibar had the highest levels of genetic diversity, complementing the differences in mtDNA diversity found within and between S. haematobium populations from across mainland Africa and the Indian Ocean Islands [53,54]. An elimination programme is now ongoing in Zanzibar, and it will be interesting to see how the strong selection pressures imposed on parasite populations by praziquantel treatments impacts this diversity [55]. The sweep of markers in the panel of loci identified provides a novel microsatellite panel with the potential to be used to identify and track genetic determinants of drug resistance in S. haematobium. Thus, these markers are a valuable resource to SCORE and the wider community of researchers and healthcare professionals researching with an ultimate goal to eliminate urogenital schistosomiasis. Primers for microsatellite loci, their location within the genome and a BLAST server are available at: http://www. cebio.org/projetos/schistosoma-haematobium-genome.