Development of genome-wide polymorphic microsatellite markers for Trichinella spiralis

Background Trichinella nematodes are globally distributed food-borne pathogens, in which Trichinella spiralis is the most common species in China. Microsatellites are a powerful tool in population genetics and phylogeographic analysis. However, only a few microsatellite markers were reported in T. spiralis. Thus, there is a need to develop and validate genome-wide microsatellite markers for T. spiralis. Methods Microsatellites were selected from shotgun genomic sequences using MIcroSAtellite identification tool (MISA). The identified markers were validated in 12 isolates of T. spiralis in China. Results A total of 93,140 microsatellites were identified by MISA from 9267 contigs in T. spiralis genome sequences, in which 16 polymorphic loci were selected for validation by PCR with single larvae from 12 isolates of T. spiralis in China. There were 7–19 alleles per locus (average 11.25 alleles per locus). The observed heterozygosity (HO) and expected heterozygosity (HE) ranged from 0.325 to 0.750 and 0.737 to 0.918, respectively. The polymorphism information content (PIC) ranged from 0.719 to 0.978 (average 0.826). Among the 16 loci, markers for 10 loci could be amplified from all 12 international standard strains of Trichinella spp. Conclusions Sixteen highly polymorphic markers were selected and validated for T. spiralis. Primary phylogenetic analysis showed that these markers might serve as a useful tool for genetic studies of Trichinella parasites.


Background
Human trichinellosis is caused by eating raw or undercooked meat infected with Trichinella parasites [1]. Trichinella parasites have a broad geographical distribution on all continents except Antarctica, and can infect > 150 animal species, including mammals, birds and reptiles [2]. The genus Trichinella contains nine species and three genotypes that can be separated into two clades by the ability to form encapsulated and non-encapsulated larvae [3][4][5]. There are genetic variations in Trichinella spp. based on geographical distributions and host species [6,7]. In China, Trichinella spp. have been reported in a range of animals, including foxes, bears, wild boar, weasels, raccoon dogs, rats, bamboo rats and civets [8]. Only two Trichinella species (i.e. T. spiralis and T. nativa) have been identified in China [8][9][10][11][12]. However, little is known about the genetic variations among the Trichinella species in China.
Genetic variability in T. spiralis was first reported in 1992, with three allozyme patterns at the loci of glucose 6-phosphate dehydrogenase and glucose phosphate isomerase detected in 61 isolates of T. spiralis from zoogeographical regions [6]. Genetic polymorphisms in T. spiralis were also studied using different molecular tools, such as restriction fragment length polymorphism and single-strand conformational polymorphism (RFLP-SSCP) [13,14], non-isotopic single-strand conformation polymorphism ('cold' SSCP) [15], and deep resequencing of the mitochondrial genomes [16]. Compared with other molecular markers, microsatellites exist throughout the genome. In addition, microsatellites are relatively easy to score, since their gel band patterns could provide unambiguous results. Thus, they have been widely used in genetic diversity, population genetic structure, genome mapping, parentage analysis, population genetics and phylogeography studies [17][18][19]. However, only a few microsatellites have been reported in T. spiralis [12,[20][21][22]. The present study was aimed to identify and characterize microsatellites in T. spiralis and to obtain polymorphic microsatellite markers for further study.

Microsatellite identification and primer design
All 9267 contigs of T. spiralis were retrieved from GenBank database (https ://www.ncbi.nlm.nih.gov/nucco re/ABIR0 00000 00) and used to search for microsatellite sequences by MIcroSAtellite Identification Tool (MISA) that was configured with strict minimum motif repeat requirements [25]. The criteria of motifs were that monoto hexanucleotide repeats with a minimum of 12 bp and a minimum of two repeat units. The maximum length of sequence between two simple sequence repeats (SSRs) to register as compound SSR was 100 bp [19]. The number of microsatellites, motif, number of repeats, length of the repeat sequence, repeat type, start and end position of the repeat sequence, and microsatellite sequence, were analyzed using MISA.

Screening of microsatellites by PCR
A total of 1000 SSR primer pairs were selected for preliminary screening by PCR using DNA from a pool of ~ 4000 muscle larvae (~ 350 larvae from each of the 12 T. spiralis isolates in China). For isolating DNA, all larvae were homogenized in 500 μl extraction buffer containing 500 mM NaCl, 10 mM Tris-Cl (pH 8.0), 50 mM EDTA (pH 8.0), 2% (w/v) SDS and 10 mM β-mercaptoethanol, followed by incubation with 5 μl of proteinase K (20 mg/ ml) at 60 °C for 0.5-2 h, phenol-chloroform extraction (50:50%, v/v), precipitation with 70% ethanol, and resuspension in 30-50 μl of sterile water. DNA samples were stored at − 20 °C. PCR reactions were carried out in a final volume of 20 μl, consisting of ~ 50 ng of DNA, 2 μl of 10× Ex Taq buffer (20 mM Mg 2+ Plus; TaKaRa, Kusatsu, Japan), 1.6 μl of dNTP mixture (2.5 mM each), 0.2 μl of Ex Taq DNA polymerase (5 U/μl) (TaKaRa), and 0.4 μl of each primer (10 pmol/μl). PCR amplifications were performed in a thermal cycler (Applied Biosystems, California, USA) using following program: 98 °C for 5 min; followed by 35 cycles of 98 °C for 10 s, a specified annealing temperature for each primer pair for 30 s, 72 °C for 30 s; and a final extension step at 72 °C for 7 min. PCR products were electrophoresed on 1% agarose gels, stained with ethidium bromide and visualized under UV illumination. Microsatellite markers producing single bands were selected as candidate loci for further validation.

Verification of microsatellite polymorphism
Each of the selected primers was validated with 40 single larvae of T. spiralis from seven regions in China. Single larva was digested with proteinase K for DNA extraction using a Tissue and Hair Extraction Kit and a DNA IQ ™ System Extraction Kit (Promega, Madison, USA) with magnetic beads following manufacturer's instructions. DNA was eluted in 25 μl of elution buffer. Whole genome amplification was performed using an Illustra ™ Ready-To-Go ™ GenomiPhi V3 DNA Amplification Kit (GE Healthcare, Pittsburgh, USA) to increase the quantity of DNA. Concentrations of DNA were measured in a NanoDrop 2000 photometer (Thermo Fisher Scientific, Waltham, USA).
PCR amplifications were performed in a 20 μl reaction using a primer mixture which contained three primers: a sequence-specific forward primer with M13-tail at its 5′-end, a sequence-specific reverse primer, and the universal fluorescent-labeled M13 primer (FAM-M13 primer) [27]. A 20 μl reaction contained 0.05 μM forward primer, 0.25 μM reverse primer, 0.2 μM FAM-M13 primer, 0.16 mM dNTP, 1 U of Ex Taq DNA polymerase (TaKaRa), and ~ 50 ng of DNA from a single larva [27]. The PCR program was run as follows: 98 °C for 5 min; 32 cycles of 98 °C for 10 s, an annealing temperature specified for a primer pair for 30 s, and 72 °C for 30 s; eight additional cycles of 98 °C for 10 s, 53 °C for 30 s and 72 °C for 30 s; a final extension at 72 °C for 7 min. PCR products were subjected to capillary electrophoresis analysis (CEA) with a 96-capillary 3730XL DNA Analyzer (Applied Biosystems). Data were analyzed with GeneMapper 4.0 (Applied Biosystems). A negative control with sterile water was included in each PCR run.
Finally, the microsatellite loci with high polymorphism were selected for further validation by PCR using DNA samples isolated from individual larvae from 12 isolates of T. spiralis in China (10 larvae per isolate; total 120 samples). PCR amplification and analysis followed the protocols described above.

Polymorphism analysis
For each locus, the number of alleles (N a ), the effective number of alleles (N e ), the expected heterozygosity (H E ) and the observed heterozygosity (H O ) per locus were estimated using GENEPOP version 4.2 (http://genep op.curti n.edu.au/) [28]. This same software was used to test the polymorphism information content (PIC) and possible deviations from Hardy-Weinberg equilibrium (HWE) with Bonferroni correction [29].

Cross-amplification
DNA samples were isolated from the 12 Trichinella international standard strains as described in section "Screening of microsatellites by PCR" above. Crossamplifications at selected polymorphic loci were performed and analyzed by a capillary electrophoresis using the same PCR protocols as described in section "Verification of microsatellite polymorphism" above.

Phylogenetic analysis
The PCR products amplified from 15 international standard strains at the TsMs03 locus were analyzed by 8% denaturing urea-polyacrylamide gel electrophoresis. The homozygous individuals were selected for sequencing. Multiple sequence alignments of nucleotide sequences at the TsMs03 locus were performed using Clustal Omega (https ://www.ebi.ac.uk/Tools /msa/clust alo/) [30]. The phylogenetic tree was inferred by MEGA X using the Neighbor-Joining method with 1000 bootstrap replicates [31,32].

Abundance and microsatellite characteristics
A total of 93,140 microsatellites were identified from 9267 contigs of the T. spiralis genome by MISA ( Table 1). The microsatellite density was 1591 loci per Mb. Among motifs containing mono-to hexanucleotide repeats, the most abundant was hexanucleotides that accounted for 49.51% of the total, followed by trinucleotide (19.61%) and tetranucleotide (17.44%). The di-, penta-, and mononucleotide motifs accounted for 8.77%, 3.69%, and 0.98% of the total motifs, respectively. The significant decrease in abundance of microsatellites was accompanied by the increase in the number of motif repeats. The number of repeating nucleotide sets was two times in 97.81% of hexanucleotide repeats. Meanwhile the number was three times in 1.81% of hexanucleotide repeats. For the pentanucleotide repeats, 68.29% consisted of three repeats, 19.12% consisted of four repeats, 8.18% consisted of five repeats, and 1.63% consisted of six repeats (Fig. 2). The top 20 most frequently classified repeat types were listed in Fig. 3. The most common motifs in each type of repeats were A/T (59.43%), AT/AT (61.84%), AAT/ ATT (39.28%), AAAT/ATTT (37.30%), AAAAT/ATTTT (18.07%) and AAA AAT /ATT TTT (10.87%). The longest repeat was (TATAA) 98 which belonged to the pentanucleotide group (Table 2).

Polymorphic microsatellite screening
Among the 1000 microsatellite loci selected for primary screening, 676 loci generated PCR products at expected sizes. A total of 120 loci producing single bright band in gel electrophoresis were selected as candidate loci.
Among them, 47 microsatellite loci were homozygotes, while 57 loci showed low polymorphism. Finally, we selected 16 loci that produced distinct bands among individual larvae originated from different regions in China with high polymorphism for further analysis (Table 3).  (Table 4).

Cross-amplification
Among the final 16 loci, 10 produced PCR amplicons for all tested Trichinella spp. Four (i.e. TsMs01, TsMs04, TsMs10 and TsMs14) obtained PCR products only from the Trichinella spp. with encapsulated larvae. Most of these loci were homozygous in the T. britovi (encapsulated larvae) and species with non-encapsulated larvae ( Table 5). In addition, the TsMs07 and TsMs08 loci were amplified from species with encapsulated and non-encapsulated larvae, except for T. pseudospiralis. The average number of amplified alleles in each of the Trichinella spp. ranged from 1.300 (T. papuae and T. zimbabwensis) to 2.938 (Trichinella T9). A maximum of six alleles was observed in Trichinella T9 strain at the TsMs03 locus. Allelic size varied among taxa at a given locus, and one allele was shared by two or three taxa commonly. Trichinella T9 had specific alleles at three loci (i.e. TsMs12, TsMs14 and TsMs16) that were different in allelic size from other Trichinella taxa. None of the alleles at a given locus were shared by all Trichinella spp.

Phylogenetic analysis
Primary phylogenetic analysis showed that all Trichinella spp. clustered into two clades: encapsulated larvae and non-encapsulated larvae group (Fig. 4). Sister relationship was observed for T. spiralis and T. nelsoni in comparison to other species with encapsulated larvae. Trichinella papuae and T. zimbabwensis were more closely related to each other than to T. pseudospiralis.

Discussion
Microsatellites have been used in genetic diversity and genetic mapping studies in various organisms [33][34][35], partly because of their high polymorphism and the ability to detect alleles at a given locus in individual organisms [36,37]. In previous studies, most of microsatellites in T. spiralis were designed based on expressed sequence tag (EST) databases [20][21][22]. The present study identified 93,140 microsatellites in the T. spiralis genomes using MISA, which accounted for 2.25% of the total genome sequence. The relative abundance of microsatellite sequences was estimated at 1.591 loci per kb of the T. spiralis genomes.
Generally, microsatellites decrease in abundance with increasing repeat length [38,39], and this trend has been observed in many organisms [40]. Previous comparative studies of microsatellites from eukaryotic genomes have found that the composition characteristics and distribution patterns significantly varied by species [39,41]. Caenorhabditis elegans has a low frequency of microsatellites in its genome, even lower than Saccharomyces cerevisiae and other fungi [19,42,43]. In general, eukaryotic genomes are characterized by the prevalence of mononucleotide repeat motifs [19,44]. For instance, mononucleotide repeats are the most abundant class of microsatellites in C. elegans [19] and Meloidogyne incognita [45]. However, dinucleotide repeats are the most abundant type of motif in rodents [19] and most dicot plant species [46]. Moreover, trinucleotide repeats are dominant in some algae and fungi species [44,47], potentially indicating their genomic structural similarity with prokaryotes [48]. In contrast, tetra-to hexanucleotide repeats are less abundant in eukaryotic genomes [49,50]. Intriguingly, our results suggested a different distribution pattern for T. spiralis: hexa-> tri-> tetra-> di-> penta-> mononucleotide repeats. The repeat frequency of hexanucleotides (49.51%) was higher than other repeat classes. This may be a characteristic that is unique to T. spiralis. It is also possible that the abundance of repeats is influenced by secondary structures and DNA replication [49].
Among mononucleotide repeats, the motif (A/T) n is predominant, while (C/G) n repeats are rare [45,48]. Our results for the most dominant motif type in mono-to hexanucleotide repeat classes of T. spiralis showed similar  The possible reasons for this (A + T)-rich motif pattern may be as follows: (A + T)-rich motifs can decrease the annealing temperature and accelerate strand separation, and the AT content increases through DNA replication and slippage [49]. Secondly, DNA methylation can generate regions with high mutagenic rates, where the cytidine monophosphate becomes transformed into thymine. This type of mutation results from the deamination of methylation sites, leading to a combination of (A + T)-rich repeats. DNA methylation has been confirmed in the three life-cycle stages of T. spiralis, making it the only nematode species known to date with epigenetic modification of its genome [51]. In addition, these repeats may be favored because the order of bases can directly influence chromatin structure, protein coding and gene function [50]. Previous studies have shown that Trichinella spp. are considered to have low intraspecific genetic diversity and genetic differentiation between populations [6,21,[52][53][54][55][56][57][58]. The unique life-cycle of Trichinella species can often promote sibling inbreeding and reduced population size [58]. Therefore, successful selection of microsatellite   54 (ATA) 64 (CATA) 17 (TATAA) 98 (AAT AGT ) 9 (TGT ATA ) 9 (TAT ATG ) 9 (ATA TAC ) 9 markers with relatively high abundance and polymorphism might be very difficult. Although the microsatellites of T. spiralis were detected in 12% of the 1000 EST sequences by La Rosa et al. [21], only seven microsatellite markers were suitable for genetic subgroup analysis.
In the present study, 16 microsatellite markers with high polymorphism were selected and identified from 1000 candidate microsatellite loci.

Conclusions
We reported the identification of microsatellite sequences from the genome sequence data of T. spiralis with MISA. Among them, 16 microsatellites with high polymorphisms among 12 isolates of T. spiralis from various geographical regions in China were identified, and 10 microsatellites could be amplified successfully from all 12 Trichinella spp. The primary phylogenetic analysis suggested that the newly selected microsatellite markers could be applied to the analysis of genetic relationship of Trichinella spp. These microsatellite markers might serve as an important resource for the further study of Trichinella spp.