Genetic structure of Spirometra mansoni (Cestoda: Diphyllobothriidae) populations in China revealed by a Target SSR-seq method
Parasites & Vectors volume 15, Article number: 485 (2022)
In China, the plerocercoid of the cestode Spirometra mansoni is the main causative agent of human and animal sparganosis. However, the population genetic structure of this parasite remains unclear. In this study, we genotyped S. mansoni isolates with the aim to improve current knowledge on the evolution and population diversity of this cestode.
We first screened 34 perfect simple sequence repeats (SSRs) using all available omic data and then constructed target sequencing technology (Target SSR-seq) based on the Illumina NovaSeq platform. Next, a series of STRUCTURE. clustering, principal component, analysis of molecular variance and TreeMix analyses were performed on 362 worm samples isolated from 12 different hosts in 16 geographical populations of China to identify the genetic structure.
A total of 170 alleles were detected. The whole population could be organized and was found to be derived from the admixture of two ancestral clusters. TreeMix analysis hinted that possible gene flow occurred from Guizhou (GZ) to Sichuan (SC), SC to Jaingxi (JX), SC to Hubei (HB), GZ to Yunnan (YN) and GZ to Jiangsu (JS). Both neighbor-joining clustering and principal coordinate analysis showed that isolates from intermediate hosts tend to cluster together, while parasites from definitive hosts revealed greater genetic differences. Generally, a S. mansoni population was observed to harbor high genetic diversity, moderate genetic differentiation and a little genetic exchange among geographical populations.
A Target SSR-seq genotyping method was successfully developed, and an in-depth view of genetic diversity and genetic relationship will have important implications for the prevention and control of sparganosis.
Parasitic tapeworms of humans and animals cause diseases of major socio-economic importance around the world . Several of these remain neglected in terms of research and control, such as Spirometra mansoni (Cestoda: Diphyllobothriidae) . The plerocercoid larvae of S. mansoni can lead to a serious parasitic zoonosis in humans known as sparganosis by invading the subcutaneous tissues, abdominal cavity, eye and central nervous system, causing local tissue damage, blindness, paralysis, and even death . Human infection results mainly from the ingestion of uncooked frogs or snakes infected with plerocercoids, drinking water contaminated with copepods that have been infected with procercoids or direct contact of frog or snake flesh with an open wound . Most cases of sparganosis have reported in Eastern and Southeastern Asia . On a country basis, China has the largest number of sparganosis cases in the world, with > 1300 reported cases of human sparganosis in 26 out of 34 provinces, autonomous regions and municipal districts . In recent years, the number of local cases have increased in several districts of China, and sparganosis has even been termed as an emerging zoonosis [4, 7, 8].
Knowledge of the extent of intraspecific genetic diversity of S. mansoni is not only useful for understanding the dynamics of individual infections but also valuable for the prevention and control of sparganosis. Several pioneer studies have investigated the genetic diversity of Spirometra tapeworms. Jeon and Eom  performed a genetic variability analysis of Spirometra species from Korea, China and Japan using two mitochondrial gene sequences (cytochrome c oxidase subunit 1 [cox1] and cytochrome b [cytb]). The study identified 148 (cox1) and 83 (cytb) haplotypes within 239 and 213 isolates, indicating that mitochondrial haplotypes of Spirometra erinaceieuropaei and Spirometra decipiens were found in the three Asian countries . Based on cox1 sequences, Yamasaki et al.  suggested two distinct Spirometra species, Type I and Type II, are present in Asia, neither of which is close to the likely European “S. erinaceieuropaei.” Kołodziej-Sobocińska et al.  found distinct genetic separation of the Polish and Chinese populations of Spirometra isolates based on cox1 and cytb, and suggested that isolates from Polish and Chinese populations should be two species. Using a global full-length DNA sequence dataset of cox1, Kuchta et al.  performed a comprehensive phylogenetic analysis of Spirometra tapeworms and suggested that there are at least six distinct lineages of the genus. In China, the genetic diversity of isolates of Spirometra tapeworms from different geographical locations and different hosts (frogs and snakes) have been investigated in recent years [12,13,14,15,16]. Although previous studies have provided valuable information that contribute to a better understanding of the genetic diversity of Spirometra tapeworms, there is room for improvement: (i) previous studies have relied on relatively small sample sizes with few representative isolates; (ii) the isolates used were collected mainly from intermediate hosts (frogs or snakes), with few samples from final hosts included; (iii) previous estimates of genetic diversity were based on one or only a few traditional mitochondrial markers (mostly cox1 and cytb); and (iv) no high-throughput sequencing method was applied. Therefore, our knowledge of the population diversity of S. mansoni obtained from different geographical areas of China is still fragmented.
In the study reported here, we explored the genetic diversity of S. mansoni using a high-throughput simple sequence repeats (SSRs) genotyping method. SSRs, also known as microsatellites, are highly variable repetitive elements with nucleotide motifs of 1–6 bp and are ubiquitous in most eukaryotic organisms . SSRs have been verified as suitable markers for inferring genetic variance and population differences in cestode species [14, 18,19,20,21,22]. However, traditional analysis based on gel electrophoresis cannot distinguish base differences or changes correctly in SSR amplicons, often causing false positive or false negative results in SSR detection . Recently, an increasing number of studies have validated that genome-wide SSRs, especially perfect SSRs, exhibit stable motifs and conserved corresponding flanking sequences [17, 24,25,26]. Therefore, the identification of perfect SSRs is critical in microsatellite analysis, such that amplification of the appropriate PCR products can be ensured in genetic research applications. In this study, we used a genotyping method named Target SSR-seq, which combines high-throughput sequencing with genome-wide perfect SSRs that exhibit stable motifs and flanking sequences. Target SSR-seq enables the genotyping of targeted SSR loci simultaneously in a large number of samples with high coverage, using a single Illumina lane . Moreover, by adding sequencing adapters and dual barcode tags, the SSR genotypes were determined directly from the deep sequencing of PCR products .
In order to conduct a comprehensive genetic diversity analysis of S. mansoni isolates, we collected a total of 368 isolates from 76 different geographical locations in China, which encompassed 16 out of 26 endemic regions of human sparganosis in the country. Specifically, the following objectives were addressed: (i) development of a high-throughput SSR genotyping (Target SSR-seq) method sensitive enough to genotype S. mansoni; and (ii) exploration of the genetic structure of S. mansoni populations with broad representative samples.
This study was performed strictly based on the recommendations of the Guide for the Care and Use of Laboratory Animals of the National Health Commission of China. The protocol was approved by the Life Science Ethics Committee of Zhengzhou University (Permission No. 2020-0704).
Human sparganosis is endemic in 26 provinces/autonomous regions/municipalities in China . From July 2013 to September 2020, we collected S. mansoni isolates in 16 of these regions. For the 10 endemic regions where no samples were collected, we surveyed wild frogs for S. mansoni infections in Heilongjiang, Jilin, Liaoning, Beijing, Hebei, Shandong and Qinghai (excluding Taiwan, Hong Kong and Macao), but found no parasite-positive frog. In total, 368 samples isolated from 12 different hosts in 76 geographical locations were included in this study (Fig. 1). Among the 368 samples, 318 were plerocercoids collected from frogs (Pelophylax nigromaculatus, Sylvirana latouchii, Fejervarya limnocharis, Odorrana margaretae and Boulengerana guentheri), 42 were plerocercoids collected from snakes (Zaocys dhumnades, Elaphe carinata and Elaphe taeniura) and eight were adults collected from felids (Panthera tigris tigris, P. tigris altaica, Prionailurus bengalensis and Felis catus). The precise locality, origin and date of collection of each sample are presented in Additional file 1: Table S1. The methods used to collect plerocercoids in frogs and snakes have been described previously [14, 15]. In brief, frogs or snakes were euthanized using ethyl-ether anesthesia and then weighed and skinned. The presence of plerocercoids in muscles and subcutaneous tissues was carefully observed visually. Once identified, the worms were isolated using small scissors and forceps and placed in phosphate-buffered saline to count and observe their shapes and movements. The plerocercoid was identified as S. mansoni by molecular genotyping with bidirectional sequencing of the mitochondrial cox1 gene (Additional file 1: Table S2) [3, 29]. The definitive hosts (cats and tigers) for collecting adult worms came from a wildlife zoo park in Changsha City, Hunan Province. All animals died of natural causes, and necropsies were carried out under approved protocols by the wildlife zoo park technical office. The intestinal tracts of the analyzed carnivores were carefully removed from each carcass and subsequently isolated by ligatures (pylorus and rectum). Examination of the intestinal content was performed as described by Arrabal et al. . All collected samples were washed extensively in physiological saline, snap-frozen in liquid nitrogen, and then stored at – 80 ℃ for further study.
Development of perfect SSR markers
A method for SSR genotyping using a target sequencing technology called Target SSR-seq was used to explore the genetic structure of S. mansoni populations in China (Fig. 2). The first step was to analyze the transcriptome data (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA761840)  to uncover genome-wide SSRs using the MISA-web tool . The search was carried out using the following parameters: (i) > 9 repeat units for mononucleotide microsatellites; (ii) ≥ 4 repeat units for di-, tri-nucleotide microsatellites; (iii) ≥ 3 repeat units for tetra-, penta- and hexa-nucleotide microsatellites; and (iv) repeat length up to 100 bp. The poly-A and poly-T sequences at the terminal regions of the unique transcripts (UTs) were removed before searching. Second, the publicly available expressed sequence tags (ESTs) were searched and retrieved from the NCBI EST database. MISA (MIcroSAtellite Identification Tool) was also used to search for microsatellite loci in the EST sequences. The following parameters were used to identify perfect SSRs (i.e. those that exhibit stable motifs and have conserved corresponding flanking sequences): (i) SSR motif length < 50 bp; (ii) no INDELs, poly regions or SSR loci in the 150-bp flanking sequence; and (iii) even distribution in chromosomes . Primers were designed for SSR loci by Primer3Plus with the following parameters: (i) primer lengths ranging from 18 to 27 bp; (ii) product lengths ranging from 100 to 300 bp; (iii) melting temperature (Tm) ranging from 57 °C to 63 °C; and (iv) GC-content ranging from 40 to 60%, with an optimal value of 50%. Before library construction, we needed to optimize these SSR markers twice. First, each SSR was independently tested and optimized by a single PCR amplification procedure consisting of: 1 cycle of 95 °C for 2 min; 11 cycles of 94 °C for 20 s, 63–58 °C (− 0.5 °C per cycle) for 40 s, 72 °C for 1 min; then 24 cycles of 94 °C for 20 s, 65 °C for 30 s, 72 °C for 1 min; with a final cycle at 72 °C for 2 min. SSR markers which amplified clear bands were selected for further multiplex PCR optimization. According to the multiplex PCR results, the composition and concentration of SSR markers in the panel of multiple PCR were adjusted and optimized to ensure that each SSR marker in the multiplex system could be amplified efficiently and specifically.
Library construction and high-throughput sequencing
Two rounds of PCR were performed to construct the library. In the first round, the optimized panels were used for multiple PCR amplification using the genomic DNA of each sample as a template. After quality control, the amplified products were mixed, and the quantity of amplified products of each SSR marker was ensured to be equal. In the second round, primers with index sequence were used to add 8 bp of specific label sequences compatible with the Illumina NovaSeq 6000 platform (Illumina Inc., San Diego, CA, USA) to the end of the library. After that, the amplified products of all samples were mixed in equal quantities, and the final FastTarget™ (Genesky Biotechnologies Inc., Shanghai, China) sequencing library was recovered by gumming. The length distribution of fragments in the library was verified using an Agilent 2100 BioAnalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA). After the molar concentration of the library was accurately quantified, the Illumina NovaSeq 6000 platform was finally used for high-throughput sequencing in a 2 × 150-bp double-ended sequencing mode, and each SSR region was sequenced for at least 1000× coverage. In addition, positive and negative controls of PCR amplification were set to assay the repeatability of the Target SSR-seq method. Specifically, in the Target SSR-seq experiment, 3% of the total samples were selected using a double-blind method to set up as positive control, and pure water was used as a negative control. To ensure quality, raw reads must be filtered to get clean reads with FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and subsequent analyses were based on clean reads. The total number of reads was calculated, and the samples were classified. A Perl script “SSRSeq count” str_count.pl (https://github.com/ccoo22/SSRseq_count) was used to process the clean reads to generate an SSR read count table. A Perl script str_type.pl (https://github.com/ccoo22/SSRseq_count) and the SSRSeq V1.1 online SSR genotyping tool (http://bioinfo.geneskybiotech.com/software/ssrseq_type/v1.1/en/) were used to output SSR genotypes.
Genetic diversity analysis
Genetic information statistics, including observed number of alleles (Na), the effective number of alleles (Ne), Shannon’s information index (I), observed heterozygosity (Ho), and expected heterozygosity (He), were calculated by using the software POPGENE32 . Polymorphic information content (PIC) was calculated by using a Perl script . The formulas are as follows:
where N is the number of samples, NHet is the number of homozygous samples, NHom is the number of heterozygous samples, l is the allele locus and Pi and Pj are the population frequencies of the ith and jth alleles, respectively.
Genetic structure analysis
The STRUCTURE v2.3 software program was used to infer the population structure . The population number (K) was evaluated from 1 to 10. The optimal K value was determined by comparing the LnP (D) and ΔK based on the rate of change in LnP (D) . Polysat was used to calculate the genetic distance between pairs of samples and principal coordinate analysis (PCoA) . Nei’s genetic distance was calculated for pairs of subpopulations based on allele frequency. The genetic similarity matrix and genetic distance matrix between subpopulations were obtained. Neighbor-joining (NJ) and ‘ggtree’ in the phangorn package in the R packet (R Foundation for statistical computing, Vienna, Austria) were respectively applied to construct and draw the evolutionary tree . The genetic differentiation coefficient between subpopulations was calculated using the software GenAlEx using allele frequency , and population differentiation was measured together with an analysis of molecular variance (AMOVA). The software Treemix was used to infer multiple population splitting and mixing events based on genome-wide allele frequency data . The results were based on the provided materials to draw the maximum likelihood tree of the population and to calibrate migration events in the phylogenetic tree.
Exploration of core SSR sets for species identification
To screen a minimal number of SSRs for distinguishing the maximal number of S. mansoni isolates, we used a script in Perl to calculate the discrimination power for SSRs with the following formula: discrimination power = the number of isolates showing unique genotypes/total isolates. High discrimination power refers to high saturation value and high SSR discernibility. A core set of SSRs with the best discernibility ability was obtained, and the saturation curve was plotted by discrimination power for SSRs. Core Hunter software was used to construct a core collection based on MR (modified Rogers Distance) . MR treats each allele as a separate dimension, which is an improvement on the standard Euclidean distance. The sampling ratios of the software were set as 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2 and 0.3, respectively.
Identification of perfect SSRs
A total of 12,481 ESTs related to S. mansoni were obtained based on the public EST database, of which 2326 ESTs remained after a redundancy analysis to reduce overestimation. Of these 2326 ESTs, 2059 overlapped and 267 were singletons. Further analysis revealed that the 2059 overlapped ESTs could be clustered into 255 clusters; thus, a total of 522 non-redundant ESTs that contained 915 SSRs were identified after screening the EST database. In addition, combined with 3365 SSRs generated from the transcriptome data, a total of 4280 SSRs were identified by MISA analysis. Of these 4280 SSRs, 1024 (23.93%) were di-nucleotide repeats, 2720 (63.55%) were tri-nucleotide repeats and 101 (2.36%) were tetra-nucleotide repeats (Additional file 1: Table S3). The distribution of SSR motifs revealed that the CT repeat was the most widespread di-nucleotide SSR motif (329/1024), and that the TCT repeat was the most widespread trimer among all tri-nucleotide SSR motifs (217/2760). Based on these identified SSRs, we obtained 98 SSRs after SSR filtering and perfect SSR search. Finally, 39 evenly distributed perfect SSRs were selected to test in Target SSR-seq after primer detection and optimization of the library enrichment system. Detailed information on SSR loci and primers is shown in Table 1.
Genotyping analysis of SSR-seq
During the high-throughput sequencing of 368 S. mansoni isolates, six samples failed due to the low quality of the specimens. The average sequencing depth per SSR capture in 309 samples (85.4%) was > 1000× (Fig. 3a). The clean ratio was > 0.8 for almost all of the samples (97%) (Fig. 3b), indicating the high quality and accuracy of the sequencing results. Moreover, four of the 39 SSRs exhibited high miss rates (> 20%), probably due to unstable flanking sequences. One SSR was observed as monomorphism in 362 varieties. Ultimately, a total of 34 polymorphic SSRs were obtained for the genotyping of 362 S. mansoni varieties.
Genetic diversity of S. mansoni populations
Target SSR-seq captured 170 alleles of 34 target SSR loci in 362 samples. Two motif types, trinucleotide and tetranucleotide, accounted for 88.2% and 11.8% of the 170 alleles, respectively. The number of SSR motif repeats ranged from three to seven, and 55 alleles (32.4%) contained four repeat units. The number of alleles per SSR locus varied from two to 18, with an average of five; however, the average effective number of alleles (Ne) was 2.4763, indicating a non-uniform distribution of alleles in the population (Table 2). The Shannon’s information index (I) varied from 0.1064 to 2.5105, with an average of 0.9455. The observed heterozygosity Ho ranged from 0.0417 to 0.9611, with a mean of 0.527, and most of the SSRs (25) exhibited higher Ho (> 0.4), indicating a high level of genetic variability in 362 S. mansoni isolates. Furthermore, the genetic diversity estimated by expected heterozygosity (He) varied from 0.0409 to 0.9064 (mean = 0.5004), while the PIC value ranged from 0.0401 to 0.8971 (mean = 0.4508). Overall, the 34 target SSR loci showed various alleles and high polymorphism rates, which were proven to be suitable for the identification of the varieties of S. mansoni isolates.
For the genetic diversity analysis, we first explored the genetic differences among the 16 geographical populations. The NJ tree based on Nei’s unbiased genetic distance demonstrated that the 16 geographical populations were clustered into three genetic groups (Fig. 4a). The first group mainly included populations from central (Hubei and Henan provinces) and eastern (Anhui, Jiangsu, Jiangxi, and Shanghai) China, with the exception of the Chongqing and Guizhou populations. The second group contained three southwestern populations (Guangxi, Yunnan and Sichuan), three southern populations (Guangdong, Hainan and Hunan) and one population from eastern China (Zhejiang). The third group only consisted of the Fujian (FJ) population from southeastern China. PCoA showed a similar clustering pattern to that of phylogenetic inference (Fig. 4b). Within the PCoA plot, the first principal coordinate accounted for 52.77% of the total variation, while the second principal coordinate accounted for 26.57%. Populations from geographically close regions showed similar distributions in the PCoA plot. However, the FJ population is located far away from the other populations, suggesting larger genetic variation within the FJ population. Next, genetic differences of tapeworms isolated from different hosts were explored (Fig. 4c). Clustering analysis showed that the samples from different hosts were clustered into three genetic groups. The first group included isolates from four frogs (P. nigromaculatus, F. limnocharis, O. margaretae and B. guentheri) and two cats (F. catus and P. bengalensis). The second group contained isolates from one frog (S. latouchii), two snakes (Z. dhumnades and E. carinata) and one tiger (P.t. tigris). The third group included isolates from one snake (E. taeniura) and one tiger (P.t. altaica). The PCoA analysis showed 54.21% and 30.5% explained variance of PCoA1 and PCoA2, respectively (Fig. 4d). The locations of the definitive hosts (cats and tigers) were scattered and far from each other, while the intermediate hosts (frogs and snakes) were concentrated in the middle of the coordinate axis and close to each other, indicating greater genetic variety among the definitive hosts and less genetic variety among intermediate hosts. Moreover, in order to investigate the detailed genetic diversity of each isolate, we performed a phylogenetic analysis using all 362 samples. The unrooted NJ tree revealed three main clusters (Fig. 5). Cluster 1 was the main clade and included 303 samples; cluster 2 and cluster 3 included 33 and 26 samples, respectively.
Genetic structure of S. mansoni in China
The STRUCTURE software program and Evanno’s correction results indicated that 362 isolates were divided into two main populations (Pop1 and Pop2) based on the optimal number of K = 2 (Fig. 6a). In general, 245 samples (67.7%) were assigned to Pop1, and the remaining 117 samples (32.3%) were assigned to Pop2 (Fig. 6b). Samples in Pop1 isolated from AH (n = 17), CQ (17), FJ (5), GD (5), GX (3), GZ (17), HB (17), HeN (20), HN (4), HuN (37), JS (17), JX (19), SC (36), SH (3), YN (6) and ZJ (22), and samples in Pop2 were collected from FJ (n = 4), GD (12), GX (21), GZ (4), HeN (1), HN (5), HuN (39), JX (4), SC (11), YN (8) and and ZJ (8) (see Fig. 1 for abbreviations of geographical locations). Clearly, the STRUCTURE analysis did not divide the 16 populations according to their geographical origin. In contrast, each population showed a mixed membership to the inferred clusters. Isolates from AH (n = 17), CQ (17), HB (17), JS (17) and SH (3) revealed one prominent genotype, whereas the remaining isolates indicated the admixture of two genotypes (Fig. 6c). Moreover, the PCoA analysis of 362 samples showed a similar clustering pattern to the STRUCTURE analysis (Fig. 7). All isolates are generally clustered into two main groups (Pop1 and Pop2) with partial overlap among several samples.
Population differentiation of S. mansoni in China
The AMOVA analysis of 34 SSR genotypes in 362 samples indicated that the maximum variation of 94.40621% resulted from differences within populations, while the remaining variation came from “among groups” and “among populations within groups,” accounting for 2.13143% and 3.46236% of the total variance, respectively (Table 3). The pairwise fixation index (Fst) values between specified regions were estimated to measure the population differentiation. Except for Fst values between several geographical populations (such as ZJ and AH, ZJ and CQ, ZJ and GZ, SC and GZ, ZJ and HuN, SC and HuN), most were within the range of 0.05–0.15, indicating moderate genetic differentiation (Table 4; Additional file 1: Fig. S1). To infer migration events between populations, a maximum likelihood tree was built using TreeMix (Fig. 8). Consistently, the tree topology indicated that all of the geographical populations clustered into three main branches. A large number of migration events were identified among these S. mansoni populations. One migration that was detected began from the GZ population to the SC population with a strong migratory signal (migration weight: 0.6). In addition, four migrations with weak signals were detected, including one event that began from SC to JX (migration weight: 0.25), one that began from SC to HB (migration weight: 0.3), one that began from GZ to YN (migration weight: 0.5) and one that began from GZ to JS (migration weight: 0.45).
Core SSR set identification and core samples analysis
At the end of the study, we intended to explore a core SSR set that will be more suitable and precise for use in the identification of Spirometra isolates through further screening of all identified 34 polymorphic SSRs. As a result, a set of 23 core SSRs that could distinguish 91.1% of 362 S. mansoni varieties was identified (Fig. 9; Additional file 1: Table S4). Consistent with the above finding, structure analysis based on 23 core SSRs classified the 362 isolates into two populations (Additional file 1: Fig. S2). The PCoA analysis also distinguished the two populations, with PCoA1 explaining 15.37% and PCoA2 explaining 13.39%, respectively (Additional file 1: Fig. S3). The AMOVA analysis showed that the maximum variation mainly came from within populations. (Additional file 1: Table S5). The pairwise Fst between Pop1 and Pop2 was 0.052. Therefore, the set of 23 core SSRs was sufficient in representing the genetic diversity and identifying Chinese S. mansoni varieties.
In addition, considering the huge geographical areas of China, it is hard to sample enough isolates to represent the whole genetic background of Chinese S. mansoni populations. Therefore, we want to screen core sample sets in each geographical population based on our collected isolates so that these core samples can represent core or backbone varieties of each geographical population with minimum number of samples. Finally, a total of 76 strains were selected as the core samples of Chinese S. mansoni under the sampling ratio of 0.2 and MR of 0.43 (Additional file 1: Table S6).
Simple sequence repeats exist extensively throughout eukaryotic genomes and are widely used for estimating genetic diversity and spatial structure of populations . Traditional microsatellite analysis based on gel electrophoresis usually cannot distinguish base differences or changes correctly in SSR amplicons . In addition, the experimental process of the traditional SSR genotyping method is extremely complicated and time-consuming. Therefore, the research community needs the development of novel methods for SSR discovery and genotyping that require less effort while delivering high accuracy and efficiency. To our knowlegde, our study is the first to use Target SSR-seq in genetic research on medical tapeworm populations. In detail, most of the selected Spirometra samples (85.4%) had a > 1000× average sequencing depth, indicating high sequencing depth in this population genetics study. Only six of the 368 samples failed to be genotyped due to the quality of the samples, suggesting high genotyping efficiency. Additionally, the number of PCR cycles in this study was 11, while traditional genotyping methods usually reach up as high as 40 cycles; taking into account that the higher the number of PCR cycles, the higher the possibility of incurring DNA polymerase slippage, the low number of PCR cycles in our study indicate that the PCR quality was highly reliable. Moreover, the Target SSR-seq method processed our multi-target SSR loci (34 SSRs) in a large number of samples (368 samples) in a short time, confirming that this method has the advantages of higher accuracy regarding genotyping results and shorter time consumption (Table 5). Therefore, the Target SSR-seq method succeeds in providing high-throughput SSR genotyping with high accuracy and efficiency. It not only achieved accurate genotyping of S. mansoni populations in the present study, but it also can be widely extended to genetic diversity analyses of other parasites.
We explored genetic differences among S. mansoni isolates from different geographical populations as well as from different host populations. Clustering analysis supported three genetic groups in the 16 geographical populations; one group mainly included populations from central and eastern China, the second group mainly contained populations from southern and southwestern China and the last group only consisted of the FJ population. When considering the detailed genetic diversity of each isolate, although three main clusters were revealed in the phylogenetic tree, no obvious geographical clusters were found, indicating that mixed memberships exist in these geographical populations. In fact, as for most parasitic helminths, the geographical distance is rarely the only determinant of the degree of genetic similarity or difference between populations, and the migration of hosts must also be considered . Both the NJ clustering and PCoA analysis showed that isolates from intermediate hosts (snakes and frogs) tended to cluster together, probably because certain snakes and frogs have similar habitats with overlapping scopes of activities, thus resulting in less geographical isolation and genetic difference of their parasitizing tapeworms . In contrast, parasites isolated from definitive hosts (cats and tigers) have fewer opportunities for gene exchange, thus leading to greater genetic difference. The influence of hosts on parasite genetic polymorphism is likewise limited by other conditions, such as habitat, activity range and whether the migration time of definitive hosts coincides with the period of infection . Therefore, still more numerous factors need to be considered to infer the actual causes of gene exchange and genetic polymorphism of S. mansoni.
STRUCTURE analysis divided all S. mansoni isolates into two main groups (Pop1 and Pop2), which is consistent with previous studies using multi-gene markers (cytb, cox1, rrnS, and 28S rDNA D1) and traditional SSRs [13, 14]. However, the STRUCTURE analysis did not divide the 16 populations according to their geographical origin; in contrast, each population showed a mixed membership to the inferred clusters. With the exception of isolates from five geographical populations (AH, CQ, HB, JS and SH) that revealed one prominent genotype, the remaining isolates showed two main genotypes. In summary: (i) there are two genotypes in the Chinese S. mansoni population, and both genotypes have sympatric distribution in most of geographical populations, with genotype 1 being a prevalent genotype; and (ii) there are two sub-populations: Group 1 and Group 2 are verified, with Group 1 mainly containing isolates from east and central China, and Group 2 containing isolates from south and southwest China.
The Fst value was used to measure genetic differentiation among geographical populations . Generally, an Fst value within the range of 0–0.05 indicates little genetic differentiation; 0.05–0.15, moderate differentiation; 0.15–0.25, large differentiation; and > 0.25, very large genetic differentiation . The pairwise Fst between Pop1 and Pop2 was estimated at 0.066, indicating moderate genetic differentiation between the two sub-populations. When considering the 16 geographical populations, except for Fst values between ZJ and AH, ZJ and CQ, ZJ and GZ, SC and GZ, ZJ and HuN, and SC and HuN, respectively, most of the Fst values were fell within the range of 0.05–0.15, also indicating moderate genetic differentiation. In addition, we observed higher pairwise Fst values between HN and the other 15 geographical populations, possibly due to the isolation of HN island from the mainland, leading to difficulties in gene exchange. Consistent with the results of estimated Fst, the AMOVA analysis estimated moderate genetic variation of S. mansoni in China. Generally, genetic variation among populations is highly affected by genetic drift, gene flow, mutations, selection and long-term evolution . Of the 16 geographical populations identified in this study, five migration events were detected in the TreeMix analysis, namely migration began from GZ to SC, SC to JX, SC to HB, GZ to YN and GZ to JS, indicating that there was some small amount of genetic exchange among the geographical populations. Usually, the populations with frequent gene exchange have a lower degree of genetic differentiation, whereas populations with low frequent gene exchange will exhibit greater genetic differentiation . Gene exchange between isolates of S. mansoni is not very frequent, which may also account for the moderate genetic differentiation between populations.
Last but not least, a core set of 23 SSRs were screened further, and a core set of S. mansoni samples in each geographical population was successfully generated using the core set of 23 SSRs. Finally, a total of 76 isolates were collected thatcan represent 91.1% genetic diversity of all 362 samples. The establishment of core SSRs and core samples is essential for the optimal management and use of genetic resources of S. mansoni, and they provide a strong foundation for establishing core SSRs and core samples for the Spirometra species.
In this study, we successfully established a Target SSR-seq method containing 34 perfect SSR loci for genotyping S. mansoni isolates. The Target SSR-seq method provides high-throughput SSR genotyping with high accuracy and efficiency. It enables accurate genotyping of S. mansoni populations and can be widely extended to genetic diversity analyses of other parasites. The results of our comprehensive genetic diversity study based on this newly established method suggest that S. mansoni samples in China have high genetic diversity and moderate genetic differentiation, and that little genetic exchange exists among geographical populations. In future studies, more populations of Spirometra isolates across worldwide geographical ranges are needed to reveal the exact genetic patterns of this medical tapeworm and discover the underlying mechanisms generating such genetic variation patterns.
Availability of data and materials
The raw data reported in this study have been deposited in the Genome Sequence Archive1 under accession numbers CRA006630. The data supporting the conclusions of this article are included within the article.
Analysis of molecular variance
- cox1 :
Cytochrome c oxidase subunit 1
Principal coordinate analysis
Simple sequence repeats
Wen H, Vuitton L, Tuxun T, Li J, Vuitton DA, Zhang W, et al. Echinococcosis: advances in the 21st century. Clin Microbiol Rev. 2019;32:e00075–118.
Scholz T, Kuchta R, Brabec J. Broad tapeworms (Diphyllobothriidae), parasites of wildlife and humans: recent progress and future challenges. Int J Parasitol Parasites Wildl. 2019;9:359–69.
Kuchta R, Kołodziej-Sobocińska M, Brabec J, Młocicki D, Sałamatin R, Scholz T. Sparganosis (Spirometra) in Europe in the molecular era. Clin Infect Dis. 2021;72:882–90.
Li MW, Lin HY, Xie WT, Gao MJ, Huang ZW, Wu JP, et al. Enzootic sparganosis in Guangdong, People’s Republic of China. Emerg Infect Dis. 2009;15:1317–8.
Liu Q, Li MW, Wang ZD, Zhao GH, Zhu XQ. Human sparganosis, a neglected food borne zoonosis. Lancet Infect Dis. 2015;15:1226–35.
Cui J, Wang Y, Zhang X, Lin XM, Zhang HW, Wang ZQ, et al. A neglected risk for sparganosis: eating live tadpoles in central China. Infect Dis Poverty. 2017;6:8.
Cui J, Lin XM, Zhang HW, Xu BL, Wang ZQ. Sparganosis, Henan Province, Central China. Emerg Infect Dis. 2011;17:146–7.
Hong Q, Feng J, Liu H, Li X, Gong L, Yang Z, et al. Prevalence of Spirometra mansoni in dogs, cats, and frogs and its medical relevance in Guangzhou. China Int J Infect Dis. 2016;53:41–5.
Jeon HK, Eom KS. Mitochondrial DNA sequence variability of Spirometra species in Asian countries. Korean J Parasitol. 2019;57:481–7.
Yamasaki H, Sanpool O, Rodpai R, Sadaow L, Laummaunwai P, Un M, et al. Spirometra species from Asia: genetic diversity and taxonomic challenges. Parasitol Int. 2021;80:102181.
Kołodziej-Sobocińska M, Stojak J, Kondzior E, Ruczyńska I, Wojcik JM. Genetic diversity of two mitochondrial DNA genes in Spirometra erinaceieuropaei (Cestoda: Diphyllobothridae) from Poland. J Zool Syst Evol Res. 2019;57:764–77.
Zhang X, Cui J, Liu LN, Jiang P, Wang H, Qi X, et al. Genetic structure analysis of Spirometra erinaceieuropaei isolates from central and southern China. PLoS ONE. 2015;10:e0119295.
Zhang X, Wang H, Cui J, Jiang P, Lin ML, Zhang YL, et al. The phylogenetic diversity of Spirometra erinaceieuropaei isolates from southwest China revealed by multi genes. Acta Trop. 2016;156:108–14.
Zhang X, Hong X, Duan JY, Han LL, Hong ZY, Jiang P, et al. Development of EST-derived microsatellite markers to investigate the population structure of sparganum—the causative agent of zoonotic sparganosis. Parasitology. 2019;146:947–55.
Liu W, Tan L, Huang Y, Li WC, Liu YS, Yang LC. Prevalence and molecular characterization of Spirometra erinaceieuropaei spargana in snakes in Hunan Province. China J Helminthol. 2020;94:e131.
Gong T, Su X, Li F, He J, Chen S, Li W, et al. Epidemiology and genetic diversity of Spirometra tapeworm isolates from snakes in Hunan Province, China. Animals. 2022;12:1216.
Liu SX, Hou W, Zhang XY, Peng CJ, Yue BS, Fan ZX, et al. Identification and characterization of short tandem repeats in the tibetan macaque genome based on resequencing data. Zool Res. 2018;39:291–300.
Králová-Hromadová I, Minárik G, Bazsalovicsová E, Mikulíček P, Oravcová A, Pálková L, et al. Development of microsatellite markers in Caryophyllaeus laticeps (Cestoda: Caryophyllidea), monozoic fish tapeworm, using next-generation sequencing approach. Parasitol Res. 2015;114:721–6.
Brabec J, Scholz T, Štefka J. Development of polymorphic microsatellites for the invasive Asian fish tapeworm Schyzocotyle acheilognathi. Parasitol Int. 2018;67:341–3.
Bazsalovicsová E, Minárik G, Šoltys K, Radačovská A, Kuhn JA, Karlsbakk E, et al. Development of 14 microsatellite markers for zoonotic tapeworm Dibothriocephalus dendriticus (Cestoda: Diphyllobothriidea). Genes. 2020;11:782.
Umhang G, Grenouillet F, Bastid V, M’Rad S, Valot B, Oudni-M’Rad M, et al. Investigating the genetic diversity of Echinococcus granulosus sensu stricto with new microsatellites. Parasitol Res. 2018;117:2743–55.
Pajuelo MJ, Eguiluz M, Roncal E, Quiñones-García S, Clipman SJ, Calcina J, et al. Genetic variability of Taenia solium cysticerci recovered from experimentally infected pigs and from naturally infected pigs using microsatellite markers. PLoS Negl Trop Dis. 2017;11:e0006087.
Li L, Fang Z, Zhou J, Chen H, Hu Z, Gao L, et al. An accurate and efficient method for large-scale SSR genotyping and applications. Nucleic Acids Res. 2017;45:e88.
Ding S, Wang S, He K, Jiang M, Li F. Large-scale analysis reveals that the genome features of simple sequence repeats are generally conserved at the family level in insects. BMC Genomics. 2017;18:848.
Lu Q, Hong Y, Li S, Liu H, Li H, Zhang J, et al. Genome-wide identification of microsatellite markers from cultivated peanut (Arachis hypogaea L.). BMC Genomics. 2019;20:799.
Maduna SN, Vivian-Smith A, Jónsdóttir ÓDB, Imsland AKD, Klütsch CFC, Nyman T, et al. Genome- and transcriptome-derived microsatellite loci in lumpfish Cyclopterus lumpus: molecular tools for aquaculture, conservation and fisheries management. Sci Rep. 2020;10:59.
Yang J, Zhang J, Han R, Zhang F, Mao A, Luo J, et al. Target SSR-seq: a novel SSR genotyping technology associate with perfect SSRs in genetic analysis of cucumber varieties. Front Plant Sci. 2019;10:531.
Campbell NR, Harmon SA, Narum SR. Genotyping-in-thousands by sequencing (GT-seq): a cost effective SNP genotyping method based on custom amplicon sequencing. Mol Ecol Resour. 2015;15:855–67.
Yanagida T, Matsuoka H, Kanai T, Nakao M, Ito A. Anomalous segmentation of Diphyllobothrium nihonkaiense. Parasitol Int. 2010;59:268–70.
Arrabal JP, Pérez MG, Arce LF, Kamenetzky L. First identification and molecular phylogeny of Sparganum proliferum from endangered felid (Panthera onca) and other wild definitive hosts in one of the regions with highest worldwide biodiversity. Int J Parasitol Parasites Wildl. 2020;13:142–9.
Liu SN, Su XY, Chen WQ, Yu JW, Li JR, Jiang P, et al. Transcriptome profiling of plerocercoid and adult developmental stages of the neglected medical tapeworm Spirometra erinaceieuropaei. Acta Trop. 2022;232:106483.
Beier S, Thiel T, Münch T, Scholz U, Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–5.
Yeh FC, Yang RC, Boyle T. Popgene version 1.31: Microsoft Window-based freeware for population genetic analysis. Edmonton: University of Alberta. 1999.
Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980;32:314–31.
Hubisz MJ, Falush D, Stephens M, Pritchard JK. Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour. 2009;9:1322–32.
Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005;14:2611–20.
Bruvo R, Michiels NK, D’Souza TG, Schulenburg H. A simple method for calculation of microsatellite genotypes irrespective of ploidy level. Mol Ecol. 2004;13:2101–6.
Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27:592–3.
Peakall R, Smouse PE. GenAlEx 6.5: genetic analysis in excel. population genetic software for teaching and research—an update. Bioinformatic. 2012;28:2537–9.
Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8:e1002967.
Thachuk C, Crossa J, Franco J, Dreisigacker S, Warburton M, Davenport GF. Core hunter: an algorithm for sampling genetic resources based on multiple genetic measures. BMC Bioinformatics. 2009;10:243.
Carrel M, Patel J, Taylor SM, Janko M, Mwandagalirwa MK, Tshefu AK, et al. The geography of malaria genetics in the democratic republic of Congo: a complex and fragmented landscape. Soc Sci Med. 2015;133:233–41.
Gao D, Perry G. Species-area relationships and additive partitioning of diversity of native and nonnative herpetofauna of the West Indies. Ecol Evol. 2016;6:7742–62.
Nyman T, Papadopoulou E, Ylinen E, Wutke S, Michell CT, Sromek L, et al. DNA barcoding reveals different cestode helminth species in northern European marine and freshwater ringed seals. Int J Parasitol Parasites Wildl. 2021;15:255–61.
Peng LP, Cai CF, Zhong Y, Xu XX, Xian HL, Cheng FY, et al. Genetic analyses reveal independent domestication origins of the emerging oil crop Paeonia ostii, a tree peony with a long-term cultivation history. Sci Rep. 2017;7:5340.
Balloux F, Lugon-Moulin N. The estimation of population differentiation with microsatellite markers. Mol Ecol. 2002;11:155–65.
Schield DR, Scordato ESC, Smith CCR, Carter JK, Cherkaoui SI, Gombobaatar S, et al. Sex-linked genetic diversity and differentiation in a globally distributed avian species complex. Mol Ecol. 2021;30:2313–32.
Wang M, Du W, Tang R, Liu Y, Zou X, Yuan D, et al. Genomic history and forensic characteristics of Sherpa highlanders on the tibetan plateau inferred from high-resolution InDel panel and genome-wide SNPs. Genet. 2022;56:102633.
We thank all students who participated in this project and collected valuable specimens used in this study.
This work was supported by National Natural Science Foundation of China (81971956 and U1704189) and Natural Science Foundation of Henan Province of China (212300410070).
Ethics approval and consent to participate
This study was approved by the Life Science Ethics Committee of Zhengzhou University (No. 2020–0704).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
: Figure S1. Estimates of pairwise Fst between S. mansoni populations. Figure S2. Estimated genetic structure of S. mansoni in China as inferred by the STRUCTURE software on the basis of the data on a core set of 23 SSRs. Figure S3. Principal coordinate analysis (PCoA) describing the relationships of 362 S. mansoni isolates on the basis of the data on a core set of 23 SSRs. Table S1. Information of origin, locality and date of collection for 368 S. mansoni isolates. Table S2. Primers of cytochrome c oxidase subunit 1 (cox1) gene. Table S3. SSR mining in S. mansoni. Table S4. Characteristics of a core set of 23 SSR markers. Table S5. Analysis of molecular variance (AMOVA) of the populations of S. mansoni based on a core set of 23 SSRs. Table S6. Core samples of S. mansoni identified in each geographical population of China.
About this article
Cite this article
Xu, F.F., Chen, W.Q., Liu, W. et al. Genetic structure of Spirometra mansoni (Cestoda: Diphyllobothriidae) populations in China revealed by a Target SSR-seq method. Parasites Vectors 15, 485 (2022). https://doi.org/10.1186/s13071-022-05568-1