All the experimental procedures involving animals were conducted in strict accordance with the Institutional Animal Care Guidelines and approved by the Ethical Committee for Animal Experimentation of Dokkyo Medical University under number 1307.
Origin of the parasitic material
Adult worms of S. mansoni (Puerto Rican strain) and S. japonicum (Japanese Yamanashi strain) used in this study were obtained from experimentally infected BALBc mice. The infected animals were maintained at the animal facility of the Laboratory of Tropical Medicine and Parasitology of Dokkyo Medical University. For the current investigation, a total of 62 adults of Schistosoma spp. were obtained from either the mesenteric or portal veins of the experimental animals and washed with PBS several times, before being morphologically identified and stored in two different media: 70% (v/v) ethanol; RNAlater (Invitrogen, USA), which is an aqueous, nontoxic, tissue and cell collection reagent that stabilizes and protects RNA and proteins in intact, unfrozen tissue and cell samples. Sample identification was further confirmed by DNA sequencing (see “Molecular identification of Schistosoma spp. samples” section). Forty samples were identified as S. mansoni, while the remaining 22 corresponded to S. japonicum. Of the 40 S. mansoni isolates, 19 (seven mixed males/females, six males, six females) were placed in 70% (v/v) ethanol, and 21 (11 mixed males/females, five males, five females) in RNAlater. For the 22 S. japonicum isolates, 11 (five mixed males/females, three males, three females) were stored in 70% (v/v) ethanol and 11 (five mixed males/females, three males, three females) in RNAlater. All the samples were stored at −40 °C before being transferred to the Institute of Medical Microbiology and Hygiene (Homburg, Germany), where they were stored at −20 °C pending further examination.
Molecular identification of Schistosoma spp. samples
For molecular confirmation of the individual Schistosoma species obtained from the mice, genomic DNA of two adult worms from each experimental infection was extracted using a commercially available DNA extraction kit (DNeasy Blood & Tissue Kit; QIAGEN, USA) according to the manufacturer’s instructions. Genomic DNA from morphologically identified S. mansoni or S. japonicum was used for the amplification of cytochrome oxidase 1 (COX1) by PCR using primer pairs specific for S. mansoni (TCCTTTATCAATTTGAGAGG/CR: CCAACCATAAACATATGATG) and S. japonicum (CCGTTTTTTTTGAGTATGAG/CR: CCAACCATAAACATATGATG), with an expected length of 479 and 614 base pairs, respectively . The reactions were carried out in a final volume of 50 μL, using KOD One PCR Master Mix (Toyobo, Japan) with 10 μmol each of the forward and reverse primers and 1.0 μL (approximately 10 ng/μL) genomic DNA. Cycling conditions for the PCR consisted of a 2-min denaturation step at 94 °C, followed by 35 cycles of denaturation at 98 °C for 10 s, annealing at 58 °C for 30 s, and extension at 68 °C for 30 s, and final extension at 72 °C for 7 min. PCR products were detected in 2% agarose gel stained with 1% ethidium bromide using Tris–borate–ethylenediaminetetraacetic acid buffer. The PCR products were purified using a commercial DNA purification kit (QIAquick Gel Extraction Kit; QIAGEN, Hilden, Germany) following the manufacturer’s protocol. Purified PCR products were sequenced in a 3130xl Genetic Analyzer (Applied Biosystems, USA). Sequences were assembled using Molecular Evolutionary Genetics Analysis version 10  and a Nucleotide Basic Local Alignment Search Tool (BLASTn) search (https://blast.ncbi.nlm.nih.gov) was performed for the confirmation of sequence identity of the generated consensus sequences. A phylogenetic tree based on the analysis of COX1 gene sequences was constructed after 1000 bootstrap replications, using the maximum likelihood method and the Hasegawa-Kishono-Yano model. The outgroup sequence S. haematobium (accession ID: ON237718), as well as other COX1 sequences of S. mansoni (accession IDs: MK171834, MF919418, and MG562513) and S. japonicum (accession IDs: KU196387, KU196397, and KU196417) were retrieved from GenBank and added to the analysis.
MALDI-TOF MS analysis
Adult worm samples were removed from the storage solution and dried at room temperature in a biosafety cabinet for about 5 min, to allow for the evaporation of organic solvents prior to the subsequent analyses.
A previously employed protocol was adapted and applied for protein extraction of the adult Schistosoma samples . In brief, adult worms were manually crushed in 300 µL liquid chromatography-mass spectrometry (LC–MS) grade water (Merck, Darmstadt, Germany). Then, 900 µL of 100% (v/v) absolute ethanol (Merck, Darmstadt, Germany) was added before mixing by vortexing. The mixture was centrifuged at 18,312 × g for 2 min, and the supernatant was discarded. After having completely dried the pellet, it was resuspended in 20 µL of 70% (v/v) formic acid and mixed by vortexing. Finally, 20 µL acetonitrile was added before mixing again.
MALDI target plate preparation and measurements
The protein extracts (see the “Protein extraction” section) mixed with formic acid and acetonitrile were centrifuged at 18,312 × g for 2 min. One microliter of the clear supernatant was spotted onto the MALDI-TOF MS target plate (Bruker Daltonics, Bremen, Germany) then allowed to dry completely before covering it with 1 µL of α-cyano-4-hydroxycinnamic acid matrix solution (Bruker Daltonics) composed of saturated α-cyano-4-hydroxycinnamic acid, 50% (v/v) acetonitrile, 2.5% (v/v) trifluoroacetic acid and 47.5% (v/v) LC–MS grade water. The protein extracts of each sample were spotted onto the MALDI-TOF MS target plate at eight different spots, and each spot was measured four times to assure reproducibility. Hence, a total of 32 raw spectra per sample were generated using FlexControl® software version 3.4 (Bruker Daltonics). The bacterial test standard (Bruker Daltonics), which is an extract of Escherichia coli spiked with two high molecular weight proteins, was used to calibrate the mass spectrometer. After drying at room temperature, the MALDI target plate was placed into a Microflex LT Mass Spectrometer (Bruker Daltonics) for the measurements.
MALDI-TOF MS parameters
Measurements were performed using the AutoXecute algorithm implemented in FlexControl software version 3.4. For each spot, a total of 240 laser shots (40 shots each, six random positions) were carried out automatically to generate protein mass profiles in linear positive ion mode with a laser frequency of 60 Hz, a voltage of 20 kV, and a pulsed ion extraction of 180 ns. Mass charge ratios range (m/z) were measured between 2 and 20 kDa.
Spectra inspection and creation of reference spectra
Raw spectra were visualized using FlexAnalysis software version 3.4 (Bruker Daltonics). The spectra were edited, i.e., all flatlines and outlier peaks were removed, intensities were smoothed, and peak shifts within replicated spectra were set at 300 p.p.m. After this editing step, spectra of four mixed (males/females) adult worm samples (two S. mansoni, two S. japonicum) from both of the storage media (RNAlater, ethanol), comprising at least 27 remaining spectra each, were randomly selected for the creation of reference spectra (main spectra profiles; MSPs). These MSPs were created using the automatic function of MALDI Biotyper Compass Explorer® software version 3 (Bruker Daltonics). The newly created MSPs of both Schistosoma species were included in a previously developed in-house MALDI-TOF MS database for helminth identification, which already contained MSPs from different helminths such as cestodes (e.g., Taenia saginata) and trematodes (e.g., Fasciola spp.) [10, 17].
To verify the purity and check if any of the spectra matched bacterial spectra, all the acquired spectra were tested against the commercially available, official BDAL database released by Bruker Daltonics for the identification of bacteria and fungi. The newly expanded in-house helminth database was subjected to two different validation procedures. First, to an internal validation procedure, in which all raw spectra of Schistosoma spp. obtained during the MSP process were tested to verify whether it was possible to identify them from existing spectra in the database. Second, to an external validation procedure, where spectra from the 58 remaining, independent, adult Schistosoma specimens were investigated to assess whether they could be reliably identified from the database. For this purpose, spectra were examined using a combination of the official BDAL database (Bruker Daltonics) and our in-house helminth database. The reliability of the identification was interpreted by log score values (LSVs) generated for each identification result. We used the scoring system recommended by the manufacturer for bacteria identification (i.e., an LSVs of 1.70 was considered the threshold for reliable identification; LSVs between 1.70 and 1.99 indicated reliable identification at the genus level, and LSVs equal to or higher than 2.0 were interpreted as indicating reliable species identification) .
Classification and comparisons analysis
A total of 1657 edited spectra were exported into the free online software Clover MS Data Analysis (https://platform.clovermsdataanalysis.com/, Clover BioSoft, Granada, Spain) (last accessed May 2022) for further investigation. Default parameters were used during pre-processing . A Savitzky–Golay filter (window length, 11; order 3 polynomial) was applied to smooth the spectra, and the baseline was removed using the top-hat filter method (factor 0.02). To obtain one average spectrum per sample for use in the classification and comparisons analysis, replicated spectra were aligned using the following parameters: allowed shift, medium; constant tolerance, 0.2 Da; linear tolerance, 2000 p.p.m.
Classification using ML algorithms
Peak matching was performed to generate a peak matrix from pre-processed spectra that were used for comparison analysis. Total ion current normalization was applied, followed by a threshold method (factor 0.01), where peaks with an intensity below 1% of the maximum intensity were not considered; the constant tolerance was 0.5 Da and the linear tolerance 500 p.p.m. .
Classification analysis was carried out at two levels. First, at the interspecies level, where all isolates were investigated to distinguish S. mansoni from S. japonicum. Second, at the intraspecies level, where samples of the same species were compared to assess the discrimination related to the effect of the storage solutions [70% (v/v) ethanol, and RNAlater]. Unsupervised [principal component analysis (PCA), hierarchical clustering], and supervised ML algorithms were used to assess the classification. A PCA is a dimensionality reduction algorithm (it reduces a high-dimensional dataset to a set of coordinates to allow for better visualization of different clusters and relationships among specimens for the identification of subgroups) and provides information about the “true” nature of a dataset . The hierarchical clustering was performed using the Chebyshev method for distance calculation and the complete method for the metric. For the supervised ML methods, four widely used algorithms for MALDI-TOF mass spectra analysis [linear support vector machine (SVM), partial least squares-discriminant analysis (PLS-DA), Random Forest (RF), and k-nearest neighbors (KNN)] were evaluated . The k-fold cross-validation method (k = 10) was used for the internal validation. A confusion matrix (generating values such as accuracy, specificity, sensitivity, F1 score, positive prediction value or precision, and negative prediction value), as well as the area under the receiver operating characteristic curve, and the area under the precision recall curve, were used as performance metrics of the supervised ML algorithms.