The models we implemented showed a high accuracy in identifying E. multilocularis infection both at the tile level and on the slide level. The slide-level classification reached an AUC of 1.0 in the test dataset. The number of tiles classified as echinococcus positive varied greatly in the test set slides with some outliers with lower accuracy. Inspecting the predictions for the slides with lower accuracy did not reveal any reasons for the performance. The control slides with normal liver had a much lower number of echinococcus tiles if we applied the high tile-level probability threshold (mainly < 10 per WSI, in contrast to the several hundreds or thousands of echinococcus tiles of the echinococcus WSIs). Although we made no attempt to identify any cut-off value, the high level of difference between the echinococcus tiles is a promising finding.
The applied models were classifiers and not segmentation models. However, the generated slide-level heatmaps can also be applied to segment echinococcus lesions, as shown with the heatmaps of the validation and test datasets. Implementation to describe safety margins and resection status would be a promising avenue for future research. Nevertheless, the model learned to classify not just the laminated and germinative layers but also the surrounding fibrous capsule and the granulomatous tissue reaction. Thus, a low-level probability was also assigned sometimes to non-echinococcus structures, such as somewhat fibrotic portal fields and non-specific inflammatory alterations (Fig. 4).
Among the models, training the Squeezenet [13] model was the most time efficient (Table 1). Since no relevant differences were observed regarding model performance on the test set, we would favor the Squeezenet architecture over other models. This could be also efficiently trained using CPU-only machines that are probably more widely available than graphics processing units (GPUs). This could be an important point, since parasitic diseases are generally more frequent in developing countries where such resources are probably scarcer.
An important difference to neoplastic diseases is that echinococcus structures usually are much bigger than human cells, so using a lower resolution and bigger tile size is reasonable. This could decrease the amount of data to store and pre-process, which in turn reduces the training time. We believe that the configurations of different major parasitic structures and the extent of inflammatory response are probably more important than the chromatin morphology of the individual cells. This could be also confirmed to some degree by the GradCAM method [15].
To our knowledge, this is the first study to evaluate DL methods for the histological identification of a tissue-invasive parasitic disease. Applying DL methods to parasite detection and classification has an extensive literature, but mainly for apicomplexan organisms. Successful recognition of Plasmodium species in red blood cells (even smartphone-based) was recently reported [17,18,19,20]. A fuzzy cycle generative adversarial network was also successfully implemented to recognize Toxoplasma gondii parasites [17]. Regarding metazoa, DL image recognition is mainly applied to parasite-egg identification from stool samples [21, 22].
The only study we are aware of to apply DL for echinococcus recognition was conducted by Wu et al. using ultrasound images [23]. The authors used similar architectures (VGG19, ResNet18 and Inception-v3) as we did in the present study, and achieved a relatively high accuracy of 68.2–96% in classifying different types of ultrasound appearance of cystic echinococcosis. These values are quite similar to our results and offer a promising avenue for further investigation.
DL methods have a wide range of applications in histology, ranging from classification to object detection and segmentation. The leading field for such studies is oncology, most notably the frequent cancer types, including prostate [7], breast [8] and colon carcinoma. [9]. A popular direction of such studies is metastasis detection, such as the recognition of metastasis in (sentinel) lymph nodes [8, 10, 24], which is a time-consuming task in pathological routine diagnostics. While these tools are mostly used only as a decision support system to aid diagnosis, they can achieve a high accuracy comparable to that of a trained pathologist. It seems intuitive that similar methods could be exploited to other diseases as well, like AE which also exhibits an infiltrative growth pattern and metastatic capacity. As our results showed, DL methods can achieve a high predictive performance similar to that of models trained for oncologic tasks. Given the relative rarity of this disease, pathologists are not confronted with AE on a daily basis and the lack of experience may result in delays in diagnosis or other diagnostic errors. A well-trained model in case of high pre-test probability, such as radiological and clinical suspicion of AE, could probably aid the diagnosis in such a setting.
The main strength of our study was the successful implementation of a DL pipeline from Berman et al. [11], including data pre-processing to a tissue-invasive parasitic disease with the addition of GradCAM to identify decision relevant regions/structures. This also represents a limitation, since other processes with excessive fibrosis could cause false positivity, and did apparently also gain attention for the classification task. Furthermore, it could cause problems when defining resection margins since it does not necessarily indicate vital parasitic structures.
Given the relatively low number of patients, we used multiple slides per patient to provide a reasonably high data volume to train data-savvy architectures like VGG19. The slides from individual patients may exhibit a greater histological similarity to each other than to slides from other patients. This could lead to some bias towards an overestimation of model performance.
Our results should be further validated, if possible, with external data. A promising avenue for further research would be to involve E. granulosus cases as well and train the classification to separate it from cases of E. multilocularis. This can be a very difficult histological task sometimes, with no single reliable morphologic parameter in light microscopy slides. A previous multivariate analysis identified several factors, such as thickness and striation of the laminated layer and number and size of cysts [25]. Attention maps like GradCAM or other kinds of morphological marker identification can add a valuable input here. Immunohistochemistry can also separate the two species; however, these antibodies are only available in highly specialized laboratories. Thus, a simple classification for hematoxylin/eosin-stained slides would be desirable. Furthermore, training the applied models to classify other tissue-invasive parasitic diseases can be also listed as a future direction.