Pooling as a strategy for the timely diagnosis of soil-transmitted helminths in stool: value and reproducibility

Background The strategy of pooling stool specimens has been extensively used in the field of parasitology in order to facilitate the screening of large numbers of samples whilst minimizing the prohibitive cost of single sample analysis. The aim of this study was to develop a standardized reproducible pooling protocol for stool samples, validated between two different laboratories, without jeopardizing the sensitivity of the quantitative polymerase chain reaction (qPCR) assays employed for the detection of soil-transmitted helminths (STHs). Two distinct experimental phases were recruited. First, the sensitivity and specificity of the established protocol was assessed by real-time PCR for each one of the STHs. Secondly, agreement and reproducibility of the protocol between the two different laboratories were tested. The need for multiple stool sampling to avoid false negative results was also assessed. Finally, a cost exercise was conducted which included labour cost in low- and high-wage settings, consumable cost, prevalence of a single STH species, and a simple distribution pattern of the positive samples in pools to estimate time and money savings suggested by the strategy. Results The sensitivity of the pooling method was variable among the STH species but consistent between the two laboratories. Estimates of specificity indicate a ‘pooling approach’ can yield a low frequency of ‘missed’ infections. There were no significant differences regarding the execution of the protocol and the subsequent STH detection between the two laboratories, which suggests in most cases the protocol is reproducible by adequately trained staff. Finally, given the high degree of agreement, there appears to be little or no need for multiple sampling of either individuals or pools. Conclusions Our results suggest that the pooling protocol developed herein is a robust and efficient strategy for the detection of STHs in ‘pools-of-five’. There is notable complexity of the pool preparation to ensure even distribution of helminth DNA throughout. Therefore, at a given setting, cost of labour among other logistical and epidemiological factors, is the more concerning and determining factor when choosing pooling strategies, rather than losing sensitivity and/or specificity of the molecular assay or the method.

Background Pooling of faeces [1][2][3][4][5], urine [6,7], serum [8] or disease vectors [9] have all been used as a cost-effective strategy to screen for infection present in the given substrate/ matrix. Such an approach has been shown to provide accurate results, while reducing time and labour requirements. Additionally, but perhaps more so in the veterinary world than in any clinical mass drug administration (MDA) programme, 'pooling' as a strategy may allow for a rapid estimation of drug efficacy or infection prevalence present in the herd based on microscopy results and subsequent faecal egg counts (FECs) [10][11][12][13].
As previous goals to reduce the intestinal worm burden and morbidity in school-aged children have been extended and enriched with new programmes to achieve universal coverage of at-risk populations by 2030, new monitoring methods need to be implemented. Novel, precise and robust diagnostic tools that measure prevalence reduction and detect interruption of transmission are key to enable de-implementation of MDA programmes [14,15]. Soil-transmitted helminths transmitted via the faecal-oral route (Ascaris lumbricoides, Trichuris trichiura, Necator americanus, Ancylostoma duodenale, An. ceylanicum and Strongyloides stercoralis) and/or via skin penetration (N. americanus, An. duodenale, An. ceylanicum and S. stercoralis) are amongst the neglected tropical pathogens drawing increased attention as targets for transmission interruption and possible elimination. Even though preventable, they affect almost a third of the world's population [16]. However, surveillance of ongoing MDA programmatic efforts that aim to reduce the worm burden include thousands or tens of thousands of samples to be screened and analysed for STH-related prevalence, especially in low-prevalence areas where large sample sizes are required to accurately detect changes in infection. Previous attempts to evaluate pooling as a means of scaling soil-transmitted helminth diagnosis have yielded poor results. Such studies have relied upon microscopy as the diagnostic strategy [13,17,18], which lacks the sensitivity of molecular tools, such as quantitative polymerase chain reaction (qPCR); caveats and disadvantages of this approach have been thoroughly described previously [19,20].
Such tools would ideally retain their sensitivity when samples from multiple individuals are combined, whilst minimizing the reagent cost implicated. More recent studies report additional cost granularity, including operational and logistical costs, concluding that a 'pooling approach' might not be as worthwhile as hoped [5]. These studies, however, have neither taken into account predicted pool sizes as optimal nor have they incorporated an adequately sensitive diagnostic tool; thus, such conclusions are yet to be confirmed. Modelling studies followed by experimental validations have suggested an optimal pooled sample range where pooling tends to be more cost-effective, whilst maintaining robustness and precision with minimal variation [12] but the decision whether to proceed with pooling or not will likely be based on a number of additional factors. Cost (determined by reagents, labour required, logistical and operational considerations), time (sample transportation and pool preparation) and the need for a sensitive enough diagnostic tool are not the only determinants which must be considered when deciding in favour, or in opposition, of pooling. The sample size of the study (n) and existing STH prevalence may also influence decision making [21].
Quantitative PCR has emerged as an effective molecular diagnostic tool to fill the need of heightened sensitivity compared to microscopy when infection levels drop considerably. Some of the advantages of qPCR include the theoretical ability to detect single numbers of eggs present in the faeces due to its analytical sensitivity, to distinguish between species [22,23] and to achieve accurate results rapidly. Given these factors, qPCR may be the most likely currently available method to enable STH detection in pools in low-prevalence areas, especially when prevalence is close to the breakpoint of transmission [24]. For this reason, the use of PCR as part of a viable pooling strategy should be evaluated [25].
In settings with low-intensity of infections, the majority of samples screened are expected to be negative [26]. The sensitivity of a given method might increase or decrease when pooling is recruited; increasing, when multiple 'weak' infections are combined in a single pool, so collectively the target of interest is detectable by qPCR and decreasing, when a single infected sample is 'buried' among uninfected ones, and subsequently diluted, hence undetectable by qPCR [11].
A need for 'spin-outs' (subsequent tests) after testing the pools and the identification of the STH infection at an individual level may increase the cost of the 'pooling approach' substantially if required too often. This negates any advantages of the approach. Also, the risk of contamination is higher as testing larger pools of samples extends the handling and processing period and increases the risk for contamination, leading to false positive results, thus driving the cost higher, especially when re-extractions are needed to confirm individual infections [27]. When the sensitivity of an STH assay is decreasing, a very 'weak' infection might be missed in a pool of negatives. This could reduce the cost since collectively that pool would identify as negative so no added labour (or cost) for 'spinouts' would be needed. As mentioned, any pool sizes higher than between 5 and 8 increases cost and time to prepare the pools and requires additional equipment.
Building on preliminary unpublished data gathered by members of our group, and taking into account the pool sizing predictive models, we examined the recruitment of pools of 5 as a tool for screening samples with low STH infection levels, aiming not to compromise either sensitivity or specificity of the qPCR. Additionally, the reproducibility of the protocol and agreement in two different laboratory settings was interrogated, and the necessity for multiple replicates obtained from each pool or individual samples was also evaluated. A basic cost exercise was performed through direct comparison of processing samples individually or as parts of pools. Also, without any prior knowledge regarding the distribution of the positive samples in a screened population, two scenarios were included in the cost-analysis based on different prevalence levels given; a 'best-' and a 'worst-case' scenario. Acknowledging that this analysis does not represent a mathematical cost model, we accounted simply for prevalence in a given sample population, labour time based on wages in different income settings and consumable costs based on standard list prices. Our results show that choosing whether to 'pool or not to pool' can only be determined effectively after considerable scrutiny of each of the component processes, which may be more problematic or prohibitory than loss of granular sensitivity of the diagnostic method used to detect the target of choice. Each process component should be taken into consideration before deciding in favour of pooling strategies.

Study design (phases I and II)
During phase I ('seeding' experiment) a series (n = 20) of infection-naïve stool samples purchased commercially (BioIVT; Westbury, NY, USA) were spiked with known numbers of N. americanus eggs mimicking low levels of infection as classified by the World Health Organisation (WHO) guidelines [28] and were mixed with four additional infection-naïve samples of equal volume to create pools of 5.
During phase II (field-samples experiment) of the study, aliquots from a series of field samples with known STH infection status, collected as part of an unrelated study, were mixed with four additional field samples (of equal volume) that had been tested and verified to be negative for all the five STH species of interest (see 'strategic pooling') to also create pools of five.
DNA extractions performed during phase I, and part of phase II, were conducted in different laboratories by different technicians to explore reproducibility of the developed protocol. Individual component samples were extracted alongside their pools throughout the process, and all extractions of both individual samples and pools were performed in duplicate (i.e. 1A, 1B, P1A and P1B). DNA from each pool was also extracted twice (PA 1&2 and PB 1&2 ). The sensitivity and specificity of the established protocol was evaluated by real-time PCR for each particular target helminth, and by all STH assays for the samples previously identified as negatives. Reproducibility of the protocol's performance and agreement of results between the two different laboratories were also analysed.

Phase I: 'seeding' experiment-Smith College (SC)
For use during 'seeding' experiments, performed at the Smith College (SC; Northampton, MA, USA), a suspension of hookworm eggs, utilized to spike the infectionnaïve stool, was prepared as previously described [29]. In brief, hamster stool pellets with known infection levels expressed as eggs per gram (epg) were diluted in nuclease-free water such that 178 µl contained 50 eggs for a final infection-load of 100 epg (50 eggs in 500 mg of stool) (Fig. 1). The level of hookworm infection chosen was based on preliminary experiments where medium and high hookworm infection loads (based on WHO guidelines [28]) were employed, but showed abundancy of the target and early amplification detected by qPCR [30]; a primary concern of pooling is loss of sensitivity through dilution in low infection settings, so we chose a moderately low final concentration of 100 epg to detect potential dilution effects.

Phase II: field-samples experiment-SC and Natural History Museum (NHM)
At SC, a 34-sample panel was created for use in a proofof-concept study. Thirty of these samples were positive for a single helminth (A. lumbricoides, T. trichiura, An. ceylanicum, S. stercoralis) and the remaining four were identified as negative. The volume of each sample (1.5 ml; 500 mg of stool suspended in 1 ml of ethanol) was split, homogenised and mixed with four infection-naïve stool aliquots of equal volume (Fig. 2). Another panel of 150 samples of human stool extracts, variously infected with the same species of STH (at least 500 mg of stool), was prepared at SC and was shipped to the Natural History Museum (NHM; London, UK). All samples utilized during phase II of this study were collected in Bangladesh as part of the WASH Benefits Bangladesh trial [31]. All samples were previously screened at SC via real-time PCR and the results for each individual sample were available. Amongst these samples, 130 were identified as negative for all species (N. americanus, T. trichiura, A. lumbricoides, An. duodenale, An. ceylanicum and S. stercoralis). The rest of the samples (n = 20) were identified as positive for at least one STH, with low/moderate intensity infections reported based on Kato-Katz/individual PCR data. For the generation of each positive pool, one sample identified as positive for at least one species of STH was mixed with four samples identified as negative. For the generation of negative pools, equal volumes of five negative samples were mixed (Fig. 2).

Pool formation and DNA extraction
The total volume of each sample (1.5 ml stool in suspension) was divided into two aliquots and was homogenized    Fig. 2 Schematic representation of the field-samples experiment. Previously screened faecal samples positive for one or more soil-transmitted helminths (STHs) were combined with four additional samples (of equal volume) identified as negative for all STHs to create pools of five (individual samples identified as negatives were also included in the study, as contamination controls). The DNA from every individual sample was extracted twice, each pool was formed twice and the DNA from each pool was also extracted twice. All the samples underwent qPCR for the target STH using a high-speed bead beater (Fast Prep 5G, MP Biomedicals; Santa Ana, CA, USA) with Lysing Matrix E tubes (containing silica, glass bead and ceramic particles). The homogeneous suspensions were recombined into a single tube after the first lysis. Two ~ 300 µl aliquots of the suspension were transferred into two new Lysing Matrix E tubes for individual extractions (A and B) and two additional 300 µl were transferred to separate tubes designated for use in the constitution of pools (PA and PB). The same procedure was followed for all five samples that would form a single pool. After a pool was formed, the volume was split again, and a second homogenization following the same procedure occurred (second lysis). Following the second lysis step, two aliquots (300 µl each) from the pool (PA 1&2 and PB 1&2 ) were also subjected to DNA extraction. For all pools and individual samples, the same DNA extraction protocol was followed. All extractions began with an additional bead-beating step (the second homogenization step for individual samples and the third homogenization step for pooled samples). Extractions were then completed using the MP Bio Fast DNA SPIN kit for Soil (MP Biomedicals; Santa Ana, CA, USA) as previously described [29] (Figs. 1, 2). Following extraction, all samples were stored at -20 °C until analysed via real-time PCR.

Real-time PCR analysis
The cycling conditions, information on sequences from primers and probes and master mix used have all been previously described [22,23,29].

Data and statistical analysis
To assess the diagnostic performance of the 5-sample pools, we calculated sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) in Excel v. 2016. Accuracy of the pooling method was also calculated using the formula: (true positives + true negatives)/number of pools. Confidence intervals (CI) for sensitivity, specificity, PPV and NPV were calculated using the Clopper-Pearson exact binomial method [32]. For these calculations qPCR results for the individual aliquots were considered as the 'gold-standard' . Results for NHM and SC were calculated and presented separately and stratified by helminth species. Chi-square tests were conducted to determine whether there was statistical evidence of a difference in the sensitivity and specificity estimates between the two laboratories. To better understand how pooling impacted the (delayed) detection of the target compared to the individuals, Pearson's correlation coefficient was used to quantitate the relationship between the qPCR outcome of the individual sample and that of the pooled one.
To investigate whether multiple extractions are required for each individual aliquot and/or 5-sample pool, Cohen's kappa statistic [33] was calculated. This determines the degree of agreement in qPCR results (positive/negative) between A/B aliquots and between the 5-sample pool duplicates (PA 1 and A 2 , PB 1 and B 2 ). Finally, for direct demonstration of agreement between the results obtained at NHM for the individual extracts and the ones originally screened as part of the independent study at SC (Bangladesh, WASH Benefits Bangladesh trial, see above), Cohen's kappa statistic was also calculated.

Cost exercise computation
Costs based on 1000 samples requiring processing (individually or as part of 5-sample pools) were calculated; the sample size was small enough for easy analysis and large enough to represent a case where pooling might be justified. For consistency and accurate reporting, the present protocol included all the extractions in duplicate and the formation and subsequent extraction of the same pool twice; these components were also part of the cost model and comparison. This cost exercise included labour and consumable costs (for plasticware and reagents per sample per assay run, based on list prices), tailored to a theoretically optimized version of the developed protocol (i.e. a protocol that would not process individual samples along with the pools simultaneously), as mentioned earlier.
Two separate case-scenarios were plotted for this exercise. In the simple case scenario, all the individual samples are negative (thus, so are the pools), and there is no need for 'spin outs'; hence, only costs for labour and consumables (based on list prices online) are included. As part of a more complicated scenario, two different prevalence rates-with a single STH present for simplicity-were factored in; 2% which reflects the defined transmission breakpoint, and 15% as an indicator of prevalence when control programmes are needed and when pooling could be considered above individual sampling. In a 'best-case' complicated scenario, all positive samples would cluster together (e.g. 5 positive samples in a 5-sample pool). Whereas, in a 'worst-case' complicated scenario only one positive sample would be part of a 5-sample pool (e.g. mixed with four 'negatives').

Results
Pooling was evaluated in terms of consistency, robustness, reproducibility and cost-effectiveness with comparisons made against individual sample results and between replicate pools. Sensitivity of the 5-sample pooling technique differed between helminth species for both the samples tested at NHM and SC. T. trichiura had the lowest sensitivity for both NHM (0.65, 95% CI: 0.50-0.79) and SC (0.80, 95% CI: 0.64-0.91). All other helminth species from SC had absolute sensitivity (1.00, 95% CI: 0.40-1.00) whilst for NHM the highest sensitivity was obtained for An. ceylanicum (0.82, 95% CI: 0.60-0.95). For T. trichiura and S. stercoralis there was no evidence of a difference in sensitivity between NHM and SC (P = 0.13 and P = 0.22, respectively), whilst for An. ceylanicum there was weak evidence of a difference (P = 0.07) and for A. lumbricoides there was very strong evidence of a difference in sensitivity between the two laboratories (P < 0.001) ( Table 1).
Estimates of specificity were consistently higher than those for sensitivity, suggesting the pooling approach has a low rate of false positives. Both N. americanus and A. lumbricoides had perfect specificity from NHM (1.00, 95% CI: 0.90-1.00 and 1.00, 95% CI: 0.92-1.00, respectively), whilst the same was true for An. ceylanicum, A. lumbricoides and T. trichiura at SC. All other estimates from both laboratories were above 0.90 except for S. stercoralis at SC (0.81, 95% CI: 0.64-0.93). There was no evidence of a difference in specificity estimates between NHM and SC for A. lumbricoides (P = 1.00), T. trichiura (P = 0.76) or An. ceylanicum (P = 0.64), but there was strong evidence of a difference for S. stercoralis (P = 0.03) ( Table 1).
Pearson's correlation coefficient (r) values between the individual aliquot qPCR results and the pooled qPCR results were generally consistent for the NHM and SC samples for each species with strong, positive correlations obtained from the A. lumbricoides samples (NHM: r = 0.75, P < 0.001; SC: r = 0.86, P < 0.001) and the An. ceylanicum samples (NHM: r = 0.93, P < 0.001; SC: r = 0.92, P < 0.001). The one exception was with regards to S. stercoralis, for which a strong positive correlation was identified for the NHM samples (r = 0.97, P < 0.001) but a very weak, and statistically insignificant, negative correlation was identified from the SC samples (r = − 0.07, P = 0.93) ( Table 2).
For the NHM samples, agreement in qPCR findings between both the 5-sample pool replicates and the A/B individual aliquots was moderate to high for all species, with Cohen's kappa ranging from 0.66 to 1.00. Similarly, with the SC samples, A. lumbricoides and An. ceylanicum showed perfect agreement for both aliquots and 5-sample pools, whilst a strong agreement was found for T. trichiura 5-sample pool results. However, only weak evidence of agreement occurring more often than would be expected by chance was identified for the 5-sample pools for S. stercoralis (k = 0.44, P = 0.07) ( Table 3).
Lastly, for all species, Cohen's kappa found a very strong degree of agreement in qPCR findings (translated as positivity for that particular target) between the isolates originally obtained at SC and the pools subsequently created at NHM (k ≥ 0.77, P < 0.001) except for N. americanus, where a slightly weaker degree of agreement was identified (k = 0.51, P = 0.02) ( Table 4). The raw numbers used for the analyses (number of true/false positives/negatives per set of pools) are provided in Additional file 1: Table S1.

Cost exercise
In all graphs shown (Figs. 3 and 4) no absolute numbers are reported as this cost exercise would differ significantly based on income (wage), currency and technician competency which would affect labour time invested. Instead, we report relative proportions of total cost.

Simplest scenario: all samples are negative for the STH to be screened
In the simplest case where all the individual samples are negative (and thus, so are the pools), there is no need for 'spin outs'; hence, only costs for labour and consumables (based on list prices online) are included (Fig. 3). In both low-income and high-income settings, labour is a slightly more expensive element than the consumables needed to process the samples in pools compared to the same samples processed individually (low-income setting: labour 9% and consumables 91% versus labour 7% and consumables 93%, high-income setting: labour 41% and consumables 59% versus labour 45% and consumable 55%, respectively). So, when all the samples are negative-or expected to be-there is no significant cost-savings when a pooling strategy is implemented compared to processing all the samples individually.

More complicated scenarios: impact of prevalence and its distribution to the pools
In this cost exercise, two scenarios including STH prevalence rates were considered; 2% and 15% prevalence of a particular STH. Taking the example of 1000 samples and a prevalence of 2% or 15%, this would result in 20 and 150 positive samples, respectively. Out of those pools, in the 'best-case' scenario ( Fig. 4), 4 and 30 positive pools would have to be revisited, for extraction and processing.  However, for the same number of samples and under the same prevalence rates, the 'worst-case' scenario would require 20 and 150 pools to be processed, for 2% and 15% prevalence respectively. In Fig. 4, for the positive pools alone, the additional cost for labour and consumables needed for the 'spin-outs' was also estimated and incorporated to the graphs. In the 'worst-case' scenario, as the prevalence increases the labour cost also increases in both low and high-income settings. In the 'best-case' scenario, for the same parameters (low to high prevalence) only for the low-income settings is the consumable cost slightly higher, whereas in the high-income settings the labour drivers are higher as the prevalence increases.

Discussion
The strategy of pooling has been considered an attractive way of screening multiple samples simultaneously for a particular target/pathogen, both in research and veterinary settings, potentially lowering the cost of labour or consumables needed [4, 10-12, 18, 27]. At the SC laboratory, some preliminary work on screening 'pools of 10' was conducted, and even though no dramatic loss of sensitivity was observed, the practicality of the process was deemed more challenging due to lack of sufficient equipment. For this reason, and upon initial cost assessment of consumable and reagent costs involved in 'pooling' , we focused on assessing a strategy of using 5-sample pools.
The main query of this study was whether pooling is an appropriate strategy for the qualitative detection of STHs in a post-treatment population, where most individuals are expected to be identified as 'negative' (based on the diagnostic test chosen). In a setting with most samples being negative, most pooled samples will also be negative thus, potentially, reducing labour and consumable costs and the lower likelihood of having to re-examine individual   samples when pools are found to be positive. Moreover, we aimed to show that pooling does not dramatically reduce the chances of the target detection by PCR (given the fact that it is further diluted as part of the pool). These questions are widely relevant for both veterinary [10] and clinical trials and epidemiological studies where large numbers of infected stool samples must be processed to assess infection presence and intensity [15,26]. Our study focused on a qualitative assessment of the infection levels (presence/ absence). The correlation of eggs found in a stool sample to worm burden and subsequently to intensity of infection is of paramount importance in epidemiological studies. A recent review of Papaiakovou et al. [34], addresses the concerns around quantitation of qPCR outputs and their subsequent correlation to egg numbers and, therefore, intensity of infection with confidence. We believe that qPCR has yet to achieve its potential for quantitative purposes given the limitations of PCR target selected, cell numbers present in eggs, and extraction efficiency. Additionally, the dilution of target through pooling will further hinder such quantitation. Thus, we decided to assess presence/absence of the target in both individuals and pools.
Our main objectives were to evaluate the successful formation of the pool, the potential for single sampling of the pool (to avoid reagent and labour cost inflation due to multiple sampling) and the subsequent detection of the diluted target with precision and accuracy. To our knowledge, this is the first time such queries have been interrogated to assist in strategic planning.

Method development
Given prior research on the need to blend stool samples sufficiently [35], and the importance of STH egg disruption by utilizing a high-speed bead-based homogenizer [36][37][38] we acknowledged that any method developed to form pools would be critical, and the subsequent accurate detection of the evenly distributed targets upon dilution in the pool, would be challenging.
The development of a 'pooling' protocol that overcomes known limitations and meets all of the aforementioned expectations was relatively trivial for the 'seeding experiment' , where only N. americanus eggs were recruited and tested. However, mixing or stirring the faecal pool with a sterile loop or low-power vortexer, was insufficient for the field-samples experiment, where the stool samples being recruited were positive for additional STH helminth species. The different consistencies of the stool samples involved, along with the low load of the infection in each one of the samples recruited, showed that adequate mixing was required. Furthermore, the need for both additional buffer and a bead-based beating step both to facilitate the homogeneous blending of the helminth eggs (or DNA) was also critical.

Precision and reproducibility
A working protocol that showed overall statistically significant and acceptable agreement between individuals and pools (through kappa values) was developed. The protocol presented no apparent technical errors for any of the helminths tested. However, due to the complexity and handson time, the need to test protocol reproducibility between different technicians and laboratory settings also emerged. Sequentially, our study aimed to show that the protocol is duplicable by any adequately trained and competent technician. Hence, the same pooling workflow (Fig. 2) was compared at two different laboratories (SC and NHM).
Utilizing the pooling strategy as described herein, a generally low rate of false negatives is expected. Also, specificity does not seem to be an issue overall but of interest remains the lower PPV for the S. stercoralis which is discussed in a separate section below.
Last but not least, the list of samples chosen to be pooled had originally been extracted and tested at SC (using the same extraction protocol and the same qPCR assays). Aliquots from the same stool samples were selected to be extracted independently (individually, and as part of pools) at NHM. Almost absolute agreement was shown between individual samples originally and independently tested with qPCR at SC with the results (individual and pool) obtained from NHM.

Single replicates versus duplicates
The Kappa estimates, comparing both individual aliquots and the pooled aliquots, showed a high degree of  Fig. 4 Cost analysis on pooling in both low-and high-wage settings in two different scenarios and for two levels of prevalence (2% and 15%) for a single soil-transmitted helminth species. Scenarios represent 'best' and 'worst' cases of positive sample distributions across 5-sample pools; see main text. Dashed white line separates consumable (extraction, qPCR and 'spin-out' reagents) from labour costs agreement, which suggests conducting the test twice may be unnecessary. For all species, agreement between 1 and 2 pool replicates was moderate to high for both laboratories. This provides strong statistical evidence that there is little need for multiple sampling. When processing large numbers of samples, the need for rapid and simple detection of the infection by single sampling is important due to the costs involved (reagents and labour). Using our developed protocol, with sufficient mixing and homogenization, there is clearly no need for multiple sampling (A and B in individuals, 1 and 2 in pools), since the infection/target seems to be evenly distributed following the workflow presented here.
For direct comparison of the individual samples forming the pool with the 5-sample pools per se, the individual samples constituting a pool were tested in duplicate, each pool was formed twice and the DNA from each pool was also extracted twice. Our study/protocol demonstrates that a thorough homogenization is critical for even distribution of the target present in stool samples. In that way, there is no reason or need for extracting DNA from the same sample/pool twice, and even in its most demanding format the protocol can be learned, implemented and reproducibly performed by suitably skilled technicians, as suggested by kappa values. Given the overall high degree of agreement, a conclusion that a single pool per 5 samples would be sufficient, can also be made.

Paradoxes
Even though the specificity for S. stercoralis was not significantly different at SC compared to NHM, the PPV was slightly lower (individual samples identified as negatives when screened by PCR were deemed positive for S. stercoralis as part of the pools). However, this can be attributed to the lower prevalence of S. stercoralis in the SC samples (10%) as compared to other parasites (at around 40-50%). As a worked example demonstrating the impact of prevalence on PPV, if the sensitivity and specificity for S. stercoralis calculated at SC remained constant (1.00 and 0.625, respectively) but prevalence was increased to 30%, the "new" PPV would be calculated as 0.79, i.e. more consistent with findings from NHM.
Moreover, the presence of larvae instead of eggs and the additional beating steps in the pool (versus individual samples), may have contributed to the infection being 'missed' at certain individual samples. It is suspected that further homogenization of larvae facilitated target detection in the pool, but not in the aliquot from the individual. Another possible explanation would be that 'weak' infections, unable to be detected in the individuals due to limits of detection of the qPCR assay, were collectively surpassing the detection threshold as part of the pool. All the individual samples had been previously screened independently, as mentioned earlier. Since all the samples previously reported as negatives were indeed negatives when tested in the laboratory, we are ruling out the chance of contamination than can lead to 'false positive' results. These samples were 'true positive' for S. stercoralis, hence we believe the respective pools were not 'false positive' . However, a higher prevalence of S. stercoralis in a given dataset would be needed in order to draw any further conclusions.
In the case of N. americanus and A. lumbricoides, since there was almost perfect agreement between individuals and respective pools, the slightly weaker agreement between original extracts and aliquots run at NHM may indicate a lack of adequate homogenization in the original sample.

Cost and time savings with pooling
The authors acknowledge that a viable and cost-effective protocol must not be too complicated or too laborious to set-up. Additionally, any protocol established as a time-saving strategy cannot be less cost-effective than processing the same number of samples individually. For this reason, a broad indicative cost analysis was carried out by our team. We calculated costs based on 1000 samples requiring processing; small enough for easy analysis, large enough to represent a case where pooling might be justified. For consistency and accurate reporting, the current protocol included all the extractions in duplicate and the formation and subsequent extraction of the same pool twice; these components were also part of the costmodel and comparison.
For every pool positive for a single parasite, there is the need to 're-visit' the individual samples that originally formed the pool, repeat the extraction step for each component sample and test each extract for the parasite of interest. For every additional parasite detected in the pooled sample, the additional cost increase is translated to consumables and the time to perform qPCR. However, pooling in the presence of positives adds to the overall cost of this alternative strategy relative to single sample processing. However, there remains room for further optimization of the current workflow (larger capacity homogenizers, purification and liquid handling systems). With a streamlined protocol in place capable of eliminating 'redundant' steps (three versus two rounds of homogenization for the pool) further simplifying the protocol may be possible, providing additional time and cost savings even when low percentages of STH prevalence are expected. Also, in cases where microscopy data may be available for individual samples, a 'strategic pooling' approach could be to use the samples identified as negatives for forming the pools and process the rest individually.
We acknowledge that our cost estimates based on list prices might not accurately reflect potential cost-saving with bulk or similar discounted purchasing, but the relative costs are likely indicative of broader trends. In our cost exercise, we included a simple case, where all samples are expected to be negative and a more complicated case with the infection present in a population. In the latter, we included only a 'worst-' and a 'best-case' scenario, along with only two levels of prevalence (2% and 15%) for a single STH species, based on low-and high-income countries. We understand that a realistic situation of the prevalence and distribution of any helminth present will lie somewhere in between. A more comprehensive mathematical cost model will include coefficients such as prevalence rates for a single STH species or more, cost from 'spin-outs' of 'false positives' or 'penalty' of false negatives in the long term, along with tailored wages to suggest a few.

To pool or not to pool
The main drive for developing and testing a pooling protocol has always been the potential savings in labour and consumables, but the additional dilution of the target and subsequent loss of sensitivity of the diagnostic method employed, has been of major concern. Recent research has challenged and augmented those concerns; pooling, might not be the cost-effective technique once hoped for.
Logistical and operational costs [18], special equipment or additional consumables needed (this study), the necessity of reproducibility (this study) and single-sample granularity in the infection present (revealing the 'positive' individuals that contribute to a 'positive' pool; this study), or generally prevalence in a given population [21], labour cost and study size are amongst the pivotal factors that will determine whether a pooling protocol will actually be beneficial and worthwhile.

Conclusions
We describe a successful pooling strategy that lessens the presence of false negative results, demonstrates reproducibility and minimizes the need for multiple replicates as long as there is sufficient mixing in the individual stools forming the pool. Such a methodology is yet to be simplified and tailored to the needs of any interventions. Even though pooling is more likely a better fit for low STH prevalence or surveillance areas and clusters where interruption of transmission is approached (< 2%), the findings and approach of this study will facilitate future protocol developments and optimizations. Our hope is that this study will assist in decision-making on single versus pooling implementation when considering end-to-end processes, budgeting and time considerations in diagnosing STH in faecal samples.
Additional file 1: Table S1. Raw numbers of true positives/negatives for both ‛seedingʼ experiment (Smith College) using N. americanus egg counts and field-sample testing (Smith College and Natural History Museum) for both individual samples and pools.