BioRssay: an R package for analyses of bioassays and probit graphs

Dose–response relationships reflect the effects of a substance on organisms, and are widely used in broad research areas, from medicine and physiology, to vector control and pest management in agronomy. Furthermore, reporting on the response of organisms to stressors is an essential component of many public policies (e.g. public health, environment), and assessment of xenobiotic responses is an integral part of World Health Organization recommendations. Building upon an R script that we previously made available, and considering its popularity, we have now developed a software package in the R environment, BioRssay, to efficiently analyze dose–response relationships. It has more user-friendly functions and more flexibility, and proposes an easy interpretation of the results. The functions in the BioRssay package are built on robust statistical analyses to compare the dose/exposure–response of various bioassays and effectively visualize them in probit-graphs. Graphical Abstract


Background
Bioassays aim at evaluating the potency of a compound. They usually consist in measuring the response of a "population" (e.g. organisms, populations, tissues, cells lines, strains, etc.) to increasing doses (or intensities or exposure times) of a stimulus (most of the time a xenobiotic or a chemical) to quantify specific dose-response relationships (also known as exposure-response relationships). Because of their effectiveness, dose-response relationship analyses are widely used in a large spectrum of scientific disciplines (e.g. epidemiology, microbiology, toxicology, environment quality monitoring, vector and pest control, and parasite biology). A prime example of such analyses is the monitoring of resistance to xenobiotics. Since the 1950s, xenobiotics (e.g. insecticides, pesticides, fungicides) have been widely used to control populations of vectors or pests [1]. However, as a counter-result, resistance mechanisms to such substances have been selected in targeted populations, undermining their efficiency [2]. Therefore, establishing and comparing the resistance levels of various populations to various xenobiotics is at the core of the World Health Organization (WHO) recommendations in order to define/adjust vector control strategies. These resistance assessments are usually done by exposing batches of individuals (adults or larvae) to varying doses of the xenobiotic to assess their responses (e.g. mortality or knockdown effect).
Despite bioassays being used in many fields, there has been a substantial lack of easily accessible statistical infrastructure for their analysis. In 2013, we developed an R script with a robust statistical background to describe and compare dose-mortality relationships [3], and it has since been used in several similar studies (e.g. [4][5][6][7][8][9][10]). In order to make it more user-friendly and easily accessible to the scientific community, we have now developed it

Open Access
Parasites & Vectors *Correspondence: pierrick.labbe@umontpellier.fr; pascal.milesi@scilifelab. uu.se into an R package called BioRssay, with more flexibility and an improved presentation of the results.

Workflow
BioRssay is a comprehensive compilation of scripts in R language [11] designed to analyze dose-response relationships (or exposure-response: mortality, knockdown effect, etc.) from bioassays of one or more strains, lines, tissues, cells, etc. [hereafter referred to as "population(s)"]. This package provides a complete analytic workflow, from data quality assessment to statistical analyses and data visualization, as follows (see also Fig. 1).
a. In the first step, the base mortality in the controls (i.e. a sample of the population not exposed to the tested stimulus) is taken into account to control for the mortality (or effect) associated with the experiment itself, regardless of the exposure (this can be critical when the exposure requires a long period). Data are adjusted following Abbott's formula using a maximum likelihood approach (optim function, method "L-BFGS-B", stats R package [11]), to estimate correction factors (Eq. 1), while taking into account heterogeneity in the mortality rates between replicates (see example in Table 1; NB: users should be cautious in their use of the data if the heterogeneity is high) [12]: where m T and m C are the survival rates in the treatments and the controls, respectively. By default, the correction is applied if mortality in the control is higher than 5%, but any threshold can be specified. b. Then, for each population independently, a generalized linear model (GLM) is fitted to the probittransformed mortality rates, with the quasi-binomial family to account for possible overdispersion (Eq. 2); if Abbott's correction has been applied, the model is fitted to the adjusted mortality rates (see above, point a).
where −1 is the probit function, Y is the mortality rates, X is a design matrix, and β the vector of effects of the covariables included in the design matrix (intercept and dose). The estimated effects (i.e. slope and intercept, with their standard errors) are reported. The slope and the intercept are used to test the linearity of the log-dose response using a Chi-square test of homogeneity (Eq. 2) between the model predictions and the probit-transformed data; significant deviations from linearity may for example reflect mixed populations, or a threshold effect.
c. The parameters estimated using Eq. 2 are then used to compute lethal doses (LD, also lethal concentrations, LC) with their corresponding 95% confidence intervals (CI), following the approach developed by Johnson et al. [13] from Finney [14], which allows us to take the heterogeneity of the data (h) into account, with larger h leading to larger CI. Heterogeneity- related parameters (h and g in Table 2) are reported; according to Finney's recommendation: "With almost all good sets of data, g will be substantially smaller than 1.0 and seldom greater than 0.4. " By default, LD are computed at 25%, 50%, and 95% of mortality; alternatively, any level of LD and CI can be specified by the user. Resistance ratios (RR), with their 95% CI, are calculated according to Robertson and Preisler [15]. They measure the magnitude of the difference in dose-responses of two populations when exposed to the same stimulus. For a given LD, the LD of a focal population is divided by the LD of the population with the lowest one (usually a susceptible reference, Table 2). d. When several populations (or strains, lines, etc.) are exposed to the same stimulus, we offer the possibility to test whether they present significant differences in Table 1 A subset of bioassays conducted on three strains of Anopheles gambiae mosquitoes exposed to increasing doses of temephos insecticide (data from [5]) The four columns "Strain", "Dose", "Total", "Dead" are the mandatory input format.  their responses (slope and/or magnitude). We use a likelihood ratio test (LRT) to compare a null model, assuming that all the data come from the same population (i.e. where the dose is the only covariable in the design matrix, X in Eq. 2), with the complete model where the dose, the population effect, and their interaction are included in the design matrix X (Eq. 2). As in point b, the effects are fitted with the glm function in R, with the quasi-binomial family to account for overdispersion.
With only two populations (e.g. comparison between an unknown population and a reference), a significant difference between the two models indicates that these populations differ in the magnitude of the response and/or slope. With more than two populations, it means that at least two populations show differing responses in the magnitude of the response and/or slope. In this case, pairwise comparisons (post hoc LRT) are implemented to identify the differing populations, testing for statistical significance of the difference between all population pairs; the Holm-Bonferroni method is then used to control for the family-wise error inherent in multiple testing [16]. Note that only populations that pass the linearity test should be compared (see above point b): If at least one of the populations failed the linearity test, the validity of comparing the slope or magnitude of their dose-mortality response is, at best, highly questionable (not recommended). These populations should be presented, but only as such (accordingly, no regression is fitted to these populations by BioRssay, see point e). Nonetheless, it is still valid to compare the slope and magnitude between the remaining populations, i.e. those that show a linear log-dose response. e. Customizable (confidence levels, colors, symbols, etc.) plots of the probit-transformed regressions are drawn in the last step (Fig. 2a, b). By default, if the probit-transformed response significantly deviates from linear regression, the data are connected by segments; the (invalid) regression and associated CI are not plotted (e.g. Fig. 2b, DZOU population, red squares). Confidence intervals around the regressions can be removed or added, and users can specify any levels of CI.

Benchmarking BioRssay
What makes BioRssay stand out for dose-response analyses and how can it complement other similar R packages and functions? We discuss three R packages that provide dose-response analysis: drc [17], protti [18] and lava [19].
-The core of the drc package is to provide the user with a comprehensive set of model fitting followed by dose-response analysis [17], while BioRssay has been designed primarily to facilitate the analysis of bioassay data in a ready-to-go approach. As such, BioRssay is optimized to compare dose-response between many different populations exposed to the same stimulus. Relevant quantities, as LD and RR, along with their CI, are automatically generated, and a function (model.signif) is dedicated to the comparison of dose-responses. Producing these results would require combining the output of several functions from drc packages (e.g. drm, ED, ED.comp, comped, comParm, predict.drc, anova.drc). Another main difference stems from the use of a quasi-binomial for model fitting in BioRssay to account for possible data overdispersion, which is not possible in drc, despite the variety of models implemented. Finally, BioRssay also includes data preprocessing by implementing Abbott's correction for mortality in controls before fitting the dose-response. The drm function in drc is much more thorough in terms of error model selection, allowing a dose-response analysis using for example binomial, Poisson, four-and five-parameter log-logistic models, and Weibull models. The mselect function of drc is handy in identifying the correct dose-response model for populations that fail the linearity test. -The function fit_drc_4p from the package protti [18] is a wrapper function for the drc package's drm function implementing a four-parameter log-logistic model. -The function PD in the lava package [19] allows for dose-response calculation for binomial regression models.

Package accessibility and concluding remarks
The package is available freely on GitHub at https:// github. com/ miles ilab/ BioRs say (https:// doi. org/ 10. 5281/ zenodo. 51720 72). We also provide a comprehensive workflow and tutorials, from data preparation to the interpretation of the results, in a dedicated GitHub page https:// miles ilab. github. io/ BioRs say/. Further, the package carries example data sets for self-tests, and more data can be downloaded at https:// github. com/ miles ilab/ DATA. The code is maintained by P. Karunarathne  Fig. 2 Probit graphs generated by the BioRssays. a Linear relationships between probit-transformed mortality rates and log-dose of bendiocarb insecticide for different mosquito populations (data from [3]). Kisumu (blue triangles) is the susceptible reference strain. AcerKis (red square) and AgRR5 (green circles) show resistance levels significantly higher than that of the reference population, AgRR5 showing the strongest resistance level ( Fig. 1 and Table 2). b Same as a but for KIS (green dots) and DZOU (red squares) populations exposed to temephos insecticide. Note that the relation is not linear for the DZOU population, and dots are connected by segments (unpublished data)