Statistical Methodology

Dealing with concentrations below the limit of detection

A problem commonly faced by scientists working with chemical samples is the presence of trace elements at concentrations below the limit of detection. These limits usually refer to thresholds below which a particular analytical instrument is not capable of distinguishing a genuine chemical signal from background noise. They may, consequently, vary between laboratories and technologies – but may also depend on more mundane considerations, such as how clean an instrument is at the time of operation. The adequate handling of nondetectable chemical concentrations within data analysis is of substantial practical concern.

plot of additive log-ratios Plot of additive log-ratios for data from a three component mixture [Zn, V, Ba], in which the red dashed line indicates the transformed limit of detection and the red points indicate the imputed values for samples with measured concentrations of Zn below the limit of detection.

Depending on the proportion of observations below the limits of detection, simple estimates derived from observed data may well not reflect the characteristics of the underlying distributions of concentrations. Unfortunately, the use of over-simplistic ad hoc methods has been widespread in the last decades. It has been shown that these procedures, which lack any rigorous statistical basis, are prone to producing biased estimates and obscuring genuine patterns. BioSS has been working to develop statistically sound methods for analysing data on chemical concentrations that include values below a known limit of detection, with a particular emphasis on compositional analysis in which interest is focussed on the relative values of different concentrations rather than the concentrations themselves. Our preferred approach is based on (1) a modified version of the Expectation-Maximization algorithm which exploits the available multivariate information to obtain unbiased parameter estimates, and (2) the log-ratio methodology which deals with the peculiarities of the compositional data. We have produced a package of computer routines for the R statistical language in order to facilitate the practical use of these methods.

log-ratio biplot representation Centered log-ratio biplot representation of chemical samples (points) and analytes (arrows). The red filled points represent samples which have required adjustments due to having values below the limit of detection.

Further details from:
Javier Palarea

Article date 2013

Research

Statistical Genomics and Bioinformatics

Process and Systems Modelling

Statistical Methodology

PhD Opportunities

Meetings & Seminars