Values below detection limit in compositional chemical data

Abstract

Samples representing part of a whole, usually called compositional data in statistics, are commonplace in analytical chemistry (say chemical data in percentage, ppm, or similar). Their distinctive feature is that there is an inherent relationship between all the analytes constituting a chemical sample as they only convey relative information. Some compositional data analysis principles and the log-ratio based methodology are outlined here in practical terms. Besides, one often finds that some analytes are not present in sufficient concentration in a sample to allow the measuring instruments to efectively detect them. These non-detects are usually labelled as < DL (less-thans) in the data set, indicating that the values are below known detection limits. Many data analysis techniques require complete data sets. Thus, there is a need of sensible replacement strategies for less-thans. The peculiar nature of compositional data determines any data analysis and demands for a specialised treatment of less-thans that, unfortunately, is not usually covered in chemometrics. Some well-founded statistical methods are revisited in this paper aiming to prevent practitioners from relying on popular but untrustworthy approaches. A new proposal to estimate less-thans combining a log-normal probability model and a multiplicative modi cation of the samples is also introduced. Their performance is illustrated and compared on a real data set, and guidelines are provided for practitioners. Matlab and R code implementing the methods are made available for the reader.

Year

2013