Palarea Albaladejo, J. and Martin-Fernandez, J.A.
Analytica Chimica Acta 764(2013), 32-43.
compositional data, detection limits, EM algorithm, log-normal distribution, log-ratio approach, imputation methods
||Samples representing part of a whole, usually called compositional data in statistics, are commonplace in
analytical chemistry (say chemical data in percentage, ppm, or similar). Their distinctive feature is that
there is an inherent relationship between all the analytes constituting a chemical sample as they only convey
relative information. Some compositional data analysis principles and the log-ratio based methodology are
outlined here in practical terms. Besides, one often
finds that some analytes are not present in sufficient
concentration in a sample to allow the measuring instruments to efectively detect them. These non-detects
are usually labelled as < DL (less-thans) in the data set, indicating that the values are below known
detection limits. Many data analysis techniques require complete data sets. Thus, there is a need of
sensible replacement strategies for less-thans. The peculiar nature of compositional data determines any
data analysis and demands for a specialised treatment of less-thans that, unfortunately, is not usually
covered in chemometrics. Some well-founded statistical methods are revisited in this paper aiming to
prevent practitioners from relying on popular but untrustworthy approaches. A new proposal to estimate
less-thans combining a log-normal probability model and a multiplicative modi
cation of the samples is also
introduced. Their performance is illustrated and compared on a real data set, and guidelines are provided
for practitioners. Matlab and R code implementing the methods are made available for the reader.