A parametric-multiplicative treatment of count zeros in compositional data sets

Publication Name

Book of Abstracts of the 8th International Workshop on Compositional Data Analysis (CoDaWork2019)

Publisher

Universitat Politecnica de Catalunya-BarcelonaTECH

ISBN

978-84-947240-1-5

Abstract

Count data are collections of non-negative integer values commonly referring to the number of occurrences or observations across a fixed set of mutually exclusive categories. For example, counts across a given range of species in a biological sample or ecological environment. Compo- sitional techniques based on the log-ratio methodology are appropriate in those cases where the total sum of the vector of counts is not of interest, that is, the analyst assumes the scale invari- ance property. However, such compositional count data sets can contain zero values which are often the result of insufficiently large samples or the sampling experiment design. That is, they refer to unobserved positive values that may have been observed with a larger or more sensitive sampling process. Because compositional analysis is applied to log-ratio coordinates of the data, count compositions require a proper replacement of the zeros. Mart ́ın-Fern ́andez et al. (2015) introduced a treatment that combines Bayesian estimation with a multiplicative modification of non-zero values. This modification causes just minor distortion in the covariance structure and becomes an appropriate treatment for count zeros. The treatment, based on compound- ing the Dirichlet and multinomial distributions, is implemented in the package zCompositions (Palarea-Albaladejo and Mart ́ın-Fern ́andez 2015). The analyst must select the parameterization of the prior distribution among a number of possibilities that provide different zero replacement results. Formal analysis and practical experiments showed that the so-called geometric Bayesian- multiplicative (GBM) prior provides the most satisfactory results when compared with the other alternatives considered. In this work we introduce a new treatment of zeros by compounding the log-ratio normal and multinomial distributions recently introduced in Comas-Cuf ́ı et al. (2018). Importantly with this new approach the selection of the prior is avoided. Following the EM- algorithm for rounded zeros implemented in zCompositions, estimates of the proportions are obtained and a multiplicative adjustment to non-zero part is applied. To accelerate the expecta- tion step (E) of the algorithm when used on large data sets, a new approach based on the mode of the distribution is proposed. The performance of this new approach is illustrated using real and simulated data sets.

Year

2019