A hierarchical model for compositional data analysis

Abstract

This paper introduces a hierarchical model for the analysis of compositional data. Our approach models both the source and mixture data simultaneously, and accounts for several different types of variation: these include measurement error on both the mixture and source data; variability in the sample from the source distributions; and variability in the mixing proportions themselves, generally of main interest. The method is an improvement on some existing methods in that estimates of mixing proportions (including their interval estimates) are sure to lie in the range [0,1]; in addition, it is shown that our model can help in situations where identification of appropriate source data is difficult, especially when we extend our model to include a covariate. We first study the likelihood surface of a base model for a simple example, and then include prior distributions to create a Bayesian model which allows analysis of more complex situations via Markov chain Monte Carlo sampling from the likelihood. Application of the model is illustrated with two examples using real data: one concerning chemical markers in plants, and another on water quality.

Year

2005