Document details for 'Zero problems with compositional data of physical behaviors: a comparison of three zero replacement methods'

Authors Rasmussen, C.L., Palarea Albaladejo, J., Johansson, M.S., Crowley, P., Stevens, M.L., Gupta, N., Karstad, K. and Holtermann, A.
Publication details International Journal of Behavioral Nutrition and Physical Activity 17, 126. BMC.
Publisher details BMC
Keywords Physical activity, sedentary behaviour, compositional data analysis
Abstract Background Researchers applying compositional data analysis to time-use data (e.g., time spent in physical behaviors) often face the problem of zeros, that is recordings of zero time spent in any of the studied behaviors. Zeros hinder the application of compositional data analysis as it is based on log-ratios. One way to overcome this challenge is to replace the zeros with sensible small values. The aim of this study was to compare the performance of three existing replacement methods used within physical behavior epidemiology: simple replacement, multiplicative replacement, and log-ratio expectation-maximization (lrEM) algorithm. Moreover, we assessed the consequence of choosing replacement values higher than the lowest observed value for a behavior. Method Using a complete dataset based on accelerometer data from 1,310 Danish adults as reference, multiple datasets were simulated across six scenarios of zeros (5-30% zeros in 5% increments). Moreover, four examples were produced based on real data, in which 10% and 20% zeros were imposed and replaced using a replacement value of 0.5 minutes, 65% of the observation threshold or an estimated value below the observation threshold. For the simulation study and the examples, the zeros were replaced using the three replacement methods and the degree of distortion introduced was assessed by comparison with the complete dataset. Results The lrEM method outperformed the other replacement methods as it had the smallest influence on the structure of relative variation of the datasets. Both the simple and multiplicative replacements tended to introduce higher distortion, particularly in scenarios with more than 10% zeros; although the second, like the lrEM, does preserve the ratios between behaviors with no zeros as desirable. The examples revealed that replacing zeros with a value higher than the observation threshold, apart from being incoherent, severely affected the structure of relative variation. Conclusions Given our findings, we encourage the use of replacement methods that preserve the relative scale of physical behavior data such as the multiplicative and lrEM replacements do, and to avoid simple replacement. Moreover, we recommend not to replace zeros with values higher than the lowest observed value for a behavior.
Last updated 2020-10-07

Unless explicitly stated otherwise, all material is copyright © Biomathematics and Statistics Scotland.

Biomathematics and Statistics Scotland (BioSS) is formally part of The James Hutton Institute (JHI), a registered Scottish charity No. SC041796 and a company limited by guarantee No. SC374831. Registered Office: JHI, Invergowrie, Dundee, DD2 5DA, Scotland