Postgraduate Research & Training

Standard errors in systematic sampling

Oak Treespaceroak leaves and acorns

 

Suppose we want to know how many acorns lie on the ground under an oak tree. Maybe we don't have enough time to count all of them, and an approximate answer is sufficient. So instead we count the number of acorns under only part of the tree, for example by dividing the ground into 50cm x 50cm squares, then counting the number in a randomly chosen 1/9th of the squares:

systematic sampling of acorns under the oak tree

We can estimate the total number of acorns by multiplying by 9 the number in our sample. This is an unbiased estimate, but we should also indicate how accurate it is, by calculating a standard error or confidence interval. Because we sampled using a random design, there is a simple equation for doing this. However, random designs are often very inefficient: we could estimate the total number of acorns a lot more precisely if we had instead used a systematic design such as:

random sampling

Again we count the number of acorns in 1/9th of the squares, but now laid out in a regular grid rather than chosen at random, and unbiasedly estimate the total number of acorns by multiplying by 9 the number in our sample. But, surprisingly, and unfortunately, there is no generally accepted way to calculate a standard error for this sampling scheme!

Because the data are essentially a sample of size one, no design-based standard error can be derived; and the alternative, a model-based approach, is time consuming to implement and relies on subjective assumptions that are difficult to validate. Each branch of statistics pursues a different approach to overcome this problem: geostatisticians model the covariances and use kriging (Aubry and Debouzie, 2001); stereologists appeal to asymptotics, with the variance of the estimator proportional to Tq, where q is a measure of the smoothness of the sampled body and the 1D case is known as Cavalieri sampling (Garcia-Finana and Cruz-Orive, 2004); and survey statisticians approximate using randomisation-based approximations, such as assuming blocks of size two had been used (Ripley, 1981).

The aims of this PhD project are:

  1. compare these approaches for obtaining standard errors, and explore syntheses, such as constructing confidence intervals after first modelling data by a smooth trend plus independent observation errors;
  2. consider whether there are better estimators of the total than simply rescaling the sample total;
  3. determine when designs intermediate between systematic and random sampling are preferable because they permit quantification of the precision of results.

References

For further details, contact Chris Glasbey

Knowledge Exchange

User Friendly Software

Training For Scientists

Postgraduate Research & Training