GENERALIZED ADDITIVE MODELS : INTRODUCTION
Generalized additive models (GAMs) represent a method of fitting a smooth relationship between two or more variables through a scatterplot of data points.
GAMs are useful where :
* The relationship between the variables is expected to be of a complex form, not easily fitted by standard linear or non-linear models;* There is no a priori reason for using a particular model;
* We would like the data to suggest the appropriate functional form.
ILLUSTRATION OF GAMs
One of the main reasons for using GAMs is that they do not involve strong
assumptions about the relationship that is implicit in standard parametric
regression. Such assumptions may force the fitted relationship away from its
natural path at critical points.
The following example is taken from Snell and Simpson (1991). The graph shows how survival time (in weeks) in related to the log(initial white blood cell count) in leukaemia patients.
|
A fit using a linear model |
|
A fit using a generalized additive model |
HOW DO GAMs WORK ?
* GAMs work by replacing the coefficients found in parametric models, by a smoother.Further details on the models* A smoother is a tool for summarising the trend of a response variable (Y) as a function of one or more predictors (X1...Xp).
* It produces an estimate of the trend that is less variable, i.e. smoother, than Y.
* Smoothing takes place by local averaging that is averaging the Y-values of observations having predictor values close to a target value.
* A simple example of a smoother is a running mean (or moving average).
REGRESSION MODELS
1. Classical Linear Model
CLASSICAL LINEAR MODEL
GENERALIZED LINEAR MODEL
ADDITIVE MODEL
GENERALIZED ADDITIVE MODEL
* As well as coping with continuous predictors such as age and height, GAMs can also deal with categorical predictors such as sex and colour.* These are often easier to deal with than continuous predictors. To smooth Y, the Y values in each category can simply be averaged.
* This satisfies the requirements for a smoother, it captures the trend of Y on X and is smoother than the Y-values themselves.
SMOOTHING METHODS
There are various methods that can be used to estimate the smooth functions S1, ..., Sj that are included in additive models.
The functions can be estimated one at a time, by a scatterplot smoother.
Scatterplot smoothers include :
RUNNING MEANS
The running mean is calculated by finding the mean of all the Y values in a neighbourhood of Xi, as shown by the following formula
where |Ni| is the size of the neighbourhood.
The mean is the value of the smoothing function at the point Xi
The disadvantages of the running mean are that it tends to flatten trends near the endpoints and does not smooth very well in the middle.
RUNNING MEDIANS
The running median is calculated by finding the median of all the Y values
in a neighbourhood of Xi
RUNNING LINES The value on the line corresponding to Xi can then be found
The formula for calculating the running line is as follows
KERNEL
This smoother is calculated by finding a weighted average of all the Y
values in a neighbourhood of
Xi
It can be calculated by the formula
which has the constraint that
The weights Wij are calculated by a kernel function which varies depending on the distribution of the data, but with weights proportional to the distance between the target point and the point in question (Xi-Xj)
SPLINES
A cubic spline is a collection of polynomials of degree less than or equal to
3, defined on subintervals. A separate polynomial is fitted for each neighbourhood, thus enabling the fitted
curve to join all of the points
The order of splines is not limited to three, although cubic splines are the most common.
LOCALLY WEIGHTED REGRESSION SMOOTHER
A locally weighted regression or Loess smoother, is a smoothing function based
on the distance from the point Xi.
For each point in the neighbourhood of Xi the weights are calculated by a function called the tri-cube weight function, where the weights are proportional to the cubic distance of a point from Xi.
MULTIPLE PREDICTORS
The simplest GAMs involve only one predictor (X) variable.
GAMs can be extended to involve multiple predictors X1...Xp, with the same ideas. Smoothers can also be extended and can be seen as acting in p dimensional space. Although less easy to picture, the same methods apply.
DEGREES OF SMOOTHING
The smoothness of a fitted curve depends on the size of the
neighbourhood that is used to
calculate the smoothed value at a particular point.
If the neighbourhood is small the smoothing function will be rough and may have a high variance
If the neighbourhood is large, the estimate may be too smooth and will not pick up curvature in the underlying function i.e. it might be biased
Statistical software usually provides
* A default span, orExample of how to construct a neighbourhood
* Calculates an optimum span, by cross-validation
USES OF GENERALIZED ADDITIVE MODELS
Possible uses of generalized additive models are :
* To investigate the shape of a relationship with a view to a later parametric fit ;* To remove the effect of nuisance variables in order to concentrate on the variables of interest.
USES OF GENERALIZED ADDITIVE MODELS
ADVANTAGES / DISADVANTAGES OF GAMs
ADVANTAGES
APPLICATION TO SPINAL SURGERY DATA
This example is taken from Hastie and Tibshirani (1990).
Lumbar laminectomy is a corrective spinal surgery commonly performed in children for tumours or for various development abnormalities. Following this surgery, spinal deformities can occur, one of which is kyphosis. Kyphosis is defined to be a forward flexion of the spine of at least 40 degrees from vertical.
The available predictors are as follows :
STAGES IN FITTING A GAM
STAGES IN FITTING A PARAMETRIC MODEL
If we are interested in prediction, then it is more useful to have a
parametric model
Stages in fitting a parametric model
APPLICATION SOFTWARE
Generalized additive models can be fitted using several packages including
To give all the Splus code to produce the results and graphs used in the kyphosis example would be tiresome. Included are segments of code which can be altered to produce the appropriate results/graphs.
GENSTAT
To run the kyphosis example you can obtain the
data
To give all the genstat code to produce the results and graphs used in the kyphosis example would be tiresome. Included are segments of code which can be altered to produce the appropriate results/graphs.