Milestone 3: Learning parameters from complete data
First half of November 2003
Theory
The objective of the next two milestone sections is
to learn the model parameters from the observed data.
Recall that your data consists of a sequence of observations,
but you usually don't have any information about
the hidden states (that's why they are called
"hidden").
This problem, the learning of the model parameters
in the absence of complete observation, will be
the subject of the next milestone section.
For the time being (to make things easier and
to approach the problem in little steps), let us
assume that we
have a chance to peek at what is happening in the background
and that we
do know the hidden state sequences.
So your data consist of the sequence of observations
and the sequence of hidden states.
Your task is the following:
-
Describe the methodology of maximum likelihood
for learning model parameters from data.
-
Derive maximum likelihood equations for the
parameters of your hidden Markov model.
Recall that the set of parameters consists of your
(1) initial, (2) transition, and (3) emission
probabilities.
Tip: Apply the
method of Lagrange multiplier.
-
Do you see a possible problem of this approach
when the training set is sparse?
Can you think of a possible solution?
Practice
Implement the maximum likelihood equations
(and any sparse-data fix you might have come
up with) in software,
and add this module to the
program you have written for
Milestone 1.
Take the sequences you have generated
for Milestone 1,
and learn the parameters (which you know)
from the data.
Recall that the maximum likelihood estimator is consistent.
This means that as your data set increases in size
and becomes very,
very large, the parameters you estimate from the
data will eventually become equal to the correct ones.
Test it! If it doesn't work, then there is something
wrong either with your equations, or with your software
implementation.
In practice, however, we usually don't have very large data
sets available, and it is therefore of interest
to investigate how the
accuracy of parameter estimation deteriorates
as your training set size decreases.
For this reason, repeat your parameter estimation
-
for different problems (recall that
for Milestone 1,
you have generated data for three different
scenarios of different complexity), and
-
for different training set sizes.
Plot by how much the parameter estimates you get
from your program
differ from the actual parameter values.
Outlook:
The subject of the next milestone section is to
look into how
to learn the parameters for the more realistic
scenario in which we don't know the hidden state
sequences.
Back to the previous page.