GMCD is a free software package, written in C, for modelling conditional probability densities with Gaussian mixture networks of the type discussed in Neural Networks 10 (3), 479-497 and Neural Networks 11 (1), 89-116. The software package was developed solely as a model exploration tool, and is provided without guarantee of maintenance or support, and without warranty. The copyright holder is not liable for any damages which may result in any manner from the use of this software.
Downloading and Extracting the Files
The programs and documentation files are contained
in a zipped tar file, called EM.tar.gz.
To extract them under UNIX,
give the command
% gunzip EM.tar.gz
% tar xvf EM.tar
If you use WINDOWS, use a program like WINZIP. A successful extraction of the files will create the following directories:
| Prog96 | C programs (Version Nov. 1996) |
| Manual | Documentation (in HTML) |
| Examples | Input files for an example application |
| Examples_output | Output files, obtained from running the gcc-compiled programs on a SUN ULTRA-10, using the input files from the directory Examples. |
| Matlab | MATLAB files for plotting the results, that is, the data in the output files. If you do not have a MATLAB licence, you can, of course, use any other software for plotting graphs, like EXCEL or SPLUS. |
Compiling and Installing the Software
Decide on a name for the
directory in which you want to keep the C programs
(the relative path, by default, is Prog96),
and add this name to your command path. For example,
if you use UNIX, put the following
command into your .login file:
setenv PATH ${PATH}:new_path
where new_path
is the name of the directory into which you have copied
the program files.
Then, log out and log in again, or just give the
command
% source .login
If you have the GNU compiler gcc, you should be able to compile the software running UNIX by typing:
% em.make
This creates a file em.exe, which needs to be made
executable by typing
% chmod +x em.exe
Edit the Makefile em.make
if you are using a different compiler.
Input Files
The program reads in the following input files:
% em.exe
Note that, except for x.in,
each row in the input files finishes with a string.
This string contains information about the parameters read in
from the respective line. It must not contain
any blanks, so use the underscore to separate words.
An initial network configuration can be read in from:
| topo.out | Information about the network topology |
| data.out | Information about the data format |
| simu.out | Information about the training simulation |
| E.out | Evolution of the energy E (= negative normalised log likelihood) during training. 1st column: Epoch number; 2nd column: E on the training set;
3rd column: E on the training set plus penalty term;
4th column: E on the cross-validation set.
The energies are printed out every
printout_E_stepth epoch.
|
| priors.out | Evolution of the mixing coefficients (the prior on the kernels, a_k) during training. 1st column: epoch number. The remaining columns show the mixing coefficients, a_1,...,a_K. |
| sigmas.out | Evolution of the kernel standard deviations (sigma_k) during training. 1st column: epoch number. Remaining columns: sigma_1,...,sigma_K |
| predict.out |
Targets (1st column) and kernel centres (2nd-Kth column, where K
is the number of kernel nodes in the second hidden layer),
from which you can obtain a
state-space plot.
Note that this file is only created if
flag_end=1.
|
| condpro.out |
Cross-section of the
conditional probability density P(y|x)
for a given input vector x specified in
x.in .
The first column contains the target variable y, the second column
the values of P(y|x).
Note that this file is only created if
flag_end=2.
|
| w_1.net |
Trajectory of network weights.
The weights are printed out every
printout_w_stepth epoch.
|
Example
The directory Examples demonstrates an application
of the programs on a synthetic time series generated from the
logistic-kappa map. This is a time series with a bimodal
conditional probability distribution, which was used
as a benchmark problem in
Neural Networks 10 (3), 479-497
and
Neural Networks 11 (1), 89-116.
The training data are contained in train.in,
the test data in cross.in.
The function and format of the other input files were
explained above.
Under UNIX, you
run the program with the following command
(recall that the directory with the executable program
must have been added to the command path):
% em.exe
If you have not changed the input files, this
generates a Gaussian mixture RVFl network with
10 kernel nodes, and optimises the parameters in
a training simulation over 50 EM steps.
The output files that I obtained when running
this simulation on a SUN-ULTRA 10 are kept in
the directory Examples_output.
If you run the programs on a different machine or if you
use a different compiler, your results might be
different. This is especially a consequence of the
stochasticity inherent in the RVFL approach.
Note that changing the random number seed also leads
to different results. The general strategy
is to run the training simulation several times
and to select the model with the lowest cross-validation
error. A more sophisticated approach is
model averaging, as discussed in my book
`Neural Networks for Conditional Probability Estimation',
Perspectives in Neural Computation, Springer Verlag .
This can be done with the program emshow.c.
The current manual does not contain any information
on this option, so consult the comments in emshow.c
for further documentation.
The directory Matlab
contains MATLAB programs for visualising the results.
Copy these programs into the same directory
as the output files, and then run them under MATLAB.
Edit the file E.out and delete the last line,
which should read something like
Epot_0= -1.07705.
Then, give the command
MATLAB> plot_E
E.out and
show you the
evolution of the energy E
on the training and the cross-validation sets during training.
With
MATLAB> plot_predictions
predict.out).
The command
MATLAB> plot_priors
priors.out).
To produce a
cross-section of the conditional
probability density
after a completed training simulation,
copy w_1.net to w_0.net,
set
number_of_epochs=0
(because you do not want to do any further training),
set
flag_load_net=1
(to read in the weights from file)
and
set
flag_end=2
(to produce the scatter plot).
Then run em.exe.
The results are written out to file
condpro.out, which can be plotted with
the command:
MATLAB> plot_condpro
Back to my homepage.
Last modified: 19 May 2000