| 1 1 1.0 1.0 | flag_EM_1_ML_0____do_EM_step____lr_sigma_prior |
| 0 1 | flag_CG_____flag_hessian |
| 10 1.0 0.0001 0.0001 | EMIN_CG__newstart__initstep__tol_linesearch__tol_convergence |
| 0.00 0.9 0.0 | EMIN_STANDARD______lr___momentum___w_perturb_sigma |
| 0 0.0 | flag_and_prob_pattern_rejection |
| 0 0.0 | flag_weight_decay______weight_decay_parameter |
| 51 | number_of_epochs |
| 2 | printout_E_step |
| 50 | printout_w_step |
Row 1
The first number is a flag variable toggling between
gradient descent (0) and the
EM algorithm (1). The latter gives only an
improvement over gradient descent if you
choose an RVFL network, that is, if you set the first
number in the last row of topo.in
to 1. The remaining parameters are, at best, left unchanged.
Row 2
Flag variables for second order schemes.
0 0) No second order scheme is applied, parameter
optimisation with gradient descent.
1 0) Parameter optimisation with
conjugate gradients.
0 1) Parameter optimisation with the Newton method.
This optimisation scheme makes only sense in an RVFl network,
that is, if you set the first
number in the last row of topo.in
to 1.
Row 3
These parameters concern training with conjugate gradients.
The first parameter defines how often the search directions
are reset. The second parameter specifies the initial step
size for the line search. The two remaining numbers
set the tolerance for the line search and the overall convergence.
These parameters are ignored if training is not performed
with conjugate gradients.
Row 4
Training with simple gradient descent.
The three parameters define the learning rate,
the momentum term, and the standard deviation
of an additive Gaussian noise term (for Langevin
updating, possibly useful for avoiding local minima).
These parameters are ignored if training is not performed
with gradient descent.
Row 5
The first parameter is a flag variable, which, if set to 1,
discards training exemplars with a probability given
by the second number. This decreases the effective size of the
training set, thereby accelerating training and implicitly
adding noise that might help to avoid local minima.
Row 6
Weight decay. If the first number (a flag variable) is set to 1,
a weight-decay penalty term is added, with a weight-decay
constant given by the second number.
Row 7
Number of epochs, that is, traverses of the training set.
Row 8
Energies are printed out every Nth step to file
E.out.
Row 9
Weights are printed out every Nth step to file
w_1.net.