simu.in

This input file sets the parameters for the training simulation. Here is an example, taken from the kappa problem:

1 1 1.0 1.0 flag_EM_1_ML_0____do_EM_step____lr_sigma_prior
0 1 flag_CG_____flag_hessian
10 1.0 0.0001 0.0001 EMIN_CG__newstart__initstep__tol_linesearch__tol_convergence
0.00 0.9 0.0 EMIN_STANDARD______lr___momentum___w_perturb_sigma
0 0.0 flag_and_prob_pattern_rejection
0 0.0 flag_weight_decay______weight_decay_parameter
51 number_of_epochs
2 printout_E_step
50 printout_w_step

Row 1
The first number is a flag variable toggling between gradient descent (0) and the EM algorithm (1). The latter gives only an improvement over gradient descent if you choose an RVFL network, that is, if you set the first number in the last row of topo.in to 1. The remaining parameters are, at best, left unchanged.

Row 2
Flag variables for second order schemes. 0 0) No second order scheme is applied, parameter optimisation with gradient descent. 1 0) Parameter optimisation with conjugate gradients. 0 1) Parameter optimisation with the Newton method. This optimisation scheme makes only sense in an RVFl network, that is, if you set the first number in the last row of topo.in to 1.

Row 3
These parameters concern training with conjugate gradients. The first parameter defines how often the search directions are reset. The second parameter specifies the initial step size for the line search. The two remaining numbers set the tolerance for the line search and the overall convergence. These parameters are ignored if training is not performed with conjugate gradients.

Row 4
Training with simple gradient descent. The three parameters define the learning rate, the momentum term, and the standard deviation of an additive Gaussian noise term (for Langevin updating, possibly useful for avoiding local minima). These parameters are ignored if training is not performed with gradient descent.

Row 5
The first parameter is a flag variable, which, if set to 1, discards training exemplars with a probability given by the second number. This decreases the effective size of the training set, thereby accelerating training and implicitly adding noise that might help to avoid local minima.

Row 6
Weight decay. If the first number (a flag variable) is set to 1, a weight-decay penalty term is added, with a weight-decay constant given by the second number.

Row 7
Number of epochs, that is, traverses of the training set.

Row 8
Energies are printed out every Nth step to file E.out.

Row 9
Weights are printed out every Nth step to file w_1.net.


Last modified: 19 May 2000