Your data

Your own data must be saved in the file mydata.dat. This file must contain a matrix without missing values, with numbers representing the nucleotides as follows: 1->A, 2->C, 3->G, 4->T. Missing values are represented by 0s. If a zero is deteced, the respective column in the alignment will be discarded. The first number in the first line must contain the number of species in the alignment. The line needs to be padded with dummy values (which are required to read in the matrix into MATLAB, but which are not used in the program). If you have got an alignment of m species, the matrix must contain (n*m+1) rows, where n is an integer. The program first reads in the first line to identify the number of species. It then reads in lines 2->(m+1) and assigns them to species 2,...,m. It then continues with lines (m+2)->(2*m+1) and so on. Here is a simple example for the hypothetical case where a user has an alignment of 4 species that is 18 base pairs long. The simplest format is:

4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 2 1 4 2 2 1 1 3 3 2 3 2 3
1 2 1 1 1 2 2 4 2 4 1 1 3 3 2 3 2 1
1 1 1 1 3 2 2 1 2 4 1 1 3 3 1 3 2 3
1 3 1 1 1 2 1 4 2 3 1 1 3 3 3 3 2 3

Note that the blanks between the numbers are necessary to separate the nucleotides. The first line indicates the number of species, 4 in this example. The program then reads in four lines, which represent the DNA sequences of the species of interest. If these sequences are very long, it might be more convenient to use line-wrapping. Here is an alternative format of the input data:

4 0 0 0 0 0 0 0 0 0
1 1 1 1 1 2 1 4 2 2
1 2 1 1 1 2 2 4 2 4
1 1 1 1 3 2 2 1 2 4
1 3 1 1 1 2 1 4 2 3
1 1 3 3 2 3 2 3 0 0
1 1 3 3 2 3 2 1 0 0
1 1 3 3 1 3 2 3 0 0
1 1 3 3 3 3 2 3 0 0

Lines 2-5 belong to species 1-4, respectively. Then comes the line wrap. Line 5 belongs to species 1, line 6 to species 2 and so on. Note that the matrix must not contain any missing values, therefore the last columns must be padded with zeros.


Last modified: Wed May 3 14:17:36 BST