javac RemoveGaps.java
To run the program, just type:
RemoveGaps
dna.dat.
It discards all the sites with symbols
different from a nucleotide. The results are written
out to file dna_no_gaps.dat, which
is also in BAMBE format.
The program also writes
out a file called gapSites.out,
which contains a list of all the sites in the original
alignment that have been discarded.
4 50
strain1 TGGGGCAAAA -TTTAGTCAA TTTTCGTAAC TTTTTTATTT TGAAAAATTC
strain2 TGGGGCAAAA TTTTAGTCAA TTTTCGTAAC TTTTTTATTT TGAAAAATTT
strain3 TGGGGCAAAA TTTTAGTCAA TTCTCATAAC TTTTTTATTT TGAAAAATTT
strain4 TGGGGCAAAA -TTTAGTCAA TTTTCATAAC TTTTTTATTT TGAAAAAT-C
First, transform this file into
BAMBE format, and call this new file
dna.dat:
4 50
strain1
TGGGGCAAAA-TTTAGTCAATTTTCGTAACTTTTTTATTTTGAAAAATTC
strain2
TGGGGCAAAATTTTAGTCAATTTTCGTAACTTTTTTATTTTGAAAAATTT
strain3
TGGGGCAAAATTTTAGTCAATTCTCATAACTTTTTTATTTTGAAAAATTT
strain4
TGGGGCAAAA-TTTAGTCAATTTTCATAACTTTTTTATTTTGAAAAAT-C
Now give the command:
java RemoveGaps
This gives you two output files.
The file dna_no_gaps.dat
contains the ungapped DNA sequence alignment
(in BAMBE format):
4 48
strain1
TGGGGCAAAATTTAGTCAATTTTCGTAACTTTTTTATTTTGAAAAATC
strain2
TGGGGCAAAATTTAGTCAATTTTCGTAACTTTTTTATTTTGAAAAATT
strain3
TGGGGCAAAATTTAGTCAATTCTCATAACTTTTTTATTTTGAAAAATT
strain4
TGGGGCAAAATTTAGTCAATTTTCATAACTTTTTTATTTTGAAAAATC
On transforming this into PHYLIP format,
you get:
4 48
Note that the second number in the first row, which
gives the number of nucleotides, has been corrected
since two columns with gaps have been removed.
The numbers of the columns that have been discarded are
written out to file
strain1 TGGGGCAAAA TTTAGTCAAT TTTCGTAACT TTTTTATTTT GAAAAATC
strain2 TGGGGCAAAA TTTAGTCAAT TTTCGTAACT TTTTTATTTT GAAAAATT
strain3 TGGGGCAAAA TTTAGTCAAT TCTCATAACT TTTTTATTTT GAAAAATT
strain4 TGGGGCAAAA TTTAGTCAAT TTTCATAACT TTTTTATTTT GAAAAATC
gapSites.out:
11
49