G Exhaustive


Aim:    To search through all possible tree topologies in order to find the most
parsimonius.

In this practical, we shall examine all possible tree topologies for a
collection of sequences.  Exhaustive searching of all tree topologies is by far
the best method of finding the most parsimonious solutions.  However, for very
large datasets, it is impractical.  The number of possible tree topologies is:

    T = (2n-5!)/(2n-3(n-3)!)

The growth in tree numbers proceeds as follows:

        n       T    
        4       3
        5       15
        6       105
        7       945
        8       10,395
        9       135,135
        10      2,027,025


However, for small numbers of sequences, we can search through all tree
topologies relatively easily.

1. Take a look at the data file called BigHIV.nex.  This file contains a total
of eight sequences of HIV viruses isolated from patients in a variety of
different countries.  Read the datafile into paup's memory.

2. Confirm that you have successfully read the file into memory by printing the
matrix to the screen.

3. Take a look at the options for searching through "all trees".  You can ask
the program to search through all these trees by simply issuing the relevant
command without any options.  Do this now.

4. Note the number of trees that were evaluated and the scores of the shortest
and the longest trees that were encountered.

5. There will be a print out of the tree scores that were encountered during the
search.  It should look something like this:

================================================================================

Frequency distribution of tree scores:

    mean=24.572872 sd=2.034705 g1=-0.787999 g2=0.198534
   /---------------------------------------------------------------------
16 | (5)
17 | (13)
18 |# (34)
19 |##### (151)
20 |####### (205)
21 |############### (449)
22 |################################# (978)
23 |########################## (771)
24 |##################################################################### (2056)
25 |##################################################### (1595)
26 |##################################################################### (2070)
27 |##################################################################### (2068)
   \---------------------------------------------------------------------



================================================================================

This frequency distribution indicates the number of times a tree of a particular
length was encountered during a tree search.  We expect that well-structured
data with low levels of homoplasy and strong phylogenetic signal will produce a
frequency distribution where the largest portion of trees are much longer than
the most parsimonious solutions.

6. Run the exhaustive search again, this time changing the Frequency
Distribution display to 'Histogram" and printing the output to a file called
"BigHIV.freq".  Take a look at the output file, note the number of trees.

7. Quit the program.
