B Characters

README for tracing characters on phylogenetic trees.

Aim:
    In this practical we shall look at how different characters require
different numbers of steps on different trees.  We shall use the program
PAUP*4.0.  This program is being developed by Dr. David Swofford, originally at
the Smithsonian Institute, Washington DC and now at Florida Stste University,
Talahassee, Florida.
    The dataset we shall use is a vertebrate morphological dataset.  We shall
use only four animals in order to show that characters do not always agree with
one another.  In the end, we shall use one of Charles Darwin's dictums "...the
aggregate of characters" for deciding which tree topology (branching order) is
the preferred one.
    In this folder, you will find a dataset in "NEXUS" format.  This file has a
number of parts that are all equally relevant.  The first line of the file
begins with:
        #NEXUS
    
    This tells the program that the file is in NEXUS-format.  A file in this
format must conform to a very strict set of guidlines.  Specifically, the file
must be in discrete "blocks" of information.
    The next thing we encounter in the file is a short commentary.  In the same
way as comments are entered into programming code, we can enter some information
in an input file:

-------------------------------------------------------------------------------

[!Data from: Lizard, Dog, Human and Frog                                      ]
[!This practical shows how data sometimes contain homoplastic characters      ]
[!Data are not always completely congruent                                    ]
[!However, the aggregate of characters is usually used to infer relationships ]

-------------------------------------------------------------------------------


The square brackets are used to indicate a comment, the exclamation mark (!)
indicates that this comment will be printed to the screen by the PAUP program.

The next part of the file is the taxa block:

-------------------------------------------------------------------------------
begin taxa;
        dimensions ntax=4;
        taxlabels
                Lizard Dog Human Frog
        ;
end;
-------------------------------------------------------------------------------
  This block indicates that there are 4 taxa and it also indicates their names. 
Like all NEXUS blocks, the block begins with the word 'begin' and ends with the
word 'end'.  All statements end in a semi-colon.
  
The next block is the 'characters' block.

-------------------------------------------------------------------------------
begin characters;
        dimensions 
            nchar=6
            ;
        format 
            symbols = "01"
            ;
        charlabels
            AMNION
            HAIR
            LACTATION
            TAIL
            ONE_JAWBONE
            PLACENTA
            ;
[!
         The first character is the Amnion
         The second character is hair
         The third character is lactation
         The fourth character is tail
         The fifth character is single bone in the lower jaw
         The sixth character is the placenta
]

matrix
        Lizard 0 0 0 1 0 0
        Dog    1 1 1 1 1 1
        Human  1 1 1 0 1 1
        Frog   0 0 0 0 0 0
        ;
end;
-------------------------------------------------------------------------------

    Again, the block begins with the word 'begin' and ends with the word 'end'. 
There is a short commentary, telling the user what each of the characters mean. 
Then we see the word 'matrix' which indicates that the scientific data is about
to follow. The data is arranged with all homologous characters present in the
same column.
    As the commentary says, the first column represents the observations
regarding the amnion.  As we can see, this character is absent (indicated by a
zero) in the Lizard and the Frog.  It is present in the Human and Dog.  Can you
figure out the distribution of character states among the rest of the
characters?

    The last part of the file contains three alternative tree topologies:

-------------------------------------------------------------------------------
begin trees;
        tree best   = [&U] (1,((2,3),4));
        tree second  = [&U] (1,2,(3,4));
        tree worst = [&U] ((1,3),(2,4));
;
end;
-------------------------------------------------------------------------------

    The trees have been given three different names - 'best', 'second', 'worst'.
The four numbers that are used in the nested parentheses treefiles indicate the
four taxa in the order in which they are represented in the data matrix (1 =
Lizard, 2 = Dog, 3 = Human, 4 = Frog).  [&U] means that the trees are not
rooted, they are Unrooted.


EXERCISE


1.  Start the PAUP program.  This can be done in two different ways.  You can
either type the program name followed by the NEXUS file:

    linux$ paup Hair_tail.nex

or alternatively, you can start the program by typing paup and then reading the
datafile into memory using the 'execute' command:

    linux$ paup

    paup> execute Hair_tail.nex;
    
2.  When the PAUP program starts, you will see a 'splash page' that looks
something like this:


-------------------------------------------------------------------------------

P A U P *
Portable version 4.0b10 for Unix
Sat Oct  5 20:05:34 2002

      -----------------------------NOTICE-----------------------------
        This is a beta-test version.  Please report any crashes,
        apparent calculation errors, or other anomalous results.
        There are no restrictions on publication of results obtained
        with this version, but you should check the WWW site
        frequently for bug announcements and/or updated versions.  
        See the README file on the distribution media for details.
      ----------------------------------------------------------------

paup> 

-------------------------------------------------------------------------------


If you read a datafile into memory at the same time as starting the program, you
should see a little more information:



-------------------------------------------------------------------------------

Processing of file "Hair_tail.nex" begins...

Data from: Lizard, Dog, Human and Frog                                       
This practical shows how data sometimes contain homoplastic characters       
Data are not always completely congruent                                     
However, the aggeregate of characters is usually used to infer  relationships
 
     The first character is the Amnion
     The second character is hair
     The third character is lactation
     The fourth character is tail
     The fifth character is single bone in the lower jaw
     The sixth character is the placenta

Data matrix has 4 taxa, 6 characters
Valid character-state symbols: 01
Missing data identified by '?'

3 trees read from TREES block
   Time used = <1 sec (CPU time = 0.00 sec)

Processing of file "Hair_tail.nex" completed.

paup> 

-------------------------------------------------------------------------------


3. Read the data into memory now.

4. The most important command for PAUP is the 'help' command.  Type this command
now.  You should see something like the following list of available commands:

-------------------------------------------------------------------------------
The following commands are always available:

 !              Edit           Help           Quit          
 CD             Execute        Leave          Set           
 Defaults       Factory        Log            Time          
 DSet           FStatus        LSet           ToNEXUS       

The following commands require data from a DATA (or TAXA and CHARACTERS) or
DISTANCES block (* = requires only TAXA block):

*Agree         *DerootTrees   *LoadConstr     Reweight       SurfCheck     
 AllTrees      *DescribeTrees  LScores       *RootTrees     *TaxPartition  
 AncStates      DScores       *MatrixRep      SaveAssum     *TaxSet        
 Assume         Exclude        MPRSets        SaveDist      *TreeDist      
 BandB          Export         NJ            *SaveTrees     *TreeInfo      
 BaseFreqs      ExSet         *Outgroup       ShowAnc       *TreeWts       
 Bootstrap     *Filter         PairDiff      *ShowConstr    *TStatus       
 CharPartition  GammaPlot      Permute        ShowDist       TypeSet       
 CharSet       *GenerateTrees  PScores        ShowMatrix    *Undelete      
*ClearTrees    *GetTrees       PSet           ShowCharParts  UPGMA         
 Condense       HomPart        Puzzle         ShowRateSets   UserType      
*Constraints    HSearch        RandTrees      ShowTaxParts   Weights       
*ConTree        Include       *RateSet       *ShowTrees      Wts           
 CStatus       *Ingroup        Reconstruct    ShowUserTypes  WtSet         
 CType          Jackknife     *Restore       *SortTrees     
*Delete         Lake           RevFilter      StarDecomp    

Type "HELP COMMANDS" or "HELP CMDS" for a one-line description of each
command.

Type "<cmdname> ?" to see brief usage and current default settings.
-------------------------------------------------------------------------------


5. The first command we shall use is the 'showmatrix' command.  Type this
command now.  you should see a column containing the names of the taxa and a
column containing the data.  If you do not see this, then the data has not been
successfully read into memory.  Modify the showmatrix command so that you can
see the "Character Matrix Labels" and so that the width of each column in the
character matrix is 5 spaces.  Re-issue the command with the modifers.

HINT: type the showmatrix command followed by a question mark.

6.  The next command is the 'showtrees' command.  This command will print trees
to the screen.  This command needs to know which trees to print
to the screen (by default it just prints the first one).  You could type
'showtrees 1' if you wanted to see the first tree.  However, in this case, we
wish to see all the trees, so you should type 'showtrees all'.

NOTE: The showtrees command takes a slightly different format to other commands.
 Because we with to find out some information relating to a tree in memory, we
must specify which tree we are interested in.  As a result, the format of the
command is:

Usage: ShowTrees [tree-list] [/ options...] ;

e.g.

showtrees 1 3-7 9 / showtaxnum=yes;


7. Draw each of these trees in your lab notebook.  These trees are rooted using
an outgroup.  In reality, they are unrooted trees.  They have just been drawn in
this way for simplicity.

NOTE: In some circumstances, PAUP recognises the word all.

8.  We would now like to see the scores each of these trees would receive
using the parsimony criterion for evaluating trees. The command for printing the
parsimony scores to the screen is 'pscores'.  Can you figure out how to use this
command?

NOTE: the 'pscores' command has the same format as the 'showtrees' command
above.

If you have successfully issued the pscores command, paup will calculate the
fit of each character to the tree.  It will then add these scores together and
give you the 'tree length'.

9.  Write down the tree length for each tree.

10. We would like to see the parsimony score for each individual character. 
This can be achieved using the command:
    
    pscores all /single=all;
    
Type this now and record the answers.  In your practical book, explain why
you see the results on this screen.

11.  We need to find the parsimony-informative characters.  Often it is useful
to exclude these characters from a dataset, since they contribute equally to the
tree score for all possible trees.  You can use the exclude command to remove
the uninformative sites.  This can be achieved using the command:
 
     paup> exclude uninf;
     
     How many sites were removed?  Why?



Questions: 
1.  Which is the preferred tree using the parsimony criterion?

2.  How many steps are required to describe the character 'amnion' on the first
tree?

3.  Which character requires the most steps on the first tree?


When you are finished, you may quit the program.





