


A PAUP

Aim: To familiarise the user with the paup program.

PAUP is the most versatile software for comparative evolutionary analysis of DNA
sequences.  In this exercise, you will become familiar with how to use the
software.  There are hundreds of commands and options available when using the
PAUP software, you will use a small subset of these commands in this practical.


To illustrate some of the commands, we shall use a dummy dataset of protein
sequences.


1. Read the file 'garfield.nex' (use the unix command 'more' to read the file). 
The datafile is heavily-commented.  Note the structure of the data file.  This
file is in NEXUS format, one of the most popular formats for comparative
datasets.  The NEXUS data format consists of a series of 'blocks'.  These blocks
begin with the word 'begin' and end with the word 'end;'.  NEXUS files always
begin with the hash mark followed by the word NEXUS.


2.  Note the line that begins:
    
        Begin data;
    
    This line indicates that the 'data' block is to follow.  This statement ends
in a semi-colon.  This is a standard end to a statement in the NEXUS format.


3.  The next three lines indicate the size of the dataset:
    
    Dimensions                   
        ntax=5                   
        nchar=21;
        
ntax means "number of taxa"
nchar means "number of characters"

NOTE 1: This statement takes the format:
    Keyword
        option=value
        option=value;

NOTE 2: The statement ends in a semi-colon.


4.  Information concerning the format of the dataset follows:

    Format                        
        datatype=protein          
        gap=-                    
        missing=?                 
        matchchar=.              
        interleave;              

Once again, the information is in the same format and the statement ends in a
semi-colon.


5.  The next part of the file includes a statement instructing the program to
treat the INDELs in the data matrix as though they were an additional character
state (a 21st amino acid)

    Options                              
        gapmode=newstate;               


6.  The next part of the file contains the sequence data.  In this case, the
data is protein.  Note that once again this part of the file ends with a
semi-colon:

    Matrix                              
        Chick     GARFIELDTHELAZY------
        Kangaroo  GARFIELDTHELAZY---CAT
        Human     GARFIELDTHE----FATCAT
        Chimp     GARFIELD-------FATCAT
        Dog       GARFIELDTHELAZYFATCAT
    ;

These sequences are protein and use the single-letter code for proteins.  There
appears to be length-variation in these sequences and therefore it has been
necessary to introduce INDELs in order to construct a sensible alignment.  As a
result, all the sequences in this matrix are now of the same length and
homologous positions have been aligned to each other.  If they were not of the
same length, PAUP will complain and will not read the sequences into memory. 
The INDELs in these sequences will be treated as though they were a 21st amino
acid.

NOTE: Because of evolutionary changes, not all organisms have the same
sequences.  The Chick sequence, for instance is missing the FAT and the CAT
domain.


7.  The last part of the file contains the word:

    end;

This indicates that the data block has ended.


8.  Read the data into the program's memory.  This can be achieved in one of
two ways.  You can either specify the file name at the command line:

    paup garfield.nex

or you can start the program (type paup) and then when the program has started,
you can type:

    execute garfield.nex
    

9.  Note what has been printed to screen.  Does this make sense?  If not,
discuss it with your demonstrator.


10. You should now be presented with the 'paup prompt':

    paup>

This indicates that you are now working within the PAUP environment.


11. The most important command within this environment is the help command. 
This can be used in two ways.  You can either issue the help command without any
trailing qualifiers:

    paup> help;

or you can type:
    
    paup> help commands;

which prints a one-line short description of each command.  Try out both of
these options now.

NOTE: It is good practice to finish each command with a semi-colon.  This tells
the PAUP software that you have finished issuing a command.  It also means that
you can issue a number of commands on one line, each one terminated by a
semi-colon e.g.:

    paup> help; help commands;


12. There is a general format for commands.  You can see this format if you type
a question mark (?) after a command name.  try this with the command for logging
all output to a file.

    paup> log ?;
    
You should see the following:


Usage: Log [options...] ;

Available options:

Keyword ---- Option type ------------------------ Current default setting --
File         <log-file-name>                      garfield.log
Replace      No|Yes                              *No
Append       No|Yes                              *No
Start        No|Yes                              *No
Stop         No|Yes                              *No
FlushLog     No|Yes                               No
                                                 *Option is nonpersistent


The first column contains the keywords that can be used to modify the way in
which the command 'log' works.

The second column contains the possible options for this keyword.

The third column contains the current setting for this option.

You can start logging your paup session to a file called 'blah.log' by issuing
the command:

      paup> log File = blah.log start = yes;

Everything you type from here on and everything that appears on the screen will
be simultaneously printed to a file.  The format of commands is:

    paup> command keyword = option;


13. Can you figure out which command you can use to 'show the character-data
matrix'?  If you find the correct command, it will print the data matrix to the
screen.


14. To remove the chicken sequence from the analysis type:

    paup> delete ?;
    
    after looking at the options, type:
    
    paup> delete Chick;
    
    you can now look at the data matrix in order to see that it has been
successfully removed.


15. To restore this sequence type:

    paup> help;
    
    and figure out the correct command to restore the sequence.


16. To exclude the parsimony-uninformative sites type:

    paup> exclude uninf;
    
    look at the datamatrix now and write down the new datamatrix that results
from issuing this command.  Can you see why these sites are
parsimony-informative?


17. Can you figure out how to include those sites you have previously excluded? 
write the command into your lab book.


TEST

1. Ask paup for the time.  Write the exact answer in your lab book.

2. Ask paup for the current character status.  Write down the output.

3. Ask paup for the current file status for the data, log and tree files.

Quit the program.

Examine the log file.




