mb_burnin.py

A python script to run mrBayes, check for convergence and calculate a sensible burnin.
Copyright Simon R Harris, Newcastle University, Newcastle Upon Tyne, UK. 2008

Installation:
Unpackage the program using the command:
	tar -xvfz mb_burnin.tgz
Move into the mb_burnin folder:
	cd mb_burnin
mb_burnin.py uses mrbayes version 3.2cvs. This must be installed in the mb_burnin folder using the following commands:
	cvs -d:pserver:anonymous@mrbayes.cvs.sourceforge.net:/cvsroot/mrbayes checkout mrbayes
	cd mrbayes
	./configure
	make
If there is an error message when trying to run cvs, you may need to use the following commands first:
	CVS_RSH=ssh
	export CVS_RSH
	
Running and Analysis:

mb_burnin.py Usage:

	mb_burnin.py [options]

	MrBayes options:
		-i	Input file in any format readable by readseq
		-l	Log file name
		-o	Output file name
		-d	Data type	[protein, DNA, RNA, nucleotide, standard, continuous]
				(default=protein)
		-m	Evolutionary model	[if datatype=protein protein: WAG, Poisson, Dayhoff, Mtrev, Mtmam, Rtrev, Cprev Vt, Blosum62]
						(default=WAG)
						[if datatype=DNA/RNA/nucleotide: 4by4, doublet, codon]
						(default=4by4)
		-t	Number of states	[if datatype=DNA/RNA/nucleotide: 6, 2, 1]
						(default=6)
		-v	Use covarion model
		-g	Number of generations in initial analysis (default=1000000)
		-e	Estimate gamma
		-n	Number of gamma categories (default=4)
		-p	Estimate proportion of invariant sites
		-f	Print frequency (default=5000)
		-s	Sampling frequency (default=5000)
		-r	Number of runs (default=2)
		-c	Number of chains per run (default=4)
	Other options:
		-a	Average standard deviation of split frequencies (ASDSF) convergence cutoff (default=0.1)
		-z	Maximum number of generations to run to try to reach convergence (default=5000000)
		-b	Minimum number of samples/trees to for consensus calculation (default=1000)
		-x	Round burnin up to the nearest x generations (default=50000)
		-h	Show this help screen
		-I	Enter interactive mode


The MrBayes options are described in detail in the MrBayes manual.

Options a, z, b and x are mb_burnin specific options that allow the program to check for convergence between chains and calculate a sensible burn-in for calculation of a consensus tree.

Option a defines the average standard deviation of split frequencies (ASDSF) required to assume convergence of the chains in the analysis has been reached. Convergence is assumed if the ASDSF has dropped below the value defined in option a (default = 0.1, as this is the value MrBayes uses to identify potential lack of convergence).

Option z is the maximum number of generations the program will run to try to reach convergence. An initial run will be completed with the number of generations defined by option g. Using the ASDSF between the two (or more) runs, mb_burnin will assess whether or not the analysis has converged after this initial run (see option a). If convergence has not been reached a second run of the same number of generations will be carried out. This process is repeated until convergence is reached or the number of generations run has reached the maximum defined by option z.

Option b defines the number of trees (samples) desired in the calculation of the consensus tree. Normally in a MrBayes run, the number of trees (samples) to discard from the start of the analysis is defined. Instead, mb_burnin calculates the burnin based on two criteria.
1: The ASDSF. The ASDSF must have dropped below 0.1 and remianed at that level for the remainder of the run.
2: The likelihood of samples. The mean likelihood of the final 100 samples from each of the mcmc runs is calculated. For each run, the burnin must be greater than the first point in the mcmc chain where the likelihood becomes better than this average.
The burnin removed from the beginning of the run for consensus calculation is the maximum of these two values rounded to the nearest x generations (see option x below). Instead of defining the length of burn-in, option b allows you to define how long you want the chain to run after the burn-in has been removed. If the total number of samples remaining in each run after removal of the burn-in is less than the value defined by option b, the number of additional samples needed will be estimated and the analysis run for this number of generations more. NOTE: Option b is measured in samples, not generations. To convert the value to generations, multiply by the sampling frequency (option s)

Option x defines a number of generations to round the burn-in to. For example, the default (50,000) will take the value calculated as the burn-in (as described above) and round it to the nearest 50,000 generations. This value can be set to 1 if the exact burn-in value is prefered.


