Module 8.
Constructing Phylogenetic trees
An unrooted tree has no node that is designated as the ancestral node or root.
An unrooted tree of life
For unrooted trees the root could be on any of the terminal or internal branches, so there are always more rooted trees than unrooted trees. The number of possible trees grows exponentially with the number of sequences so for 10 sequences (often called OTUs = operational taxonomic units) there can be 34 million rooted and 2 million unrooted trees. To root an unrooted tree a node must be designated as the root. To do this, a new sequence can be added that is known to be distantly related to all the other sequences in the tree. For example, a beta globin sequence might be added to a tree of alpha globins. The divergence of alpha and beta globin is known to be older than the divergence of any alpha globin from any other. Therefore, the point where the beta globin branch joins the tree of alpha globin sequences is the root. This type of sequence is called an outgroup. All rooted trees, by definition must have an outgroup. The UPGMA method produces rooted trees, with the most distant branch being the outgroup. Because the UPGMA method uses mathematical averaging of distances, UPGMA assumes a constant rate of evolution in all branches. This is usually not true, so UPGMA is criticised as being too simple a method. Neighbor joining makes unrooted trees and does not assume constant rates of change. The longest branch is not necessarily the oldest branch on a neighbor joining tree.
(Bovine:0.69395,(Gibbon:0.36079,(Orang:0.33636,(Gorilla:0.17147,(Chimp:0.19268,
Human:0.11927):0.08386):0.06124):0.15057):0.54939,Mouse:1.21460);
The area of phylogenetics is mathematically rich. There are many different algorithms for computing trees and inferring phylogeny. As in any field, there are experts and they do not always agree on the best methods and they like to promote their favorites. This led to a running battle between cladistics and phenetics, two different phylogenetic approaches. A nice discussion of this is given by Fred Opperdoes here A simplification is that cladistics relies on parsimony methods and phenetics relies on distance methods. The question is what do you use and how do you decide? One site puts this very nicely as "...anyone with a copy of, say, PAUP* can learn how to "point and shoot" to make a neighbor-joining tree, or a cladogram or a likelihood tree, but
this doesnt really mean you know what youre doing..." This page tries to give in depth explanations of the different methods, so you do know what you are doing. It suffers from lack of explanations for the uninitiated. It does link to another page which is the manual for the package
MEGA. This seems to be a more detailed resource.
Parsimony adherents rather look down on distance methods as being too simple and unreliable, but here is a quote from the MEGA manual about this.
"Some authors (e.g., Farris 1981, Penny 1982) have argued that distance methods are inherently inferior to
discrete-character methods (e.g., parsimony methods), but their arguments are apparently based on misconceptions of distance methods (Felsenstein 1986, Nei
1987). Actually, some distance methods can be superior to discrete character methods in obtaining the correct tree, depending on the situation."
The cladistic approach is to define clades, organisms that are descended from a common ancestor. This is done by identifying shared derived characters called synapomorphies (characters that are shared in common by all members of the clade, but not by organisms outside the clade) For example, butterflies and moths both have scales on their wings so they are grouped together in the Lepidoptera. A fly would not be in this group. So this is an informative character. When looking at flies, moths and butterflies, they all have six legs, which is not informative, so cladistics would ignore that character. Phenetics is based on grouping organisms by similarity, and phenetics considers every character, so six legs is something that unites flies with moths and butterflies. In terms of sequence data, in a sequence alignment, a cladistic analysis would ignore all 100% conserved amino acids as not informative. A Phenetics approach would count them as part of the similarity between the sequences. These would go into computing the difference matrix. Technically, the tree generated by a cladistic approach is called a cladogram and it tries to be a true account of the history of the organisms, a genealogy. A phenetics based tree is called a phenogram. The phenogram and the cladogram may look the same, but thay are made from different treatments of the data. Cladistics has forced revisions to nomenclature. The term reptiles is no longer acceptable as a phylogenetic classification because it excludes birds. The term reptilia has replaced it and this includes the birds.
The following data set is a test set for this program. ? means the amino acid that belongs there is not known. A . means the same as the first sequence.
This module dicusses how to make phylogenetic trees from multiple sequence alignments.
A sequence alignment can be useful for identifying conserved amino acids or motifs shared in common among many sequences. This may be useful in designing degenerate PCR primers to be used to find additional members of the family or it can be used to choose amino acids to mutate in experimental work on a particular gene. A sequence alignment contains much more information than that. All the sequences in a sequence alignment are assumed to be descended from a common ancestor, that is why they share sequence relatedness in the first place. From the time of the last common ancestor, the sequences have diverged. The sequence alignment has within it an approximate history of those sequences. This record can be extracted by making a tree showing the relationships between the sequences. If the sequences are from different species, this is called a phylogenetic tree. If the sequences are from a single species (like our 49 mouse mitochondrial carrier sequences), the tree does not represent phylogeny but the history of the protein family. This type of tree is often called a gene tree or a dendrogram.
This module will show you how to make several types of trees using a single sequence alignment. This area has been a controversial one, because the tree results can be used to infer the evolutionary history of groups of organisms. There have been very heated debates in the literature often running for years over such things as are humans evolutionarily closer to the chimp or the gorilla? This is not an issue now because more data became available and the question is resolved. When the data were sparse, debate raged over whose tree building method was the best and which results could you believe. Trees using different genes would often support opposite conclusions. More modern debates concern whether there was a single origin of modern humans in Africa about 200,000 years ago (mitochondrial Eve hypothesis) or are there other explanations (the multiregional hypothesis). Much of this argument hinges on the building and interpreting of trees of human mitochondrial DNA.
There are two main ways to build trees from sequence data. These are called distance methods and character-state methods. The distance methods use a difference matrix (or distance matrix) containing all the pairwise distances between all the sequences. We will talk about the UPGMA (Unweighted Pair Group Method using Arithmetic averages) and Neighbor Joining methods and these are both distance methods. Character-state methods keep track of the amino acid or nucleotide at a give site in a sequence. They start with the known sequences and attempt to reconstruct the history of changes that had to take place from a common ancestor. There are usually large numbers of ways to do this so these methods try to minimize the number of changes required to go from the common ancestor to the present day sequences. Each branch on a tree of this kind has a length equal to the number of substitutions (or mutations) required to get from one node to the next. The program looks at large numbers of possible trees and chooses those that have the shortest total number of steps. It is not uncommon that there may be a few hundred trees all with the same number of steps. In that case a consensus tree can be made. The character-state methods tend to be more computer intensive. These include parsimony methods such as PAUP (Phylogenetic Analysis Using Parsimony), PROTPARS and DNAPARS in PHYLIP. Maximum likelihood methods include ProtMl in the MOLPHY package and DNAML or DNAMLK in PHYLIP. There are more in the molecular evolution packages listed below (see MEGA and MacClade)
Distance methods can be used with any set of pairwise distances. These can be from non-sequence data. If you can measure anything between sets of related objects, like flowers, skeletons, cars, baseball players or gene expression on DNA chips, you can make a distance matrix and a tree. Character state methods can also be applied to discrete data like Restriction Fragment Length Polymorphisms (RFLPs) and skeletal features in fossils. Rob Edwards has recently counted the presence and absence of genes in 100 phage genomes and made a phylogeny of phage. This type of analysis is even applied to old manuscripts to detect the history of copying and editing changes over hundreds of years. Existing manuscripts of books like the Tale of Genji by Murasaki Shikibu (written approximately 1000 years ago) and the 84 different manuscript copies of Chaucer (see link1 and link2) are said to have lineages. For an explicit discussion of Maximum parsimony applied to the Letter of James see
Phylogenetic analysis of the Greek New Testament
Biologists and molecular systematists are interested in the history of life and that is what they want to know from sequences. How can these sequences help in identifying the relationships among living things? There are layers and layers to this question, because it can apply to all life, just eukaryotic life or just artiodactyls in the mammals. I am interested in the evolution of eukaryotic life (see my pages on the Molecular History of Eukaryotic Life.
MHEL Before we can go further, we need to know some basics, such as what is a rooted tree or an unrooted tree.
A tree can be rooted or unrooted. A rooted tree has one node that is designated as the root and all other branches and nodes trace back to that node. This implies there is an ancestral relationship of that node to the other parts of the tree. There is a timeline to such a tree.
diagrams of tree nomenclature
The UPGMA method starts with a matrix of pairwise distances between all sequences. The smallest distance is found and these two sequences are then treated as a unit. The matrix is recomputed with the distances to the pair being replaced by the average of the two distances to the separate sequences. This results in an array that is smaller by one. The process is repeated until all the sequences are joined in clusters. The order of clustering is kept in the computers memory and this is the data used to make the tree.
The Neighbor Joining method is another distance method. It does not average the branch lengths at each step so the branches from the same terminal node do not have to be the same length. This feature allows for uneven rates of evolution. This algorithm seeks to cluster sequences in such a way that the total branch length of the tree is minimized. The tree starts out as a star, with all branches coming from a single point like spokes of a wheel. In the first step, the two most similar sequences (smallest distance between them) are joined. These are then treated as a single unit (a separate spoke on the wheel). The process is repeated, and the measure that is used is minimization of the total branch length. Neighbor Joining does not make a rooted tree, but designation of a root can be made if an outgroup sequence is included. The root is then the point where the outgroup joins the rest of the tree.
We will be using the PHYLIP package to make these types of trees. PHYLIP is a set of 30 programs that are all related to molecular evolution. It has been loaded on your computers in the teaching lab and can be downloaded for PC, Mac or UNIX machines free of charge (see link in the references). The first step in going from a sequence alignment to a tree is generating the distances for the difference matrix. This is done with the programs PROTDIST for protein alignments or DNADIST for DNA alignments. The alignments have to be in PHYLIP format that we discussed in Module 7. This is an output option from the CLUSTALW package. The alignment must be in the PHYLIP folder along with the programs that will manipulate it.
I have prepared the PHYLIP format alignment for the 51 trimmed human mitochondrial carriers. This is available at this
link. Copy the file, paste it in a Word document and save it as text only to the PHYLIP folder. Now in the PHYLIP folder start the PROTDIST program. It will ask you for a file name. Type in the file name for the alignment and hit return. (I noticed that this program on a PC needs you to type a .txt at the end of your filename, even if it does not show in the name of your file. Windows does something odd and adds on hidden extensions to filenames.) You will get a menu that says: Settings for this run. Accept the settings by typing Y for yes then return. The program should run and compute the distance matrix for the alignment. The results will be written to outfile. You should change the name of this file, since many programs in this package write the results to outfile and that will erase your previous results if you do not change the name. Since this is a difference matrix I usually name these files dfmat.car51, where the car51 part tells me what is in the file (51 carriers) and the dfmat tells me it is a difference matrix. I have saved the difference matrix for you to look at
DFMAT.CAR51.
Once you have the difference matrix you can quit PROTDIST and start NEIGHBOR. This program will calculate either UPGMA or Neighbor Joining trees. When you start the program you are prompted for a file name. Type in dfmat.car51 and hit return. There is a menu called Settings for this run: with several options that you can select. The first is Neighbor Joining or UPGMA. You can switch between the two by typing an N and return. This will redraw the menu with the change made. You can do this for any item on the menu. By typing Y for yes, you select the options and start the program. If you choose Neighbor joining, there is a selection choice called o for setting the outgroup. Please type in 41 as the sequence for the outgroup. We will talk about this later. If all is well, the program will begin to calculate your tree. Once it is done it will write the output to the output file and the tree to the treefile. You should rename these files as output.car51 and treefile.car51 or some similar scheme to preserve them from being overwritten next time. You may find an error allocating memory message. This happened to me when I did not change the name of my outfile and ran two programs in a row. The new data was written on at the end of the last outfile, so the file was useless.
The treefile is written in the accepted standard text format for trees. It is called Newick format or New Hampshire format. Newick is the name of a restaurant where the format was devised. The format is a series of parentheses, sequence names and distances. It looks like this
You have the option now to view a rough version of the tree. You can open the output.car51 file in Word. I have linked an HTML version of it OUTFILE.CAR51. This shows a text version of the tree. In Word, I recommend that you select all and set the font to 9 point courier, so the tree does not wrap on the screen. Now you can scroll down and see the tree. Each branch is labeled with the 10 character name from each sequence in the PHYLIP alignment. This output file is important later since the actual drawing of the tree may not contain legible labels. I recommend printing just the pages that have the tree on them. Below the tree there are more pages with numerical data about distances. You do not need to print that.
To draw the tree start the program DRAWGRAM. This is going to ask you for a file name for the treefile. Type in treefile.car51. It will then ask you for a fontfile. Type in font2 and return. Then you will get a long choice of options. These are dependent on your computer, so you have to choose differently from Mac and PC. The settings I use for Mac are M, M, 1, 2, P, 4, 90, Y. You have to hit return after each and the list is redrawn. For PC use M, I, 1, 2, P, 4, 90, Y. Once the tree is drawn, it asks you if you want to view the result. Say Y and return. The tree will be drawn for you on the screen. If you like it hit return and then it will ask you to save it. Type Y and return. The program will save the file as plotfile. Please change the name to plotfile.car51. This file can be openned by most drawing programs and labeled and colored and made suitable for publication. I suspect you will have to replace the font2 labels with your own and that is why it was a good idea to print out the outfile.car51. If you do not have access to a fancy drawing program like Illustrator, you can still view the tree in Word. Just use the insert picture from file command and select the plotfile.car51 file.
There are other tree drawing programs you can download that are more user friendly. One that works well and gives nice output with labels you will not have to replace is NJplot. You can download this at
NJplot. This program also allows you to swap nodes at will to redraw the tree in the most pleasing visual way for you. I have used it with a 92 sequence tree and it worked very well. The picture can be copied from the edit menu and pasted into a Word document or other drawing program. Here is a plot of the UPGMA tree made from the treefile.car51
Trees are adversely affected by having partial sequences included in the alignment. These can skew the difference matrix and cause the clustering of sequences to be off. If two partial sequences are in the alignment and they do not overlap, the distance between them will be 100% even though that is not realistic. It is best to delete such short sequences.
Parsimony and Cladistics
Running the parsimony program PROTPARS for protein sequences
5 10
Alpha ABCDEFGHIK
Beta AB--EFGHIK
Gamma ?BCDSFG.??
Delta CIKDEFGHIK
Epsilon DIKDEFGHIK
If you start PROTPARS and type in the name for this file it will give you a menu with options.
Setting for this run: U Search for best tree? Yes J Randomize input order of sequences? No. Use input order O Outgroup root? No, use as outgroup species 1 T Use Threshold parsimony? No, use ordinary parsimony C Use which genetic code? Universal M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Print out steps in each site No 5 Print sequences at all nodes of tree No 6 Write out trees onto tree file? YesIf you leave all the setting the same and type Y return, the program will run and make a set of trees with the smallest number of steps. (see trees) The output shows three trees of length 14. To decide which is correct would take additional information. This is a computer intense process for many sequences so we will do only a small alignment file here. I have taken the last two pages of your assignment from last week and done some editing. I have trimmed the gap filled regions off to give blocks of sequence that are without dashes on the ends. These two blocks have been joined together to make a single block of 58 amino acids by 47 sequences (seq 114a and 138 are removed).
47 58 50B LVAGGGAGAVSRTCTAPLDRLKVLMQVHCIVGGFTQMIREGGAKSLWRGNGINVLKIA 51 LVAGGGAGAVSRTCTAPLDRLKVLMQVHCIVGGFTQMIREGGAKSLWRGNGINVLKMP 52C LLSGAMAGAVSRTGTAPLDRARVYMQVYNLLSGLRSLVQEGGVRSLWRGNGINVLKIA 54 LVAGAVAGAVSRTGTAPLDRLKVFMQVHNILGGLRNMIQEGGVLSLWRGNGINVLKIA 56 LLAGGVAGAVSRTSTAPLDRLKVMMQVHNIFGGFRQMVKEGGIRSLWRGNGTNVIKIA 27 FLAGGIAAAVSKTAVAPIERVKLLLQVQIIDCVVRIPKEQ-GFLSFWRGNLANVIRYF 34 FLAGGVAAAISKTAVAPIERVKLLLQVQIIDCVVRIPKEQ-GVLSFWRGNLANVIRYF 44A FLCGSISGTCSTLLFQPLDLLKTRLQALMLAVFLKVVRTE-SLLGLWKGMSPSIVRCV 45 FLCGSISGTCSTLLFQPLDLLKTRLQALMLAVFLKVVRTE-SLLGLWKGMSPSIVRCV 46D YLCGYCAAFNNVAITYPVQKILFRQQLYKTRDAVLQLRKD-GFRNLYRGILPPLMQKT 59A FLAGGIAGCCAKTTVAPLDRVKVLLQAHVLSTLRAVPQKE-GYLGLYKGNGAMMIRIF 70A WYFGGLASCGAACCTHPLDLLKVHLQTQMTGMALQVVRTD-GFLALYNGLSASLCRQM 62B LLSGALAGALAKTAVAPLDRTKIIFQVSAFRLLYFTYLNE-GFLSLWRGNSATMVRVI 73F LTAGAAGGTACVLTGQPFDTMKVKMQTFLTDCCLKTYSQV-GFRGFYKGTSPALIANI 73G LTAGAAGGGACVLTGQPFDTIKVKMQTFLADCFLKTYNQV-GIRGLYRGTSPALLAYV 75 LLAGGFGGMCLVFVGHPLDTVKVRLQTQTLDCFRKTLMRE-GITGLYRGMAAPIIGVT 78I FVAGWIGGVASVIVGYPLDTVKTRLQAGTFNCIRMVYKRE-RVFGFFKGMSFPLASIA 78E FVAGWISGAVGLVLGHPFDTVKVRLQTQIVDCVVKTYRHE-SVLGFFKGMSFPIASVA 78F FLAGCAGGVAGVIVGHPFDTVKVRLQVQTLHCFQSIIKQE-SVLGLYKGLGSPLMGLT 78G FVAGAIGGVCGVAVGYPLDTVKVRIQTEIWHCIRDTYRQE-RVWGFYRGLSLPVCTVS 90 FTLGSVAGAVGATAVYPIDLVKTRMQNQSFDCFKKVLRYE-GFFGLYRGLIPQLIGVA 90C VAAGGSAGLVEICLMHPLDVVKTRFQVQVRGSFQMIFRTE-GLFGFYKGIIPPILAET 92 FGLGSIAGAVGATAVYPIDLVKTRMQNQSFDCFKKVLRYE-GFFGLYRGLLPQLLGVA 94B LINGGIAGLIGVTCVFPIDLAKTRLQNQMSDCLIKTIRSE-GYFGMYRGAAVNLTLVT 96 LINGGIAGLVGVTCVFPIDLAKTRLQNQMTDCLMKTARAE-GFLGMYRGAAVNLTLVT 96D YVFGVATTMMIRVSVYPFTLIRTRLQVQTFDAFVKILRAD-GVAGLYRGFLVNTFTLI 102 AVAGSVSGFVTRALISPLDVIKIRFQLQIFQAAKQILQEE-GPRAFWKGHVPAQILSI 103E FLMSGVAACGACVFTNPLEVVKTRMQLQVFHAFFTIGKVD-GLAALQKGLGPALLYQF 103F LVLGASACCLACVFTNPLEVVKTRLQLQFVSSVAAVARAD-GLWGLQKGLAAGLLYQG 105D MIASCTGAVLTSLMVTPLDVVKIRLQAQTLDAFLKILRNE-GIKSLWSGLPPTLVMAI 107 MVASGAGAVVTSLFMTPLDVVKVRLQSQTLDAFVKIVRHE-GTRTLWSGLPATLVMTV 112 LFAGGCGGTVGAILTCPLEVVKTRLQSSPLHCLKAILEKE-GPRSLFRGLGPNLVGVA 114 LFAGGCGGTVGAIFTCPLEVIKTRLQSSLLQVLKSILEKE-GPKSLFRGLGPNLVGVA 124 LVAGVSGGVLSNLALHPLDLVKIRFAVSILHCLATIWKVD-GLRGLYQGVTPNVWGAG 126 AVAGAVGSVTAMTVFFPLDTARLRLQVDTHAVLLEIIKEE-GLLAPYRGWFPVISSLC 133 MTAGXXAGILEHSIMYPVDSVKTRMQSLIYGALKRIMHTE-GFWRPLRGLNVMMMGAG 135 MVAGAVAGILEHCVMYPIDCVKTRMQSLVLEALWRIMRTE-GLWRPMRGLNVTATGAG 152 ILAGGLAGGIEICITFPREYVKTQLLSDIGDCVRQTVRSH-GVLGLYRGLSSLLYGSI 166 GFGGVLSCGLTHTAVVPLDLVKCRMQVDIFNGFSITLKED-GVRGLAKGWAPTLIGYS 167A GLGGIISCGTTHTALVPLDLIKCRMQVDIFNGFSITLKED-GVRGLAKGWAPTLIGYS 179 FLFGGLAGMGATVFVQPLDLVKNRMQLSSFHALTSILKAE-GLRGIYTGLSAGLLRQA 184D FLLSGCAATVAELATFPLDLTKTRLQMQMVRTALGIVQEE-GFLKLWQGVTPAIYRHV 190 IFSAGVSACLADIITFPLDTAKVRLQIQVLGTITTLAKTE-GLPKLYSGLPAGIQRQI 193A FLGAGTAACFADLLTFPLDTAKVRLQIQVLGTILTMVRTE-GPRSPYSGLVAGLHRQM 195 FLGAGTAACIADLITFPLDTAKVRLQIQVLGTILTMVRTE-GPRSLYNGLVAGLQRQM 198B FVYGGLASIVAEFGTFPVDLTKTRLQVQMFHALFRIYKEE-GILALYSGIAPALLRQA 198C FVYGGLASITAECGTFPIDLTKTRLQIQMLHALMRIGREE-GLKALYSGIAPAMLRQAThis is in Phylip format and it can be copied and pasted into a word doc and saved as text only for use by the PROTPARS program. The program will execute and produce four trees of length 1070 see trees I have drawn these trees in NJplot, since the text version saved in outfile is not readable because the alignment wraps (it is too wide to show on one page width). At the top of each tree I have included the treefile output in Newick format. If you paste this into a word processor and save as text, you can open it in NJplot and it will draw the tree for you. The four trees have very few differences between them. These differences are limited to the placement of seq 56 and the placement of seq 46D. 56 is either above or below the 52C 54 cluster and 46D is either with 102 or the adjacent cluster with 103E in it. There are four trees since all possible combinations of these two differences are drawn.
Notice that the top sequence in the tree is 50B. The tree was drawn this way because of a default setting in the PROTPARS program that says use sequence #1 (50B) as the outgroup. This choice is probably not a good one. If a UPGMA tree is made from the same sequence data using Neighbor, it looks like this: UPGMA tree Note that the sequence 46D is the deepest branch and is probably the best choice for an outgroup sequence, if we assume a molecular clock which is what UPGMA does.
If we now rerun PROTPARS with outgroup 46D (choose option o in the menu and type in 10 as the sequence number for seq 46D) we get a very different tree. We still get 4 trees of 1070 steps, but now seq 46D is at the bottom and seq 50B clusters with it closest neigbors as it should. 46D outgroup trees There are two differences between the trees. Seq 56 again is either above or below the seq 52C, 54 cluster and seq 102 is either above or below the 103E cluster.
The tree that is created by the PROTPARS algorithm is dependent on the chosen outgroup. You can see that very different trees can result. You should also go back and compare the new parsimony trees to the UPGMA tree made from the same data set. There are numerous differences. The UPGMA tree is based on distances between the sequences (related to percent sequence difference based on a PAM scoring matrix). The parsimony tree is based on minimizing the number of steps to evolve from the outgroup. Because the sequences are so short in this case (only 58 amino acids) one can expect the two methods will not agree. One might hope that as longer sequences are used the two might approach a similar overall result.
Assignment 8 Making Trees
Save the sequence alignment in PHYLIP format either from CLUSTALW or from Seaview.
Compute a UPGMA tree and a neigbor joining tree from the PHYLIP format alignment. This should contain 99 sequences. (remember to rename your outfile and treefiles after each run). Download NJplot to your home, office or lab computer and open the treefiles with NJPLOT. [I really dont think it worthwhile to try the Drawgram program in Phylip] Select Helvetica font 9 point so the names will not all run together. Copy the picture from the edit command and paste in a Word document. Paste both the Neighbor joining tree and the UPGMA tree in the same file. Look for alternating human and mouse sequences in the alignment. This is what would be expected if there were a 1:1 correspondence between the two species. Where does this alternation breakdown? Note major differences between the two trees. Email me your trees and your analysis of the differences. This is not a trivial assignment with a test set of data. This is real data that has not ever been published, because we just found some of these sequences in this class. This result will be a new observation. Good luck.
We will not try this with parsimony.
The Neighbor Joining method: Saitou and Nei (1987), Mol. Biol.Evol. 4:406
PAUP: Phylogenetic Analysis Using Parsimony a sophisticated parsimony program.
PAUP
PHYLIP: PHYLogeny Inference Package
PHYLIP
MEGA: Molecular Evolutionary Genetics Analysis
MEGA
MacClade discrete-state parsimony methods (including DNA and protein parsimony)
For Macintosh computers
MacClade
Take the two files mouse mito carriers and human mito carriers from module 3 and paste them both into the clustalW server to make an alignment of all the carrier sequences from mouse and human. You may use the trimmed human carrier file instead in Module 7, and I recommend trimming the mouse carriers also and removing the sequence 114a which is short. You can trim the mouse sequences before or after you make the alignment. It might be easier afterwards. You can do the mouse trimming in the original FASTA sequence file or in SeaView. If you do it in the original FASTA file you will need to recompute the clustalw alignment with the revised file.References
The UPGMA method: Sneath, P. H. A. and Sokal, R. R. (1973) in Numerical Taxonomy, pp. 230-234, W. H. Freeman and Company, San Francisco, CA, USAPackages for phylogenetic inference
MOLPHY: A Computer Program Package for Molecular Phylogenetics
ProtML is the main program in MOLPHY for inferring evolutionary trees from PROTein (amino acid) sequences by using the Maximum Likelihood
method.
MOLPHY