Module 3.

This module will continue study and use of the BLAST search. Now that you have used BLAST to find some identical sequence matches and extended the sequences into new territory or sequence gaps, it is time to look at the process behind the technique. Rob Edwards will go over the fundamentals of the scoring matrices and the search algorithm. See the Blast Searching I link below. This is covered in some detail in your text book from page 172-190. After that we will visit two more sites at NCBI that we did not get a chance to see last time: OMIM and Unigene. Then we will return to the mitochondrial carriers and learn how to find orthologs of human sequences that are not yet found in the mouse. The set of human mitochondrial carriers has 49 sequences. The mouse set has only 28, so there are about 20 missing. Files of the complete mouse and human sequences are linked below. This will form the basis for assignment 2.

Unigene is a collection of ESTs (Expressed Sequence Tags) that have been assembled into contigs. A contig is any overlapping DNA sequence (or protein sequence) that is part of the same CONTIGUOUS sequence. A gene that is highly expressed will probably have a large number of ESTs and so the UNIGENE cluster for that gene should be a very large set. Other ESTs are solo sequences with only 1 copy ever found. These are called singletons. Most researchers feel that singletons are probably errors and they do not really represent genes. This is more likely to be true in human where there are about 4 million ESTs. Singletons would be expected in smaller data sets.

Click on the Unigene link below and go to the Unigene page. Once there select human on the left frame. At the human page click on chromosome 14. This lists the Unigene entries for chromosome 14. Note about 12 lines down that heterogeneous nuclear ribonucleoprotein C (C1/C2) has1882 sequences. A little farther down there is one called EST with only one sequence. Click on the first entry. Notice that the enry has a Unigene ID that starts with Hs. Follwed by a number. The Hs. Is for Homo sapiens. Mouse Unigene entries begin with Mm. For Mus musculus, and so on. The Unigene page has links to other resources with other types of data. See at the top Locus Link, OMIM and Homologene. We will visit OMIM later today and the others on another day. Below these links are a list of the best matches of the sequence to several model organisms, so you can find the ortholog of your sequence in another species. There is mapping information in humans and mRNAs and genes in Genbank are listed for this gene. Finally, at the bottom the individual ESTs are listed. These can be used to construct a whole mRNA sequence even if there are no mRNAs listed above. This may be more important in species like zebrafish where the annotation has not been so completely done.

LINKS FOR THIS MODULE

Blast Searching I

Revised carrier alignment with all gaps for mouse and human filled in

Local carrier/P450 BLAST server

Assignment 2

Answers to Assignment 1

Answers to Assignment 2

Mouse Mito Carriers

Human Mito Carriers

Bioinformatics links

OMIM

Unigene

Quick Protein Translator

Protein Machine