17 Missing pieces of mouse and human mitochondrial carriers

To be used with the carrier sequence alignment for reference to adjacent sequences.  
[Note that this is a newer alignment than the one used in the Blast Searching II example, 
so use this link].

The goal is to take a missing fragment and find the missing parts.
These are real data and the missing sequences may not all be in the databases yet.
The file was last modified Feb 22, 2001 so 11 months have passed since I have
Tried to find these sequences.  There are probably many that are now in the databases.

Since these are mostly mouse, the RIKEN full length cDNA project may have filled in some 
of them.  These sequences are in the mouse section of nr and in the mouse EST section of 
the database and I would search there first.

Navigating the sequence alignment.  To find your sequence gap go to the sequence 
alignment page and search for your sequence number (for 2 digit numbers like 52 precede 
and follow the number by a space, so you wont find numbers like 152 or 252).
This should take you to the first page of the alignment.  To get to the next page (there are 
six pages to the alignment) do a find next.  Scroll this way through the alignment until you 
find your missing sequence gap.  Alternatively, you can manually scroll through the pages.

Use the procedure given in the Blast Searching II link.  Search mouse ests or nr limited to 
mouse (Mus musculus) by the pull down menu.  You do not need to limit to species when 
using the mouse est option.  

If the nr and est sections do not give a match, try the HTGS section limited to mouse.  If 
that still does not give a good match try the GSS section limited to mouse.

Helpful hints: If your sequence has a human ortholog that is almost a 100% match, use this 
sequence to extend your fragment in the search.  (see 94 human and 94B mouse) Add the 
human sequence to the mouse to help do the search.  

In the output look at the frame given above the alignment.  This will tell you what strand the 
coding sequence is on.  This is helpful if you need to translate the DNA sequence so you 
only need to do one strand.  

Translators:  There are two web based translators you can use.  
The MBS Quick protein Translator
a good option but it ignores Ns and other non GATC bases and that results in frameshifts.
If your sequence has Ns it may be better to try the Protein Machine.  
The Protein machine does not do all three or all six frames at once, but it does not skip Ns.

When you have found an extension that fills a gap send the following information to me 
(and Rob Edwards) via email.

The accession numbers of the new sequence that fill part of the missing sequence.
The sequence alignment with the sequence above and the sequence below your gap
both before and after you added the sequence to the alignment.  You will need to use 
courier font probably 9 point size to make it fit.

Some of the best matches are not adjacent, so include the best matching 
sequence,  probably human in this alignment block. 

When submitting the alignment you can use Word and color the new sequence red 
instead of a before and after alignment, but then you will have to add the Word 
doc as an attachment to your email.

If you try a big gap, you do not have to completely fill it.  A reasonable extension into the 
gap is OK (50-60 amino acids).  If you fill a gap on the first try, do another.  Some will be 
easy and others may be impossible.  Just pick one at random.  See the list below to select
a gap to fill.

This assignment is due by email Wednesday Jan. 23 so we can look at the results before the next class.  


example

accession numbers AJ234567, AV982367

before 
 28 SFLKDFLAGGIAAAVSKTAVAPIERVKLLLQVQH---ASKQISAEKQY-KGIIDCVVRIPKE----QGFL   68
 29 -------------------------------------------------KGIIDCVVRIPKE----QGFL   68
 30 SFAKDFLAGGIAAAISKTAVAPIERVKLLLQVQH---ASKQIAADKQY-KGIVDCIVRIPKE----QGVL   68

after
 28 SFLKDFLAGGIAAAVSKTAVAPIERVKLLLQVQH---ASKQISAEKQY-KGIIDCVVRIPKE----QGFL   68
 29 SFLKDFLAGGVAAAISKTAVAPIERVKLLLQVQH---ASKQISAEKQY-KGIIDCVVRIPKE----QGFL   68
 30 SFAKDFLAGGIAAAISKTAVAPIERVKLLLQVQH---ASKQIAADKQY-KGIVDCIVRIPKE----QGVL   68



Human carriers are only missing 2 small segments

1) 52A is missing the extreme N-terminal that runs off the end of the alignment.

2) 78B is missing an internal fragment


The mouse carriers are missing more pieces

3,4) 52C is missing both N and C-terminals

5) 54 is missing an internal fragment

6) 56A is missing most of the sequence

7) 90 is missing the N-term that runs of the end

8) 94B is missing the first 2/3 of the sequence

9,10) 112 is missing the N-term and the C-term

11) 114 is missing the middle

12,13) 124 is missing the first half and the C-term

14) 133 has the N-term but is missing the rest

15) 135 is missing the N-term

16) 138 is missing the N-term

17) 166 is missing the N-term