Chromatograms

Chromatograms are the nuts and bolts of sequencing, and so we will spend a little time looking at them and how we can use them

On the last page we were in the nucleotide sequence view. Click on the Show chromatograms button to view the chromatogram for this sequence.

You will see something like this:

sequencher chromatogram view

The first thing that you will notice is all the lines: peaks represent the bases in the sequence. The red line represents every time a T went past the detector. The black line represents G's, the blue line represents C's and the green line represents A's. Each base does not form a discrete line, but rather a peak. The peak is formed by the distribution of the speed of movement of the band through the gel matrix during electrophoration.


If you look at the bottom two lines of the above chromotogram (a portion of which is shown below) you can see that the peaks are very well defined, and are distributed quite similarly along the chromosome. In the region we can be confident that the sequence is correct as there is very little overlap or background.

good chromatogram sequence

However compare that region to this region of 6 bases from the sequence:

bad chromatogram sequence

These are 6 bases from the region of base #15 to #20. Sequencher has correctly identified the first two bases as A and G, although there is potentially some C in the mix here (see the lower blue line). However the next two bases are incorrect. Sequencher has called two A's. There is one A, but then the next peak is a T (red line) and there is not another A peak. Also, the A peak and the T peak are very close to each other compared to the distance between all the other peaks. Sequencher has realized this and put the two A's very close to each other in the colored text line. The next two bases, G and C, are correctly called.

We can edit these bases manually by clicking on the top line of the sequence. We can than change the second A to a T by typing a T on the keyboard:

editing bases in sequencher


Now we will look at a section a little later in the chromotagram. This region is not shown in the picture above, but a small region is shown below. This is the standard view from sequencher, and it is not clear what the exact base order should be.

weak sequence

There is a small icon on the top left of the window that looks like this: slider

Moving the position of this slider will alter the magnification of the peaks. For example, moving the slider up and down will alter the peaks like this:

moving chromatogram

We can optimize the peak height to discern the peaks like this:

weak sequence with big peaks

When the peak heights have been increased you can clearly see that most of the bases have been called correctly. There are probably two T's because there is a very broad peak. The G is clear, and there are three A peaks and another G. After the G, the next two bases have been called as N's because sequencher was not sure what they should be. The first is probably a T (red line), but the next one could be either a C (blue line) or A (green line). The next two peaks are correctly called as A and T though there may be a C in there too but that is probably an artefact.

Sequencher has one other feature on this icon slider that helps to view secondary peaks. By clicking on the A, G, C, or T you can hide those traces. So the same view with only the T-trace showing you get an image like this:

sequence with only T trace

Notice that all the other bases are shown in italics to emphasize that those traces are hidden. You can see that there probably should be a T at the first of the two N's.

These ambiguous peaks are sometimes called secondary peaks. Sequencher has a facility for calling the bases for secondary peaks. If you pull down the sequence menu and choose "Call secondary peaks..." you will see the following dialog appear:

secondary peaks dialog

The slider allows you to vary the sensitivity of the secondary peaks, while the three checkboxes allow you to control which peaks are called. Try this with sequence one, and you should be warned that 64 bases will be changed. Now take a look at the sequence bases view (by closing the chromatogram) you will see that the sequence looks like this:

altered sequence

Notice three things:

  1. Some bases are highlighted in pink. These are the bases that have been changed by the secondary peaks call.
  2. The base that we edited by hand (base 18, and A->T) is also shown highlighted in pink.
  3. Some of the bases are no longer just A, G, C, T, or N. The ambiguous DNA codes have been introduced.
The following table (also in the glossary represents the accepted usage of ambiguous DNA codes.

symbolbasesymbolbase
AadenosineMA C (amino)
CcytidineSG C (strong)
GguanineWA T (weak)
TthymidineBG T C
UuridineDG A T
RG A (purine)HA C T
YT C (pyrimidine)VG C A
KG T (keto)NA G C T (any)
- gap of indeterminate length
You can call the secondary peaks in batch. Switch to the main Sequencher window by closing the trace and sequence windows. You should see this:

Image of first sequencer view with traces

Now higlight all the sequences and choose Call secondary peaks again. It will warn you that 723 bases will be changed (note that if you haven't called the secondary peaks in chromatogram one this number will be 787).

All of the peaks have now been called. In the next part of the tutorial we will trim vector contamination from the sequence and start assembling it.