This is an ongoing attempt to build a glossary of terms that are frequently used in bioinformatics. If you would like a term explained that is not on this list, please email Rob Edwards


Index

  1. BLAST
  2. redundant DNA codes
  3. Standard genetic code
  4. FASTA format
  5. NCBI Databases


BLAST

Return to index


DNA codes
symbolbasesymbolbase
AadenosineMA C (amino)
CcytidineSG C (strong)
GguanineWA T (weak)
TthymidineBG T C
UuridineDG A T
RG A (purine)HA C T
YT C (pyrimidine)VG C A
KG T (keto)NA G C T (any)
- gap of indeterminate length

Return to index


Standard Genetic Code

This is the standard genetic code, Alternative genetic codes and modifications to this code are available from the NCBI.
Codons in bold are potential initiation codons.

Second base
TCAG3rd
base
First
base
TTTTFPheTCTSSerTATYTyrTGTCCysT
TTCFPheTCCSSerTACYTyrTGCCCysC
TTALLeuTCASSerTAA*TerTGA*TerA
TTGLLeuTCGSSerTAG*TerTGGWTrpG
CCTTLLeuCCTPProCATHHisCGTRArgT
CTCLLeuCCCPProCACHHisCGCRArgC
CTALLeuCCAPProCAAQGlnCGARArgA
CTGLLeuCCGPProCAGQGlnCGGRArgG
AATTIIleACTTThrAATNAsnAGTSSerT
ATCIIleACCTThrAACNAsnAGCSSerC
ATAIIleACATThrAAAKLysAGARArgA
ATGMMetACGTThrAAGKLysAGGRArgG
GGTTVValGCTAAlaGATDAspGGTGGlyT
GTCVValGCCAAlaGACDAspGGCGGlyC
GTAVValGCAAAlaGAAEGluGGAGGlyA
GTGVValGCGAAlaGAGEGluGGGGGlyG


FASTA format

Return to index


NCBI Databases

These are the databases available from NCBI:

  1. nr: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant" but that is what the name means. It used to be the single unified database for everything.
  2. SWISS-PROTSWISS PROT protein sequence database
  3. month: All new or revised GenBank, EMBL (European), DDBJ (Japanese), and PDB (protein database) sequences released in the last 30 days.
  4. Drosophila genome: Drosophila genome provided by Celera and Berkeley Drosophila Genome Project (BDGP).
  5. dbest: Database of GenBank, EMBL, DDBJ, and PDB sequences from EST Divisions. Expressed Sequence Tags.
  6. dbsts: Database of GenBank, EMBL, DDBJ, and PDB sequences from STS Divisions. Sequence Tagged Sites.
  7. htgs: Unfinished High Throughput Genomic Sequences: phases 0, 1 and 2 (finished, phase 3 HTG sequences are in nr)
  8. gss: Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
  9. yeast: Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences
  10. E. coli: Escherichia coli genomic nucleotide sequences
  11. pdb: Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank
  12. Patent: Nucleotide sequences derived from the Patent division of GenBank
  13. vector: Vector subset of GenBank
  14. mito: Database of mitochondrial sequences
  15. alu: Select Alu repeats from REPBASE (the repeat database), suitable for masking Alu repeats from query sequences.
  16. Kabat's database of Immunologically interesting protein sequences
  17. ESTs: ESTs from mouse, human, and other projects maintained as separate databases
  18. EPD: Eukaryotic promoter database.

Return to index