Computer Based Training for Bioinformatics
The definition of Bioinformatics is somewhat variable. However, by looking at different course lecture content as posted on the web and by examining textbooks, it is possible to identify what most courses include. A list of 18 components that will be addressed is given below. Depending on the audience, more or less emphasis can be placed on the actual algorithms.
- Introduction to sequence databases with emphasis on NCBI
This module is done. It includes statistics about the contents of the different sections of Genbank, a brief introduction to the blast page including discussion of nucleotide blasts, protein blasts and translated blasts, with an example search. Developed by David Nelson UT Dept. of Molecular Sciences.
Basic Blast searching of the databases with some explanation of the significance of the output scores, what is in each section of the databases: nr, EST, GSS, STS, HTGS and why one would search each section.
This module is done. It covers beginning blast skills with examples that lead to more sophisiticated searching. Details of the ouput are discussed and strategies for successful gene hunting are described. Real data, including a 265 sequence alignment, are used. Developed by David Nelson UT Dept. of Molecular Sciences.
Fundamentals of assembling protein sequences from genomic DNA, how to find intron-exon boundaries by comparison to mRNA sequences from the same gene or related genes.
Detection of alternative splicing. The benefits of cross species comparison. Hunting for helpful sequences in GSS (bovine and Tetraodon nigroviridis fragments)
Sequence alignment, scoring matrices, multiple sequence alignment, Clustal W, Pile Up
This module is under construction. It presents an introduction to Clustal W, including the input and output formats, parameters set by the user and how the alignment is generated from the individual sequences. Examples will be included. Developed by Rob Edwards UT Dept. of Molecular Sciences.
Phylogenetic tree construction. Comparison of the major methods and their advantages disadvantages. Distance methods UPGMA, Neighbor Joining, Maximum Parsimony, Maximum Likihood.
Survey of protein motif databases and protein family classification databases. Prosite, Pfam, Blocks, Prints, Interpro.
Specialized databases and what they contain and how they can be used. UNIGENE, dbSNP, human mutation database
Secondary structure prediction. Discussion of success rates of different methods
Search techniques that use comparison of secondary structure predictions rather than sequence.
How to build profiles of protein families or motifs from sequence alignments
Use of Hidden Markov Models (HMMs) and use of related software packages HMMer and HMMPro
Comparative genomics, clusters of orthologous groups, comparison of genomes at the level of gene order, synteny, application to the mammalian radiation.
Annotation geared databases for model organisms ACEDB sofware (A C. Elegans Database and its many offshoots), FLYBASE (Drosophila), TAIR (The Arabidopsis Information Resource), YPD (Saccharomyces cerevisiae) and the Proteome Inc. set of databases on C. elegans and Zebrafish.
Gene Mapping, radiation hybrid mapping, mapping disease genes in humans, mapping genes in the mouse.
Genome scale sequencing, strategies and assembly of data into a contiguous sequence
Shotgun vs. mapped BAC clone approaches. Contigs, Bactigs and Scaffolds. Sequencher
Microbial Genomics, taking a look at what is in microbial genomes with emphasis on the ERGO system from Integrated Genomics Inc. Tutorials are presented on the following items. Tutorials developed by Rob Edwards UT Dept. of Molecular Sciences.
- How to find the annotation for an ORF
- Finding comparisons among closely related organisms
- Comparing an ORF with similar sequences
- Adding or changing ORF annotations
- Finding the DNA sequence adjacent to an rRNA
- Search for a sequence and view the annotation around that sequence
Microarrays and DNA chip technology
Three dimensional structural databases, comparing 3D structures, SCOP database
Threading sequences to known structures (a more advanced topic, possibly more appropriate in a structural biology class)
The CBT (Computer Based Training) component of the proposal will address each of these 18 subjects with summaries of methods and procedures, including tutorials with links to web based tools for use on real data sets. Details of some of the algorithms will be included. Dr. Nelson will draw on his extensive collection of sequence data in cytochrome P450 and mitochondrial carriers to prepare some of these examples. The goal will be to give the participants a sense of experience from practised users, with tips and hints that the beginner simply will not know. Some of these modules are already developed for the Bioinformatics course at UTHSC Memphis. Other segments are available from other courses taught at Knoxville/ORNL and the University of Memphis. We will be combining and selecting the best material from these existing sources and making new materials as needed.
Once these units are in place, they will be posted on the Bioinformatics servers at the participating institutions. These CBT materials will be open to the world without password protection, so they are freely accessible.