Synopsis
In genomic data analysis and computation biology, a good understanding of the probabilistic nature and statistical modeling of biological sequences, such as nucleotide and protein sequences, is key to understanding the existing algorithms/tools and to the ability to develop new tools. This course provides statistical foundations and an in-depth overview of the core algorithms of sequence analysis. Sequence analysis algorithms will include alignment/motif-finding (pairwise local alignment, heuristic local alignment such as BLAST, optimal pairwise local alignment, i.e. Smith-Waterman, pairwise global alignment and multiple alignment), gene finding (Glimmer), protein structure prediction and phylogenetic trees. Topics covered will include background on probability (including conditional probabilities and Bayes' rule), Markov models, hidden Markov models.
|
Instructors: |
Sining Chen (sichen@jhsph.edu) and Guests |
|
|
Course time and location: |
Monday 3:30 – 4:50pm at Wolfe 4013; Thursday 4:25 – 5:30pm at Wolfe 4013 (new time & location!) |
|
|
Office hours: |
When: Monday 2:30-3:30 (before class); where: W7033A and by appointment |
|
|
Required text: |
||
|
Supplementary texts: |
Bioinformatics, by David Mount (more biology oriented); Statistical Methods in Bioinformatics, by W. Ewens and G. Grant (more quantitative). |
|
|
Prerequisites: |
basic probability; some programming |
|
|
Grades: |
Student grades will be based 60% on homework, 30% on a presentation + written critique of a paper + 10% attendance |
|
Homework assigned by every Monday will be due the next Monday.
For information on computing in R, please click here.
N = lecture Notes; R = References; P = Problems;
|
date |
N |
R |
P |
Topic |
||
|
|
|
|
|
|
|
|
|
Oct |
28 |
Overview of the course; basic molecular biology terminology. Review of useful probability concepts: random variables, conditional probability, expectation & variance. |
||||
|
Oct |
31 |
Pairwise alignment (global alignment): dot matrix; Needleman-Wunch; Smith-Waterman (local alignment) |
||||
|
|
|
|
|
|
|
|
|
Nov |
3 |
Significance of alignment scores; Development of scoring matrices |
||||
|
|
7 |
|
|
Multiple sequence alignment; |
||
|
|
|
|
|
|
|
|
|
|
10 |
|
Phylogenetic trees: UPGMA, neighbor-joining, |
|||
|
|
14 |
|
parsimony likelihood approach, comparison to other methods |
|||
|
|
|
|
|
|
||
|
|
17 |
|
|
Guest lecture: Ingo Ruczinski on protein structure prediction from amino acid sequences. |
||
|
|
21 |
|
|
|
Continued from last week: Ingo Ruczinski
on protein structure prediction from amino acid sequences. |
|
|
|
|
|
|
|
|
|
|
|
24 |
|
|
|
Thanksgiving Break |
|
|
|
28 |
|
|
Database search: BLAST . Jon: Bayesian
bootstrap in evaluating alignment matrix performance |
||
|
|
|
|
|
|
|
|
|
Dec |
1 |
|
|
|
Alex: likelihood approach for morphological phylogenetics, Deepti: selection of oligoes |
|
|
|
5 |
|
HMM. Tao: protein function prediction using phylogenomics |
|||
|
|
|
|
|
|
|
|
|
|
8 |
|
|
HMM-continued. Euiju:Selection strength |
||
|
|
12 |
|
|
HMM in gene finding. GLIMMER Matt: first generation HMM |
||
|
|
|
|
|
|
|
|
|
|
15 |
|
|
|
Gene-finding, general. No student. |
|
|
|
19 |
|
|
|
No class. Homework and paper summary due. |
|
|
|
|
|||||