Statistics for Gene Expression

The notes have been updated, and equations can now be read in the black and white version.


Class outline (subject to change):
Title Slides Slides in Handout Format Reading
Introduction to molecular biology PDF PDF 1) The WWW Virtual Library of Cell Biology
2) Nature Genetics Special Issue: the Chipping Forecast
Introduction to DNA Array Technology PDF PDF 1) A Concise Guide to cDNA Microarray Analysis,
2) Expression profiling using cDNA microarrays
Pre-processing cDNA Array Data PDF PDF 1) Normalization for cDNA Microarray Data
Pre-processing Affymetrix GeneChip Data PDF PDF 1) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection.
2) Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data
Statistical considerations PDF PDF 1) H Wainer (1984) How to display data badly. American Statistician 38(2):137-147
2) Multiple hypothesis testing in microarray experiments
Differential Expression in Two Populations PDF PDF 1) Significance Analysis of Microarrays Tusher, Tibshirani and Chu (2001) PNAS
Classification PDF PDF 1) S. Dudoit, J. Fridlyand, and T. P. Speed (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, Vol. 97, No. 457, p. 77-87.
2) Tibshirani, Hastie, Narashiman and Chu (2002):
"Diagnosis of multiple cancer types by shrunken centroids of gene expression" . PNAS 2002 99:6567-6572.
Clustering PDF PDF
Clustering: extra notes PDF
Dimension Reduction PDF PDF
Annotation PDF

Project: The class project is described in full detail here [pdf file].


Data-sets and R-scripts, should you wish to try your hand:

The Golub data is a very famous traning/test microarray dataset. The training data includes 38 samples representing two types of Leukemia, AML and ALL. The test data includes 33 additional samples. Affymetrix prepared a large "spike-in" experiment. Several transcripts, both human and bacterial, were added in specific amounts to background human rna that did not contain any of these transcripts. The amounts vary from chip to chip according to a latin square design described in the pheno data. Only spiked in transcripts are differentially expressed, the samples are otherwise identical. Three replicates of each of 3 spikein levels are included in the data here. This the the data for the zebrafish experiment included as an example in the Maffy package available from Bioconductor.
References There is a great list of microarray references here Some of the key references from the list are collected here
Resources
Class General Info
Acknowledgements Thanks to Elizabeth Garrett, Giovanni Parmigiani and Rafael Irizarry from whom I shamelessly borrowed most course content, including the design of this web page.