To understand evolutionary relationships between DNA sequences, we would like to build a model similar to that of the PAM of protein sequences for point accepted mutations in nucleic acids (A, G, T, C). We collected some nucleic acid sequences and constructed phylogenetic relationships among them. The number of observed mutations are tabulated below:
|
|
A |
G |
T |
C |
|
A |
270 |
10 |
10 |
10 |
|
G |
10 |
270 |
10 |
10 |
|
T |
10 |
10 |
270 |
10 |
|
C |
10 |
10 |
10 |
270 |
A PAM1 matrix needs to be made from the above mutation table (denoted with A in class) for an expected model of evolution; e.g., each base can change into any other base and the overall rate of change in the sequences is 1%. Please show your steps in deriving the PAM1 matrix, and note what each meaningful quantity stands for. Note that the final matrix should be given in bits (take log of base 2). No need to round off. (10 points)
Extrapolate the PAM1 matrix to PAM10, PAM25, PAM50, PAM100, PAM125. (5 points)
A PAM1 mutation matrix represents 99% sequence conservation and one PAM of evolutionary distance (1% mutations). A PAM25 matrix represents 79% sequence conservation and 25 PAM evolutionary distance (21% mutation). Remember that the score in bits is the log-odds ratio of the alignment under the current evolutionary distance vs. alignment by chance, then the evolutionary distance between a pair of aligned sequences is given by the PAM matrix that produce the highest score for the alignment. Consider the following alignment:
ACTAA GCCAG GTCAC
CCGGA GCCTC GTGTC
Which of the following best describes the evolutionary distance between the two sequences: 10 PAM, 25 PAM, 50 PAM, 100 PAM or 125 PAM? (5 points)
Hint for R users: matrix multiplication for two matrices A and B is A %*% B