MIME-Version: 1.0 Content-Location: file:///C:/D4F74177/HMM-hw.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii"
Hidden Markov model homework, due Dec 12, 2005
(There is no<= /b> paper-reading homework this week!)
1. A hidden Markov model for a pa= irwise alignment of two DNA sequences is as follows:
At any position along the aligned sequence pairs, you = either see a match/mismatch, or a gap in sequence 1, or sequence 2.
The probability of a gap following a match/mismatch is= 20%, the probability of a gap extension is 10%. It is not possible to have a gap in seq1 followed by a gap in seq2, or vice versa. = p>
In match/mismatch, you are 80% likely to see a match, = 20% likely a mismatch. All matches have the same probability. All mismatches ha= ve the same probability. In gap mode, all nucleotides are equally likely.
You are 50% likely to start with a match/mismatch, 25% likely to start with a gap in seq1, 25% a gap in= seq2.
Please write down all the elements you need to specify= such an HMM. (this includes all hidden states, all obser= vable states, transition probability matrix, emission probabilities, etc. )
(10pts)
2. Write a piece of code to randomly generate an align= ment of 30 bases according to the above specification. (5pts)
Hint: if you are using R, the sample() function is very useful. For example, sample(c(1,2,3), 1, prob=3Dc(0.50, 0.25,0.25)) gives you 1 with prob. 0.5= 0, and 2, 3 with prob 0.25 each.