GS 540 Discussion section 7 HW6: Baum-Welch • Goal: learn HMM parameters taking into account all paths: • Expectation maximization – Forward backward algorithm. – Re-estimate.
Download ReportTranscript GS 540 Discussion section 7 HW6: Baum-Welch • Goal: learn HMM parameters taking into account all paths: • Expectation maximization – Forward backward algorithm. – Re-estimate.
GS 540 Discussion section 7 HW6: Baum-Welch • Goal: learn HMM parameters taking into account all paths: • Expectation maximization – Forward backward algorithm. – Re-estimate parameter values based on expected counts. Likelihood function for Viterbi and EM • Likelihood function for Viterbi: • Likelihood function for EM: Baum-Welch 1. Use forward algorithm to find log likelihood of the sequence (ie. sum of all paths) 2. Use forward-backward to get fractional counts for each edge type – (total prob of paths passing through edge)/(total prob of all paths) 3. Re-estimate transition and emission probs by calculating the expected number of each edge type Forward-backward algorithm sequence states A C T A C T T … Store at each node: • Forward: Sum of probabilities of paths ending at position i state k. • Backward: Sum of probabilities of path starting at position i state k. Forward-backward algorithm sequence states A C T A C T T … forward(i,k) = sum_k' [ forward(i-1,k') * transition(k,k') * emission(S_i-1, k') ] Forward-backward algorithm sequence states A C T A C T T … backward(i,k) = sum_k' [ backward(i+1,k') * transition(k,k') * emission(S_i+1, k') ] Forward-backward algorithm sequence states A C T A C T T … Total probability of paths passing through position i state k: forward(i, k) * backward(i, k) * emission(S_i, k) Use this to update emission(X_i, k) Forward-backward algorithm sequence states A C T A C T T … Total probability of paths passing through position i-1 state k' to position i state k: forward(i-1, k') * backward(i, k) * emission(S_i-1, k') * emission(S_i, k) * transition(k',k) Use this to update transition(k', k) Numeric stability in Baum-Welch • Baum-Welch involves addition of probabilities, so using log space is not trivial. • Two strategies for numeric stability: – Multiply probabilities by a constant factor at each position (not recommended). – Use logs, and implement log(A+B) carefully (recommended). Addition in log space • Not stable: • More stable: Addition in log space • More stable: Be careful with log(0) • Use a special value for log(0) • log(LOGZERO + p) = log(p) • log(LOGZERO * p) = LOGZERO HW6 Tips • Carry out algorithm by hand for a few steps. • Compute likelihood at each Baum-Welch iteration. If it goes down, you have a bug. HW7 • HMM to predict genes – 11 states – Each emits trinucleotides (codons) • Viterbi parse • 5 iterations of training UCSC Genome Browser