Configuration
Live Metrics
Iteration
—
Log-Likelihood
—
Δ (Change)
—
Status
—
Log-Likelihood Convergence — log P(O|λ)
Observation Probability — P(O|λ)
Optimization Loss — Negative Log-Likelihood
Probability Complement — 1 − P(O|λ)1/T vs Model Version
HMM State Transition Diagram
Transition Matrix A
Emission Matrix B
Initial Distribution π
Parameter Evolution — A[i][j] over iterations
Iteration Log
Intermediate Variables (Current Iteration)
Showing first 20 time steps only.
Final Learned Parameters
HMM Theory & Algorithm Reference
1. Hidden Markov Model — Formal Definition
A Hidden Markov Model (HMM) is a probabilistic generative model for sequential data. The system is assumed to follow a Markov chain of hidden (unobservable) states, each of which emits an observable symbol according to a state-dependent probability distribution.
An HMM is fully specified by the parameter tuple λ = (A, B, π), where:
| Symbol | Name | Definition |
|---|---|---|
| A | State Transition Matrix | A[i][j] = P(qt+1 = Sj | qt = Si). Each row sums to 1. |
| B | Emission (Observation) Matrix | B[j][k] = P(Ot = vk | qt = Sj). Each row sums to 1. |
| π | Initial State Distribution | π[i] = P(q1 = Si). Elements sum to 1. |
The model assumes the Markov property: the probability of transitioning to state Sj depends only on the current state Si, not on the history of states. It also assumes output independence: the observation at time t depends only on the current state at time t.
2. Three Fundamental Problems of HMMs
Problem 1 — Evaluation
Given λ and an observation sequence O, compute P(O|λ).
Solved by the Forward Algorithm.
Problem 2 — Decoding
Given λ and O, find the most likely hidden state sequence Q*.
Solved by the Viterbi Algorithm.
Problem 3 — Learning
Given O, find λ* = argmaxλ P(O|λ).
Solved by the Baum–Welch Algorithm (this tool).
3. Forward Algorithm
The forward variable αt(i) represents the probability of observing the partial sequence O1, O2, …, Ot and being in state Si at time t.
Initialization:
α1(i) = πi · B[i][O1]
Induction (t = 2, …, T):
αt(j) = [ Σi=1N αt−1(i) · A[i][j] ] · B[j][Ot]
Termination:
P(O|λ) = Σi=1N αT(i)
Complexity: O(N²T) vs brute-force O(NT)
4. Backward Algorithm
The backward variable βt(i) represents the probability of the partial observation sequence from t+1 to T, given state Si at time t.
Initialization:
βT(i) = 1 ∀ i
Induction (t = T−1, …, 1):
βt(i) = Σj=1N A[i][j] · B[j][Ot+1] · βt+1(j)
Both α and β are computed in the E-Step of Baum–Welch.
5. Baum–Welch Algorithm (EM for HMMs)
The Baum–Welch algorithm is a special case of the Expectation–Maximization (EM) framework applied to HMMs. It iteratively alternates between computing the expected sufficient statistics (E-Step) and re-estimating the model parameters (M-Step).
E-Step: Compute Responsibilities
- ▶ Compute αt(i) via the forward pass
- ▶ Compute βt(i) via the backward pass
- ▶ γt(i) = P(qt=Si | O, λ) = αt(i) · βt(i) / P(O|λ)
- ▶ ξt(i,j) = P(qt=Si, qt+1=Sj | O, λ)
M-Step: Re-estimate Parameters
6. Convergence Guarantees
The EM framework guarantees that the log-likelihood is monotonically non-decreasing across iterations:
Convergence is to a local maximum (or saddle point) of the likelihood surface. The algorithm terminates when |Δ log L| < ε (the user-specified tolerance) or when the maximum iteration count is reached.
In practice, multiple random restarts are recommended to mitigate sensitivity to initialization.
7. Numerical Stability — Scaling
For long observation sequences, the forward and backward variables can underflow to zero. This implementation uses log-space scaling (Rabiner, 1989):
- At each time step t, compute a scaling coefficient ct
- ct = 1 / Σi α̃t(i), applied to normalize α
- The same coefficients are used to scale β
- Log-likelihood recovered as log P(O|λ) = −Σt log ct
This avoids floating-point underflow while preserving exact computation of γ and ξ.
8. Applications
Bioinformatics
Gene finding, protein family classification & sequence alignment.
Speech Recognition
Acoustic modeling, phoneme recognition & large vocabulary models.
Quantitative Finance
Regime switching detection, volatility modeling & market analysis.
Pattern Recognition
Climate pattern analysis, POS tagging & handwriting recognition.
References
- L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
- A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” JRSS-B, vol. 39, no. 1, pp. 1–38, 1977.
- C. M. Bishop, Pattern Recognition and Machine Learning, Chapter 13: Sequential Data. Springer, 2006.