Author: Adrian@Vrabie.net
Adviser: Dr. Habil. Univ Professor Dmitrii Lozovanu
Jeffrey Kuan at Harvard University claimed that Markov models might well be the most “real world” useful mathematical concept after that of a derivative.
Markovian processes are used in:
a dynamic stochastic process with a Markovian Property
.From Hamilton, Economic Transition
$$\mathbb{A} = \left( \begin{array}{ccc} 0.971 & 0.029 & 0 \\ 0.145 & 0.778 & 0.077 \\ 0 & 0.508 & 0.492 \end{array} \right) $$With the followint payoffs
$$\mathbb{E} = \left( \begin{array}{c} 0.215 \\ 0.015 \\ -0.18 \\ \end{array} \right)$$But We will do this later!
This gives us:
$$ \mathbf{\lambda} = \left(\begin{array}{c} 1.00000 \\ 0.85157 \\ 0.38943 \\ \end{array} \right) $$Then find the null space for each eigenvalue
A stochastic matrix, \(\mathbb{A}\) is irreducible if its graph is strongly connected, that is: there \(\exists \; t \geq 0\) : \(Pr(X_{t}=j|X_{0}=i) > 0 \)
In our example, $\mathbb{A}$ is irreducible since we can end up in any state from any state after \(t \geq 1\) steps.
A stochastic matrix, \(\mathbb{A}\) is aperiodic if the greatest common divisor of the set S(x) defined as \[S(x) = \{t \geq 1 : P^t(x, x) > 0\}\] equals 1.
We can easily check that matrix \(\mathbb{A}\) from our example is aperiodic.
A probability distribution \(\pi\) over \(X\) is stationary over \(\mathbb{A}\) if: \[\pi = \pi \mathbb{A}\]
[Fundamental Theorem of Markov Chains][Haggstrom] If a stochastic matrix \(\mathbb{A}\) is irreducible and aperiodic then there is a unique probability distribution \(\pi\) that is stationary on \(\mathbb{A}\).
provided by Häggström (2002).
Therefore, we can write: $$\lim_{t \to \infty} \mu \mathbb{A}^t = \pi$$
Since all but one of the eigenvalues is smaller than unity.
$$\mathbf{\phi} = \lim_{t \to \infty} \mathbf{S}\Lambda^t\mathbf{S}^{-1} = \mathbf{S} \left(\begin{array}{ccc} 1 &0 &0 \\ 0 &0 &0 \\ 0 &0 &0 \\ \end{array}\right) \mathbf{S}^{-1}$$in our example:
$$\phi = \left(\begin{array}{c} 0.812800, 0.162560, 0.024640\\ \end{array} \right)$$we have an oriented graph which can easily be transposed to a HMM model.
$$\mathbb{A} = \begin{array}{c} e \\ f \\ g \\ h \end{array} \left( \begin{array}{cccc} 0.9 & 0.1& 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0.9& 0.1 \\ 1 & 0 & 0 & 0 \end{array} \right)$$
The emission probabilities for the events \(\Sigma = \{a,b,c,d\}\), as specified in the [smc]: \[\mathbb{B} = \begin{array}{c} e \\ f \\ g \\ h \end{array} \begin{pmatrix} 0.1 & 0.9 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0.8 & 0.2 & 0 \\ 0 & 0& 0& 1 \end{pmatrix}\]
An initial probability distribution among the states $\pi$
Formally we define an HMM $\lambda(\mathbb{A,B},\pi)$
Parameters, $\mathbb{A,B},\pi$ are often denoted by $\theta$, thus $\lambda(\theta)$
Using external libraries http://ghmm.org.
$$\mathbb{A}= \left(\begin{array}{cc} 0.9 & 0.1 \\ 0.2 & 0.8 \\ \end{array}\right)$$Emissions: one fair coin and one biased
$$\mathbb{B}= \left(\begin{array}{cc} 0.5 & 0.5 \\ 0.15 & 0.85 \\ \end{array}\right)$$Outcomes: Head (0) or Tail (1).
[1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0 ...
The code is provided in the Annex
Before we estimate $\lambda$, it is helpful to answer the following questions:
Suppose we have identified a sequence \(\mathcal{O} \in \mathbf{\Omega}\) that we assume is generated by an HMM: \(\lambda(\mathbb{A},\mathbb{B},\pi)\). Given \(\lambda\), we would like to know: \[Pr(\mathcal{O}|\lambda)=?\]
Given a model $\lambda$ a sequence of observations $o_1, o_2, ...$ and states $x_1, x_2, ...$
$$Pr\left(o_1,o_2,...,o_{T} |x_1,x_2,...,x_k,...x_T, \lambda \right)= \pi_i b_i(o_1) \prod_{t=2}^{t=T} a_{t-1,t}b_{a_t}(o_t)$$We then sum over all possible states
$$Pr\left(\mathcal{O} | \lambda \right)= \sum_X \pi_i b_i(o_1) \prod_{t=2}^{t=T} a_{t-1,t}b_{a_t}(o_t) Pr(X|\lambda)$$The difficulty of this approach is that the total number of possibilities of sequences of states that can generate \(\mathcal{O}\) is exponential in the number of observations \(T\)
$$Pr\left(\mathcal{O} | \lambda \right)= \sum_X \pi_i b_i(o_1) \prod_{t=2}^{t=T} a_{t-1,t}b_{a_t}(o_t) Pr(X|\lambda)$$A better approach to calculate the unconditional probability \(Pr(\mathcal{O}|\lambda)\) is the forward/backward algorithm which is a class of dynamic programming algorithms and takes advantage of the assumptions of the Markovian processes to filter all possible combinations of states \(X\).
the objective of the forward/backward algorithm is to compute the probability of being in a particular state \(x_t\) at time \(t\), given a sequence of observations \(\mathcal{O}\)
$$Pr(x_k | \mathcal{O})= ? , k \in \{1..T \}$$$$Pr(x_k | \mathcal{O}, \mathbf{\lambda}) = \frac{Pr(x_k, \mathcal{O} | \mathbf{\lambda})}{Pr(\mathcal{O}|\mathbf{\lambda})} = $$
$$\frac{Pr(x_k,o_1,o_2,...o_k |\mathbf{\lambda} )*Pr(o_{k+1},...,o_T|x_k,o_1,...o_k,\mathbf{\lambda})}{Pr(\mathcal{O}|\mathbf{\lambda})}$$
using the Markovian property:
$$Pr(x_k | \mathcal{O}, \mathbf{\lambda}) = \frac{Pr(x_k,o_1,o_2,...o_k |\mathbf{\lambda} ) Pr(o_{k+1},...,o_T|x_k,\mathbf{\lambda})}{Pr(\mathcal{O}|\mathbf{\lambda})}$$
$$Pr(x_k=i | \mathcal{O}, \mathbf{\lambda}) = \frac{\alpha_i(k) \beta_i(k)}{Pr(\mathcal{O}|\mathbf{\lambda})} =$$
$$ \frac{\alpha_i(k) \beta_i(k)}{\sum_{i=1}^N \alpha_i(k=T)} $$
Given an observable sequence \(\mathcal{O} = \{o_1,o_2,...,o_T \}\) and assuming this sequence was generated by an HMM model \(\lambda(\mathbb{A,B},\pi)\) we want to find out the most likely set of parameters of \(\lambda\) that generated the sequence \(\mathcal{O}\). we want to find out \(\theta\) that maximizes the probability:
$$\theta^{\star} = \underset{\theta}{\operatorname{arg\, max}}Pr\left(\mathcal{O} | \lambda(\theta)\right)$$Substituting: $$\bar{a_{ij}} = \dfrac{ \dfrac{\alpha_i(t) a_{ij} b_j(o_{t+1}) \beta_j(t+1)} {\sum_{i\in S}\sum_{j\in S} \alpha_i(t) a_{ij} b_j(o_{t+1}) \beta_j(t+1) }} {\dfrac{\alpha_i(t) \beta_i(t)}{\sum_{j \in S}\alpha_j(t) \beta_j(t) }} $$
This presentation: moldovean.github.io
replicate results : github.com/moldovean/usm/tree/master/Thesis