In a Hidden Markov Model with N states, each (hidden) state is associated to an emission law, the state evolve according to a markov chain, and at each time step the (visible) observation is drawn from the current state emission law.

Denoting $\left(s_{t}\right)$ the hidden-state sequence, and $\left(O_{t}\right)$ the visible-observation sequence we get the following graphical representation:


Formally, the hidden state chain $\left(s_{t}\right)$ is defined by an initial probability distribution vector $\Pi_{0}$ and a transition matrix $M$ such that: $$\Pi_{t}=M.\Pi_{t-1}$$ In our case we assume that the N observation law are gaussian vectors determined by their mean vector and covariance matrices: $$O\left(s_{t}\right)\thicksim N\left(\mu_{s_{t}},\Omega_{s_{t}}\right)$$
The graph structure shows that conditionaly on the state sequence the observations are independant. Furthermore each single observations depends solely on the state at the same time-step. Thus we have: $$P\left(O_{1},\ldots,O_{T}\mid s_{1},\ldots,s_{T}\right)=\prod_{k=1}^{T}P\left(O_{k}\mid s_{k}\right)$$ As a result, if we know the probability distribution of the state sequence we can perform the Maximization step of the EM algorithm the same way we do it for the Gaussiam Mixture Model. Fortunately the Baum-Welch backward-forward procedure allows us to compute the probability of the state at each time step.

Estimation with Baum-Welch algorithm

Forward pass (adapted from Wikipedia)
Let $\alpha_{i}(t)=P(O_{1}=o_{1},...,O_{t}=o_{t},s_{t}=i|\theta)$ i.e. the joint probability of observing $o_{1},o_{2},...,o_{t}$ and having $s_{t}=i$.
By forward recursion: $$\alpha_{i}(1)=\pi_{i} g_{i}(o_{1})$$ $$\alpha_{j}(t+1)=g_{j}(o_{t+1})\sum_{i=1}^{N}\alpha_{i}(t)m_{ij}$$ Backward pass (adapted from Wikipedia)
Let $\beta_{i}(t)=P(O_{t+1}=o_{t+1},...,O_{T}=o_{T}|s_{t}=i,\theta)$ i.e. the conditional probability of observing $o_{t+1},...,o_{T}$ given $s_{t}=i$.
By backward recursion: $$\beta_{i}(T)=1$$ $$\beta_{i}(t)=\sum_{j=1}^{N}\beta_{j}(t+1)m_{ij} g_{j}(o_{t+1})$$ Expected state dynamic (adapted from Wikipedia)
Probability of being in state $s_{t}=i$ given the observed sequence $O$ and the parameters $\theta$: $$\gamma_{i}(t)=P(s_{t}=i|O,\theta)=\frac{\alpha_{i}(t)\beta_{i}(t)}{\sum_{j=1}^{N}\alpha_{j}(t)\beta_{j}(t)}$$ Probability of being in state $s_{t}=i$ and $s_{t+1}=j$ given the observed sequence $O$ and parameters $\theta$: $$\xi_{ij}(t)=P(s_{t}=i,s_{t+1}=j|O,\theta)={\frac{\alpha_{i}(t)m_{ij}\beta_{j}(t+1) g_{j}(o_{t+1})}{\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_{i}(t)m_{ij}\beta_{j}(t+1) g_{j}(o_{t+1})}}$$

Example: HMM with 2 states and 2-d gaussian observations

1st chart shows data as well as estimated gaussian centers

2nd chart compares simulated (true) state with the estimated state probability

3rd chart shows step-by-step convergence of estimated state transitions (diagonal elements)

4th chart shows step-by-step convergence of estimated gaussian mean coordinates