## Some Algorithms for Large Hidden Markov Models

**Sanaa Chafik**^{1,}, **Daoui Cherki**^{1}

^{1}University Sultan Moulay Slimane, Laboratory of modelisation and calcul, Béni Mellal

### Abstract

The Hidden Markov Model (HMM) has become increasingly popular in the last several years because it is used in a wide range of applications. There are some inherent limitations of this type of statistical model. The major limitation of HMM is large hidden state space, which limits their practical purview. The objective of this work is to reduce the task of solving some classical algorithms (Forward, Backward, Baum-Welch) by review of their theoretical aspects, offering faster improved algorithms based on the decomposition technique which represent a general approach to solving a problem by breaking it up into smaller ones and solving each of the smaller ones separately.

### At a glance: Figures

**Keywords:** Hidden Markov Model, Forward, Backward, Baum Welch, large hidden state space, divide and conquer, decomposition, communicating class, graph theory

*World Journal Control Science and Engineering*, 2013 1 (1),
pp 9-14.

DOI: 10.12691/wjcse-1-1-2

Received June 30, 2013; Revised August 25, 2013; Accepted August 27, 2013

**Copyright**© 2014 Science and Education Publishing. All Rights Reserved.

### Cite this article:

- Chafik, Sanaa, and Daoui Cherki. "Some Algorithms for Large Hidden Markov Models."
*World Journal Control Science and Engineering*1.1 (2013): 9-14.

- Chafik, S. , & Cherki, D. (2013). Some Algorithms for Large Hidden Markov Models.
*World Journal Control Science and Engineering*,*1*(1), 9-14.

- Chafik, Sanaa, and Daoui Cherki. "Some Algorithms for Large Hidden Markov Models."
*World Journal Control Science and Engineering*1, no. 1 (2013): 9-14.

Import into BibTeX | Import into EndNote | Import into RefMan | Import into RefWorks |

### 1. Introduction

A Hidden Markov Model (HMM) is a doubly stochastic process, one of whose components is an unobservable Markov chain, the HMM have been successfully used for pattern recognition, speech recognition ^{[IRCAM, PARIS, 2003.">1, Technometrics, Vol. 33, No. 3. pp. 251-272. Aug 1991.">2]}, Handwriting recognition ^{[Colloque International Francophone sur l'Ecrit et le Document (CIFED'00), Lyon, France, 2000.">3, Sixth International Conference on Document Analysis and Recognition (ICDAR 2001). pp.491-495. 2001.">4, 5]}, computational biology ^{[6]}, Machine translation ^{[7]}. During the use of HMMs we are led to treat three fundamental problems Evaluation, Decoding and Learning ^{[8]}.

The HMMs fall most often on a large dimension state space that makes interesting use of the algorithmic technique of divide and conquer their principle based on dividing a large problem into several similar problems which avoids the curse of dimensionality. In this direction, we will propose a decomposition method and improved algorithms to solve large HMMs.

This paper is organized as follows: Section 1 present a general introduction to Hidden Markov Models, their fundamental problems, and some classical algorithms algorithms (Forward, Backward, Viterbi, Baum-Welch). Section 2 details the curse of dimensionality problem and gives the principle of our proposed solution. Section 3 presents a new version of some HMM algorithms. Section 4 summarizes the complexity of the different algorithms.

### 2. Hidden Markov Model

**2.1. Definition**

The elements of a HMM are ^{[9, World of Computer Science and Information Technology Journal (WCSIT), ISSN: 2221-0741, Vol. 2, No. 6, 203-208. 2012.">10]}: The Model formed by N states; separate observations according to M symbols; the matrix of transition probabilities is denoted by where and , that specifies the probability of transitioning from state to state ; the observation probability matrix or emission probability is denoted by where that represents the probability of symbols emission at the instant t by the state; the initial state distribution is denoted by where , that specifies the probability of being in state at time zero.

Given appropriate values of N, M, A, B and, the HMM can be used as generator to give an observation sequence, where T is the number of observations in the sequence.

We denote by λ the model parameters, where λ = (A, B, ).

**2.2. Fundamental Problems of HMM**

There are three basic HMM problems ^{[8]} of interest that must be solved for the model to be useful in real-world application. These problems are the following:

Evaluation: Given the observation and an HMM model, how do we compute the probability of O given the model ? Several techniques are developed in this direction: direct evaluation method, "Forward-Backward" procedure and Viterbi Algorithm.

Decoding: Given the observation and an HMM model, how do we find the state sequences that best explain the observation? There are several solutions to this problem: The local criterion, the global criterion and the Viterbi algorithm.

Learning: How do we adjust the model parameters, to maximize? The task is usually to derive the maximum likelihood estimate of the parameters of the HMM given the set of output sequences. No tractable algorithm is known for solving this problem exactly, but a local maximum likelihood can be derived efficiently using the Baum-Welch algorithm or the Baldi-Chauvin algorithm ^{[11]}. The Baum-Welch algorithm is a special case of the expectation-maximisation algorithm.

**2.3. Classical Algorithms**

Let the sequence of observation wehre T represent the number of possible observations.

**2.3.1. Forward-Backward Algorithm**

To calculate the probability to engender the sequence of observation O given the modelillustrated in equation (1), we make recourse to two procedures (Forward and Backward).

(1) |

For each pair (state i, time t) we associate the Forward variable (2) which represents the probability of the partial observation sequence (until time t) and state at time t, given the model

(2) |

**Algorithm 2.1.**

**Step1****:*** *Initialization, for each

**Step 2****:** Induction, for

(3) |

**Step 3****:** Termination

(4) |

Analogous to the Forward variable, just in the other direction, we define the Backward variable given in (5) which represents the partial probability to observe all future events knowing that we are in state , at time t and given the model .

(5) |

**Algorithm 2.2.**

**Step 1****:** Initialization, for

(6) |

**Step 2****:** Induction, for

(7) |

**2.3.2. Viterbi Algorithm**

The Viterbi algorithm is a for finding the most sequence of hidden states.

**Algorithm 2.3.**

**Step 1****:** Initialization, for t = 1 and

(8) |

**Step 2****:** Recursion, for

(9) |

**Step 3****:** Termination

(10) |

**Step 4****:** Optimal state sequence backtracking

(11) |

Where

**2.3.3. Baum-Welch Algorithm**

This is re-estimation algorithms that iteratively try to estimate the model parameters by maximizing the likelihood.

Baum-Welch algorithm ^{[12]} is summarized in three steps: first forward passes and backward pass followed by a smoothing pass.

To reestimate the values of the transition matrix A, the observation matrix B and the probability initialization we need to define two more auxiliary variables given in (12) and given in (13), in addition to the Forward and Backward variables defined in a previous section. The variable represents the probability of being in state at time t, given the observation sequence and the model, and the variable represents the probability of being in state at time t and in state at t+1.

(12) |

(13) |

The variables and can be expressed as:

(14) |

(15) |

Using the above formulas we can give a method for reestimation of the parameters of an HMM:

(16) |

(17) |

(18) |

### 3. Problematic and Solution

**3.1. The Curse of Dimensionality**

The statistical learning algorithms such as those dedicated to hidden Markov chains they are suffering from the exponentially increase of the cost when the volume of data grows, which is known as the curse of dimensionality ^{[13]}. In this direction we thought using the “Divide and Conquer" algorithmic technique to alleviating the curse of dimension Divide and Conquer.

It is a Latin proverb "to destroy a group of enemies, split into subsets that are crushed one by one, the fusion of the results while not posing much difficulty".

The term Divide and Conquer algorithmic technique ^{[14, 15]} yields elegant, simple and very efficient algorithms, their principle is based on dividing a large problem into several similar problems which avoids the curse of dimensionality. It is preceded in three steps:

**Step 1****:** Partitioning the object of size N in P sub-objects.

**Step 2****:** Solving the problem for each sub-objects.

**Step 3****:** Merging the results obtained to get a solution of the initial problem.

**3.2. Principe of Decomposition**

Decomposition technique ^{[9]} consists of the following steps. First, the algorithm of decomposition to levels is applied, thereafter the restricted HMMs are constructed, eventually, we combine all the partial solutions in order to construct the global solution of the HMM.

In this section, we consider HMM, Let G=(S, U) be the graph associated with the HMM, that is, the state space represents the set of nodes and the set of directed arcs.

**3.2.1. Decomposition into Levels**

The state space can be partitioned into strongly connected classes . Note that the strongly connected classes are defined to be the classes with respect to the relation on G defined by: is strongly connected to if and only if or there exist a directed path from to and directed path from to . There are many good algorithms in graph theory for the computations of such partition, e.g., see ^{[16]}. Now we construct by induction the levels of the graph G. The level is formed by all classes such that is closed, that is, any arc emanating from has both nodes in. The path level is formed by all classes such that the end of any arc emanating from is in some level.

**Remark 3.1.** Let be strongly connected class in the level then is closed with respect to the restricted HMM to the state space.

It is clear that, from Remark 3.1, the following algorithm finds the levels.

**Algorithm 3.1.**

**Example 3.1****.**

**Fig**

**ure**

**1.**Construction of the levels

**3.2.2. Construction of the Restricted HMM**

Let (), be the strongly connected class corresponding to the nodes in level p, where K(p) represent the maximum of the classes included in the level.

**Remark 3.2.** The construction of restricted HMM for each level depends on the type of HMM problems; we will details this point in the sequel.

**3.2.2.1. Restricted HMM for Decoding Problem Using Forward Algorithm**

**Construction of the restricted HMM in level**** the ****:** For each, we denote by the restricted HMM corresponding to the class that is the restricted HMM in which state space restricted is ; the same M symbols; the matrix of transition probabilities and the matrix of observation probability restricted to

**Construction of the restricted HMM in level****,****:** For each, we denote by the restricted HMM corresponding to the class .

Let be the set of predecessors for each state . The restricted defined by:

State space: .

Matrix of transition probabilities for each

,; where .

The same symbols .

The probability distribution of the initial state ; for ach where .

Matrix of observation probability ; for each where.

**3.2.2.2. Restricted HMM for Decoding Problem Using Backward Algorithm**

**Construction of the restricted HMM in level****:** For each , we denote by the restricted HMM corresponding to the class. Let be the set of successors for each state.The restricted defined by:

State space: .

Matrix of transition probabilities: for each , where.

The same symbols:

Matrix of observation probability ; for each where.

### 4. Improved Algorithms

**4.1. Improved Forward Algorithm**

We denote by the Forward variable in state.

**Lemma 4.1.** Let, the Forward variable for state j at time t+1 is defined by:

(19) |

**Proof.** From equation (3) to calculate the value we need only the states i such as, these states belongs to the original set states of the classor .

**Remark 4.1.** To calculate the Forward variable we need the Forward variable for each, therefore, we always need some values that have been already calculated in the upper levels.

**Algorithm**** ****4.1.**

**Step1****:** Initialization

For ; let

(20) |

**Step**** 2****:** Iteration

For, ; let

(21) |

**Step 3****:** Termination

(22) |

**4.2. Improved Backward Algorithm**

We denote by ; and the Backward variable in state.

**Lemma 4.2.** Let, The Backward variable for the state i at time t is defined by:

(23) |

**Proof.** From equation (7) to calculate the value we need only the states j such as , these states belongs to the original set states of the class or.

**Remark 4.2.** To calculate the Backward variable we need the Backward variable for each , Therefore, we always need some values that have been already calculated in the lower levels.

**Algorithm**** ****4.2.**

**Step1****:** Initialization

For ; let

(24) |

**Step 2****:** Iteration

, ; let

(25) |

**4.3. Improved Baum Welch Algorithm**

To describe the Baum-Welch algorithm, we need to define two more auxiliary variables. These variables that are defined in a previous section can be expressed in terms of the forward and backward variables, so it is clear that the improvements made on Forward and Backward algorithms remain useful for the algorithm Baum Welch.

### 5. Complexity

The classical Forward algorithm generate N (N +1) (T-1) + N multiplications and N (N-1) (T-1) additions, it takes on the order of computations. The same for Backward algorithm takes on the order of computations, which represents the quadratic complexity as a function of the number of hidden state.

In general the complexity of the algorithm decomposed into sub-problems is equal to:

(26) |

Where:

: The complexity of algorithm.

: The time required to fuse the solutions.

: The time required to partition the object of size N in sub-problems.

N: The size of the problem.

: The execution time for each sub-problem of size

E: The number of the sub-problems

Using our method the fusion time is negligible and the decomposition time is proportional to the size N of the problem, then on the order N. There are several techniques to solve the equation (26): Master Theorem ^{[17]}, Akra Bazzi theorem ^{[18]}.

The Akra Bazzi theorem is best adapted to our case, because he is the most general.

**Theorem**** ****5****.1. **

(27) |

Let p be the unique real number for which

(28) |

: represents the number of sub-problem that have the size where . Then:

(29) |

To calculate the complexity given in equation (27), we must find the value of p that satisfies the relation in equation (28).

In general, the calculates of the Forward variable or the variable Backwardis done at the original states of each class, then the relation illustrated in equation (27) does ensure that if and only if p = 1.

Which represents a quasi-linear complexity as a function of the number of state.

### 6. Conclusion

In this paper we have attempted to present the theory of hidden Markov models, it has been our purpose to focus on the curse of dimensionality problem; hence we have used the algorithmic technique of divide and conquer in order to propose some faster improved algorithms to solve the fundamental problems of large HMMs.

### References

[1] | Pellegrini T., Duée R., "Suivi de Voix Parlée grace aux Modèles de Markov Cachés," IRCAM, PARIS, 2003. | ||

In article | |||

[2] | Juang B. H.; Rabiner L. R., "Hidden Markov Models for Speech Recognition," Technometrics, Vol. 33, No. 3. pp. 251-272. Aug 1991. | ||

In article | CrossRef | ||

[3] | Ben Amara N., Belaïd A., Ellouze N., "Utilisation des modèles markoviens en reconnaissance de l'écriture arabe état de l'art," Colloque International Francophone sur l'Ecrit et le Document (CIFED'00), Lyon, France, 2000. | ||

In article | |||

[4] | Nakai M., Akira N., Shimodaira H., Sagayama S., "Substroke Approach to HMM-based On-line Kanji Handwriting Recognition," Sixth International Conference on Document Analysis and Recognition (ICDAR 2001). pp.491-495. 2001. | ||

In article | CrossRef | ||

[5] | Ramy Al-Hajj M., Mokbel C., Likforman-Sulem L., "Reconnaissance de l’écriture arabe cursive: combinaison de classifieurs MMCs à fenêtres orientées," Université de Balamand, Faculté de Génie, pp 1-6. 2006. | ||

In article | |||

[6] | Krogh A. ,Mian I. S., Haussler D., "A Hidden Markov model that finds genes in E.coli DNA," Nucleic Acids Research, Vol. 22, No. 22. 1994. | ||

In article | CrossRef PubMed | ||

[7] | Morwal S., Jahan N., Chopra D., "Named Entity Recognition using Hidden Markov Model (HMM)," International Journal on Natural Language Computing (IJNLC) Vol. 1, No.4. 2012. | ||

In article | CrossRef | ||

[8] | RABINER R., "A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceding of the IEEE, Vol. 77, No. 2. 2010. | ||

In article | |||

[9] | Daoui C., "Decomposition des problèmes de decision markoviens," Thesis of doctorat in Faculty of science Rabat , pp 64-69. 2002. | ||

In article | |||

[10] | Jaafer S. A. , Mohamed I. O., Elhafiz M. M., "Text-Independent Speaker Identification Using Hidden Markov Model," World of Computer Science and Information Technology Journal (WCSIT), ISSN: 2221-0741, Vol. 2, No. 6, 203-208. 2012. | ||

In article | |||

[11] | Baldi P., Chauvin Y., "Smooth On-Line Learning Algorithms for Hidden Markov Models," Neural Computation 6, 307-318. 1994. | ||

In article | CrossRef | ||

[12] | Khreich W., Granger E, Miri A., Sabourin R., "On the memory complexity of the forward–backward algorithm," Pattern Recognition Letters 31. 2010. | ||

In article | |||

[13] | Rust J., "Using Randomization to Break the Curse of Dimensionality,” Econometrica, Vol. 65, No. 3, pp. 487-516. May 1997. | ||

In article | CrossRef | ||

[14] | Canet L., "Algorithmique, graphes et programmation dynamique," pp 23-25. 2003. | ||

In article | |||

[15] | Van Caneghem M., "Algorithmes recursifs Diviser pour régner,". Janvier, 2003. | ||

In article | |||

[16] | Gondran M., Minoux M., "Graphes et Algorithmes," Paris 2nd edition. 546 p. 1990. | ||

In article | |||

[17] | ROURA S., "Improved Master Theorems for Divide-and-Conquer Recurrences," Journal of the ACM, Vol. 48, No. 2, pp. 170-205. March 2001. | ||

In article | CrossRef | ||

[18] | Drmota M., Szpankowski W., "A Master Theorem for Discrete Divide and Conquer Recurrences," Austrian Science Foundation FWF Grant No. S9604. 2011. | ||

In article | |||