Monday, May 2, 2011

Blogging Markov Models: Hidden Markov Models

Continuing through Chapter 9 of Foundations of Statistical Natural Language Processing (1999) by Christopher D. Manning and Hinrich Schütze.

Last time, we introduced Markov models as describing the probabilities of random variables in a sequence.

In a Hidden Markov Model (HMM), we don't know the sequence. We have a state, and then only probabilities for future transitions. For this reason, in the HMM we need to consider all the paths that might be taken through the model.

Manning and Schütze note that "HMMs are useful when one can think of underlying events probabilistically generating surface events." Thus,
  • Parts of speech (or other classifiers) might be one such type of underlying series of events generating the actual words of a text.
  • HMMs are one of a class of model for which there exist efficient methods of training through use of the Expectation Maximization (EM) algorithm.
  • HMMs can be used to generate parameters for linear interpolation of n-gram models.
The upshot of all this is how to build HMMs--with hidden states that represent the choice of whether to use unigram, bigram, or trigram probabilities.

HMMs are specified by five factors:
  1. A set of states (S).
  2. An output alphabet (K).
  3. Initial state probabilities (Π).
  4. State transition probabilities (A).
  5. Symbol emission probabilities (B).
This is where it gets tricky, at least for me. Manning and Schütze say "The random variables Xt map from state names to corresponding integers." I think this means that if our state at some time t is the letter "a," for example, then we can covert it to a number.

Once an HMM is specified, one can easily set up a computer program to simulate the running of a Markov process and to produce an output sequence. The important point here is that the program operates to simulate a Markov process. The "real" Markov process, as it were, is a given set of data--such as a text--that we assume was generated through a hidden Markov process.

I don't think we should ignore that distinction being made between simulated and real Markov processes. It's a key operation fro producing knowledge of the real, and I suspect that with further consideration we would agree it carries several assumptions that also hold in the domains of computerization generally, new media, and language.

No comments: