PDL::HMM - Hidden Markov Model utilities in PDL
use PDL::HMM; ##----------------------------------------------------- ## Dimensions $N = $number_of_states; $M = $number_of_symbols; $T = $length_of_input; $A = $maximum_ambiguity; ##----------------------------------------------------- ## Parameters $af = log(random($N,$N)); $bf = log(random($N,$M)); $pif = log(random($N)); $omegaf = log(random($N)); @theta = ($a,$b,$pi,$omega) = hmmmaximize($af,$bf,$pif,$omegaf); $o = long(rint($M*random($T))); maximum_n_ind(dice_axis($a->logsumover+$pi+$b, 1,$o), ($oq=zeroes(long,$A,$T))); ##-- for constrained variants ##----------------------------------------------------- ## Log arithmetic $log0 = logzero; $logz = logadd(log($x),log($y)); $logz = logdiff(log($x),log($y)); $logz = logsumover(log($x)); ##----------------------------------------------------- ## Sequence Probability $alpha = hmmfw ($a,$b,$pi, $o ); ##-- forward (full) $alphaq = hmmfwq($a,$b,$pi, $o, $oq); ##-- forward (constrained) $beta = hmmbw ($a,$b,$omega, $o ); ##-- backward (full) $betaq = hmmbwq($a,$b,$omega, $o,$oq); ##-- backward (constrained) ##----------------------------------------------------- ## Parameter Estimation @expect = ($ea,$eb,$epi,$eomega) = hmmexpect0(@theta); ##-- initialize hmmexpect (@theta, $o, $alpha, $beta, $ea,$eb,$epi); ##-- expect (full) hmmexpectq(@theta, $o,$oq, $alphaq,$betaq, $ea,$eb,$epi); ##-- expect (constrained) ($a,$b,$pi,$omega) = hmmmaximize($ea,$eb,$epi,$eomega); ##-- maximize ##----------------------------------------------------- ## Sequence Analysis ($delta,$psi) = hmmviterbi ($a,$b,$pi, $o); ##-- trellis (full) ($deltaq,$psiq) = hmmviterbiq($a,$b,$pi, $o,$oq); ##-- trellis (constrained) $paths = hmmpath ( $psi, sequence($N)); ##-- backtrace (full) $pathsq = hmmpathq($oq, $psiq, sequence($A)); ##-- backtrace (constrained)
Signature: ([o]a())
Approximates $a() = log(0), avoids nan.
logzero() handles bad values. The state of the output PDL is always good.
Signature: (a(); b(); [o]c())
Computes $c() = log(exp($a()) + exp($b())), should be more stable.
logadd does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Computes log symmetric difference c = log(exp(max(a,b)) - exp(min(a,b))), may be more stable.
logdiff does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (a(n); [o]b())
Computes $b() = log(sumover(exp($a()))), should be more stable.
logsumover does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (a(N,N); b(N,M); pi(N); o(T); [o]alpha(N,T))
Compute forward probability (alpha) matrix for input $o given model parameters @theta = ($a, $b, $pi, $omega).
Output (pseudocode) for all 0<=i<N, 0<=t<T:
$alpha(i,t) = log P( $o(0:t), q(t)==i | @theta )
Note that the final-state probability vector $omega() is neither passed to this function nor used in the computation, but can be used to compute the final sequence probability for $o as:
log P( $o | @theta ) = logsumover( $omega() + $alpha(:,t-1) )
hmmfw does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (a(N,N); b(N,M); pi(N); o(T); oq(Q,T); [o]alphaq(Q,T))
Compute constrained forward probability (alphaq) matrix for input $o given model parameters @theta = ($a, $b, $pi, $omega), considering only the initial non-negative state indices in $oq(:,t) for each observation $o(t).
Output (pseudocode) for all 0<=qi<Q, 0<=t<T:
$alphaq(qi,t) = log P( $o(0:t), q(t)==$oq(qi,t) | @theta )
log P( $o | @theta ) = logsumover( $alphaq(:,t-1) + $omega($oqTi) )
where:
$oqTi = $oq(:,t-1)->where($oq(:,t-1)>=0)
hmmfwq does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (a(N,N); b(N,M); omega(N); o(T); [o]beta(N,T))
Compute backward probability (beta) matrix for input $o given model parameters @theta = ($a, $b, $pi, $omega).
$beta(i,t) = log P( $o(t+1:T-1) | q(t)==i, @theta )
Note that the initial-state probability vector $pi() is neither passed to this function nor used in the computation, but can be used to compute the final sequence probability for $o as:
log P( $o | @theta ) = logsumover( $pi() + $b(:,$o(0)) + $beta(:,0) )
hmmbw does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (a(N,N); b(N,M); omega(N); o(T); oq(Q,T); [o]betaq(Q,T))
Compute constrained backward probability (betaq) matrix for input $o given model parameters @theta = ($a, $b, $pi, $omega), considering only the initial non-negative state indices in $oq(:,t) for each observation $o(t).
$betaq(qi,t) = log P( $o(t+1:T-1) | q(t)==$oq(qi,t), @theta )
log P( $o | @theta ) = logsumover( $betaq(:,0) + $pi($oq0i) + $b($oq0i,$o(0)) )
$oq0i = $oq(:,0)->where( $oq(:,0) >= 0 );
hmmbwq does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (a(N,N); b(N,M); pi(N); omega(N); [o]ea(N,N); [o]eb(N,M); [o]epi(N); [o]eomega(N))
Initializes expectation matrices $ea(), $eb() and $epi() to logzero(). For use with hmmexpect().
Signature: (a(N,N); b(N,M); pi(N); omega(N); o(T); alpha(N,T); beta(N,T); [o]ea(N,N); [o]eb(N,M); [o]epi(N); [o]eomega(N))
Compute partial Baum-Welch re-estimation of the model @theta = ($a, $b, $pi, $omega) for the observation sequence $o() with forward- and backward-probability matrices $alpha(), $beta(). Result is recorded as log pseudo-frequencies in the expectation matrices $ea(), $eb(), $epi(), and $eomega(), which are required parameters, and should have been initialized (e.g. by "hmmexpect0"()) before calling this function.
Can safely be called sequentially for incremental reestimation.
hmmexpect does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (a(N,N); b(N,M); pi(N); omega(N); o(T); oq(Q,T); alphaq(Q,T); betaq(Q,T); [o]ea(N,N); [o]eb(N,M); [o]epi(N); [o]eomega(N))
Compute constrained partial Baum-Welch re-estimation of the model @theta = ($a, $b, $pi, $omega) for the observation sequence $o(), with constrained forward- and backward-probability matrices $alphaq(), $betaq(), considering only the initial non-negative state indices in $oq(:,t) for observation $o(t). Result is recorded as log pseudo-frequencies in the expectation matrices $ea(), $eb(), $epi(), and $eomega(), which are required parameters, and should have been initialized (e.g. by "hmmexpect0"()) before calling this function.
hmmexpectq does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (Ea(N,N); Eb(N,M); Epi(N); Eomega(N); [o]ahat(N,N); [o]bhat(N,M); [o]pihat(N); [o]omegahat(N));
Maximizes expectation values from $Ea(), $Eb(), $Epi(), and $Eomega() to log-probability matrices $ahat(), $bhat(), $pihat(), and $omegahat(). Can also be used to compile a maximum-likelihood model from log-frequency matrices.
Signature: (a(N,N); b(N,M); pi(N); o(T); [o]delta(N,T); int [o]psi(N,T))
Computes Viterbi algorithm trellises $delta() and $psi() for the observation sequence $o() given the model parameters @theta = ($a,$b,$pi,$omega).
Outputs:
Probability matrix $delta(): log probability of best path to state $j at time $t:
$delta(j,t) = max_{q(0:t)} log P( $o(0:t), q(0:t-1), $q(t)==j | @theta )
Path backtrace matrix $psi(): best predecessor for state $j at time $t:
$psi(j,t) = arg_{q(t-1)} max_{q(0:t)} P( $o(0:t), q(0:t-1), $q(t)==j | @theta )
Note that if you are using termination probabilities $omega(), then in order to find the most likely final state, you need to compute the contribution of $omega() yourself, which is easy to do:
$best_final_q = maximum_ind($delta->slice(",-1") + $omega);
hmmviterbi does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (a(N,N); b(N,M); pi(N); o(T); oq(Q,T); [o]deltaq(Q,T); indx [o]psiq(Q,T))
Computes constrained Viterbi algorithm trellises $deltaq() and $psiq() for the observation sequence $o() given the model parameters @theta = ($a,$b,$pi,$omega), considering only the initial non-negative state indices $oq(:,t) for each observarion $o(t).
Constrained probability matrix $deltaq(): log probability of best path to state $oq(j,t) at time $t:
$deltaq(j,t) = max_{j(0:t)} log P( $o(0:t), q(0:t-1)==$oq(:,j(0:t-1)), q(t)==$oq(j,t) | @theta )
Constrained path backtrace matrix $psiq(): best predecessor index for state $oq(j,t) at time $t:
$psiq(j,t) = arg_{j(t-1)} max_{j(0:t)} P( $o(0:t), q(0:t-1)=$oq(:,j(0:t-1)), q(t)==$oq(j,t) | @theta )
Note that if you are using termination probabilities $omega(), then in order to find the most likely final state, you need to compute the contribution of $omega() yourself, which is quite easy to do:
$best_final_j = maximum_ind($deltaq->slice(",-1") + $omega->index($oq->slice(",(-1)")))
hmmviterbiq does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (psi(N,T); indx qfinal(); indx [o]path(T))
Computes best-path backtrace $path() for the final state $qfinal() from completed Viterbi trellis $psi().
Path backtrace $path(): state (in best sequence) at time $t:
$path(t) = arg_{q(t)} max_{q(0:T-1)} log P( $o(), q(0:T-2), $q(T-1)==$qfinal() | @theta )
This even threads over multiple final states, if specified, so you can align paths to their final states just by calling:
$bestpaths = hmmpath($psi, sequence($N));
Note that $path(T-1) == $qfinal(): yes, this is redundant, but also tends to be quite convenient.
hmmpath does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
Signature: (indx oq(Q,T); psiq(Q,T); indx qfinalq(); indx [o]path(T))
Computes constrained best-path backtrace $path() for the final state index $qfinalq() from completed constrained Viterbi trellis $psiq().
$path(t) = arg_{q(t)} max_{q(0:T-1)} log P( $o(), q(0:T-2), $q(T-1)==$oq($qfinalq(),T-1) | @theta )
This is really just a convenience method for dealing with constrained lookup -- the same thing can be accomplished using hmmpath() and some PDL index magic.
hmmpathq does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays.
HMMs are specified by parameters $a(N,N), $b(N,M), $pi(N), and $omega(N); input sequences are represented as vectors $o(T) of integer values in the range [0..M-1], where the following notational conventions are used:
The model has $N states, denoted $q, 0 <= $q < $N.
The input- (aka "observation-") alphabet of the model has $M elements, denoted $o(t), 0 <= $o(t) < $M.
Time indices are denoted $t, 1 <= $t < $T.
Input- (aka "observation-") sequences are represented as vectors of of length $T whose component values are in the range [0..M-1], i.e. alphabet indices.
The vector $pi(N) gives the (log) initial state probability distribution:
$pi(i) = log P( $q(0)==i )
The vector $omega(N) gives the (log) final state probability distribution:
$omega(i) = log P( $q($T)==i )
This parameter is a nonstandard extension. You can simulate the behavior of more traditional definitions (such as that given in Rabiner (1989)) by setting:
$omega = zeroes($N);
wherever it is required.
The matrix $a(N,N) gives the (log) conditional state-transition probability distribution:
$a(i,j) = log P( $q(t+1)==j | $q(t)==i )
The matrix $b(N,M) gives the (log) conditional symbol emission probability:
$b(j,o) = log P( $o(t)==o | $q(t)==j )
Perl by Larry Wall.
PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.
Implementation based largely on the formulae in: L. E. Rabiner, "A tutorial on Hidden Markov Models and selected applications in speech recognition," Proceedings of the IEEE 77:2, Februrary, 1989, pages 257--286.
Probably many.
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2005-2023 by Bryan Jurish. All rights reserved.
This package is free software, and entirely without warranty. You may redistribute it and/or modify it under the same terms as Perl itself.
perl(1), PDL(3perl).
To install PDL::HMM, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PDL::HMM
CPAN shell
perl -MCPAN -e shell install PDL::HMM
For more information on module installation, please visit the detailed CPAN module installation guide.