David Huggins-Daines


Speech::Recognizer::SPX::Server - Perl module for writing streaming audio speech recognition servers using PocketSphinx


  my $sock = new IO::Socket(... blah blah blah ...);
  my $log = new IO::File('server.log');
  my $audio_fh = new IO::File('speech.raw');
  my $srvr
      = Speech::Recognizer::SPX::Server->init({ -arg => val, ... }, $sock, $log, $verbose)
        or die "couldn't initialize pocketsphinx: $!";

  my $client = new IO::Socket;
  while (accept $sock, $client) {
      next unless fork;
      $srvr->calibrate or die "couldn't calibrate audio stream: $!";
      while (!$done && defined(my $txt
                        = $srvr->next_utterance(sub { print $log "listening\n" },
                                                sub { print $log "not listening\n },
                                                $audio_fh))) {
          print "recognized text is $txt\n";
      $srvr->fini or die "couldn't shut down server: $!";
      exit 0;


This module encapsulates a bunch of the stuff needed to write a PocketSphinx server which takes streaming audio as input on an arbitrary filehandle. It's not meant to be flexible or transparent - if you want that, then read the code and write your own server program using just the Speech::Recognizer::SPX module.

The interface is vaguely object-oriented, but unfortunately it is presently not possible to create multiple instances of Speech::Recognizer::SPX::Server within the same process, due to severe limitations of the underlying PocketSphinx library. You can, however, create multiple distinct servers with judicious use of fork, as shown in the example above.

It is possible that this will be fixed in a future release of PocketSphinx.


  my $srvr = Speech::Recognizer::SPX::Server->init(\%args, $sock, $log, $verbose);

%args is a reference to a hash of argument => value pairs, exactly like the arguments you would pass on the command line to one of the sphinx example programs. Argument names can be given either with or without a leading dash.

$sock is a socket or other filehandle (could be anything, really) on which the server will read audio data. This argument is optional and not needed to initialize the server - you can set it later with the sock accessor.

$log is a filehandle on which the server module will log messages. This argument is optional. Without a filehandle to log on, these messages (boring things like "started listening at $foo") will not be printed.

$verbose determines the verbosity level of the Sphinx library. Currently, due to limitations in the PocketSphinx library, there are only two options for this value, namely a true value for 'be insanely verbose', or a false value for 'say nothing at all'.


Calibrates the noise threshold for the continuous audio stream (i.e. figures out when it should listen and when it shouldn't). This requires you to actually have a ready and willing source of input on the socket you set in init or with sock.

  my $text = $srvr->next_utterance($cb_listen, $cb_not_listen, $audio_fh);

Waits for and recognizes the next utterance in the data stream. All arguments are optional:

$cb_listen is a reference to (or name of, but I encourage you not to do that) a subroutine to be called when the recognizer has detected speech input.

$cb_not_listen is a reference to (or name of) a subroutine to be called when the recognizer has detected the end of speech input.

Obviously this presumes a request/response model for your application. If you need to be able to get partial results then you'll have to wait for me to support them (which will undoubtedly happen sooner or later), or write your own module. Sorry.

$audio_fh is a filehandle to which to save the speech data - this may come in handy for debugging, or if you would like to only record the user talking and not the hours and hours of silence in between.


Shuts down the PocketSphinx recognizer. Doesn't close the socket or anything though, you have to do that yourself.


  my $sockfh = $srvr->sock;

Sets or gets the socket on which the server reads audio data.

  my $logfh = $srvr->logfh;

Sets or gets the filehandle on which the server logs messages (if it's being verbose).


Sets/gets the amount of time (in milliseconds) to wait after the end of speech-level input before processing an utterance. Default is one second.


perl(1), Speech::Recognizer::SPX.


David Huggins-Daines <dhuggins@cs.cmu.edu>