NAME

Pod::Stupid - The simplest, stupidest 'pod parser' possible

VERSION

version 0.005

SYNOPSIS

  use Pod::Stupid;
  
  my $file = shift; # '/some/file/with/pod.pl';
  my $original_text = do { local( @ARGV, $/ ) = $file; <> }; # slurp
  
  my $ps = Pod::Stupid->new();
  
  # in scalar context returns an array of hashes.
  my $pieces = $ps->parse_string( $original_text );
  
  # get your text sans all POD
  my $stripped_text = $ps->strip_string( $original_text );
  
  # reconstruct the original text from the pieces...
  substr( $stripped_text, $_->{start_pos}, 0, $_->{orig_txt} )
      for grep { $_->{is_pod} } @$pieces;
  
  print $stripped_text eq $original_text ? "ok - $file\n" : "not ok - $file\n";

DESCRIPTION

This module was written to do one simple thing: Given some text as input, split it up into pieces of POD "paragraphs" and non-POD "whatever" and output an AoH describing each piece found, in order.

The end user can do whatever s?he wishes with the output AoH. It is trivially simple to reconstruct the input from the output, and hopefully I've included enough information in the inner hashes that one can easily perform just about any other manipulation desired.

INDESCRIPTION

There are a bunch of things this module will NOT do:

  • Create a "parse tree"

  • Pod validation (it either parses or not)

  • Pod cleanup

  • "Handle" encoded text (but it should still parse)

  • Feed your cat

However, it may make it easier to do any of the above, with a lot less time and effort spent trying to grok many of the other POD parsing solutions out there.

A particular design decision I've made is to avoid needing to save any state. This means there's no need or advantage to instantiating an object, except for your own preferences. You can use any method as either an object method or a class method and it will work the same way for both. This design should also discourage me from trying to bloat Pod::Stupid with every feature that tickles my fancy (or yours!) but still, I encourage any feature requests!

METHODS

new

the most basic object constructor possible. Currently takes no options because the object neither has nor needs to keep any state.

This is only here if you want to use this module with an OO interface.

parse_string

Given a string, parses for pod and, in scalar context, returns an AoH describing each pod paragraph found, as well as any non-pod.

  # typical usage
  my $pieces = $ps->parse_string( $text );
  
  # to separate pod and non-pod
  my @pod_pieces     = grep { $_->{is_pod}  } @$pieces;
  my @non_pod_pieces = grep { $_->{non_pod} } @$pieces;

strip_string

given a string or string ref, and (optionally) an array of pod pieces, return a copy of the string with all pod stripped out and an AoH containing the pod pieces. If passed a string ref, that string is modified in-place. In any case you can still always get the stripped string and the array of pod parts as return values.

  # most typical usage
  my $txt_nopod = $ps->strip_string( $text );
  
  # pass in a ref to change string in-place...
  $ps->strip_string( \$text );   # $text no longer contains any pod
  
  # if you need the pieces...
  my ( $txt_nopod, $pieces ) = $ps->strip_string( $text );
  
  # if you already have the pod pieces...
  my $txt_nopod = $ps->strip_string( $text, $pod_pieces );

KNOWN LIMITATIONS

  • Currently only works on files with unix-style line endings.

TODO

This is only what I've thought of... suggestions *very* welcome!!!

  • Fix aforementioned limitation

  • More comprehensive tests

  • A utility module to do common things with the output

CREDITS

Uri Guttman for giving me the task that led to my shaving this particular yak

SEE ALSO

POD TERMINOLOGY FOR DUMMIES (aka: me)

paragraphs

In Pod, everything is a paragraph. A paragraph is simply one or more consecutive lines of text. Multiple paragraphs are separated from each other by one or more blank lines.

Some paragraphs have special meanings, as explained below.

command

A command (aka directive) is a paragraph whose first line begins with a character sequence matching the regex m/^=([a-zA-Z]\S*)/

I've actually been a bit more generous, matching m/^=(\w+)/ instead. Don't rely on that though. I may have to change to be closer to the spec someday.

In the above regex, the type of command would be in $1. Different types of commands have different semantics and validation rules yadda yadda.

Currently, the following command types (directives) are described in the Pod Spec http://perldoc.perl.org/perlpodspec.html and technically, a proper Pod parser should consider anything else an error. (I won't though)

  • head[\d] (\d is a number from 1-4)

  • pod

  • cut

  • over

  • item

  • back

  • begin

  • end

  • for

  • encoding

directive

Ostensibly a synonym for a command paragraph, I consider it a subset of that, specifically the "command type" as described above.

verbatim paragraph

This is a paragraph where each line begins with whitespace.

ordinary paragraph

This is a prargraph where each line does not begin with whitespace

data paragraph

This is a paragraph that is between a pair of "=begin identifier" ... "=end identifier" directives where "identifier" does not begin with a literal colon (":")

I do not plan on handling this type of paragraph in any special way.

block

A Pod block is a series of paragraphs beginning with any directive except "=cut" and ending with the first occurence of a "=cut" directive or the end of the input, whichever comes first.

piece

This is a term I'm introducting myself. A piece is just a hash containing info on a parsed piece of the original string. Each piece is either pod or not pod. If it's pod it describes the kind of pod. If it's not, it contains a 'non_pod' entry. All pieces also include the start and end offsets into the original string (starting at 0) as 'start_pos' and 'end_pos', respectively.

AUTHOR

Stephen R. Scaffidi <sscaffidi@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2010 by Stephen R. Scaffidi.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.