HTML::FormatData - formats strings and dates for web display/storage


  use HTML::FormatData;

  my $f = HTML::FormatData->new();

  my $string = "<b>bolded</b>";
  my $formatted = $f->format_text( $string, strip_html=>1 );
  # $string eq 'bolded'

  my $dt = $f->parse_date( $dt_string, '%Y%m%d%H%M%S' );
  my $yrmoday = $f->format_date( $dt, '%Y%m%d' );
  $yrmoday = $f->reformat_date( $dt_string, '%Y%m%d%H%M%S', '%Y%m%d' ); # shortcut


HTML::FormatData contains utility functions to format strings and dates. These utilities are useful for formatting data to be displayed on webpages, or for cleaning and date data during server-side validation before storage in a database or file.

While doing web development work in the past, I noticed that I was having to do the same operations time and again: strip HTML from form submissions, truncate strings for display as table data, URI-encode strings for use in links, translate Unix timestamps into mm/dd/yyyy format, etc. Rather than try to keep straight the different modules and functions used, I decided to write a wrapper with a single, consistent interface.



This method creates a new HTML::FormatData object. Returns the blessed object.

format_text( $string, %args )>

Wrapper function for the text formatting routines below. Formats a string according to parameters passed in. While the functions this routine calls can be called directly, it will usually be best to always go thru this function.

Returns the formatted string.

decode_xml( $string )

A copy of XML::Comma::Util::XML_basic_unescape. Returns an XML-unescaped string.

decode_html( $string )

Returns an HTML-unescaped string.

decode_uri( $string )

Returns an URI-unescaped string.

strip_html( $string )

Strips all HTML tags from string. Returns string.

strip_whitespace( $string )

Strips all whitespace ( \s ) characters from string. Returns string.

clean_high_ascii( $string )

Converts 8-bit ascii characters to their 7-bit counterparts. Tested with MS-Word documents; might not work right with high-ascii text from other sources. Returns string.

clean_html_encoded_text( $string )

Properly encodes some entities skipped by HTML::Entities::encode. Returns the modified string.

decode_select_entities( $string )

Takes HTML::Entities::encoded HTML and selectively unencodes certain entities for display on webpage. Returns modified string.

clean_encoded_html( $string )

Formats HTML-encoded HTML for display on webpage. Returns modified string.

clean_encoded_text( $string )

Formats HTML-encoded text for display on webpage. Returns modified string.

clean_whitespace( $string [keep_full_breaks => 1 | keep_all_breaks => 1] )

Cleans up whitespace in HTML and plain text. If passed an argument for handling line breaks, it will either keep full breaks (\n\n) or all breaks (any \n). Otherwise, all line breaks will be converted to spaces. Returns the modified string.

clean_whitespace_keep_full_breaks( $string )

Cleans up whitespace in HTML and plain text while preserving all full breaks (\n\n). Returns the modified string.

clean_whitespace_keep_all_breaks( $string )

Cleans up whitespace in HTML and plain text while preserving all line breaks (\n). Returns the modified string.

force_lc( $string )

Returns lc( $string ).

force_uc( $string )

Returns uc( $string ).

truncate( $string, $count )

Returns the first $count characters of string.

truncate_with_ellipses( $string, $count )

Returns the first $count - 3 characters of string followed by '...'.

encode_xml( $string )

A copy of XML::Comma::Util::XML_basic_escape. Returns an XML-escaped string.

encode_html( $string )

Returns an HTML-escaped string.

encode_uri( $string )

Returns an URI-escaped string.

reformat_date( $string, $oldformat, $newformat )

Takes a date string in $oldformat and returns a new string in $new_format.

parse_date( $string [, $format] )

Takes a $string representing a date and time, and tries to produce a valid DateTime object. Returns the object upon success, otherwise undef.

Setting $string to 'now' creates a DateTime object of the current date and time. Setting $string to 'today' creates a DateTime object of today's date and time set to midnight.

Otherwise, you must pass a $format to parse the string correctly. $format can be set to one of the following "shortcuts": 'date8', 'date14', or 'rfc822'.

format_date( $dt, $format )

Takes a DateTime object ($dt) and a $format, and returns the formatted string.

$format is a DateTime 'strftime' format string. $format can be set to one of the following "shortcuts": 'date8', 'date14', and 'rfc822'.


Eric Folley, <>


Copyright 2004-2005 by Eric Folley

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.