New in version 1.32:
- Fixed more installation problems (Thanks to Craig Brockmeier
- Dropped DBI requirement during installation because it's really optional.
- Changed commercial version so it can run on any operating system with the
  same registration key. (Old keys will continue to work if on the same
  operating system, but new ones will need to be issued if a new operating
  system is needed.)
- Made handler loading routines more robust to illegal characters in the
  handler name.

New in version 1.31:
- Fixed a couple installation problems (Server files weren't being installed.)

New in version 1.30:
- Fixed a bug in detection of incompatible configuration file.
- Removed cacheimages option processing code from, and put it
  in the cacheimages handler.
- Added DBI and DBD::mysql module requirements to Makefile.PL. (Forgot to add
  them in version 1.28)
- "Server is down" warning is now shown only once for each run.
- Fixed a bugs in handler server connect timeouts and retries.
- -C now also gives the option to clear the debug and run logs.
- Direct database connections are now used only if DBI and DBD::mysql are
  available. Otherwise, the older (and slower) web-based connections are used.
- Fixed unhelpful warning when invalid handler names are used.
- Added support for emailing output files (Thanks to Lee Adler
  <> for the suggestion)
- Mail::Sendmail is now required if you use the email feature.

New in version 1.29:
- The Log::Agent and Log::Agent::Rotate packages are now required
- Debug information is now sent to a log file determined by the debug_log_file
  configuration parameter.
- Information about the execution of News Clipper is stored in the file
  determined by the run_log_file configuration parameter.
- Errors are now saved in the run log, which prevents private information
  about the News Clipper commands from being printed in the HTML comments. (An
  HTML message tells you to look in the log file.)
- New "lprint" function can be used within handlers to print to the run log.
- The new max_number_of_log_files and max_log_file_size configuration
  parameters control the rotation of log files.
- "tagText" configuration value allows users to set the comment keyword that
  indicates News Clipper commands
- "makeOutputFilesExecutable" configuration value allows users to set whether
  or not the output files should be made executable
- now saves updated config files even if a .bak file already
  exists. New backups are saved as .bak2, .bak3. etc.
- The amount of time to hold a lock and the amount of time to wait for a lock
  are now based on the scriptTimeout value.
- Updated Makefile.PL to use Win32::TieRegistry instead of older
- Makefile.PL now determines old installation directory for single-user
- Fixed a use of unitialized value warning in
- Standardized config options to use_this_style instead of thisStyle or

New in version 1.28:
- New handler check now uses direct database connection instead of CGI
  scripts. (Five times faster)
- The lockfile is now explicitly cleaned up. (May help prevent stale locks.)
- Added -H flag to override home directory location
- Modified -C flag to remove handler-specific state and News Clipper state,
  and to exit immediately afterwards
- Fixed a URL handling bug in ConvertHandler
- Added check in ConvertHandler to suggest that RunHandler be used instead of
- News Clipper should now only remove the old output file if the new one was
  created successfully.
- modulepath can now contain multiple paths separated by whitespace
- Fixed a bug where a newly downloaded handler update would not reload
- StripAttributes bug fixed: now recognizes attributes with empty quotes like
- Arguments to News Clipper are now treated as -e commands. This allows
  "NewsClipper slashdot" to be run to fetch and print slashdot headlines.
- Debug output now correctly says the script name is "NewsClipper" instead of
  "NewsClipper2" (on compiled versions)
- Compiled versions now have the default @INC set in modulepath during
  installation, allowing NC to find externally installed Perl modules.

New in version 1.27:
- Modified NewsClipper.cfg to use a local URL for the imagecache to help News
  Clipper work "out of the box"
- Cause of "undefined value in File::Cache" bug found and repaired

New in version 1.26:
- Added -P flag to pause after completion.

New in version 1.25:
- Added LOCAL_TIME_ZONE option to timezone specifications, which causes the
  local time zone to be used. (Thanks to Gerhard Wiesinger
  <> for the suggestion and reminder to implement
  this feature.)
- Re-implemented caching to use a different persistence mechanism. (This
  should finally squash the dreaded undefined value in File::Cache error.)

New in version 1.24:
- Added a lockfile (~/.NewsClipper/lock) that prevents more than one copy of
  News Clipper from running at the same time. (LockFile::Simple module now
- Faster verification of handler type during counting of number of installed
  acquisition handlers (affects commercial versions only)
- Decreased dependence on handler server during couting of number of installed
  acquisition handlers (affects commercial versions only)
- Installation no longer requires "make NewsClipper_Cleanup" step
- Installation now automatically converts older configuration files.
- Added ExtractTables function to HTMLTools. (Thanks to Chris Pimlott
  <> for the initial implementation and idea.)
- Non-fatal errors while contacting the handler server are now printed in the
  output as HTML comments.
- A few speed improvements.

New in version 1.23:
- Fixed so that it doesn't insert comment characters ("#") into
  the syntax information. (bugfix)

New in version 1.22:
- Added a check to make sure that an incompatible handler will not be loaded
  and used.
- Improved error reporting during replacement of old output file with new one.

New in version 1.21:
- Fixed a bug in error message reporting
- Expanded the arguments to the handler subroutines ProcessAttributes,
  FilterType, OutputType, GetUpdateTimes, and GetDefaultHandlers, providing
  attribute information and data. [This should be backwards compatible with
  existing handlers which do not expect or use the new data.]
- Fixed a bug where a newly downloaded update to a handler would not be used
  until the next execution of News Clipper.
- Added use of socketTries through more of the code (not just when fetching
  remote data).
- Added "forNewsClipperVersion" attribute to NewsClipper.cfg
- Updated ConvertHandler and ConvertConfig to detect the version of the
  handler or NewsClipper.cfg, and only apply the changes that are necessary to
  bring it up to the current version.
- Added FTP code which somehow didn't get into version 1.20.
- Added support for type names containing spaces.

New in version 1.20:
- Support for FTPing output files (Thanks to Rob Saunders for the suggestion.)
  This allows users who don't have shell access to their web server to run
  News Clipper on a local machine, and send the output files to the web
- News Clipper will check last modified dates before downloading content from
  the remote server.
- Fixed bug where <? ... > directives were stripped out erroneously. (Thanks
  to Kwon Soon Son <> for the bug report.)
- Improved image caching.
- Fixed a bug in HTMLTools::StripAttributes. (Thanks to Paco Hope
  <> for finding the bug.)
- New "lastupdated" handler gets time data was last updated for a News Clipper
  input command.
- New "dumpdata" output handler will dump the type signature and values
  for the data.
- <!-- newsclipper startcomment --> ... <!-- newsclipper endcomment --> can
  now be used to mark blocks of HTML that will be removed from the output
  during processing. This is useful for putting "dummy" content right after a
  News Clipper command during input file design to take up space.
- News Clipper now checks that remote content has changed before downloading
  information that has expired from the cache.
- Added -C flag to clear the News Clipper cache.
- Reworked cache to use File::Cache module. (This should fix problems related
  to missing cache files.)
- Restructured the code to make it more understandable. Commented every
- Added "socketTries" option to configuration, which specifies the number of
  times News Clipper should try to get data from the server before giving up.
  This defaults to 3. (Implemented by Ludovic Drolez <>)
- Modified GetHtml, GetLinks, and GetImages to generate correct links when the
  source web page uses <base href="..."> (Thanks to Dominique ROUSSEAU
  <> for the initial implementation.)
- Fixed a couple (more) bugs in the "check if the cached data needs to be
  updated" function.
- Improved error message reporting.
- Added versioning information to prevent accidental download of handlers that
  can break the input files.
- Handler-specific configuration information is now separated in the
  NewsClipper.cfg file.
- and help to convert config
  files and handler to the new style.
- Fixed a bug in HTMLsubstr
- Significantly improved API for handler writers:
  - Each handler has its own configuration information in the user's
    NewsClipper.cfg file. This information can be accessed from within the
    handler through the %handlerconfig hash.
  - The comment block has been replaced with a hash, which will help with the
    automated processing of the handler.
  - "Maintainer_Name", "Maintainer_Email", and "Language" fields have been
  - A new "CATEGORY" field says what category the handler belongs. 
  - A new "FOR NEWSCLIPPER VERSION" field tells what version of News Clipper
    the handler was built for. This helps accidental download of incompatible
  - Version numbers have been updated so that you can indicate when a change
    will break people's input files, when a change adds functionality, and
    when a change fixes a bug.
  - You can save values between runs using $self->{'state'}, which is just a
    File::Cache object. ($self->{'state'}->set('key','value'), then
  - Types have changed substantially:
    - FilterType and OutputType return a "type signature", like "@($ | %)",
      which means "an array of either scalars or hashes". @($ & %Slashdot)
      means an array of at least one scalar and at least one Slashdot hash.
    - A hunk of data matches a type signature if the structure matches, and
      any subtype relationships hold. For example, a type signature of @($ |
      %) would be matched by a data strcuture whose type signature is @($URL),
      %@Myarray(%), and @Myarray($URL & %Slashdot), but not by @(@). 
    - Everything in the data structure is a reference -- even scalars. 
    - Each new data value is blessed, not the whole data structure.
    - Data values don't have to be bless as "String", "Array", or "Hash". 
    - However, any new types introduced should be related to the others
      with the MakeSubtype function: MakeSubtype('Slashdot','HASH');
  - There's a new API function called TypesMatch(), which takes a data
    reference and a types signature, and returns 1 if the data matches the
  - dprint() can be used to send output as <!-- DEBUG: ... --> messages in
    debug mode.
  - error() can be used to output errors about invalid arguments, etc.
  - Default handlers are now specified as real News Clipper commands, like
      <filter name=map filter=highlight words=linux>
      <output name=array>
  - FilterType and OutputType now provide the $data and $attributes as
    arguments, so you can return a different type signature depending on
    parameters such as "style".
  - ComputeURL should now be used to compute the URL from the attributes.
  - You should never have to check the type of $grabbedData -- News Clipper
    will do that automatically.
  - A RunHandler API function was added, which takes the handler name, type of
    handler, data, and attributes. It will automatically check for correct
  - Some "use" statements are no longer needed.

New in version 1.17:
- A parsing bug in <a href> links was found (and fixed!) by Ludovic Drolez
  (Ludovic Drolez <>)
- The way the configuration information is loaded has changed:
  - On non-Windows machines, News Clipper always loads the system-wide
    configuration file NEWSCLIPPER/NewsClipper.cfg, where NEWSCLIPPER is the
    path specified by the NEWSCLIPPER environment variable.
  - Per-user configuration information is then loaded from
    $home/.NewsClipper/NewsClipper.cfg, or the file specified with the -c
  - Personal config files can now override system-wide configuration
    information on an item-by-item basis. This means that the user's config
    file doesn't have to have "$ENV{TZ}", imgcache settings, proxy settings,
    etc. if all they want to do is override the timeout values.
- Fixed a bug where cached data would not be used if a HTTP request failed.
  Thanks to Vadim Strizhevsky <> for finding the bug.
- Fixed a bug in checking version information for update of handlers.
- Consolidated common code into NewsClipper::Globals (dprint, DEBUG, etc.)
  Added a few useful functions to improve code and output message readability
  (dequote, reformat).
- Improved timeout handling, so that errors get propagated correctly, and the
  script exits with the correct value. (Thanks to Ludovic Drolez
  <> for finding the bug.) Hopefully News Clipper works
  correctly in makefiles now.
- Fixed a potential bug in NewsClipper::HTMLTools::HTMLsubstr

New in version 1.16:
- Updated compiler-related code
- Changed licensing code.
- Updated README instructions
- Improved installation instructions
- Improved installation script
- Changed numbering scheme to X.YZ, where X is the major version number, Y is
  increased with each functionality enhancement, and Z is increased with each
  bugfix release for a given Y.
- NOTE: The open source version will lead the commercial version in releases.
  The "official" version number will be the commercial version.

New in version 1.0104:
- Fixed a performance bug in the Trial version code (
- -h now prints version and registration information

New in version 1.0102:
- Fixed the brain-dead non-monotonically increasing version numbering :)
- Fixed a bug in input file validation, when "STDIN" is used as the file.
  (Thanks to Jamie Wall <> for finding it)
- Fixed a bug with -c. (Thanks to for helping to find it)
- Fixed a bug that occurs when checking for new versions and the handler
  server is down. (Thanks to for helping to find it.)
- Fixed a typo in the NewsClipper.cfg file: maxcachesize is in MB, not bytes.
- Updated the template to work with the new yahootopstories handler.
- Made Makefile.PL for unix installations interactive instead of specifying
  everything on the command line.

New in version 1.0001:
- Code modified to point to the new handler server at
  (instead of
- Fixed typo in (\\&main::dprint, not \&main::dprint)
- Added licensing code:
  - email and regKey in NewsClipper.cfg,
  - key verification in
  - handler counting code in
  - Nag in output for trial version
- Clarified a few error messages.

New in version 1.00:
- Significantly faster update of out-of-date handlers.
- Anti-abuse measures implemented to help reduce unnecessary load on the
  handler server.
- Added types to the handlers and News Clipper commands, which helps catch
- A new name, new web page, fancy compiled versions, a nice user's manual, a
  dedicated server, and improved technical support. (Well, most of that's for
  the commercial version. Non-commercial users only get the name, web page,
  and server :)

-------------- History before Daily Update became News Clipper --------------

New in version 7.1: 
- You can now specify "STDIN" and "STDOUT" as the input and output files in
  DailyUpdate.cfg. This allows you to use Daily Update as a filter, piping the
  data to and from it. You can also use a file name and STDOUT to force DU to
  output to standard output.
- Debugging mode is now activated using the -d switch. It outputs lots of
  useful information that can help you nail down problems you might have.
- Some speed improvements.
- Date::Manip has been replaced by Time::CTime and Time::ParseDate, which are
  a lot faster. 

New in version 7.0: 
- This is a major rewrite, with a new syntax and improved support for web
- See the file MIGRATING for hints on how to move to the new version from
  older versions, and how to upgrade handlers you've written.
- None of the old handlers work. You'll have to download the new ones and
  migrate the ones you've written yourself but haven't submitted to me.
- Improved caching
- New, separated input, filtering, and output commands. This means you'll have
  to update your input files to reflect the new syntax.
- New and improved
- -v for verbose output
- Added StripAttributes HTMLsubstr TrimOpenTags functions to HTMLTools.
- Added day of the week and timezone support to handler update times.
- New web support for browsing handlers
- Daily Update mailing list
- Now Daily Update prints a message if it times out.
- When a handler fails, a comment is inserted in the output instead of
  printing it to visible HTML.
- Miscellaneous bug fixes
- Miscellaneous bugs introduced. :) (I reworked a lot of the code.)

New in version 6.1:
- Miscellaneous bug fixes and improvements to GetText, GetLinks, and
- Added -v (verbose output) support.
- Setup changed so -I isn't necessary when running as a CGI program.
- Message now displayed when Daily Update times out.
- Output file now automatically chmod'd to 755. (Executable in case you call
  server-side scripts like I do.)

New in version 6.02:
- Added return values for those of you who like to use makefiles to build web
- Added documentation for
- Bug fixed in output columns function. (Thanks to Craig Brockmeier
  <> for finding it.)
- I've revamped the configuration and install procedure to make it more
  amenable to site-wide installation. DailyUpdate.cfg is looked for in
  ~/.DailyUpdate, and handlers are installed in the first directory specified
  by handlerlocations in DailyUpdate.cfg. (~/.DailyUpdate by default.) You'll
  need to fix up your handlerLocations variable if you're upgrading from a
  previous version. (Thanks to John Goerzen <> for his
  invaluable input.)
- Added support for proxies requiring passwords. (Thanks to Kevin D.
  Clark <>.)
- Fixed a bug in (Thanks to Mark Harburn

New in version 6.01:
- Added -r switch to force proxies to reload cached data during data
  acquisition. (Thanks to Gerhard Wiesinger <>)
- Fixed a couple of minor bugs.
- Improved installation to automatically set #! line and set INSTALLDIRECTORY
- Configuration doesn't have to be in the same directory as the
  file. (You can leave the DailyUpdate.cfg file in the install directory, and
  move the file wherever you want, like cgi-bin.)
- During automatic installation, handlers are now put in the install
  directory, not the directory of the currently running script.

New in version 6.0:
- Sorry about the big version jump. I had to do it because I switched over to
  the standard Perl numbering scheme. (5.2 becomes 5.02, which is less than
  5.1). Maybe some of these features will justify the leap. :)
- Now the installation mimicks the usual "perl Makefile.PL;make;make install"
  of perl modules.
- The script now supports multiple input and output files in the configuration
- Changed the syntax from <dailyupdate...> to <!--dailyupdate...--> so that
  WYSIWYG editors won't croak on the 'unknown' tag.
- Moved HTML-related functionality from to, and added function StripTags.
- Fixed 2 bugs in MakeLinksAbsolute. (Thanks to Kazuo Moriwaka
  <> and Phillip Gersekowski
- Enhanced Handler to work better in the face of bad data acquisition and
- Added OutputList to, which allows you to set the format
  and number of columns when printing out lists.

New in version 5.1:
- Update times are now set by the handlers by overriding GetUpdateTimes. (The
  default is 2,5,8,11,14,17,20,23.) Users can also customize the update times
  in the configuration file, if they don't like the ones given by the
  handlers. Modified to support this.
- Fixed a caching bug that would cause (a) multiple copies of some data when a
  tag is used more than once on the same page, and (b) reuse of cached data
  even when the style has been changed.
- Added -a flag for automatic download of handlers.
- Added -n flag to check for new versions of handlers.
- Fixed prerequisites.
- Removed use of the deprecated HTML::Parse in

New in version 5.0.2:
- Fixed data caching.
- now stores the URL in the comment block.
- Created POD documentation in
- Fixed a problem with the loading of configuration information

New in version 5.0.1:
- Fixed a bug in
- Improved error checking for missing Perl modules when installing new

New in version 5.0: This is a major rewrite.
- Handlers, implemented as Perl classes, are now used instead of the schemas
  of previous versions. The GetHtml, GetText, OutputTwoColumn,
  OutputUnorderedList, etc. functions are now part of two APIs that can be
  used by handler writers: DailyUpdate::AcquisitionFunctions, and
- No handlers are distributed with Daily Update.  Instead, the user is
  prompted for permission to install them as necessary. This makes using a new
  tag easy -- just put it in your input file, run Daily Update once manually,
  and tell it to download the handler.
- The handlers are real Perl, instead of the pseudo Perl used in the schemas.
  This increases the difficulty of adding new tags, but gives people a lot
  more flexibility. To help things, I've included a script ""
  which will create a prototype handler based on user inputs.
- GetText now uses HTML::FormatText.
- Tags are now of the syntax <dailyupdate name=X>

New in version 4.6:
- Added notification support for new versions.
- Added -c flag for user-specified configuration file. Dropped use of POSIX
  (for NT compatibility.)
- Fixed a bug in relative to absolute link conversion.  (Thanks to C. Harald
  Koch <>)
- URLs can now be of the format "file://...".
- Added GetImages acquisition function (thanks to Tanner Lovelace
- Now the %attributes hash can be referenced in the URL or tag name of a
  schema--added <unitedmediacomic> tag to illustrate its use.
- Made changes to the config file to support better schema design.
- Added "How to Write Schemas" page at
- Moved to URI from URI::URL, which is now deprecated.

New in version 4.5:
- Added acquisition function GetHtml.
- Added -i and -o flags for input and output files.
- Now reads old html file only once (faster). 
- Added "always" schema time option.
- Now checks script directory for configuration file.
- Miscellaneous bug fixes, new data sources, and minor enhancements.

New in version 4.4:
- Separated configuration from main script (yea!).
- Fixed "uninitialized variable" warnings.
- Created user-submitted schemas webpage at
- Added more data acquisition schemas.

New in version 4.3:
- Added Infoworld, and Adam@Home.
- Changed date & time handlers to take any valid strftime format string, and
  made url attribute to weather mandatory.
- Changed CoolSite to use their logo.
- Script only dumps to screen if called as a cgi script.
- Added default values to date, time, and OutputListOrColumns.

New in version 4.2:
- Added "linuxtoday", and "time".
- Modified GetText & GetLinks to take "^" and "\$" signifying start and end of
- Added proxy support.
- Fork is now disabled on Windows platforms.
- Moved from HTML::Parse to HTML::Parser.

New in version 4.1:
- Added "User Friendly" and "Freshmeat" support.
- Fixed Slashdot so that it takes a style.
- Changed it to use LWP instead of my own HTTP sockets stuff.

New in version 4.0:
- This is a total rewrite, resulting in a modular, extensible script that is
  half the size of the previous version.

New in version 3.5.4:
- Added Peanuts (Thanks to Alvaro Herrera <>)
- Also fixed the news from the AP.

New in version 3.5.2:
- Fixed a bug in the routine to detect if we need to update the data.

New in version 3.5.1:
- Improved calculation of when cached data should be used. Hopefully it's
  less buggy.

New in version 3.5:
- Added support for Slashdot. Did a major cleanup of the code.

New in version 3.4.6:
- Fixed script to work with new Calvin and Hobbes webpage.
- Improved algorithms for changing relative URLs to absolute in headlines.

New in version 3.4.5:
- More enhanced parsing of weather.

New in version 3.4.4:
- Fixed Dr. Science and Sports.
- Enhanced parsing of weather.

New in version 3.4.2:
- Fixed a bug in the weather processing code.

New in version 3.4.1:
- Fixed the extra line in the headlines

New in version 3.4:
- Updated DailyUpdate to work with the new CNN/SI homepage.

New in version 3.3.3:
- Updated DailyUpdate to work with the new Yahoo headlines format

New in version 3.3.1:
- Fixed dilbert to handle absolute & relative URLs for the image.

New in version 3.3.1:
- Fixed dilbert to work with the new Dilbert Zone format

New in version 3.3:
- Added Calvin and Hobbes
- Fixed a bug that caused nothing to be output for the first use of the tags.

New in version 3.2.1:
- Finally fixed the timeouts to work correctly.

New in version 3.2:
- Thanks to Christian Blair for helping to make the script more robust.

New in version 2.9:
- Thanks to Bob Anzlovar for the addition of the Dr. Science question of the
- He also changed the script to "use Socket" instead of "require".
  I agree with his thinking on this, however users of older versions of Perl
  will have to do some comment work on the geturl procedure. In a couple of
  versions I'm going to stop backward support, since Perl 5 is the way to go...

New in version 2.8:
- In this version I've commented out the telnet for weather and replaced it
  with an html grab, which seems much faster. For those people who need the
  weather thrice daily (and have a fast connection) or who can't get the
  forecast they need from, put the telnet
  command in line 173 back in and remove the wwwgrab line. Then comment out
  the weather site and remove the comments for a telnet weathersite.