MSISK / HTML-TableExtract-2.11 / Changes

Revision history for HTML::TableExtract

2.11  Tue Aug 23 16:01:04 EDT 2011
        - added parsing context, override for eof() and parse() for
          memory clear on new docs or post-eof()
        - fixed some long standing test warnings

2.10  Sat Jul 15 20:50:41 EDT 2006
        - minor bug fixed in HTML repair routines (thanks to Dave Gray)

2.09  Thu Jun  8 15:46:17 EDT 2006
        - Tweaked rasterizer to handle some situations where the HTML is
          broken but tables can still be inferred.
        - Fixed TREE() definition for situations where import() is
          not invoked. (thanks to DDICK on cpan.org)

2.08  Wed May  3 17:17:33 EDT 2006
        - Implemented new rasterizer for grid mapping. Thanks to Roland
          Schar for a tortuous example of span issues.
        - This also fixes a bug the old skew method had when it
          encountered ridiculously large spans (out of memory). Thanks
          to Andreas Gustafsson.
        - Regular extraction and TREE mode are using the same
          rasterizer now.
        - Fixed HTML stripping for a header matching bug on single word
          text in keep_html mode (thanks to Michael S. Muegel for
          pointing the bug out)

2.07  Sun Feb 19 13:40:44 EST 2006
        - Fixed subtable slicing bug
        - Fixed hrow() attachment bug
        - Added tests

2.06  Tue Oct 18 13:13:52 EDT 2005
        - Tightened up element interactions in TREE() mode when
          examining rows, columns, cells, etc. Was running into trouble
          with dereferencing scalars vs objects.
        - Documented space() H::TE::T method, added tests
        - Added POD tests
        - Documentation updates and fixes

2.05  Tue Oct  4 16:00:02 EDT 2005
        - Fixed a TREE() definition bug and class method assignments
        - Fixed a 'row above header' bug, added tests

2.04  Wed Aug  3 14:42:23 EDT 2005
        - Fixed some conditional optional dependency tests in order to
          avoid falure assertions on some test boxes.

2.03  Wed Jul 20 12:45:56 EDT 2005
        - Fixed greedy attribute bug (non qualifying tables were being
          selected under certain circumstances)
        - Moved more completely to File::Spec operations in testload.pm
          in order to make windows boxes happy.

2.02  Thu Jun 23 12:42:44 EDT 2005
        - squelched TREE() creation warnings for subclasses
        - fixed a rows() bug involving keep_headers

2.01  Tue Jun 21 22:05:53 EDT 2005
        - fixed some test changes

2.00  Fri Jun 17 17:28:10 EDT 2005
        - Can now return parsed tables as HTML::TableElement objects
          within an HTML::Element tree structure (via HTML::TreeBuilder)
          for such purposes as in-line editing of table content within
          documents. Invoked via 'use HTML::TableExtract qw(tree);'.
        - Added columns(), row(), column(), and cell() methods.
        - Added some handy reporting methods: tables_report() and
          tables_dump(). These are almost always handy while first
          analyzing a new HTML document for table content.
        - Debugging and error output can now be assigned to arbitrary
          file handles.
        ! Old 'table_state' methods are now merely 'table' methods,
          though the old table_state style is still supported.
        ! Chains have been dropped. Though interesting (think xpath),
          they needlessly complicated matters as they were nearly
          universally unused.

1.09  Fri Feb 25 17:49:00 EST 2005
        - Tables can now be selected by table tag attributes
        - lineage() method now returns row and column information, as
          well as depth and count, for each ancestor (potential
          backwards incompatability, entries are now 4 element arrays
          now rather than 2)
        - header matching and column retention enhancements
        - header retention
        - old-style procedures deprecated in prepration for them to
          become methods
        - various bug fixes

1.08  Thu Apr  4 11:26:27 CST 2002
        - Added some more crufty HTML tolerance -- not PC (puristicly
          correct) but HTML correctness is probably of no interest to
          those merely trying to extract information *out* of HTML.
        - Fixed a mapback problem with the legacy methods

1.07  Wed Aug 22 06:14:24 CDT 2001
        - Added keep_html option for HTML retention
        - bug fix for depth/count targets

1.06  Thu Nov  2 15:29:49 CST 2000
        - Added <br> translation to newlines (enabled by default)
        - cleaned up some warnings

1.05  Sun Aug  6 06:38:14 CDT 2000
        - minor bug fix involving empty cells

1.04  Sat Jul 15 02:18:04 CDT 2000
        - fixed gridmap bug involving skew calcs on unwanted columns
        - added example page reference in README

1.03  Tue Jul  7 03:43:30 CDT 2000
        - gridmap option, columns are really columns regardless of
          cell span skew
        - Added chains for relative targeting
          * Terminus-matching by default
          * Elasticity option
          * Waypoint retention option
          * Lineage tracking (match record along chain)
        - Significant tests added to 'make test'
        - Documentation rewrite

0.05  Tue Mar 21 08:11:54 CST 2000
        - Fixed -w init warnings for dangling columns in header mode
        - added 'decode' option to turn off text decoding when desired
        - internally stores real slices right now rather than sparse
          tables that later get massaged.

0.03  Thu Mar  9 13:10:03 CST 2000
        - Fixed bug regarding incomplete defaults
        - Tables, rows, and cells that are either empty or contain no
          text are now properly noted
        - Header patterns now match across stripped tags
        - In some cases, mangled HTML tables are properly
          scanned by inferring missing <TR> tags.
        - Depth/Count votes are now properly honored.
        - Cleaned up some -w noise.

0.02  Thu Feb 10 13:43:04 CST 2000
        - Fixed some problems tracking counts at revisited depths.
        - Minor doc fix, added mailing list

0.01  Wed Feb  2 18:24:07 CST 2000
        - Initial version.



Hosting generously
sponsored by Bytemark