++ed by:
1 non-PAUSE user
Author image David Ferrance


Data::Grouper - Perl module to aggregate data for use with template modules.


   my $grouper = new Data::Grouper(
                   COLNAMES => [ 'CATEGORY','SUBCAT','DESCR','PRICE'],
                   SORTCOLS => [ 'CATEGORY','SUBCAT' ]

   $sql = 'select category, subcat, description, price from order_items order by category, subcat';
   $aref = $dbh->selectall_arrayref($sql);

   $t = HTML::Template->new(filename=>'../foobar.htmlt');
   print $t->output;

Lazy? Me too. The DATA param can use only two calls:

   $sql = 'select category, subcat, description, price from order_items order by style, color';
   $aref = $dbh->selectall_arrayref($sql, {Slice=>{}});

   my $g = new Data::Grouper(DATA=>$aref,SORTCOLS=>['CATEGORY','SUBCAT']);

   $t = HTML::Template->new(filename=>'../foobar.htmlt');
   print $t->output;

And the fragment from the HTML::Template code:

        <h1>Style: <TMPL_VAR NAME=CATEGORY></h1>
        <h2>Color: <TMPL_VAR NAME=SUBCAT> </h2>
          <TMPL_LOOP NAME=INNER>          
            <tr><td> <TMPL_VAR NAME=DESCR></td><td><TMPL_VAR NAME=PRICE></td></tr>
      <tr><td>&nbsp</td><td><TMPL_VAR NAME=SUM_PRICE></td></tr>

This might produce output like:

    Nerf Toys
      Nerf Dart Gun                  20.00
      Nerf Footbal                   30.00

  Video Games
    X Box
      Gothic Something or Other      10.00
    Playstation 2
      Gran Turismo 3 A-Spec          15.00
      Grand Theft Auto 3             10.00

This is not my most inspired example ever. Except for the PS2 bits.


This is a helper object for the various templating modules that exist. A key feature of any templating system is the ability to deal with loops of data, for example rows in a table. Most systems will deal with nested looping. Data::Grouper takes the work out of prepaing the data structures for nested loops.

Grouper will help you the most if you are writing reports that have multiple layers of grouping with aggregate values but also need to display the detail rows. An SQL select statement will get you one or the other of these features, but not both at the same time. And you still have to construct your data structure for HTML::Template. Grouper does it all.



Most of the behavior of grouper is determined by the parameters passed to the constructor.

  • COLNAMES - define column names for use with TMPL_VAR

    Tell grouper the column names. These are used when creating the hashes for HTML:Template. COLNAMES should contain the names of the hash values in the order the values will appear in the rows passed to add_row(). So for the following code:


    The hashes in the output structure will have keys RESP_DESC, DUTY_ID, and DUTY_DESC. This allows them to be used in <TMPL_VAR> directives.

    If you are using add_hash or add_array with an array of hash references, you may omit COLNAMES. If COLNAMES is omitted, the keys of the first incoming hash will become the contents of COLNAMES. Keep in mind that these are case sensitive.

  • AGGREGATES - Columns for which totals should be computed

    This parameter gets a reference to an array of column names to compute aggregate functions on. Aggregate functions are the same as you would find in SQL: sum, avg, min, max.

    At every level of grouping, aggregate values will be provided.

    Aggregate values are automatically given names based on the aggregate function and the column name. Specifically, one of SUM_, MIN_, MAX_, or AVG_ is prepended to the column name.

    Top level aggregates are contained in a hash obtainable with the get_top_aggregates() function.

  • SORTCOLS - Columns to group on

    This parameter is a reference to an array of column names to sort on. The number of <TMPL_LOOPS> in your template will be equal to the number of column names you indicate here.

    The rows passed to grouper MUST BE SORTED on the columns indicated in SORT_COLS.

  • DATA - Array ref of hash or array refs

    If you are using selectall_arrayref and not modifying the data before sending it to grouper, you can reduce your grouper code to two calls by using the DATA parameter in the grouper->new() constructor:

       my $g = new Data::Grouper(DATA=>$aref,SORTCOLS=>['STYLE','COLOR']);

    This will often be your best approach.

  • FORMAT - Formatting Functions

     my $cr = sub { sprintf "\$%.2f", shift; };
     my $g = new Data::Grouper( 
                      FORMAT=> { 'Count'=>$cr, 'SUM_Count'=>$cr }

    This parameter is a reference to a hash from column names (as specified in COLNAMES) to formatting functions. A formatting function takes one parameter and returns the formatted value of that parameter. Such a function might be used, for example, to format money values.


The add_array function adds multiple rows into the grouper in one function call. It takes one parameter, an array reference. This array can contain either hash references or array references.

Add_array is especially useful with DBI. DBI can return entire arrays with selectall_arrayref. This function lets you just pass an array ref in instead of doing a while loop and calling add_row (or add_hash) for each row.

   $aref = $dbh->selectall_arrayref($sql);

Just like add_row, add_array expects that the data is sorted on the columns that are specified in SORTCOLS.


This takes a list of scalar values, which should correspond to the entries in COLNAMES, and puts it in the correct place in the data structure. Data in the successive calls to add_row is expected to be sorted on the columns specified in SORTCOLS.

One reason to use add_row over add_array is if some transformation happens between retrieving the data and populating the data structure.


This function is the hash-equivalent to add_row (above). Add_hash takes one parameter, a hash reference, which contains one row of data. For example:

  $href = { COLOR=>'Red', Size=>'13', Style=>'Running' };
  add_hash ($href);

This would add one row of data to the grouper. To add multiple rows, for example if you had an array of hash refs from DBI, you would use add_array().

The hash is assumed to contain the keys from COLNAMES. If COLNAMES was not specified, the keys from the first hash passed to this function are used. Grouper makes its own copy of the hash ref to avoid modifying any data that is passed in.


Sometimes you will want to add some additional information to a row or a parent. This function helps you do that.

I'm not certain this function is useful and will stay.


This function returns the array refernces required for the call to HTML::Template's param() function.


This function returns a reference to a hash containing top level aggregates for the data. Top level aggregates are contained in their own separate hash since aggregates are normally one level up in the data structure from the array containing their rows, but in the case of the top level, it is impossible to go up one level.

Some top level aggregates may be best computed outside of Data::Grouper. For example, while Grouper could maintain a count of total rows, if you were using add_array this statistic is just the size of the array, which is computed much more efficiently as $#array.

Right now G::D computes top level aggregates if any aggregates are being computed. In the future there may be an option to disable computation of top level aggregates. Actually, they should probably be disabled by default.

More Examples

Suppose you have a table of book data with columns genre, title. You might have this data:

 Non-Fiction, Techincal, Effective Perl, 25
 Non-Fiction, Technical, The C Programming Language, 20
 Non-Fiction, Techincal, Lex & Yacc, 20
 Non-Fiction, Philosophy, Book of Five Rings, 13
 Non-Fiction, Philosophy, Against Method, 30
 Fiction, Sci-Fi, Lord of the Rings,25
 Fiction, Sci-Fi, Foundation,15
 Fiction, Literary, Gravity's Rainbow,15

Say you want to display these, grouped by genre. You need to build

 @data = [
   { FICTION=> Non-Fiction,
     INNER => [
       { GENRE => Technical,
         INNER => [  { TITLE=>Effective Perl, PRICE=>25},
                     { TITLE=>The C Programming Language, PRICE=>20 },
                     { TITLE=>Lex&Yacc, PRICE=>20 }
       { GENRE => Philosophy,
         INNER => [ {TITLE=>Book of Five Rings, PRICE=>13 },
                    {TITLE=>Against Method, PRICE=>30 }
   { FICTION=> Fiction,
     INNER => [  
       { GENRE=> Sci-Fi,
         INNER => [ { TITLE=>Lord of the Rings, PRICE=>25 },
                    { TITLE=>Foundation, PRICE=>15 }
       { GENRE=> Literary,
         INNER => [ { TITLE=>Gravity's Rainbow, PRICE=>15 } ]

and apply it to template

   Average Price: <TMPL_VAR NAME=AVG_PRICE>

to get

     Effective Perl
     The C Programming Language
     Lex & Yacc
     Average Price: 21.6
    Book of Five Rings
    Against Method
    Average Price: 21.5
     Lord of the Rings
     Gravity's Rainbow

If you are getting your data from a database, you might do this:

 $sql = 'select fiction, genre, title from books order by fiction, genre';
 $aref = $dbh->selectall_arrayref($sql);

Grouper creates the array of hash refs, containing array refs with hash refs that H::T requires.


How is this different than GROUP BY in SQL?

SQL's group by facility allows grouping at only one level and does not provide the underlying detail data. Grouper is used to provide summary details at multiple levels while still providing the underlying data, all in a format you can use with HTML::Template.


David Ferrance (dave@ferrance.com)


Data::Grouper - A module for using aggregating data for use with various Template modules.

Copyright (C) 2001,2002 David Ferrance (dave@ferrance.com). All Rights Reserved.

This module is free software. It may be used, redistributed and/or modified under the same terms as perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 584:

=back doesn't take any parameters, but you said =back 4