Graphics::Skullplot::ClassifyColumns - simple type inference of columns of tabular data


Version 0.01


  use Graphics::Skullplot::ClassifyColumns;

  my $cc = Graphics::Skullplot::ClassifyColumns->new( data => $data );  
  my $plot_cols = 
    $cc->classify_columns_simple( { indie_count => $indie_count, } );


Graphics::Skullplot::ClassifyColumns is a stripped down version of an old experimental module I was developing I called Data::Classify. I expect to go back to that project and develop a more elaborate system of plug-ins to target different kinds of databases and so on, most likely named Table::TypeInference.

This particular module just needs a "classify_columns_simple" routine that works well enough to figure out how to plot some data via ggplot2 in R (i.e. the "Graphics::Skullplot" project).


Creates a new Graphics::Skullplot::ClassifyColumns object.

Takes a hashref as an argument, with named fields identical to the names of the object attributes. These attributes are:


A required field, columns of data as an array of array references, with a header in the first row.


Note: here "simple" might be thought of as "stub": This does the simplest possible categorization using only a single numeric hint for the number of independent fields.

The presumption here is the incoming data is organized like the output of a typical sql group by select, x-axis in the first column a number of columns of dependent data as the end, and (possibly) a certain number of categorical variables (ones with a small number of allowed values) in-between.

This returns a hash indicating how different columns should be handled in the plotting stage, the keys are:

  x    (rename: indie_x )
  y             but just for when there's only one dependent 
  dep_fields  (rename: dependents_y }

Example usage:

  my $cc = Graphics::Skullplot::ClassifyColumns->new( data => $data );  
  my $opt = { indie_count => 1, };
  my $plot_cols_href = 
    $cc->classify_columns_simple( $opt ); 

Given a reference to tabular data in an array-of-arrays format- with a header expected in the first row- tries to infer the rough data type of each column.

Returns a list (or aref) of the type codes, in sequence.


A wrapper around Scalar::Classify's "classify", which also subdivides the string category, looking for datetime types.

The type is most often (but not limited to) one of the following:


This code examines any string values to see if a date/time code is more appropriate:


Given a hash of numeric counts, returns the key of the maximum count.

In the case of a tie, the return will be one of the tie values, which one is undefined.


Generates a hashref of locally useful regexps.

These are mostly intended to identify dates and times. TODO just look up existing solutions, e.g. Regexp::Common.


Joseph Brenner, <>, 22 May 2018


Copyright (C) 2018 by Joseph Brenner

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

No warranty is provided with this code.

See for more information.