NAME

Parse::Taxonomy::AdjacentList - Extract a taxonomy from a hierarchy inside a CSV file

SYNOPSIS

    use Parse::Taxonomy::AdjacentList;

    $source = "./t/data/alpha.csv";
    $obj = Parse::Taxonomy::AdjacentList->new( {
        file    => $source,
    } );

METHODS

`new()`

Purpose

Parse::Taxonomy::AdjacentList constructor.
Arguments

Single hash reference. There are two possible interfaces: file and components.
1 file interface
```
    $source = "./t/data/delta.csv";
    $obj = Parse::Taxonomy::AdjacentList->new( {
        file    => $source,
    } );
```
Elements in the hash reference are keyed on:
- file
  
  Absolute or relative path to the incoming taxonomy file. Required for this interface.
- id_col
  
  The name of the column in the header row under which each data record's unique ID can be found. Defaults to id.
- parent_id_col
  
  The name of the column in the header row under which each data record's parent ID can be found. (Will be empty in the case of top-level nodes, as they have no parent.) Defaults to parent_id.
- leaf_col
  
  The name of the column in the header row under which, in each data record, there is a found a string which differentiates that record from all other records with the same parent ID. Defaults to name.
- Text::CSV_XS options
  
  Any other options which could normally be passed to Text::CSV_XS->new() will be passed through to that module's constructor. On the recommendation of the Text::CSV documentation, binary is always set to a true value.
2 components interface
```
    $obj = Parse::Taxonomy::AdjacentList->new( {
        components  => {
            fields          => $fields,
            data_records    => $data_records,
        }
    } );
```
Elements in this hash are keyed on:
- components
  
  This element is required for the components interface. The value of this element is a hash reference with two keys, fields and data_records. fields is a reference to an array holding the field or column names for the data set. data_records is a reference to an array of array references, each of the latter arrayrefs holding one record or row from the data set.
Return Value

Parse::Taxonomy::AdjacentList object.
Exceptions

new() will throw an exception under any of the following conditions:
- Argument to new() is not a reference.
- Argument to new() is not a hash reference.
- Argument to new() must have either 'file' or 'components' element but not both.
- Lack columns in header row to match requirements.
- Non-numeric entry in id or parent_id column.
- Duplicate entries in id column.
- Number of fields in a data record does not match number in header row.
- Empty string in a component column of a record.
- Unable to locate a record whose id is the parent_id of a different record.
- No records with same parent_id may share value of component column.
- file interface
  - In the file interface, unable to locate the file which is the value of the file element.
  - The same field is found more than once in the header row of the incoming taxonomy file.
  - Unable to open or close the incoming taxonomy file for reading.
- components interface
  - In the components interface, components element must be a hash reference with fields and data_records elements.
  - fields element must be array reference.
  - data_records element must be reference to array of array references.
  - No duplicate fields in fields element's array reference.

`fields()`

Purpose

Identify the names of the columns in the taxonomy.
Arguments
```
    my $fields = $self->fields();
```
No arguments; the information is already inside the object.
Return Value

Reference to an array holding a list of the columns as they appear in the header row of the incoming taxonomy file.
Comment

Read-only.

`data_records()`

Purpose

Once the taxonomy has been validated, get a list of its data rows as a Perl data structure.

Arguments

    $data_records = $self->data_records;

None.

Return Value

Reference to array of array references. The array will hold the data records found in the incoming taxonomy file in their order in that file.
Comment

Does not contain any information about the fields in the taxonomy, so you should probably either (a) use in conjunction with fields() method above; or (b) use fields_and_data_records().

`get_field_position()`

Purpose

Identify the index position of a given field within the header row.
Arguments
```
    $index = $obj->get_field_position('income');
```
Takes a single string holding the name of one of the fields (column names).
Return Value

Integer representing the index position (counting from 0) of the field provided as argument. Throws exception if the argument is not actually a field.

Accessors

The following methods provide information about key columns in a Parse::Taxonomy::MaterializedPath object. The key columns are those which hold the ID, parent ID and component information. They take no arguments. The methods whose names end in _idx return integers, as they return the index position of the column in the header row. The other methods return strings.

    $index_of_id_column = $self->id_col_idx;

    $name_of_id_column = $self->id_col;

    $index_of_parent_id_column = $self->parent_id_col_idx;

    $name_of_parent_id_column = $self->parent_id_col;

    $index_of_leaf_column = $self->leaf_col_idx;

    $name_of_leaf_column = $self->leaf_col;

`pathify()`

Purpose

Generate a new Perl data structure which holds the same information as a Parse::Taxonomy::AdjacentList object but which expresses the route from the root node to a given branch or leaf node as either a separator-delimited string (as in the path column of a Parse::Taxonomy::MaterializedPath object) or as an array reference holding the list of names which delineate that route.

Another way of expressing this: Transform a taxonomy-by-adjacent-list to a taxonomy-by-materialized-path.

Example: Suppose we have a CSV file which serves as a taxonomy-by-adjacent-list for this data:

    "id","parent_id","name","is_actionable"
    "1","","Alpha","0"
    "2","","Beta","0"
    "3","1","Epsilon","0"
    "4","3","Kappa","1"
    "5","1","Zeta","0"
    "6","5","Lambda","1"
    "7","5","Mu","0"
    "8","2","Eta","1"
    "9","2","Theta","1"

Instead of having the route from the root node to a given node be represented implicitly by following parent_ids up the tree, suppose we want that route to be represented by a string. Assuming that we work with default column names, that would mean representing the information currently spread out among the id, parent_id and name columns in a single path column which, by default, would hold an array reference.

    $source = "./t/data/theta.csv";
    $obj = Parse::Taxonomy::AdjacentList->new( {
        file    => $source,
    } );

    $taxonomy_with_path_as_array = $obj->pathify;

Yielding:

    [
      ["path", "is_actionable"],
      [["", "Alpha"], 0],
      [["", "Beta"], 0],
      [["", "Alpha", "Epsilon"], 0],
      [["", "Alpha", "Epsilon", "Kappa"], 1],
      [["", "Alpha", "Zeta"], 0],
      [["", "Alpha", "Zeta", "Lambda"], 1],
      [["", "Alpha", "Zeta", "Mu"], 0],
      [["", "Beta", "Eta"], 1],
      [["", "Beta", "Theta"], 1],
    ]

If we wanted the path information represented as a string rather than an array reference, we would say:

    $taxonomy_with_path_as_string = $obj->pathify( { as_string => 1 } );

Yielding:

    [
      ["path", "is_actionable"],
      ["|Alpha", 0],
      ["|Beta", 0],
      ["|Alpha|Epsilon", 0],
      ["|Alpha|Epsilon|Kappa", 1],
      ["|Alpha|Zeta", 0],
      ["|Alpha|Zeta|Lambda", 1],
      ["|Alpha|Zeta|Mu", 0],
      ["|Beta|Eta", 1],
      ["|Beta|Theta", 1],
    ]

If we are providing a true value to the as_string key, we also get to choose what character to use as the separator in the path column.

    $taxonomy_with_path_as_string_different_path_col_sep =
        $obj->pathify( {
            as_string       => 1,
            path_col_sep    => '~~',
         } );

Yields:

    [
      ["path", "is_actionable"],
      ["~~Alpha", 0],
      ["~~Beta", 0],
      ["~~Alpha~~Epsilon", 0],
      ["~~Alpha~~Epsilon~~Kappa", 1],
      ["~~Alpha~~Zeta", 0],
      ["~~Alpha~~Zeta~~Lambda", 1],
      ["~~Alpha~~Zeta~~Mu", 0],
      ["~~Beta~~Eta", 1],
      ["~~Beta~~Theta", 1],
    ]

Finally, should we want the path column in the returned arrayref to be named something other than path, we can provide a value to the path_col key.

    [
      ["foo", "is_actionable"],
      [["", "Alpha"], 0],
      [["", "Beta"], 0],
      [["", "Alpha", "Epsilon"], 0],
      [["", "Alpha", "Epsilon", "Kappa"], 1],
      [["", "Alpha", "Zeta"], 0],
      [["", "Alpha", "Zeta", "Lambda"], 1],
      [["", "Alpha", "Zeta", "Mu"], 0],
      [["", "Beta", "Eta"], 1],
      [["", "Beta", "Theta"], 1],
    ]

item * Arguments

Optional single hash reference. If provided, the following keys may be used:

path_col

User-supplied name for column holding path information in the returned array reference. Defaults to path.
as_string

Boolean. If supplied with a true value, path information will be represented as a separator-delimited string rather than an array reference.
path_col_sep

User-supplied string to be used to separate the parts of the route when as_string is called with a true value. Not meaningful unless as_string is true.

Return Value

Reference to an array of array references. The first element in the array will be a reference to an array of field names. Each succeeding element will be a reference to an array holding data for one record in the original taxonomy. The path data will be represented, by default, as an array reference built up from the component (name) column in the original taxonomy, but if as_string is selected, the path data in all non-header elements will be a separator-delimited string.

`write_pathified_to_csv()`

Purpose

Create a CSV-formatted file holding the data returned by pathify().
Arguments
```
    $csv_file = $obj->write_pathified_to_csv( {
       pathified => $pathified,                   # output of pathify()
       csvfile => './t/data/taxonomy_out5.csv',
    } );
```
Single hash reference. That hash is keyed on:
- pathified
  
  Required: Its value must be the arrayref of hash references returned by the pathify() method.
- csvfile
  
  Optional. Path to location where a CSV-formatted text file holding the taxonomy-by-adjacent-list will be written. Defaults to a file called taxonomy_out.csv in the current working directory.
- Text::CSV_XS options
  
  You can also pass through any key-value pairs normally accepted by Text::CSV_XS.
Return Value

Returns path to CSV-formatted text file just created.

Example

Suppose we have a CSV-formatted file holding the following taxonomy-by-adjacent-list:

    "id","parent_id","name","is_actionable"
    "1","","Alpha","0"
    "2","","Beta","0"
    "3","1","Epsilon","0"
    "4","3","Kappa","1"
    "5","1","Zeta","0"
    "6","5","Lambda","1"
    "7","5","Mu","0"
    "8","2","Eta","1"
    "9","2","Theta","1"

After running this file through new(), pathify() and write_pathified_to_csv() we will have a new CSV-formatted file holding this taxonomy-by-materialized-path:

    path,is_actionable
    |Alpha,0
    |Beta,0
    |Alpha|Epsilon,0
    |Alpha|Epsilon|Kappa,1
    |Alpha|Zeta,0
    |Alpha|Zeta|Lambda,1
    |Alpha|Zeta|Mu,0
    |Beta|Eta,1
    |Beta|Theta,1

Note that the id, parent_id and name columns have been replaced by the <path> column.

To install Parse::Taxonomy, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Parse::Taxonomy

CPAN shell

perl -MCPAN -e shell
install Parse::Taxonomy

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

METHODS

new()

fields()

data_records()

get_field_position()