Jason Lefler


Data::SeaBASS - Object-oriented interface for reading/writing SeaBASS files


version 0.173030


To read SeaBASS files:


    my $sb_file = Data::SeaBASS->new("input.txt");

    # Calculate the average chlorophyll value using next
    my $chl_total = 0;
    my $measurements = 0;
    while (my $row = $sb_file->next()){
        if (defined($row->{'chl'})){
            $chl_total += $row->{'chl'};
    if ($measurements){
        print $chl_total/$measurements;
    } else {
        print "No chl values.";
    while (my %row = $sb_file->next()){
        if (defined($row{'chl'})){
            $chl_total += $row{'chl'};

    # Calculate the average chlorophyll value using where
    my $chl_total2 = 0;
    my $measurements2 = 0;
    $sb_file->where(sub {
        if (defined($_->{'chl'})){
            $chl_total2 += $_->{'chl'};
    if ($measurements2){
        print $chl_total2/$measurements2;
    } else {
        print "No chl values.";

Or to modify SeaBASS files:


    my $sb_file = Data::SeaBASS->new("input.txt");

    # Add a one degree bias to water temperature
    while (my $row = $sb_file->next()){
        $row->{'wt'} += 1;
    $sb_file->write(); # to STDOUT

    # Remove the one degree bias to water temperature
    $sb_file->where(sub {
        $_->{'wt'} -= 1;

Or to start a SeaBASS file from scratch:


    my $sb_file = Data::SeaBASS->new({strict => 0, add_empty_headers => 1});
    $sb_file->append({'lat' => 1, 'lon' => 2});
    $sb_file->append("3,4"); # or if you're reading from a CSV file


Data::SeaBASS provides an easy to use, object-oriented interface for reading, writing, and modifying SeaBASS data files.

What is SeaBASS?

SeaWiFS Bio-optical Archive and Storage System housed at Goddard Space Flight Center. SeaBASS provides the permanent public repository for data collected under the auspices of the NASA Ocean Biology and Biogeochemistry Program. It also houses data collected by participants in the NASA Sensor Intercomparision and Merger for Biological and Oceanic Interdisciplinary Studies (SIMBIOS) Program. SeaBASS includes marine bio-optical, biogeochemical, and (some) atmospheric data.

SeaBASS File Format

SeaBASS files are plain ASCII files with a special header and a matrix of values.

The SeaBASS header block consists of many lines of header-keyword pairs. Some headers are optional but most, although technically not required for reading, are required to be ingested into the system. More detailed information is available in the SeaBASS wiki article. The only absolutely required header for this module to work is the /fields line. This module turns fields and units lowercase at all times.



The SeaBASS body is a matrix of data values, organized much like a spreadsheet. Each column is separated by the value presented in the /delimiter header. Likewise, missing values are indicated by the value presented in the /missing header. The /fields header identifies the geophysical parameter presented in each column.

    19920109 16:30:00 31.389 -64.702 3.4 20.7320 -999
    19920109 16:30:00 31.389 -64.702 19.1 20.7350 -999
    19920109 16:30:00 31.389 -64.702 38.3 20.7400 -999
    19920109 16:30:00 31.389 -64.702 59.6 20.7450 -999

Strictly Speaking

SeaBASS files are run through a program called FCHECK before they are submitted and before they are ingested into a NASA relational database management system. Some of the things it checks for are required headers and proper field names. All data must always have an associated depth, time, and location, though these fields may be placed in the header and are not always required in the data. Just because this module writes the files does not mean they will pass FCHECK.

Files are case-INsensitive. Headers are not allowed to have any whitespace.


This module does not export anything by default.


STRICT_READ is used with the strict option, enabling error messages when reading header lines and inserting header data.


STRICT_WRITE is used with the strict option, enabling error messages when writing the data to a file/stream.


STRICT_ALL is used with the strict option, enabling STRICT_READ and STRICT_WRITE.


INSERT_BEGINNING is used with insert or add_field to insert a data row or field at the beginning of their respective lists.


INSERT_END is used with insert or add_field to insert a data row or field at the end of their respective lists.


new([$filename,] [\%options])

    my $sb_file = Data::SeaBASS->new("input_file.txt");
    my $sb_file = Data::SeaBASS->new("input_file.txt", { delete_missing_headers => 1 });
    my $sb_file = Data::SeaBASS->new("output_file.txt", { add_empty_headers => 1 });
    my $sb_file = Data::SeaBASS->new({ add_empty_headers => 1 });

Creates a Data::SeaBASS object. If the file specified exists, the object can be used to read the file. If the file specified does not exist, an empty object is created and will be written to the specified file by default when invoking write().

Options should be given in a hash reference to ensure proper argument parsing. If a file is specified, options can be given as a hash list.

  • default_headers

  • headers

    These two options accept either an array reference or a hash reference. They are used to set or override header information. First, headers are read from default_headers, then from the data file itself, then are overridden by whatever is in headers.

    Arguments are an array reference of header lines, or a hash reference of header/value pairs.

        my $sb_file = Data::SeaBASS->new({
            default_headers => [
            headers => {
                'experiment' => 'real_experiment',

    Warning: Modifying the delimiter or missing value will likely break the object. Modifying these will change the expected format for all rows. Do so with caution.

  • preserve_case

    1 or 0, default 1. Setting this to 0 will change all values in the header to lowercase. Header descriptors (the /header part) are always turned to lowercase, as well as all fields and units.

  • keep_slashes

    1 or 0, default 0. Forces the object to keep the / in the beginning of headers when accessed. If set to 1, when using the headers function, they will be returned with leading slash.

  • cache

    1 or 0, default 1. Enables caching data rows as they are read. This speeds up re-reads and allows the data to be modified. This is required for writing files.

  • delete_missing_headers

    1 or 0, default 0. Any headers that are equal to the /missing header, NA, or are not defined (when using the headers/default_headers options) are deleted. They cannot be retrieved using headers and will not be written.

  • missing_data_to_undef

    1 or 0, default 1. If any values in the data block are equal to the /missing, /above_detection_limit, /below_detection_limit headers, they are set to undef when they are retrieved.

  • preserve_comments

    1 or 0, default 1. Setting this option to zero will discard any comments found in the header.

  • add_empty_headers

    0, 1, or a string. If set to a string, this will populate any missing headers, including optional ones, and will set their value to the string given. If set to 1, the string 'NA' is used. This option disables STRICT_WRITE.

  • strict

        my $sb_file = Data::SeaBASS->new("input_file.txt", {strict => STRICT_ALL});
        my $sb_file = Data::SeaBASS->new("input_file.txt", {strict => (STRICT_READ | STRICT_WRITE)});
        my $sb_file = Data::SeaBASS->new("input_file.txt", {strict => 0});
        my $sb_file = Data::SeaBASS->new("input_file.txt", {strict => STRICT_WRITE}); #default

      STRICT_READ will throw errors when reading invalid headers, missing required ones, or an invalid delimiter. This may change in future revisions.


      STRICT_WRITE will throw the same errors when writing the data to a file or stream. STRICT_WRITE only checks for required headers and invalid headers, but does not check their values to see if they are actually filled. This may change in future revisions.

  • fill_ancillary_data

    0 or 1, default 0. Insert date, time, measurement depth, station, and location values to the data rows from the headers. Values are not overridden if they are already present. This option is only useful when reading files.

    Note: It is bad practice to include these fields in the data if they don't change throughout the file. This option is used to remove the burden of checking whether they are in the data or header.

    Another odd behavior: This option will also combine individual date/time parts in the data (year/month/day/etc) to create more uniform date/time fields.

    Another odd behavior: If any part of a date/time is missing, the fields dependent on it will not be added to the row.

  • preserve_header

    0 or 1, default 0. Preserves header and comment order. This option disables modifying the header, as well, but will not error if you try -- it will simply not be reflected in the output.

  • preserve_detection_limits

    0 or 1, default 0. Disables setting values equal to below_detection_limit or above_detection_limit to null while reading files. This should only be used during read-only operation, as there is no telling missing data from data outside limits.

  • optional_warnings

    0 or 1, default 1. Determines whether or not to print warnings deemed optional. For the moment, the only defined warning is for optically shallow data.


add_headers(\%headers | \@header_lines | @header_lines)

    $sb_file->add_headers({'investigators' => 'jason_lefler'});

add_headers is used to add or override metadata for a Data::SeaBASS, as well as add comments.

This function can not be used to change fields/units, see add_field and remove_field for that.

Warning: Modifying the delimiter or missing value halfway through reading/writing will likely break the object. Modifying these will change the expected format for any new or non-cached rows. Do so with caution.

headers([ \%new_headers | \@get_headers | @get_headers ])


    my %headers = $sb_file->headers(['investigators']);
    print Dumper(\%headers); # { investigators => 'jason_lefler' }
    my ($inv) = $sb_file->headers('investigators');
    print $inv; # jason_lefler
    $sb_file->headers({investigators => 'jason_lefler'});
    $sb_file->headers()->{'investigators'} = 'jason_lefler';

headers is used to read or modify header values. Given an array reference of header names, it will return a hash/hash reference with header/value pairs. Given a plain list of header names, it will return an array/array reference of the given header values. Given a hash reference, this function is a proxy for add_headers.

If keep_slashes is set, then headers will be returned as such, IE: {'/investigators' => 'jason_lefler'}.

This function can also be used to set header values without going through the normal validation.

head and h are aliases to headers.






    my $row = $sb_file->data(1);
    my @rows = $sb_file->all();

data is responsible for returning either a data line via an index or all of the data lines at once.

Data is returned as field => value pairs.

If given an index: in list context, returns the hash of the row; in scalar context, returns a reference to the row.

If not given an index: in list context, returns an array of the rows; in scalar context, returns a reference to an array of the rows.

If given an index out of range, returns undef. If given a negative index, rewinds the file, then returns undef.

If cache is enabled and the row has already been read, it is retrieved from the cache. If it has not already be read, all rows leading up to the desired row will be read and cached, and the desired row returned.

If cache is disabled and either all rows are retrieved or a previously retrieved row is called again, the file will rewind, then seek to the desired row.

d, body, b, and all are all aliases to data. (Yes, that means all can be used with arguments, it would just look silly.)


    while (my $row = $sb_file->next()){
        print $row->{'lat'};
    while (my %row = $sb_file->next()){
        print $row{'lat'};

Returns the next data row in the file, returning undef when it runs out of rows.

Data is returned as field => value pairs.

In list context, returns a hash of the row. In scalar context, returns a reference to the hash of a row.

After a rewind, next will return the very first data hash, then each row in turn. If the row has been cached, it's retrieved from the cache instead of rereading from the file.


rewind seeks to the start of the data. The next next will return the very first row (or data(0)). If caching is enabled, it will not actually perform a seek, it will merely reset the index interator. If caching is disabled, a seek is performed on the file handle to return to the start of the data.

update(\%data_row | \@data_row | $data_row | %data_row)

    while (my %row = $sb_file->next()){
        if ($row{'depth'} == -999){
            $row{'depth'} = 0;
    # Less useful for update():
    print join(',',@{$sb_file->actual_fields()}); #lat,lon,depth,chl
    while (my %row = $sb_file->next()){
        if ($row{'depth'} == -999){
            $row{'depth'} = 0;
        # or

update replaces the last row read (using next()) with the input.

Caching must be enabled to use update, set, or insert.

set($index, \%data_row | \@data_row | $data_row | %data_row)

    my %row = (lat => 1, lon => 2, chl => 1);
    $sb_file->set(0, \%row);
    print join(',',@{$sb_file->actual_fields()}); #lat,lon,chl
    $sb_file->set(0, [1, 2, 1]);

set replaces the row at the given index with the input. Seeks to the given index if it has not been read to yet. croaks if the file does not go up to the index specified.

Caching must be enabled to use update, set, or insert.

insert($index, \%data_row | \@data_row | $data_row | %data_row)

    my %row = (lat => 1, lon => 2, chl => 1);
    $sb_file->insert(INSERT_BEGINNING, \%row);
    print join(',',@{$sb_file->actual_fields()}); #lat,lon,chl
    $sb_file->insert(1, [1, 2, 1]);
    $sb_file->insert(INSERT_END, [1, 2, 1]);

Inserts the row into the given position. INSERT_BEGINNING inserts a new row at the start of the data, INSERT_END inserts one at the end of the data block.

The index must be a positive integer, INSERT_BEGINNING, or INSERT_END.

If a row is inserted at the end, the entire data block is read from the file to cache every row, the row is appended to the end, and the current position is reset to the original position, so next() will still return the real next row from the data.

If a row is inserted before the current position, the current position is shifted accordingly and will still return the next() real row.

Caching must be enabled to use update, set, or insert.

prepend(\%data_row | \@data_row | $data_row | %data_row)

prepend is short for insert(INSERT_BEGINNING, ...).

append(\%data_row | \@data_row | $data_row | %data_row)

append is short for insert(INSERT_END, ...).


If index is specified, it deletes the desired index. If it is omitted, the last row read is deleted. The current position is modified accordingly.


    # Find all rows with depth greater than 10 meters
    my @ret = $sb_file->where(sub {
        if ($_->{'depth'} > 10){
            return $_;
        } else {
            return undef;

    # Delete all measurements with depth less than 10 meters
    $sb_file->where(sub {
        if ($_->{'depth'} < 10){
            $_ = undef;
    # Calculate the average chlorophyll value
    my $chl_total = 0;
    my $measurements = 0;
    $sb_file->where(sub {
        if (defined($_->{'chl'})){
            $chl_total += $_->{'chl'};
    if ($measurements){
        print $chl_total/$measurements;
    } else {
        print "No chl values.";

Traverses through each data line, running the given function on each row. $_ is set to the current row. If $_ is set to undefined, remove() is called. Any changes in $_ will be reflected in the data.

Any defined value returned is added to the return array. If nothing is returned, a 0 is added.

get_all($field_name [, ... ] [, \%options])

Returns an array/arrayref of all the values matching each given field name. This function errors out if no field names are passed in or a non-existent field is requested.

Available options are:

  • delete_missing

    If any of the fields are missing, the row will not be added to any of the return arrays. (Useful for plotting or statistics that don't work well with bad values.)

remove_field($field_name [, ... ])

Removes a field from the file. update_fields is called to remove the field from cached rows. Any new rows grabbed will have the removed fields omitted, as well. A warning is issued if the field does not exist.

add_field($field_name [, $unit [, $position]])

Adds a field to the file. update_fields is called to populate all cached rows. Any rows retrieved will have the new field set to undefined or /missing, depending on if the option missing_data_to_undef is set.

If the unit is not specified, it is set to unitless.

If the position is not specified, the field is added to the end.

find_fields($string | qr/match/ [, ... ])

Finds fields matching the string or regex given. If given a string, it must match a field exactly and entirely to be found. To find a substring, use qr/chl/. Fields are returned in the order that they will be output. This function takes into account fields that are added or removed. All fields are always lowercase, so all matches are case insensitive.

Given one argument, returns an array of the fields found. An empty array is returned if no fields match.

Given multiple arguments, returns an array/arrayref of arrays of fields found. IE: find_fields('lw','es') would return something like [['lw510','lw550'],['es510','es550']]. If no field is matched, the inner array will be empty. IE: [[],[]].


Adds comments to the output file, which are printed, in bulk, after /missing. Comments are trimmed before entry and !s are added, if required.


Returns a list of the comments at the given indices. If no indices are passed in, return them all.


Overwrites all of the comments in the file. For now, this is the proper way to remove comments. Comments are trimmed before entry and !s are added, if required.

write([$filename | $file_handle | \*GLOB])

Outputs the current header and data to the given handle or glob. If no arguments are given, and a non-existent filename was given to new, the contents are output into that. If an output file was not given, write outputs to STDOUT.

If STRICT_WRITE is enabled, the headers are checked for invalid headers and missing required headers and errors/warnings can be thrown accordingly.

The headers are output in a somewhat-arbitrary but consistent order. If add_empty_headers is enabled, placeholders are added for every header that does not exist. A comment section is also added if one is not present.


If a file handle is opened for reading, this function closes it. This is automatically called when the object is destroyed. This is useful to replace the file being read with the current changes.

make_data_hash($line [,\@field_list])

    my %row = $sb_file->make_data_hash("1.5,2,2.5");
    my %row = $sb_file->make_data_hash("1.5,2,2.5", [qw(lat lon sal)]);
    my %row = $sb_file->make_data_hash("1.5,2,2.5", [$sb_file->fields()]);
    my %row = $sb_file->make_data_hash("1.5,2,2.5", [$sb_file->actual_fields()]);

For mostly internal use. This function parses a data line. It first splits the data via the delimiter, assigns a field to each value, and returns a hash or hash reference.

If @field_list is not set, $sb_file->fields() is used.

If a delimiter is not set (a blank file was created, a file without a /delimiter header is read, etc), the delimiter is guessed and set using guess_delim.

croaks if the delimiter could not be guessed or the number of fields the line is split into does not match up with the field list.


    print $sb_file->missing();
    print $sb_file->dataidx();
    print $sb_file->actual_fields();

Returns a few internal variables. The accessor is read only, but some variables can be returned as a reference, and can be modified afterwards. Though, do it knowing this is a terrible idea.

If the variable retrieved is an array or hash reference and this is called in a list context, the variable is dereferenced first.

Here are a few "useful" variables:

  • dataidx

    The current row index.

  • max_dataidx

    The highest row index read so far.

  • fields

    An array of the original fields.

  • actual_fields

    An array of the current fields, as modified by add_field or remove_field.

  • delim

    The regex used to split data lines.

  • missing

    The null/fill/missing value of the SeaBASS file.

  • delim

    The current line delimiter regex.



For internal use only. This function is in charge of checking the options to make sure they are of the right type (array/hash reference where appropriate).

If add_empty_headers is set, this function turns off STRICT_WRITE.

Called by the object, accepts no arguments.


For internal use only. create_blank_file populates the object with proper internal variables, as well as adding blank headers if add_empty_headers is set.

By default, the missing value is set to $DEFAULT_MISSING (-999).

This function turns on the cache option, as cache must be enabled to write.

The delimiter is left undefined and will be guessed upon reading the first data line using the guess_delim function.

Called by the object, accepts no arguments.


For internal use only. read_headers reads the metadata at the beginning of a SeaBASS file.

Called by the object, accepts no arguments.

validate_header($header, $value, $strict)

    my ($k, $v, $string) = ('investigators','jason_lefler',0)
    $sb_file->validate_header($k, $v, $strict);

For internal use only. validate_header is in charge of properly formatting key/value pairs to add to the object. This function will modify the input variables in place to prepare them for use.

Returns false if there was a problem with the inputs, such as strict is set and an invalid header was passed in.

validate_header will set /missing to $DEFAULT_MISSING (-999) if it is blank or undefined.

This function will also change the expected delimiter for rows that have not yet been cached.

set_delim($strict, $delim)

Takes a string declaring the delim (IE: 'comma', 'space', etc) and updates the object's internal delimiter regex.


update_fields runs through the currently cached rows and calls add_and_remove_fields on each row. It then updates the /fields and /units headers in the header hash.


Given a reference to a row, this function deletes any fields removed with remove_field and adds an undefined or /missing value for each field added via add_field. If missing_data_to_undef is set, an undefined value is given, otherwise, it is filled with the /missing value.

If fill_ancillary_data is set, this function adds missing date, time, date_time, lat, lon, and depth fields to the retrieved row from the header.

Needlessly returns the hash reference passed in.


guess_delim is is used to guess the delimiter of a line. It is not very intelligent. If it sees any commas, it will assume the delimiter is a comma. Then, it checks for tabs, spaces, then semi-colons. Returns 1 on success. If it doesn't find any, it will throw a warning and return undef.

ingest_row(\%data_row | \@data_row | $data_row | %data_row)

For mostly internal use, parses arguments for set, update, and insert and returns a hash or hash reference of the data row. Given a hash reference, it will merely return it.

Given an array or array reference, it will assume each element is a field as listed in either actual_fields or fields. If the number of elements matches actual_fields, it uses assumes it's that. If it doesn't match, it is tried to match against fields. If it doesn't match either, a warning is issued and the return is undefined.

Given a non-reference scalar, it will split the scalar based on the current delimiter. If one is not defined, it is guessed. If it cannot be guessed, the return is undefined.

If the inputs are successfully parsed, all keys are turned lowercase.


Used by fill_ancillary_data to traverse through a field's possible substitutes in %ANCILLARY and try to find the most suitable replacement. Values of fields in %ANCILLARY are array references, where each element is either:

  • a string of existing field names used to create the value

  • an array reference of the form [converter function, parsing regex (optional), arguments to converter, ... ]

  • a hash reference of the form { header => qr/parsing_regex/ }

If the element is an array reference and an argument requires a field from the file, all arguments are parsed and the variables within them extrapolated, then the array is put into $self->{'ancillary'}.

If no value can be ascertained, it will not be added to the data rows.

The value found is stored in $self->{'ancillary'}. Returns 1 on success, 0 if the field cannot be filled in.

extrapolate_variables($missing, $expression, \%row)

Used by add_and_remove_fields to convert a parsed ancillary string, such as '$year$month$day', into a real value using the fields from the \%row. $expressions are strings figured out by find_ancillaries and stored in $self->{'ancillary'}.

The return is undefined if a value cannot be created (IE: a required field is missing).

extrapolate_function($missing, $expression, \%row)

If the value stored in $self->{'ancillary'} is an array reference, this function uses the array to create an actual value. See find_ancillaries for an explanation of the array.



    my @space_filled_lines = (' line1 ', ' line2', 'line3 ', 'line4');
    print @space_filled_lines; #line1line2line3line4

Runs through the list and removes leading and trailing whitespace. All changes are made in place.

It is literally this:

    sub strip {
        s/^\s+|\s+$//g for @_;


Converts a date in the day of year format YYYYJJJ into YYYYMMDD. Returns the newly formatted string or undefined if the input does not match the required format.

This uses the Add_Delta_Days function from Date::Calc to do the heavy lifting.


Duplicate Fields

This class will not allow a field to be added to the object if a field of the same name already exists. If a file being read has duplicate field names, only the last one is used. No warning is issued. If remove_field is used to remove it, only the first instance will be deleted. To delete all instances, use $sb_file->remove_field($sb_file->find_fields('chl')). This may change in future releases.

Changing Delimiter or Missing Value

Modifying the delimiter header on a file that is being read will cause any non-cached rows to be split by the new delimiter, which should break most/all files. If the delimiter must be changed, call all() to cache all the rows, then change it. This will obviously not work if caching is turned off. The same is true for setting the missing value, but only really applies when the missing_data_to_undef option is used (same goes to below detection limit).

Below Detection Limit

Below detection limit is only partially supported. If missing_data_to_undef is used, fields equal to /below_detection_limit will be set to undef, as well. Files modified while using missing_data_to_undef will have all data equal to /below_detection_limit written out set to the missing value instead of the below detection limit value. If the below detection limit value is equal to the missing value or missing_data_to_undef is used, the /below_detection_limit header will not be written.


Jason Lefler, <jason.lefler at nasa.gov>


Please report any bugs or feature requests to bug-seabass-file at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=SeaBASS-File. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.


You can find documentation for this module with the perldoc command.

    perldoc Data::SeaBASS

You can also look for information at:


Copyright 2014 Jason Lefler.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.