The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Geo::BUFR - Perl extension for handling of WMO BUFR files.

SYNOPSIS

  # A simple program to print decoded contents of a BUFR file. Note
  # that a more sophisticated program (bufrread.pl) is included in the
  # package

  use Geo::BUFR;

  Geo::BUFR->set_tableformat('BUFRDC'); # ECCODES is also possible
  Geo::BUFR->set_tablepath('path to BUFR tables');

  my $bufr = Geo::BUFR->new();

  $bufr->fopen('name of BUFR file');

  while (not $bufr->eof()) {
      my ($data, $descriptors) = $bufr->next_observation();
      print $bufr->dumpsections($data, $descriptors) if $data;
  }

  $bufr->fclose();

DESCRIPTION

BUFR = Binary Universal Form for the Representation of meteorological data. BUFR is approved by WMO (World Meteorological Organization) as the standard universal exchange format for meteorological observations, gradually replacing a lot of older alphanumeric data formats.

This module provides methods for decoding and encoding BUFR messages, and for displaying information in BUFR B and D tables and in BUFR flag and code tables.

Installing this module also installs some programs: bufrread.pl, bufrresolve.pl, bufrextract.pl, bufrencode.pl, bufr_reencode.pl and bufralter.pl. See https://wiki.met.no/bufr.pm/start for examples of use. For the majority of potential users of Geo::BUFR I would expect these programs to be all that you will need Geo::BUFR for.

BUFR tables are not included in this module and must be installed separately, see "BUFR TABLE FILES".

Note that being Perl, this module cannot compete in speed with for example the (free) ECMWF BUFRDC Fortran library. Still, some effort has been invested in making the module reasonable fast in that the core routines for encoding and decoding bitstreams are implemented in C.

METHODS

The get_ methods will return undef if the requested information is not available. The set_ methods as well as fopen, fclose, copy_from and rewind will always return 1, or croak if failing.

Create a new object:

  $bufr = Geo::BUFR->new();
  $bufr = Geo::BUFR->new($BUFRmessages);

The second form of new is useful if you want to provide the BUFR messages to decode directly as an input buffer (string). Note that merely calling new($BUFRmessages) will not decode anything in the BUFR messages, for that you need to call next_observation() from the newly created object. You also have the option of providing the BUFR messages in a file, using the no argument form of new() and then calling fopen.

Associate the object with a file for reading of BUFR messages:

  $bufr->fopen($filename);

Close the associated file that was opened by fopen:

  $bufr->fclose();

Check for end-of-file (or end of the input buffer provided as argument to new):

  $bufr->eof();

Returns true if end-of-file (or end of input buffer) is reached, false if not.

Ensure that next call to next_observation will decode first subset in first BUFR message:

  $bufr->rewind();

Copy from an existing object:

  $bufr1->copy_from($bufr2,$what);

If $what is 'all' or not provided, will copy everything in $bufr2 into $bufr1, i.e. making a clone. If $what is 'metadata', only the metadata in section 0, 1 and 3 will be copied (and all of section 2 if present).

Load B and D tables:

  $bufr->load_BDtables($table);

$table is optional, and should for BUFRDC be (base)name of a file containing a BUFR table B or D, using the ECMWF BUFRDC naming convention, i.e. [BD]'table_version'.TXT. For ECCODES, use last part of path, e.g. on UNIX-like systems '0/wmo/18' for master tables and '0/local/8/78/236' for local tables, or both if that is needed, e.g. '0/wmo/18,0/local/8/78/236'. If no argument is provided, load_BDtables() will use BUFR section 1 information in the $bufr object to decide which tables to load (which for ECCODES might be up to 4 table files, both local and master tables). Previously loaded tables are kept in memory, and load_BDtables will return immediately if the tables already have been loaded. Will die (croak) if tables cannot be found, but (in the no argument version) not if these are local tables (Local table version number > 0) and the corresponding master tables exist (Local table version number = 0), which then will be loaded instead. Returns table version for the tables loaded (see get_table_version).

Load C table:

  $bufr->load_Ctable($table,$default_table);

Both $table and $default_table are optional. This will load the flag and code tables (if not already loaded), which in ECMWF BUFRDC are put in tables C'table_version'.TXT (not to be confused with WMO BUFR table C, which contains the operator descriptors). $default_table will be used if $table is not found. For $table and $default_table in ECCODES, use (just like for load_BDtables) last part of path, e.g. on UNIX-like systems '0/wmo/18' for master tables and '0/local/8/78/236' for local tables, or both if that is needed, e.g. '0/wmo/18,0/local/8/78/236'. Will for ECCODES then load all tables in the codetables subdirectory. If no arguments are provided, load_Ctable() will use BUFR section 1 information in the $bufr object to decide which table(s) to load. Will die (croak) if table cannot be found, but not if this is a local table and the corresponding master table exists, which then will be loaded instead. Returns table version for the table loaded.

Get next observation (next subset in current BUFR message or first subset in next message):

  ($data, $descriptors) = $bufr->next_observation();

where $descriptors is a reference to the array of fully expanded descriptors for this subset, $data is a reference to the corresponding values. This method is meant to be used to iterate through all BUFR messages in the file or input buffer (see new) associated with the $bufr object, see example program in "SYNOPSIS". Whenever a new BUFR message is reached, section 0-3 will also be decoded, the contents of which is then available through the access methods listed below. This is the main BUFR decoding routine in Geo::BUFR, and will call load_BDtables() internally (unless decoding of section 4 has been turned off by use of set_nodata or set_filter_db), but not load_Ctable. Consult "DECODING/ENCODING" if you want more precise info about what is returned in $data and $descriptors.

next_observation will return the empty list (so both $data and $descriptors will be undef) in the following cases: if there are no more BUFR messages in file/input buffer (so next call to eof() will return false), if no decoding of section 4 was requested in set_nodata, if filtering was turned on in set_filter_db and the BUFR message met the filter criteria in the user defined callback function, or if the BUFR message contained 0 subsets. If you need to distinguish the first case from the rest, one way would be to check get_current_subset_number() which will return 0 only in this first case.

If an error is met during decoding, it is possible to trap the error in an eval and then continue calling next_observation (as demonstrated in source code of bufrread.pl). Care has been taken that BUFR messages with incorrectly stated BUFR length should not cause later proper BUFR messages to be skipped. But the possibility of an erroneous last BUFR message in file led to abandonment of the convenient feature retained until Geo::BUFR version 1:25 of eof always returning false if there were no more BUFR messages in file/input buffer. Instead you should expect last call to next_observation to return false (empty list).

Filter BUFR messages:

  $bufr->set_filter_cb(\&callback,@args);

Here user is responsible for defining the callback subroutine. This subroutine will then be called in next_observation (with arguments @args if provided) right after section 3 is decoded, and, if returning true, will cause next_observation to return immediately, without even trying to decode section 4 (the data section). Here is a simple example of such a callback (without arguments), filtering on AHL and Data category (table A) of the BUFR message.

  sub callback {
      my $obj = shift;
      return 1 if $obj->get_data_category != 0;
      my $ahl = $obj->get_current_ahl() || '';
      return ($ahl =~ /^IS.... (ENMI|TEST)/);
  }

Check result of filtering:

  $bufr->is_filtered();

Will return true (1) if next_observation returned immediately as described for set_filter_cb above. But calling is_filtered should rarely be needed, as in most cases the simple check 'next if !$data' after calling next_observation would be the natural way to proceed.

Print the contents of a subset in BUFR message:

  print $bufr->dumpsections($data,$descriptors,$options);

$options is optional. If this is first subset in message, will start by printing message number and, if this is first message in a GTS bulletin, AHL (Abbreviated Header Line), as well as contents of sections 0, 1 and 3. For section 4, will also print subset number. $options should be an anonymous hash with possible keys 'width' and 'bitmap', e.g. { width => 20, bitmap => 0 }. 'bitmap' controls which of dumpsection4 and dumpsection4_with_bitmaps will be called internally by dumpsections. Default value for 'bitmap' is 1, causing dumpsection4_with_bitmaps to be called. 'width' controls the value of $width used by the dumpsection4... methods, default is 15. If you intend to provide the output from dumpsections as input to reencode_message, be sure to set 'bitmap' to 0, and 'width' not smaller than the largest data width in bytes among the descriptors with unit CCITTIA5 occuring in the message.

Normally dumpsections is called after next_observation, with same arguments $data,$descriptors as returned from this call. From the examples given at https://wiki.met.no/bufr.pm/start#bufrreadpl you can get an impression of what the output might look like. If dumpsections does not give you exactly what you want, you might prefer to instead call the individual dumpsection methods below.

Print the contents of sections 0-3 in BUFR message:

  print $bufr->dumpsection0();
  print $bufr->dumpsection1();
  print $bufr->dumpsection2($sec2_code_ref);
  print $bufr->dumpsection3();

dumpsection2 returns an empty string if there is no optional section in the message. The argument should be a reference to a subroutine which takes the optional section as (a string) argument and returns the text you want displayed after the 'Length of section:' line. For general BUFR messages probably the best you can do is displaying a hex dump, in which case

  sub {return '    Hex dump:' . ' 'x26 . unpack('H*',substr(shift,4))}

might be a suitable choice for $sec2_code_ref. For most applications there should be no real need to call dumpsection2.

Print the data of a subset (descriptor, value, name and unit):

  print $bufr->dumpsection4($data,$descriptors,$width);
  print $bufr->dumpsection4_with_bitmaps($data,$descriptors,$width);

$width fixes the number of characters used for displaying the data values, and is optional (defaults to 15). $data and $descriptors are references to arrays of data values and BUFR descriptors respectively, likely to have been fetched from next_observation. Code and flag values will be resolved if a C table has been loaded, i.e. if load_Ctable has been called earlier on. dumpsection4_with_bitmaps will display the bit-mapped values side by side with the corresponding data values. If there is no bit-map in the BUFR message, dumpsection4_with_bitmaps will provide same output as dumpsection4. See "DECODING/ENCODING" for some more information about what is printed, and https://wiki.met.no/bufr.pm/start#bufrreadpl for real life examples of output.

Set verbose level:

  Geo::BUFR->set_verbose($level); # 0 <= $level <= 6
  $bufr->set_verbose($level);

Some info about what is going on in Geo::BUFR will be printed to STDOUT if $level > 0. With $level set to 1, all that is printed is the B, C and D tables used (with full path). Each line of verbose output starts with 'BUFR.pm: ', except for the level 6 specific output. Setting verbose level > 1 might be helpful when debugging, or for example if you want to extract as much information as possible from an incorrectly formatted BUFR message.

No decoding of section 4 (data section):

  Geo::BUFR->set_nodata($n);
 - $n=1 (or not provided): Skip decoding of section 4 (might speed up
   processing considerably if only metadata in section 1-3 is sought for)
 - $n=0: Decode section 4 (default in Geo::BUFR)

No decoding of quality information:

  Geo::BUFR->set_noqc($n);
 - $n=1 (or not provided): Don't decode quality information (more
   specifically: skip all descriptors after 222000)
 - $n=0: Decode quality information (default in Geo::BUFR)

Enable/disable strict checking of BUFR format for recoverable errors (like using BUFR compression for one subset message etc):

  Geo::BUFR->set_strict_checking($n);
 - $n=0: disable checking (default in Geo::BUFR)
 - $n=1: warn (carp) if error but continue decoding
 - $n=2: die (croak) if error

Confer "STRICT CHECKING" for details of what is being checked if strict checking is enabled.

Show all BUFR table C operators (data description operators, F=2) as well as all replication descriptors (F=1) when calling dumpsection4:

  Geo::BUFR->set_show_all_operators($n);
 - $n=1 (or not provided): Show replication descriptors and all operators
 - $n=0: Show no replication descriptors and only the really informative
         data description operators (default in Geo::BUFR)

set_show_all_operators(1) cannot be combined with dumpsections with bitmap option set (which is the default).

Set or get tableformat:

  Geo::BUFR->set_tableformat($tableformat);
  $tableformat = Geo::BUFR->get_tableformat();

Set or get tablepath:

  Geo::BUFR->set_tablepath($tablepath);
  $tablepath = Geo::BUFR->get_tablepath();

Get table version:

  $table_version = $bufr->get_table_version($table);

$table is optional. Return table version from $table if provided, or else from section 1 information in the currently processed BUFR message. For BUFRDC, this is a stripped down version of table name. If for example $table = 'B0000000000088013001.TXT', will return '0000000000088013001'. For ECCODES, this is last path of table location (e.g. '0/wmo/29'), and a stringified list of two such paths (master and local) if local tables are used (e.g. '0/wmo/29,0/local/8/78/236'). Returns undef if impossible to determine table version.

Get number of subsets:

  $nsubsets = $bufr->get_number_of_subsets();

Get current subset number:

  $subset_no = $bufr->get_current_subset_number();

If decoding of section 4 has been skipped (due to use of set_nodata or set_filter_cb), will return number of subsets. For a BUFR message with 0 subsets, will actually return 1 (a bit weird perhaps, but then this is a really weird kind of BUFR message to handle).

Get current message number:

  $message_no = $bufr->get_current_message_number();

Get current BUFR message:

    $binary_msg = get_bufr_message();

This returns the original raw (binary, not the decoded) BUFR message. An empty string will be returned if no BUFR message is found, or if the currently processed BUFR message is erroneous (even if section 4 is not decoded, there will at least be a check for finding '7777' at expected end of BUFR message, as calculated from length of BUFR message decoded from section 0).

Get Abbreviated Header Line (AHL) before current message:

  $ahl = $bufr->get_current_ahl();

Get GTS starting line before current message:

  $ahl = $bufr->get_current_gts_starting_line();

Get GTS end of message after current message:

  $ahl = $bufr->get_current_gts_eom();

Currently supporting the notation of the International Alphabet No. 5, i.e. \001\r\r\n<csn>\r\r\n for GTS starting line with 3 or 5 digits for <csn> (channel sequence number), and \r\r\n\003 for GTS end of message. But ZCZC/NNNN notation (International Telegraph Alphabet No. 2) might be provided in a future version of Geo::BUFR if requested.

Note that the definition of GTS starting line and AHL used in Geo::BUFR differs slightly from that of the Manual on the GTS. In the Manual the Abbreviated heading actually starts with \r\r\n, which in Geo::BUFR for convenience is considered part of the GTS starting line, since this provides for nicer output when displaying AHLs.

Check length of BUFR message (as stated in section 0):

    $bufr->bad_bufrlength();

Will return true (1) if no '7777' is found at the end of BUFR message (as calculated from the stated length of BUFR message in section 0), which usually means that the BUFR message is badly corrupted (e.g. truncated). But note that there should be no need to call bad_bufrlength if section 4 is decoded, as in this case you should expect next_observation to die with a more precise error message describing the kind of corruption found. If no decoding of section 4 is done (because set_nodata or set_filter_cb were called), however, next_observation is likely not to throw an error, and you can use bad_bufrlength to decide what to do next (see source code of bufrextract.pl for example of use).

Accessor methods for section 0-3:

  $bufr->set_<variable>($variable);
  $variable = $bufr->get_<variable>();

where <variable> is one of

  bufr_length (get only)
  bufr_edition
  master_table
  subcentre
  centre
  update_sequence_number
  optional_section (0 or 1)
  data_category
  int_data_subcategory
  loc_data_subcategory
  data_subcategory
  master_table_version
  local_table_version
  year_of_century
  year
  month
  day
  hour
  minute
  second
  local_use
  number_of_subsets
  observed_data (0 or 1)
  compressed_data (0 or 1)
  descriptors_unexpanded

set_year_of_century(0) will set year of century to 100. get_year_of_century will for BUFR edition 4 calculate year of century from year in section 1.

Encode a new BUFR message:

  $new_message = $bufr->encode_message($data_refs,$desc_refs);

where $desc_refs->[$i] is a reference to the array of fully expanded descriptors for subset number $i ($i=1 for first subset), $data_refs->[$i] is a reference to the corresponding values, using undef for missing values. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method. See "DECODING/ENCODING" for meaning of 'fully expanded descriptors'.

Encode a (single subset) NIL message:

  $new_message = $bufr->encode_nil_message($stationid_ref,$delayed_repl_ref);

$delayed_repl_ref is optional. In section 4 all values will be set to missing except delayed replication factors and the (descriptor, value) pairs in the hashref $stationid_ref. $delayed_repl_ref (if provided) should be a reference to an array of data values for all descriptors 031001 and 031002 occuring in the message (these values must all be nonzero), e.g. [3,1,2] if there are 3 such descriptors which should have values 3, 1 and 2, in that succession. If $delayed_repl_ref is omitted, all delayed replication factors will be set to 1. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method (although number of subsets and BUFR compression will automatically be set to 1 and 0 respectively, whatever value they had before).

Reencode BUFR message(s):

  $new_messages = $bufr->reencode_message($decoded_messages,$width);

$width is optional. Takes a text $decoded_messages as argument and returns a (binary) string of BUFR messages which, when printed to file and then processed by bufrread.pl with no output modifying options set (except possibly --width), would give output equal to $decoded_messages. If bufrread.pl is to be called with --width $width, this $width must be provided to reencode_message also.

Join subsets from several messages:

 ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr_1,$subset_ref_1,
     ... $bufr_n,$subset_ref_n);

where each $subset_ref_i is optional. Will return the data and descriptors needed by encode_message to encode a multi subset message, extracting the subsets from the first message of each $bufr_i object. All subsets in (first message of) $bufr_i will be used, unless next argument is an array reference $subset_ref_i, in which case only the subset numbers listed will be included, in the order specified. On return $nsub will contain the total number of subsets thus extracted. After a call to join_subsets, the metadata (of the first message) in each object will be available through the get_-methods, while a call to next_observation will start extracting the first subset in the first message. Here is an example of use, fetching first subset from bufr object 1, all subsets from bufr object 2, and subsets 4 and 2 from bufr object 3, then building up a new multi subset BUFR message (which will succeed only if the bufr objects all have the same descriptors in section 3):

  my ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr1,
      [1],$bufr2,$bufr3,[4,2]);
  my $new_bufr = Geo::BUFR->new();
  # Get metadata from one of the objects, then reset those metadata
  # which might not be correct for the new message
  $new_bufr->copy_from($bufr1,'metadata');
  $new_bufr->set_number_of_subsets($nsub);
  $new_bufr->set_update_sequence_number(0);
  $new_bufr->set_compressed_data(0);
  my $new_message = $new_bufr->encode_message($data_refs,$desc_refs);

Extract BUFR table B information for an element descriptor:

  ($name,$unit,$scale,$refval,$width) = $bufr->element_descriptor($desc);

Will fetch name, unit, scale, reference value and data width in bits for element descriptor $desc in the last table B loaded in the $bufr object. Returns false if the descriptor is not found.

Extract BUFR table D information for a sequence descriptor:

  @descriptors = $bufr->sequence_descriptor($desc);
  $string = $bufr->sequence_descriptor($desc);

Will return the descriptors in a direct (nonrecursive) lookup for the sequence descriptor $desc in the last table D loaded in the $bufr object. In scalar context the descriptors will be returned as a space separated string. Returns false if the descriptor is not found.

Resolve BUFR table descriptors (for printing):

  print $bufr->resolve_descriptor($how,@descriptors);

where $how is one of 'fully', 'partially', 'simply' and 'noexpand'. Returns a text string suitable for printing information about the BUFR table descriptors given. $how = 'fully': Expand all D descriptors fully into B descriptors, with name, unit, scale, reference value and width (each on a numbered line, except for replication operators which are not numbered). $how = 'partially': Like 'fully', but expand D descriptors only once and ignore replication. $how = 'noexpand': Like 'partially', but do not expand D descriptors at all. $how = 'simply': Like 'partially', but list the descriptors on one single line with no extra information provided. The relevant B/D table must have been loaded before calling resolve_descriptor.

Resolve flag table value (for printing):

  print $bufr->resolve_flagvalue($value,$flag_table,$B_table,
                                 $default_B_table,$num_leading_spaces);

Last 2 arguments are optional. $default_B_table will be used if $B_table is not found, $num_leading_spaces defaults to 0. Examples:

  print $bufr->resolve_flagvalue(4,8006,'B0000000000098013001.TXT') # BUFRDC
  print $bufr->resolve_flagvalue(4,8006,'0/wmo/13')       # ECCODES, master table
  print $bufr->resolve_flagvalue(4,8193,'0/local/1/98/0') # ECCODES, local table

Print the contents of BUFR code (or flag) table:

  print $bufr->dump_codetable($code_table,$table,$default_table);

where in BUFRDC $table is (base)name of the C...TXT file containing the code tables, optionally followed by a default table which will be used if $table is not found.

resolve_flagvalue and dump_codetable will return empty string if flag value or code table is not found.

Manipulate binary data (these are implemented in C for speed and primarily intended as module internal subroutines):

  $value = Geo::BUFR->bitstream2dec($bitstream,$bitpos,$num_bits);

Extracts $num_bits bits from $bitstream, starting at bit $bitpos. The extracted bits are interpreted as a nonnegative integer. Returns undef if all bits extracted are 1 bits.

  $ascii = Geo::BUFR->bitstream2ascii($bitstream,$bitpos,$num_bytes);

Extracts $num_bytes bytes from bitstream, starting at $bitpos, and interprets the extracted bytes as an ascii string. Returns undef if the extracted bytes are all 1 bits.

  Geo::BUFR->dec2bitstream($value,$bitstream,$bitpos,$bitlen);

Encodes nonnegative integer value $value in $bitlen bits in $bitstream, starting at bit $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $value. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

  Geo::BUFR->ascii2bitstream($ascii,$bitstream,$bitpos,$width);

Encodes ASCII string $ascii in $width bytes in $bitstream, starting at $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $ascii. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

  Geo::BUFR->null2bitstream($bitstream,$bitpos,$num_bits);

Sets $num_bits bits in bitstream starting at bit $bitpos to 0 bits. Last byte affected will be padded with 1 bits. $bitstream must be at least $bitpos + $num_bits bits long. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

DECODING/ENCODING

The term 'fully expanded descriptors' used in the description of encode_message (and next_observation) in "METHODS" might need some clarification. The short version is that the list of descriptors should be exactly those which will be written out by running dumpsection4 (or bufrread.pl without any modifying options set) on the encoded message. If you don't have a similar BUFR message at hand to use as an example when wanting to encode a new message, you might need a more specific prescription. Which is that for every data value which occurs in the section 4 bitstream, you should include the corresponding BUFR descriptor, using the artificial 999999 for associated fields following the 204Y operator, and including the data operator descriptors 22[2345]000 and 23[2567]000 with data value set to the empty string, if these occurs among the descriptors in section 3 (rather: in the expansion of these, use bufrresolve.pl to check!). Element descriptors defining new reference values (following the 203Y operator) will have F=0 (first digit in descriptor) replaced with F=9 in next_observation, while in encode_message both F=0 and F=9 will be accepted for new reference values. When encoding delayed repetition you should repeat the set of data (and descriptors) to be repeated the number of times indicated by 031011 or 031012 (if given the feedback that this is considered cumbersome, an option for including the set of data/descriptors just once might be added later, both for encoding end decoding).

Some words about the procedure used for decoding and encoding data in section 4 might shed some light on this choice of design.

When decoding section 4 for a subset, first of all the BUFR descriptors provided in section 3 are expanded as far as possible without looking at the actual bitstream, i.e. by eliminating nondelayed replication descriptors (F=1) and by using BUFR table D to expand sequence descriptors (F=3). Then, for each of the thus expanded descriptors, the data value is fetched from the bitstream according to the prescriptions in BUFR table B, applying the data operator descriptors (F=2) from BUFR table C as they are encountered, and reexpanding the remaining descriptors every time a delayed replication factor is fetched from bitstream. The resulting set of data values is returned in an array @data, with the corresponding B (and sometimes also some C) BUFR table descriptors in an array @descriptors. next_observation returns references to these two arrays. For convenience, some of the data operator descriptors without a corresponding data value (like 222000) are included in the @descriptors because they are considered to provide valuable information to the user, with corresponding value in @data set to the empty string. These descriptors without a value are written by the dumpsection4 methods on unnumbered lines, thereby distinguishing them from descriptors corresponding to 'real' data values in section 4, which are numbered consecutively.

Encoding a subset is done in a very similar way, by expanding the descriptors in section 3 as described above, but instead fetching the data values from the @data array that the user supplies (actually @{$data_refs->{$i}} where $i is subset number), and then finally encoding this value to bitstream.

The input parameter $desc_ref to encode_message is in fact not strictly necessary to be able to encode a new BUFR message. But there is a good reason for requiring it. During encoding the descriptors from expanding section 3 will consecutively be compared with the descriptors in the user supplied $desc_ref, and if these at some point differ, encoding will be aborted with an error message stating the first descriptor which deviated from the expected one. By requiring $desc_ref as input, the risk for encoding an erroneous section 4 is thus greatly reduced, and also provides the user with highly valuable debugging information if encoding fails.

When decoding character data (unit CCITTIA5), any null characters found are silently (unless $Strict_checking is set) removed, as well as leading and trailing white space.

BUFR TABLE FILES

The BUFR table files should follow the format and naming conventions used by one of these two ECMWF software packages: either BUFRDC (download from https://confluence.ecmwf.int/display/BUFR/Releases), or ecCodes (download from https://confluence.ecmwf.int/display/ECC/Releases).

The utility programs in Geo::BUFR will look for table files by default in the standard installation directories, which in Unix-like systems will be /usr/local/lib/bufrtables for BUFRDC and /usr/local/share/eccodes/definitions/bufr/tables for ecCodes. You can change that behaviour by either providing the environment variable BUFR_TABLES, or setting path explicitly by using the --tablepath. Note that while BUFR_TABLES is a well known concept in BUFRDC software, the closest you get in ecCodes is probably ECCODES_DEFINITION_PATH (see e.g. https://confluence.ecmwf.int/display/ECC/BUFR%3A+Local+configuration), for which BUFR_TABLES should (or could) be set to ECCODES_DEFINITION_PATH/bufr/tables (again in Unix-like systems).

STRICT CHECKING

The package global $Strict_checking defaults to

  0: Ignore recoverable errors in BUFR format met during decoding or encoding

but can be changed to

  1: Issue warning (carp) but continue decoding/encoding

  2: Croak (die) instead of carp

by calling set_strict_checking. The following is checked for when $Strict_checking is set to 1 or 2:

  • Total length of BUFR message as stated in section 0 bigger than actual length

  • Excessive bytes in section 4 (section longer than computed from section 3)

  • Compression set in section 3 for one subset message (BUFR reg. 94.6.3.2)

  • Bits 3-8 in octet 7 in section 3 not set to zero

  • Local reference value for compressed character data not having all bits set to zero (94.6.3.2.i)

  • Illegal flag values (rightmost bit set for non-missing values) (Note (9) to Table B in FM 94 BUFR)

  • Character data not being CCITTIA5 (Note (9) in FM 94 BUFR first page)

  • Null characters in CCITTIA5 data (Note (4) to Table B in FM 94 BUFR)

  • Missing CCITTIA5 value encoded as spaces

  • Invalid date and/or time in section 1

  • Cancellation operators (20[1-4]00, 203255 etc) when there is nothing to cancel

  • 0 subsets in message. This may not break any formal rules, but is likely to cause problems in further data processing (and Geo::BUFR will not allow you to encode or reencode such a message anyway).

  • Leaving out descriptors to be repeated when corresponding delayed replication/repetition factor in section 4 is 0 and this is last data item. E.g. ending 'Data descriptors unexpanded' in section 3 with '106000 031001' when data value for 031001 is 0. This (mal)practice, however, defies the very point of replication operations (BUFR reg. 94.5.4). Presumably the purpose is to save some space in the BUFR message, but then why not leave out also '106000 031001' and the (0) data value for 031001?

  • Value encoded using BUFR compression which would be too big to encode without compression. For example, for a data descriptor with data width 9 bits a value of 510 ought to be the biggest value possible to encode, but in a multisubset message using BUFR compression it is possible to encode almost arbitrarily large values in single subsets as long as the average over all subsets is contained within 9 bits. This is not breaking any formal rules, but almost certainly not desirable.

Plus some few more checks not considered interesting enough to be mentioned here.

BUGS OR MISSING FEATURES

Some BUFR table C operators are not implemented or are untested, mainly because I do not have access to BUFR messages containing such operators. If you happen to come over a BUFR message which the current module fails to decode properly, I would therefore highly appreciate if you could mail me this.

AUTHOR

Pål Sannes <pal.sannes@met.no>

CREDITS

I am very grateful to Alvin Brattli, who (while employed as a researcher at the Norwegian Meteorological Institute) wrote the first version of this module, with the sole purpose of being able to decode some very specific BUFR satellite data, but still provided the main framework upon which this module is built.

SEE ALSO

Guide to WMO Table Driven Code Forms: FM 94 BUFR and FM 95 CREX; Layer 3: Detailed Description of the Code Forms (for programmers of encoder/decoder software)

https://wiki.met.no/bufr.pm/start

COPYRIGHT

Copyright (C) 2010-2023 MET Norway

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.