The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Hadoop::IO::RCFile::Reader

VERSION

version 0.003

SYNOPSIS

   use Hadoop::IO::RCFile::Reader;

   my $table_reader = Hadoop::IO::RCFile::Reader->new(directory => '/user/hive/warehouse/sabbir.db/', webhdfs_client => $webhdfs_client);
   while($table_reader->next()) {
        my $current_row = $table_reader->current_row();
   }

DESCRIPTION

This module decodes a RCFILE based hive table and reads rows from the table. It reads directly from HDFS file, so no partition information available, only the data of the file will be read. User need to take care of partition informations.

The documentation about the file format can be found here: https://hive.apache.org/javadocs/r2.1.1/api/org/apache/hadoop/hive/ql/io/RCFile.html

NAME

Hadoop::IO::RCFile::Reader - Read the RCFILE based hive table from HDFS through the WebHDFS API

METHODS

new

The constructor. Accepts parameters in key => value format.

directory

Name of the directory/file;

webhdfs_client

A Net::Hadoop::WebHDFS client.

next

Move the current row pointer to next row, must be called before reading any row. First call will make the first row as current row. Returns true if it can move the pointer to next row, false if no more rows available to read.

current_row

Returns the current row as a reference of list of columns from left to right.

AUTHORS

  • Philippe Bruhat

  • Sabbir Ahmed

  • Somesh Malviya

  • Vikentiy Fesunov

COPYRIGHT AND LICENSE

This software is copyright (c) 2023 by Booking.com.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.