NAME

Treex::PML::IO - I/O support functions used by Treex::PML

DESCRIPTION

This module implements various I/O and filesystem related functions used by Treex::PML.

The current implementation supports the following protocols for reading:

  http, https, ftp, gopher, news - reading (POSIX and Windows)

  ssh, fish, sftp - reading/writing on POSIX systems via secure shell copy
                    or the kioclient from KDE.

The module attempts to handle GNU Zip-compressed files (suffix .gz) transparently.

FUNCTIONS

DirPart($path)

Returns directory part of a given path (including volume).

CallerDir($rel_path?)

If called without an argument, returns the directory of the perl module or macro-file that invoked this macro.

If a relative path is given as an argument, a respective absolute path is computed based on the caller's directory and returned.

register_input_protocol_handler($scheme,$callback)

Register a callback to fetch URIs of a given protocol. $scheme is the URI scheme of the protocol (i.e. the first part of an URI preceding the comma, e.g. 'ftp' or 'https'). <$callback> is either a CODE reference or an ARRAY reference whose first element is a CODE reference and the other elements are additional arguments to be passed to the callback prior to the standard arguments.

When the library attempts to fetch a resource from an URI matching given scheme, the callback is invoked with the (optional) user parameters followed by the URI.

The callback function must either return a new URI (typically a file:// URI pointing to a temporary file) and a boolean flag indicating whether the library should attempt to delete the returned file after it finished reading.

If the callback returns the same or another URI with the $scheme, the callback is not reinvoked, but passed on to further processing (i.e. by Treex::PML I/O backends).

unregister_input_protocol_handler($scheme)

Unregister a handler for a given URI scheme.

get_input_protocol_handler($scheme)

Returns the user-defined handler registered for a given URI scheme; if none, undef is returned.

set_encoding($filehandle, $encoding)

Safely resets Perl I/O-layer on a given filehandle to decode or encode from/to a given encoding. This is equivalent to:

   binmode($filehandle,":raw:perlio:encoding($encoding)");

except that errors are turned into warnings.

get_protocol($filename_or_URI)

If the argument is a filename, returns 'file'; if the argument is an URI, returns the URI's scheme. Note: unless the argument is an URI object, a heuristic is used to determine the scheme. To avoid reporting Windows drive names as URI schemes, only URI schemes consisting of at least two characters are supported, i.e. C:foo is considered a file name wheres CC:foo would be an URI with the scheme 'CC'.

quote_filename($string)

Returns given string in shell-quotes with special characters (\, $, ") escaped.

get_filename($URI_or_filename)

Upgrades given string to an URI and if the resulting URI is in the 'file' scheme (e.g. file:///bar/baz), returns the file-name portion of the URI (e.g. /bar/baz). Otherwise returns nothing.

make_abs_URI($URL_or_filename)

Upgrades a given string (URL or filename) into an URI object with absolute path (relative URIs are resolved using the current working directory obtained via Cwd::getcwd())

make_URI($URL_or_filename)

Upgrades a given string (URL or filename) into an URI object.

make_relative_URI($URL,$baseURI)

Returns a relative URI based in a given base URI. The arguments are automatically upgraded using make_URI() if necessary.

strip_protocol($URI)

Returns the scheme-specific part of the URI (everything between the scheme and the fragment). If the scheme of the URI was 'file', returns the URI as a file name.

is_same_filename($URI_1,$URI_2)

Checks if $URI_1 and $URI_2 point to the same resource. For filenames and URIs in the 'file' scheme checks that the referred files (if exist) are the same using is_same_file(); for other schemes simply checks for string equality on canonical versions of the URIs (see URI->canonical).

is_same_file($filename_1,$filename_2)

Uses device and i-node numbers (reported by stat()) to check if the two filenames point to the same file on the filesystem. Returns 1 if yes, 0 otherwise.

open_pipe($filename,$mode,$command)

Returns a filehandle of a newly open pipe in a given mode.

In write mode ($mode = 'w'), opens a writing pipe to a given command redirecting the standard output of the command to a given file. Moreover, if the last suffix of the $filename is '.gz' or '.gz~', the output of the command is gzipped before saving to $filename.

In read mode ($mode = 'r'), opens a reading pipe to a given command redirecting the content of the given file to the standard input of the command. Moreover, if the last suffix of the $filename is '.gz' or '.gz~', the output of the command is un-gzipped before it is passed to the command.

open_file($filename,$mode)

Opens a given file for reading ($mode = 'r') or writing ($mode = 'w'). If the last suffix of the filename is '.gz' or '.gz~', the data are transparently un-gzipped (when reading) or gzipped (when writing).

fetch_file($uri)

Fetches a resource from a given URI and returns a path to a local file with the content of the resource and a boolean unlink flag. If the unlink flag is true, the caller is responsible for removing the local file when finished using it. Otherwise, the caller should not remove the file (usually when it points to the original resource). The caller may assume that the resource is already un-gzipped if the URI had the '.gz' or '.gz~' suffix.

get_store_fh ($uri, $command?)

If $command is provided, returns a writable filehandle for a pipe to a given command whose output is redirected to an uploader to the given $URI (for file $URIs this simply redirects the output of the command to the given file (gzipping the data first if the $URI ends with the '.gz' or '.gz~' suffix).

If $command is not given, simly retuns a writable file handle to a given file (possibly performing gzip if the file name ends with the '.gz' or '.gz~' suffix).

Delete the resource point to by a given URI (if supported by the corresponding protocol handler).

rename_uri($URI_1,$URI_2)

Rename the resource point to by $URI_1 to $URI_2 (if supported by the corresponding protocol handlers). The URIs must point to the same physical storage.

open_backend (filename,mode,encoding?)

Open given file for reading or writing (depending on mode which may be one of "r" or "w"); Return the corresponding object based on File::Handle class. Only files the filename of which ends with '.gz' are considered to be gz-commpressed. All other files are opened using IO::File.

Optionally, in perl ver. >= 5.8, you may also specify file character encoding.

close_backend (filehandle)

Close given filehandle opened by previous call to open_backend

open_uri (URI,encoding?)

Open given URL for reading, returning an object based on File::Handle class. Since for some types of URLs this function first copies the data into a temporary file, use close_uri($fh) on the resulting filehandle to close it and clean up the temporary file.

Optionally, in perl ver. >= 5.8, you may also specify file character encoding.

close_uri (filehandle)

Close given filehandle opened by previous call to open_uri.

copy_uri ($URI_1,$URI_2)

Copy the resource pointed to by the URI $URI_1 to $URI_2. The type of $URI_2 must be writable.

COPYRIGHT AND LICENSE

Copyright (C) 2006-2010 by Petr Pajas

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.