The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

File::AptFetch - perl interface onto APT-Methods.

SYNOPSIS

# TODO:

DESCRIPTION

Shortly:

  • Methods are usual executables. Hence F:AF forks.

  • There's no command-line interface for methods. The IPC is two pipes (STDIN and STDOUT from method's POV).

  • Each portion of communication (named message) consists of numerical code with explaining text and a sequence of colon (':') separated lines. A message is terminated with empty line.

  • File::AptFetch::Cookbook has more.

(disclaimer) Right now, F::AF is in "proof-of-concept" state. It surely works with local methods (file and copy); I hope it will work within trivial cases with remote methods. (F::AF has no means to accept (not talking about to pass along) authentication credentials; So if your upstream needs authentication, F::AF is of no help here.) And one more warning: you're supposed to do all the dirty work of managing -- F::AF is only for comunication. Hopefully, there will be someday a kind of super-module what would simplify all this.

(warning) You should understand one potential tension with F::AF: wget(1), curl(1), various FTP clients, or whatever else that constitutes fetcher are (I hope so) thoroughly tested against monkey-wrench on the other side of connection. APT methods are not. APT talks to repositories; those repositories are mostly mirrors. Administrators of mirrors and mirror-net roots have at least a basic clue. Pending discovery of APT methods behaviour when they face idiots on the other side of connection.

There's a list of known bugs, caveats, and deficiencies.

  • At two points F::AF reads and writes pipes. SIGALRM and SIGPIPE are of concern (SIGCHLD support just talks about that signal; the signal by itself is ignored). However, that's possible, that eval would be broken with some other signal. Hopefully, some day I'll find some other way to support such situation. Right now -- F::AF will die.

  • That seems that upon normal operation there're no zombies left. However, I'm not sure if waitpid would work as expected. (What if some method would take lots of time to die after being signaled?)

  • SIGCHLD is ignored by default. SIGPIPE is not. It's supported only while interacting with a child. If method decides to die some time outside those IPC sections, then your process will get SIGCHLD and possible SIGPIPE. (To be honest, may be I'm overpessimistic here (if process goes away it becomes a zombie; if it didn't closed its input (your output), then should stay; than there's no place for SIGPIPE). Should verify.)

  • Methods are supposed (or not?) to write extra diagnostic at its STDERR. It stays the same as of your process. However, I still haven't seen any output. So, (first) I (and you) have nothing to worry about and (second) I have nothing to work with. That's possible that issue will stay as caveat.

  • @$log is fragile. Don't touch it. However, there's a possibility of @$log corruption, like this. If method goes insane and outputs unparsable messages, then "gain()" will give up immedately leaving @$log unempty. In that case you're supposed to recreate F::AF object (or give up). If you don't then strange things can happen (mostly -- give-ups again). So, please, do.

  • @$diag grows. In next release there will be some means to maintain that. Right now, clean @$diag yourself, if that becomes an issue.

  • You're supposed to maintain a balance of requests and fetches. If you try "gain()" when there's no unfinished requests, then method will timeout. There's nothing to worry about actually except hanging for 120sec.

(note) Documentation of this library must maintain 4 namespaces:

Function/method parameter list (@_)

Within a section they always refer to parameter names and keys (if @_ is hash) mentioned in nearest synopsis.

Explicit values in descriptive codes

They always refer to some value in nearest code. $method, $? etc means that there would be some value that has some relation with named something. POD markup in descriptions means exactly that.

Keys of File::AptFetch blessed object

Whatever missing in nearest synopsis fits here. Each key has explicit content dereference attached. So @$log means that key named log has value of ARRAY reference, %$message has value of HASH reference, and $status has value of plain scalar (it's not reference to SCALAR, or it would be $$status).

Keys of File::AptFetch::ConfigData configuration module

Within each section upon introducing they are explicitly mentioned as such. The above explanation about explicit dereference applies here too.

(note) Message headers are refered as keys of some fake global %$message. So Filename becomes $message{filename}, and Last-Modified -- $message{last_modified}. I hope it's clear from context is that header down- or up-stream.

(note) Through out this POD "log item" means one line in @$log; "log entry" means sequence of log items including terminating empty item.

(note) Through out this POD "120sec timeout" means: "$timeout in File::AptFetch::ConfigData being left as set in stock distribution, overriden while pre-build configuring, or set at run-time".

IMPORTANT NOTE ON PERL-5.10.0

It's neither bug nor caveat. And it's out of my hands, really. perl-5.10.0 exits application code differently if compared with perl-5.10.1 (unbelievable?). My understanding is that v5.10.0 closes handles first, then DESTROYs. Sometimes that filehandle closing happens in right order. But most probably application is killed with $SIG{CHLD}. END{} doesn't help --- that filehandle masacre happens before those blocks are run. I believe, whatever tinkering with the global $SIG{CHLD} is a bad idea. And terminating every method just after transfers have finished is same stupid. Thus, if you run perl-5.10.0 (probably any earlier too) destroy the File::AptFetch object explicitly before exiting app, if you care about to be not $SIG{CHLD}ed.

METHODS

init()
    ref( my $fetch = File::AptFetch->init( $method )) or die $fetch;

That's an initialization stuff. APT-Methods are userspace executables, you know, hence it forks. If fork fails, then it dies. If all preparations succeede, then it returns File::AptFetch blessed object; Otherwise a string describing issue is returned. Any diagnostic from forked instance and, later, execed $method goes through STDERR. (And see "_cache_configuration()".)

An idea behind this ridiculous construct is that someday, in some future, there will be a lots of concurency. Hence it's impossible to maintain one package-wide store for fail description. All methods of File::AptFetch return descriptive strings in case of errors. init() just follows them.

$method is saved in same named key for reuse.

($method): (lib_method): neither preset nor found

$lib_method (in File::AptFetch::ConfigData) points to a directory where APT-Methods reside. Without that knowledge File::AptFetch has nothing to do. It's either picked from configuration (build-time) or from apt-config output (run-time) (in that order). It wasn't found in either place -- fairly strange APT you have.

($method) is unspecified

$method is required argument, so, please, provide.

($method): ($?): died without handshake

Start-up configuration is essential. If $method disconnects early, than that makes a problem. The exit code (no postprocessing at all) is provided in braces.

($method): timeouted without handshake

$method failed to configure within time frame provided. (v.0.0.8) "_read()" has more about timeouts.

($method): ($Status): that's supposed to be (100 Capabilities)

As described in "APT Method Interface", Section 2.2, $method starts with '100 Capabilities' Status Code. $method didn't. Thus that's not an APT-Method. File::AptFetch has given up.

Yet refer to "_parse_status_code()", "_parse_message()", and "_cache_configuration()" -- those can emit their own give-up codes (they are passed up immediately by init() without postprocessing).

DESTROY()
    undef $fetch;
    # or leave the scope

Cleanups. A method is killed and waitpided, pipes are explicitly closed. I anything goes wrong then carps, for obvious reasons. waitpid is unconditional and isn't timeout protected.

The actual signal sent for $pid is configured with $signal in File::AptFetch::ConfigData. However one can override (upon build time) or explicitly set it to any desired name or number (upon runtime). Refer to File::AptFetch::ConfigData for details.

request()
    my $rc = $fetch->request(
      $target0 => $source,
      $target1 => { uri => $source } );
    $rc and die $rc;

(bug) In that section abbreviation "URI" actually refers to "scheme-specific-part". Beware.

That files requests for transfer. Each request is a pair of $target and either of

$source

Simple scalar; It MUST NOT provide schema -- pure filename (either local or remote); It MUST provide all (and no more than) needed leading slashes though (double slash for remotes).

$source is preprocessed -- $method (with obvious colon) is prepended. (That seems, APT's method become very nervous if being requested mismatching method's name schema.) (bug) That requirement will be slightly relaxed in next release.

%$source HASH ref

Such keys are known

$uri

The same requirements as for $source apply.

There're other keys yet that must be supported. Right now I unaware of any (pending real-life testing).

(v0.1.5) If request list is empty then silently succeeds without doing anything.

Actual request is filed at once (subject to buffering though), in one big (or not so) chunk (as requested by API). @$diag field is updated accordingly.

Diagnostic provided:

($method): ($filename): URI is undefined

Either $source or $source{uri} was evaluated to FALSE. (What request is supposed to be?)

(caveat) While undef and empty string are invalid URIs, is 0 a valid URI? No, URI is supposed to have at least one leading slash.

request() pretends to be atomic, the request would happen only in case @_ has been parsed successfully.

gain()
    $rc = $fetch->gain;
    $rc and die $rc;

That gains something. 'Something' means it's unknown what kind of message APT's method would return. It can be 'URI Start', 'URI Done', or 'URI Failure' messages. Anyway, message is stored in @$diag and %$message fields of object; $Status and $status are set too.

Diagnostic provided:

($method): ($CHLD_error): died

Something gone wrong, the APT's method has died; More diagnostic might gone onto STDERR.

($method): timeouted without responce

The APT's method has quit without properly terminating message with empty line or failed to output anything at all. Supposedly, shouldn't happen.

($method): timeouted

The APT's method has sat silently all the time. The possible cause would be you've run out of requests (than the method has nothing to do at all (they don't tick after all)).

"_parse_status_code()" and "_parse_message()" can emit their own messages.

set_callback()
    File::AptFetch::set_callback %callbacks;

Sets (whatever known) callbacks. Semantics and procedures are documented where apropriate. Keys of %callbacks are tags (subject to hash handling by perl, don't mess); key must be among known (or else). Values are CODEs (or else); whatever previous value was would be vanished. Known tags are:

read

"_read()" has more.

Diagnostics provided:

(%s): candiate to pass in isn't CODE

Tag %s (may be unknown) tries to set something for callback. That must be CODE. It's not.

(%s): unknown callback

Tag %s is unknown. Nothing to with it but croak.

_parse_status_code()
    $rc = $self->_parse_status_code;
    return $rc if $rc;

Internal. Picks one item from @$log and attempts to process it as a Status Code. Consequent items are unaffected.

($method): ($log_item): that's not a Status Code

The $log_item must be qr/^\d{3}\s+.+/. No luck this time.

Sets apropriate fields ($Status with the Status Code, $status with the informational string), then backups the processed item.

_parse_message()
    $rc = $self->_parse_message;
    return $rc if $rc;

Internal. Processes the log entry. Atomically sets either %$capabilities (if $Status is 100) or %$message (any other). Each key is lowercased. (v0.1.4) Since "_read()" has been rewritten there could be multiple messages in @$log; those are preserved for next turn.

(v0.1.2) Each hyphen (-) is replaced with an underscore (_). For convinience reasons (compare 'last-modified' => $time with last_modified => $time.) (bug) What if a method yelds Foo-Bar: and Foo_Bar: headers? (RFC2822 headers are anything but space and colon after all.) Right now, _parse_message() will fail if a message header gets reset. But those headers are different and should be handled appropriately. They aren't.

($method): ($log_item): message must be terminated by empty line

APT method API dictates that messages must be terminated by empty line. This one is not. Shouldn't happen.

($method): ($log_item): that resets header ($header)

The leading message header ($header) has been seen before. That's a panic. The offending and all consequent items are left on @$log. Shouldn't happen.

($method): ($log_item): that's not a Message

The $log_item must be qr/^[0-9a-z-]+:(?>\s+).+/i. It's not. No luck this time. The offending and all consequent items are left on @$log.

The $log_items are backed up and removed from @$log.

(bug) If the last item isn't an empty line, then undef will be pushed. Beware and prevent before going for parsing.

_cache_configuration()
    $rc = $self->_cache_configuration;
    return $rc if $rc;

Internal. forks. dies if fork fails. forked child execs an array set in @$config_source (from File::AptCache::ConfigData). If $ConfigData{lib_method} is unset, then parses prepared cache for Dir::Bin::methods item and (if finds) sets $lib_method. It doesn't complain if $lib_method happens to be left unset. If cache is set it returns without any activity.

@$config_source is subject to the build-time configuration. It's preset with qw[ /usr/bin/apt-config dump ] (YMMV, refer to F::AF::CD to be sure). @$config_source must provide reasonable output -- that's the only requirement (look below for details).

(bug) While @$config_source is configurable all diagnostic messages refer to 'apt-config'.

@$config_source's output is postprocessed -- configuration names and values are stored as equal ('=') separated pairs in scalars and pushed into intermediate array. If everything finishes OK, then the package-wide cache is set. That cache is lexical (that's possible, I would find a reason to make some kind of iterator some time later; such iterator is missing right now).

(v0.1.2) Parsing cycle has suffered total rewrite. First line is split on space into $name and $value (or else). Then comes validation (it woulnd't be needed if @{$ConfigData{config_source}} would be hardcoded, it's not): * $name must consist of alphanumerics, underscores, pluses, minuses, dots, colons, and slashes (qr[\w/:.+-]) (or else); * (that's an heuristic) colons come in pairs (or else); * $value must be double-quote (") enclosed, with no double-quote inside allowed (or else); * there must be terminating semicolon (;) (or else). Then comes cooking (all cooking is found by observation, it mimics APT-talk with methods): * trailing double pair-of-colons in $name is trimmed to single pair; * every space in $value is percent escaped (%20); * every equal sign in $value is percent escaped (3d).

That last one, needs some explanation. apt.conf(5) clearly states: "Values must not include backslashes or extra quotation marks".

    apt-config dump | grep \\\\

disagrees on backslashes (if you're upgraded enough). So does F::AF: backslashes are passed through. After some experiments double-quote handling looks, roughly, like this: * double-quotes must come in pairs; * those double-quotes are dropped from $value withouth any visible effects (double-quotes, not enclosed content; it stays intact; whatever content, empty string is content too); * if there's any odd double-quote that fails parsing. F::AF doesn't need to do anything about it -- @{$ConfigData{config_source}} is supposed to handle those itself.

(bug) What should be investigated: * what if double-quote is explicitly percent-escaped in apt.conf? * how percents in $value are handled? Pending.

Diagnostic provided:

($method): ($line): that's unparsable

Validation (described above) has failed.

($method): [close] (apt-config) failed: $!

After processing input a pipe is closed. That close failed with $!.

($method): (apt-config): timeouted

While processing a fair 120sec timeout is given (it's reset after each $line). @$config_source hanged for that time.

($method): (apt-config) died: ($?)

@$config_source has exited uncleanly. More diagnostic is supposed to be on STDERR.

($method): (apt-config): failed to output anything

@$config_source has exited cleanly, but failed to provide any output to parse at all.

_uncache_configuration()
    File::AptFetch::_uncache_configuration;
    # or
    $self->_uncache_configuration;
    # or
    $fetch->_uncache_configuration;

Internal. That cleans APT's configuration cache. That doesn't trigger recacheing. That cacheing would happen whenever that cache would be required again (subject to the natural control flow).

(caveat) _cache_configuration sets $lib_method (in File::AptFetch::ConfigData) (if it happens to be undefined). &_uncache_configuration untouches it.

_read()
    $fetch->_read;
    $fetch->{ALRM_error} and
      die "internal error: requesting read while there shouldn't be any";
    $fetch->{CHLD_error} and
      die "external error: method has gone nuts and AWOLed";

Internal. Refactored. That attempts to read the log entry. Each item is chomped and then pushed onto @$log. If item happens to be empty line then finishes. The @$log isn't filled atomically, so check if the last line was empty.

That provides no diagnostic. However

child timeouts

If child timeouts, then $ALRM_error is set (to TRUE, otherwise meaningles). Then finishes.

(v0.0.8) And more about what timeout is. It was believed, that methods pulse their progress. That belief was in vain. Thus for now:

  • The timeout is configurable through $ConfigData{timeout} (120sec, by stock configuration; no defaults.) The timeout is cached in each instance of File::AptFetch object.

  • (v0.1.6) Target filenames are cached in the F::AF object. For each target there's a HASH. In the HASH a key filename is set to target filename value.

  • (v0.1.4) Timeout (the big one $timeout) is made in supposedly small $ConfigData{tick}s (5sec, by stock configuration; no defaults.) The small timeout is made with 4-arg select.

  • (v0.1.6) If there's no input from method then routing is made as follows:

    +

    Each target's cached HASH is passed to read callback ("set_callback()" has more).

    +

    If any callback returns TRUE then resets timeout counter and goes for next $tick long select (IOW, file transfer (whatever that means) is in progress).

    +

    If every callbacks return FALSE then advances to timeout and goes for next $tick long select.

    +

    (not implemented) If any callback returns undef then fails entirely.

    +
child exits

The child is waitpided, then $CHLD_error is set, then finishes.

unknown error

(v.0.1.4) It used to be read-with-alarm-in-eval. It's not anymore, thus any signal(7) will kill a process. Then it dies.

_read_callback()

(v0.1.6) Internal. It's a default read callback ("_read()" has more). It was supposed to be simple. In vain.

The primary objective is avoiding false negatives at all cost. Here comes list of avoided false negatives:

  • Somewhere on lenny/squeeze time-span APT methods have changed behaviour. In past they opened target for writing instantly. Now they create a temporal and upon finishing rename it to target. For obvious reasons methods do not communicate neither progress nor filename of temporal. If naming or handling of unfinished transfers would ever change there will be breakage.

  • Then. When transfer is finished *physically* it's not reported just yet (temporal has been renamed). A method calculates hashes. For obvious reasons methods do not coummunicate progress either. Naive approach would be to check size and then just wait forever. That's possible size isn't known beforehand. So _read_callback() increases number of ticks before signaling timeout. That increase is function of tick length ($ConfigData{tick}), current file size, and supposed IO speed. The IO speed is hardcoded to be 15MB/sec. So if media is realy slow (like a diskette or something) there's a possibility of breakage. However, those nitty-gritty manipulations won't result ever in timeout decrease.

For now it's not clear if _read_callback() ought to provide some diagnostics. Right now it doesn't.

SEE ALSO

File::AptFetch::Cookbook, "APT Method Itnerface" in libapt-pkg-doc package, apt-config(1), apt.conf(5)

AUTHOR

Eric Pozharski, <whynot@cpan.org>

COPYRIGHT & LICENSE

Copyright 2009, 2010, 2014 by Eric Pozharski

This library is free in sense: AS-IS, NO-WARANRTY, HOPE-TO-BE-USEFUL. This library is released under GNU LGPLv3.