The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::AptFetch::Cookbook - Tips and Gotchas about APT Methods

DISCLAIMER

My understanding how APT methods work (and interact) is mostly experimental. I've thoroughly read "APT Method Interface" (not big reading though). And when you would read it, please note unsurprizing number on the very first page -- 1998. At least something.

Then I've read strace(1) output -- very interesting reading. And I made some conversations with methods themselves. That's all.

I admit, I didn't dig the actual code. (Once I'd got into apt-get(1) -- it's C++; is it possible to read C++ without debugger?) I promise to do it later (Read-The-Code seems to be Debian's mantra, though that's the only authoritative place).

So, if in some next section I've said "I can't comment", than that means that I didn't tortured methods yet. I will.

INTRODUCTION

Briefly. APT method is an executable that has no command-line interface at all. All interaction is done through STDIN and STDOUT. STDERR is for side-effect messages (pending to see any output yet).

Each message is a sequence of newline ("\n") terminated lines; And is terminated by empty line (lone newline works for sure; I can't comment what would happen if "empty" line is set of spaces).

Each message starts with message header. Message header is a 3-digit number (Status Code) and informational string (for "visual debugging"; does it mean it's ignored? what if it's empty?). File::AptFetch stores the Status Code in $status field and informational string in $Status field. $status is read-only.

Then header fields come. Header field is kind of RFC822 header: colon (':') separates header name off header value. (I can't comment on if header value wrapping is supported. And what about any extra space?) File::AptFetch splits message, and stores it in obvious hash. It's either %$capabilities ($status is 100) or in %$message (in any other case). %$capabilities is filled once and then stays unnoted (that can change in future). %$message is overwritten each time new message comes (it's not refilled).

The list of message headers and header fields is in "APT Method Interface" manual. The list header fields is incomplete; that rises a question: is the list of message headers complete?

MESSAGE HEADERS

Those are message headers I have something to say about (additions pending). There's an asymmethry -- they are either from an applicaton to a method (downstream) or from a method to an application (upstream).

100 Capabilities

(upstream) That's the "Hello, world!" of the method. That shows something about the method invoked. It doesn't show what method is invoked (you're supposed to know what you're doing).

102 Status

(upstream) Probably -- networking only. Hints application on progress. That progress, however, is in logical steps but bytes transfered.

200 URI Start

(upstream) That informs application that the method started to process a request. That's possible for method to skip this message.

201 URI Done

(upstream) That marks a request completion. I suppose that after this message a method forgets about just completed request ("Quotation Needed (tm)"). (v0.1.2) Sometimes method leaves remains (or fossils, if you like) of served request what should have failed but succeeded; KEE of file method covers some details.

400 URI Failure

(upstream) A named request can't be fulfilled. Than goes for next or waits for request.

600 URI Acquire

(downstream) That requests file. One file -- one request. Application isn't required to wait 201 URI Done code before fileing next 600 -- quite otherwise. It's supposed that 600s will gone in at once.

601 Configuration

(downstream) That's the "Hello, pal!" of the application. There's a field $message{send_config} (supposed to be in use with 100 Capabilities); although 601 is sent each time method is started.

101 Log
401 General Failure
402 Authorization Required
403 Media Failure
602 Authorization Credentials
603 Media Changed

Those are message headers I didn't meet yet. That's the problem with test-suite. To fix this I need to do some work -- it's undone yet.

As you can see, there's no message what would require status. It's up to the method to report any progress. (v.0.0.8) Alas, they don't. Thus, that's up to application to count any progress. Me wonders if they timeout at all. However, there're signs that those of networking type properly detect if underlying TCP/IP connection has been lost.

HEADER FIELDS

The same comment as for "MESSAGE HEADERS" apply. The subtle difference is this list mentions header fileds missing in the manual.

Header fields are spelled first-capital (regular words) and all-capital (abbreviations). I can't comment what would happen otherwise.

Config-Item

(downstream) That's used with 601 Configuration message. The format of value is set to

    APT::Configuration::Item=value

That seems there should be no space inside (neiter around equal sign ('=') nor in $value). I can't comment what kind of dragons hide behind this. OTOH, if spaces are escaped, then should equal sign be escaped too? The set of items consists almost of "APT configuration space"; however there are (at least one: quiet) undocumented items too.

Filename

(up/down-stream) The 2nd of two most used fields. Designates the target for request. It's local FS path and doesn't need and bear scheme. Some methods seem to ignore it completely. File::AptFetch sets it anyway.

Last-Modified

(upstream) That's a time stamp of $message{uri} mentioned file. I can't comment what would be returned if the time can't be retrieved before fetching. The time is in RFC1123 format. (I'm puzzled. RFC1123 really touches in section v5.2.14 ("RFC-822 Date and Time Specification: RFC-822 Section 5") date-n-time spec. It covers Y2K and timezone issues. It doesn't set the actual format. So, I believe, that the format is in RFC822 format with RFC1123 comments applied. I don't know why it's this way.) Meanwhile, File::AptFetch exports returned value (via $message{last_modified}), while provides no means for time checks. (However, refering to symlink maintanance issue -- is mtime checks duty of method or application?)

MD5-Hash

(upstream) The obvious MD5sum of already fetched file.

MD5Sum-Hash

(upstream) (undocumented) Obvious. I can't comment why it duplicates MD5-Hash.

Message

(upstream) Some piece of diagnostic. Comes with error messages (400 URI Failure etc).

Send-Config

(upstream) Comes with 100 Capabilities. But the config is sent anyway. Probably the remains of old ages. Or does it mean that such field can come asynchronously? Wait, what F::AF should do if the field comes in with false?

SHA1-Hash
SHA256-Hash

(upstream) (undocumented) Obvious.

Single-Instance

(upstream) Comes with 100 Capabilities, requires the method to be set up once. I can't comment what would happen if that requirement is violated. It turns out, only file and copy methods show that config field.

Size

(upstream) That's obvious -- the size of the source. I can't comment what whould be returned if the size can't be retrieved before fetching. (v0.1.8) Looks like this field appears first in 200 URI Start and hasn't been seen in 102 Statuses; Probably, it will be present in 201 URI Done.

URI

(up/down-stream) The 1st of two most used fields. Designates the source for request. This field's value must start with scheme; OTOH, this scheme must exactly match the method name. Otherwise the method denies such URI. In this release, File::AptFetch prepends requested URI with scheme unconditionally. That will be relaxed in the next release, meanwhile please strip.

Version

(upstream) Is the value always 1.0?

Index-File

That's missing in documentation.

Drive
Fail
IMS-Hit
Local
Media
Needs-Cleanup
Password
Pipeline
Resume-Point
Site
User

Those are fields I didn't meet yet.

PROTOCOL

The conversation between application ("APP") and method ("MTD") is like that:

APP <- "100 Capabilities" <- MTD
APP -> "601 Configuration" -> MTD

Those are the very first messages. Both are required. I don't think that 601 could be sent before 100 is received (that 100 states the method is up and running). So does File::AptFetch.

APP -> "600 URI Acquire" -> MTD

Requests should be filed as quick as possible (in sense --- with least pause). I suppose, that application shouldn't wait for responces. So here is no cycle actually. File requests when you need something, and then check for completion.

APP <- "102 Status" <- MTD

Doesn't show in locals (can't say about cdrom: though). Marks progress in protocol handshake. $message{message} has more and other fields might appear as well (probably, dependent on uplink capabilities). Doesn't appear after 200 URI Start. I can't say if this one ever happens after 201 URI Done or 400 URI Failure.

APP <- "200 URI Start" <- MTD

That seems to be purely informational. (v0.0.8) In fact that's not. If connection is ruined somehow (monkey-wrench on ISP?) then the method restarts transfer and manifests this this way. Once, I've seen it five times in row. I can't comment if method would give up ever.

APP <- "201 URI Done" <- MTD
APP <- "400 URI Failure" <- MTD

Only one of them should be sent (that's obvious). Either marks completion of request -- successful or not.

METHOD SPECIFIC NOTES

# FIXME: verification needed

One common behaviour should be mentioned. Methods are either local or remote (that's obvious). Local methods require scheme-specific-part in $message{uri} to be absoulte and it must start with lone slash ('/'). Remote methods require double-slash ('//'). Either would complain otherwise.

And one more, that seems all $message{message}s bear trailing space. I've better note, in case I would ever spot one that doesn't.

And one piece of advice. Don't mess with methods. They ain't forgetting. And they ain't forgiving.

copy:

Local. It's marked as (internal) in doucmentation. I can't comment if that means that you can't have copy: scheme in your sources.list. When succeedes than returns a set of usual informational message headers (hashes, mtime, size).

Another issue (that, BTW, clearly shows File::AptFetch bad state) if is it possible to create symlinks with copy method. There is an APT configuration parameter Acquire::Source-Symlinks. I suppose it's set to true by default, although it's missing in apt-config dump output. I can't comment what exactly it affects (apt-get, copy method, or file method). Maybe there's an undocumented message header that would force copy method to symlink instead of copying. However, if there's such header, then File::AptFetch should have means substitute right value for Acquire::Source-Symlinks parameter (or whatever else). Right now it doesn't.

Known Easter Eggs are:

  • Inspite of Etch'es (v0.6.46.4-0.1) and Lenny's (v0.7.20) APT packages both have a copy method, and they both set $message{version} to 1.0 in 100 Capabilities, Etch'es version doesn't return hashes at all. Beware.

  • It does not reget! If a source and a target match, then it will silently truncate the source. Then happily return size and hashes of now gone file. (That's for pre-Squeeze's APT.)

  • (v0.0.9) It does not reget! However, the Squeeze's copy unlinks the target now. Thus it needs write permissions in the targt directory. The permissions of just created target are affected by umask. And, as you already guessed, $message{version} is 1.0 still. (OMG, THAT'S RACE!)

  • While $message{uri} should be absolute, the $message{filename} can be relative.

  • However, if $message{uri} isn't absolute, then the message is (in contrary with file method)

        Failed to stat - stat (2 No such file or directory) 

    (Does it really stats -? Should check it.)

  • However, if $message{uri} starts with double-slash, then the $message{message} is the same "Failed to stat...".

  • And $message{filename} can start with double-slash.

  • (v0.1.6, Wheezy) After target's transfer is finished, the method reads it back. I can't comment if it's for hash-set calculation or verification. If target's media is slow then it's huge performance loss.

file:

Local (so far that's the only method what would clearly say in 100 Capabilities $message{local_only} as true). It doesn't fetch anything (kind of fake method). It provides usual set of properties for $message{uri} -- hash-sums, mtime, size.

Known Easter Eggs are:

  • Etch'es version of APT method doesn't returns $message{md5sum_hash} and $message{sha256_hash}. Since both (Etch'es and Lenny's) versions has $message{version} set to 1.0 off 100 Capabilities message, you can find out what version of APT you have only by experimenting.

  • If $message{uri} is unabsolute, then the $message{message} is

        Invalid URI, local URIS must not start with // 
  • If $message{uri} has permissions set a way that prohibits read access, then the method surprisingly succeedes, but hashes are of empty file (puzled).

  • (v0.1.2) EE that laid just here is retracted. At the time the TS was lazy (units reused samples). At present I can't verify what was going on in pre-Squeeze.

  • (v0.1.2) (Following route sounds really weird and there isn't RL scenario when it could possibly happen but that's the way it is.) As of Wheezy: (1) file foo isn't readable; (2) access it; (3) access all right file bar with leading double-slash URI. Then get this $message{message}:

        Could not open file %s - open (13 Permission denied) 

    But! It refers foo; while $message{uri} refers bar. t/file/fail.t (around ftagaab5) really does it. No comments.

  • If $message{message} points to a directory, then the behaviour is the same as for the unreadable file. Even if scheme-specific-part ends with slash. However, if there are leading double-slash in $message{message}, then the method complains about invalid URI.

ftp:

Multiple times sends 101 Status messages. In 200 URI Start manifests file size that's coming through (probably, protocol feature). Hashes are calculated as bytes are passing in. I can't say how to pass in credentials.

In 100 Capabilities appears $message{send_config} as true. $message{version} is 1.0, who would expect that?

networking methods

Thorough testing of networking methods pending. However, I'm pretty sure that default timeout of 2min is enough for now. Methods itself seem to timeout itself within that frame. However, I can't say how many times networking methods would reconnect before giving up.

SEE ALSO

File::AptFetch, "APT Method Itnerface" in libapt-pkg-doc package, apt-config(1)

AUTHOR

Eric Pozharski, <whynot@cpan.org>

COPYRIGHT & LICENSE

Copyright 2009, 2010, 2014 by Eric Pozharski

This overview is free in sense: AS-IS, NO-WARANRTY, HOPE-TO-BE-USEFUL. This overview is released under CC-SA.