NAME

LWP::Protocol::rsync - rsync protocol for LWP

SYNOPSIS

 use LWP::UserAgent;
 my $ua = LWP::UserAgent->new;
 $res = $ua->get('rsync://example.com/pub/some/thing.txt');
 # (module loaded automatically)

DESCRIPTION

This module adds rsync:// protocol scheme to LWP::UserAgent by running the external rsync(1) program.

The rsync protocol uploads or downloads files by sending only changed file blocks if possible. (The receive side calculates MD4 checksums over existing content and tells the send side what it already has.)

See RFC 5781 on the rsync:// schema and see the Perl URI module for manipulations of such URIs.

  • GET downloads a file from an rsync server.

    If an existing :content_file is specified as the local destination (see "REQUEST METHODS" in LWP::UserAgent) then that file is updated as necessary per the rsync protocol. GET to an ordinary HTTP::Response downloads the full source content.

    If-Modified-Since is implemented by getting a listing from the server and comparing the desired time. If server time is not newer then the response is usual "304 Not Modified".

    Last-Modified response is the modification time of the file on the server. For :content_file, the file modification time is set too.

    Content-Type response is guessed from the URI by LWP::MediaTypes. This is slightly experimental. The rsync server has no notion of Content-Type as such.

  • HEAD retrieves information about a file by asking for a listing from the server. Content-Length and Last-Modified response headers are parsed out of the listing.

  • PUT uploads content to a file on the server. The rsync protocol means only changed parts of the content are actually sent.

    An upload requires a writable destination on the server (see rsyncd.conf(5)). If it's not writable then rsync version 3.1.0 server has been seen simply dropping the connection, resulting in a rather uninformative error message "Connection reset by peer". The intention would be "405 Method Not Allowed" or some such if unwritable can be distinguished from actual connection trouble.

Characters * ? [ are not permitted in paths. The intention is for this interface to access a single file resource (read or write), but rsync interprets these characters as shell style wildcards for multi-file transfers. IPv6 style brackets [::1] in the hostname part are allowed.

The rsync program has many options for things like mirroring whole directory trees and that sort of thing is best done by running rsync directly, or perhaps File::Rsync front-end or File::RsyncP protocol.

Username and Password

Any username and password in the URI are sent to the server. This can be used for servers or server modules which require a username and/or password for read or write or both.

    rsync://username:password@hostname/module/dir1/dir2/foo.txt

If the username or password is incorrect the response is "401 Unauthorized" in the usual way. The server checks authorization before other module restrictions or path existence, so expect "Unauthorized" for anything without a valid username and password.

The rsync program can take passwords from a file but there's nothing here for that.

Directory Listing

GET of a directory gives the rsync text listing of the files in that directory.

    rsync://hostname/module/dir/        # for directory contents
    rsync://hostname/module/dir         # same

The format generated by rsync is a text listing like

    -rw-r--r--             24 2014/03/26 19:54:15 foo.txt
    -rw-r--r--              6 2014/03/26 19:54:15 bar.txt

Last-Modified is not returned for a directory because there's no particularly good date/time for the listing. The directory has a modtime, but that's just the filenames, not the dates, sizes and perms text of the listing.

HEAD of a directory doesn't give a Content-Length since getting that would require getting the full listing. "200 Ok" from HEAD means the directory exists, but no further information.

Putting a trailing / on an ordinary file, attempting to treat it as a directory, currently gives a 404

    rsync://hostname/module/filename.txt/    # will be 404

Module Listing

GET with no module name gives a text listing of the available modules.

    rsync://hostname/               # for modules list
    rsync://hostname

The descriptions are from the comment part of rsyncd.conf. Eg.

    pub             public access
    private         authorized users only
    incoming        uploads by arrangement

HEAD of the module listing doesn't give a Content-Length since getting that would be the same as getting the whole listing. There's no notion of Last-Modified for the modules list.

ENVIRONMENT VARIABLES

TMPDIR

Temporary directory as per File::Temp and File::Spec (and their usual other variables etc on various systems).

In the current implementation, uploading and downloading both go through a temporary file unless given a :content_file already.

RSYNC_PASSWORD environment variable is not used in the current implementation. The rationale is that this module is expected to act on rsync:// URIs to various hosts so a single password is unlikely to be useful. Is that reasonable?

IMPLEMENTATION

The password part of a URI ($uri->password()) is extracted and passed to the rsync program in $ENV{'RSYNC_PASSWORD'} (since rsync doesn't take a password part in a command line rsync:// URL). rsync expects either $ENV{'RSYNC_PASSWORD'} or prompts the user (with getpass(3)). A prompt is avoided as this interface is meant to be non-interactive.

Any %20 etc URL escapes are unescaped to relevant characters since rsync doesn't take those forms in its rsync:// command line. (Any % is a literal part of the filename.)

* ? [ characters as literals in filenames probably needs help from rsync itself, perhaps even on the server side. Some \* or [*] escaping can read an existing file, but will result in reading a file of actual \* or [*] if there's no *. It would be bad to read or write a wrong file.

The rsync --checksum option is always used so the file contents are compared. Perhaps there could be some special control header to do only the rsync "quick check" of date and size. That would only be useful for :content_file upload or download.

Each request is a separate rsync program run so there's no connection keep-alive. Does the rsync protocol allow connection re-use? The If-Modified-Since implementation is two rsync runs. Maybe the quick-check algorithm could be asked to look at the time but not the size, though the rsync 3.1 code in its unchanged_file() suggests not.

File::RsyncP could be an alternative to the rsync program. The advantage would be "more than one way to do it" and it would be Perl-only. But File::RsyncP version 0.70 says it doesn't have the delta-transfer of changes, and not sure whether it likes speaking to newer server versions.

It's not possible to rsync through ssh here, only to an rsync:// server daemon. This corresponds to rsync:// on the rsync command line meaning the daemon, and should be usual for publicly available URLs. An ssh or rsh mode would be for point-to-point with a specific host where you have an account. LWP::Protocol::sftp might be useful for such ssh (without rsync synchronizing). Some generic way to specify rsync program name or options here could allow its various finer controls.

The various --delete options to rsync are for deleting files no longer present when mirroring a directory. Can it be used to delete an individual file? If so perhaps a DELETE method could be implemented.

The directory and module listings are presented as text/plain since that's what rsync gives. It would be possible to parse it and convert to HTML in the manner of LWP::Protocol::ftp, though rsync itself has a tighter grip on which part is the filename etc when strange characters or a -> sequence for symlink etc.

SEE ALSO

LWP::UserAgent, LWP::Protocol, URI

File::Rsync, File::RsyncP

LWP::Protocol::sftp

HOME PAGE

http://user42.tuxfamily.org/lwp-protocol-rsync/index.html

LICENSE

Copyright 2012, 2013, 2014, 2018, 2019 Kevin Ryde

LWP-Protocol-rsync is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.

LWP-Protocol-rsync is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with LWP-Protocol-rsync. If not, see http://www.gnu.org/licenses/.