NAME
LWP::Protocol::rsync - rsync protocol for LWP
SYNOPSIS
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$res = $ua->get('rsync://example.com/pub/some/thing.txt');
# (module loaded automatically)
DESCRIPTION
This module adds rsync://
protocol scheme to LWP::UserAgent
by running the external rsync(1) program.
The rsync protocol uploads or downloads files by sending only changed file blocks if possible. (The receive side calculates MD4 checksums over existing content and tells the send side what it already has.)
See RFC 5781 on the rsync://
schema and see the Perl URI module for manipulations of such URIs.
GET
downloads a file from an rsync server.If an existing
:content_file
is specified as the local destination (see "REQUEST METHODS" in LWP::UserAgent) then that file is updated as necessary per the rsync protocol.GET
to an ordinaryHTTP::Response
downloads the full source content.If-Modified-Since
is implemented by getting a listing from the server and comparing the desired time. If server time is not newer then the response is usual "304 Not Modified".Last-Modified
response is the modification time of the file on the server. For:content_file
, the file modification time is set too.Content-Type
response is guessed from the URI by LWP::MediaTypes. This is slightly experimental. The rsync server has no notion ofContent-Type
as such.HEAD
retrieves information about a file by asking for a listing from the server.Content-Length
andLast-Modified
response headers are parsed out of the listing.PUT
uploads content to a file on the server. The rsync protocol means only changed parts of the content are actually sent.An upload requires a writable destination on the server (see rsyncd.conf(5)). If it's not writable then
rsync
version 3.1.0 server has been seen simply dropping the connection, resulting in a rather uninformative error message "Connection reset by peer". The intention would be "405 Method Not Allowed" or some such if unwritable can be distinguished from actual connection trouble.
Characters *
?
[
are not permitted in paths. The intention is for this interface to access a single file resource (read or write), but rsync
interprets these characters as shell style wildcards for multi-file transfers. IPv6 style brackets [::1]
in the hostname part are allowed.
The rsync
program has many options for things like mirroring whole directory trees and that sort of thing is best done by running rsync
directly, or perhaps File::Rsync
front-end or File::RsyncP
protocol.
Username and Password
Any username and password in the URI are sent to the server. This can be used for servers or server modules which require a username and/or password for read or write or both.
rsync://username:password@hostname/module/dir1/dir2/foo.txt
If the username or password is incorrect the response is "401 Unauthorized" in the usual way. The server checks authorization before other module restrictions or path existence, so expect "Unauthorized" for anything without a valid username and password.
The rsync program can take passwords from a file but there's nothing here for that.
Directory Listing
GET
of a directory gives the rsync
text listing of the files in that directory.
rsync://hostname/module/dir/ # for directory contents
rsync://hostname/module/dir # same
The format generated by rsync
is a text listing like
-rw-r--r-- 24 2014/03/26 19:54:15 foo.txt
-rw-r--r-- 6 2014/03/26 19:54:15 bar.txt
Last-Modified
is not returned for a directory because there's no particularly good date/time for the listing. The directory has a modtime, but that's just the filenames, not the dates, sizes and perms text of the listing.
HEAD
of a directory doesn't give a Content-Length
since getting that would require getting the full listing. "200 Ok" from HEAD
means the directory exists, but no further information.
Putting a trailing /
on an ordinary file, attempting to treat it as a directory, currently gives a 404
rsync://hostname/module/filename.txt/ # will be 404
Module Listing
GET
with no module name gives a text listing of the available modules.
rsync://hostname/ # for modules list
rsync://hostname
The descriptions are from the comment
part of rsyncd.conf. Eg.
pub public access
private authorized users only
incoming uploads by arrangement
HEAD
of the module listing doesn't give a Content-Length
since getting that would be the same as getting the whole listing. There's no notion of Last-Modified
for the modules list.
ENVIRONMENT VARIABLES
TMPDIR
-
Temporary directory as per
File::Temp
andFile::Spec
(and their usual other variables etc on various systems).In the current implementation, uploading and downloading both go through a temporary file unless given a
:content_file
already.
RSYNC_PASSWORD
environment variable is not used in the current implementation. The rationale is that this module is expected to act on rsync://
URIs to various hosts so a single password is unlikely to be useful. Is that reasonable?
IMPLEMENTATION
The password part of a URI ($uri->password()
) is extracted and passed to the rsync
program in $ENV{'RSYNC_PASSWORD'}
(since rsync doesn't take a password part in a command line rsync://
URL). rsync
expects either $ENV{'RSYNC_PASSWORD'} or prompts the user (with getpass(3)). A prompt is avoided as this interface is meant to be non-interactive.
Any %20
etc URL escapes are unescaped to relevant characters since rsync
doesn't take those forms in its rsync://
command line. (Any % is a literal part of the filename.)
*
?
[
characters as literals in filenames probably needs help from rsync
itself, perhaps even on the server side. Some \*
or [*]
escaping can read an existing file, but will result in reading a file of actual \*
or [*]
if there's no *
. It would be bad to read or write a wrong file.
The rsync --checksum
option is always used so the file contents are compared. Perhaps there could be some special control header to do only the rsync "quick check" of date and size. That would only be useful for :content_file
upload or download.
Each request is a separate rsync
program run so there's no connection keep-alive. Does the rsync protocol allow connection re-use? The If-Modified-Since
implementation is two rsync runs. Maybe the quick-check algorithm could be asked to look at the time but not the size, though the rsync 3.1 code in its unchanged_file()
suggests not.
File::RsyncP
could be an alternative to the rsync
program. The advantage would be "more than one way to do it" and it would be Perl-only. But File::RsyncP
version 0.70 says it doesn't have the delta-transfer of changes, and not sure whether it likes speaking to newer server versions.
It's not possible to rsync through ssh
here, only to an rsync://
server daemon. This corresponds to rsync://
on the rsync command line meaning the daemon, and should be usual for publicly available URLs. An ssh
or rsh
mode would be for point-to-point with a specific host where you have an account. LWP::Protocol::sftp might be useful for such ssh
(without rsync
synchronizing). Some generic way to specify rsync
program name or options here could allow its various finer controls.
The various --delete
options to rsync
are for deleting files no longer present when mirroring a directory. Can it be used to delete an individual file? If so perhaps a DELETE
method could be implemented.
The directory and module listings are presented as text/plain
since that's what rsync gives. It would be possible to parse it and convert to HTML in the manner of LWP::Protocol::ftp
, though rsync itself has a tighter grip on which part is the filename etc when strange characters or a ->
sequence for symlink etc.
SEE ALSO
LWP::UserAgent, LWP::Protocol, URI
HOME PAGE
http://user42.tuxfamily.org/lwp-protocol-rsync/index.html
LICENSE
Copyright 2012, 2013, 2014, 2018, 2019 Kevin Ryde
LWP-Protocol-rsync is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.
LWP-Protocol-rsync is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with LWP-Protocol-rsync. If not, see http://www.gnu.org/licenses/.