The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

load - control when subroutines will be loaded

SYNOPSIS

  use load;            # default, same as 'autoload'

  use load 'autoload'; # export AUTOLOAD handler to this namespace

  use load 'ondemand'; # load subroutines after __END__ when requested, default

  use load 'now';      # load subroutines after __END__ now

  use load ();         # same as qw(dontscan inherit)

  use load 'dontscan'; # don't scan module until it is really needed

  use load 'inherit';  # do NOT export AUTOLOAD handler to this namespace

DESCRIPTION

The "load" pragma allows a module developer to give the application developer more options with regards to optimize for memory or CPU usage. The "load" pragma gives more control on the moment when subroutines are loaded and start taking up memory. This allows the application developer to optimize for CPU usage (by loading all of a module at compile time and thus reducing the amount of CPU used during the execution of an application). Or allow the application developer to optimize for memory usage, by loading subroutines only when they are actually needed, thereby however increasing the amount of CPU needed during execution.

The "load" pragma combines the best of both worlds from AutoLoader and SelfLoader. And adds some more features.

In a situation where you want to use as little memory as possible, the "load" pragma (in the context of a module) is a drop-in replacement for AutoLoader. But for situations where you want to have a module load everything it could ever possibly need (e.g. when starting a mod_perl server in pre-fork mode), the "load" pragma can be used (in the context of an application) to have all subroutines of a module loaded without having to make any change to the source of the module in question.

So the typical use inside a module is to have:

 package Your::Module;
 use load;

in the source. And to place all subroutines that you want to be loadable on demand after the (first) __END__.

If an application developer decides that all subroutines should be loaded at compile time, (s)he can say in the application:

 use load 'now';
 use Your::Module;

This will cause the subroutines of Your::Module to all be loaded at compile time.

MODES OF OPERATION

There are basically two places where you can call the "load" pragma:

inside a module

When you call the "load" pragma inside a module, you're basically enabling that module for having an external control when certain subroutines will be loaded. As with AutoLoader, any subroutines that should be loaded on demand, should be located after an __END__ line.

If no parameters are specified with the use load, then the "autoload" parameter is assumed. Whether the module's subroutines are loaded at compile time or on demand, is determined by the calling application. If the application doesn't specify anything specific, the "ondemand" keyword will also be assumed.

inside an application

When you call the "load" pragma inside an application, you're basically specifying when subroutines will be loaded by "load" enhanced modules. As an application developer, you can basically use two keywords: "ondemand" and "now".

If an application does not call the "load" pragma, the "ondemand" keyword will be assumed. With "ondemand", subroutines will only be loaded when they are actually executed. This saves memory at the expense of extra CPU the first time the subroutine is called.

The "now" keyword indicates that all subroutines of all modules that are enhanced with the "load" pragma, will be loaded at compile time (thus using more memory, but not having an extra CPU overhead the first time the subroutine is executed).

KEYWORDS

The following keywords are recognized with the use command:

ondemand

The "ondemand" keyword indicates that subroutines, of modules that are enhanced with the "load" pragma, will only be loaded when they are actually called.

If the "ondemand" keyword is used in the context of an application, all modules that are subsequently used, will be forced to load subroutines only when they are actually called (unless the module itself forces a specific setting).

If the "ondemand" keyword is used in the context of a module, it indicates that the subroutines of that module, should always be loaded when they are actually needed. Since this takes away the choice from the application developer, the use of the "ondemand" keyword in module context is not encouraged. See also the now and dontscan keywords.

now

The "now" keyword indicates that subroutines, of modules that are enhanced with the "load" pragma, will be loaded at compile time.

If the "now" keyword is used in the context of an application, all modules that are subsequently used, will be forced to load all subroutines at compile time (unless the module forces a specific setting itself).

If the "now" keyword is used in the context of a module, it indicates that the subroutines of that module, should always be loaded at compile time. Since this takes away the choice from the application developer, the use of the "now" keyword in module context is not encouraged. See also the ondemand keyword.

dontscan

The "dontscan" keyword only makes sense when used in the context of a module. Normally, when a module that is enhanced with the "load" pragma is compiled, the source after the __END__ is scanned for the locations of the subroutines. This makes the compiling of modules a little slower, but allows for a faster (initial) lookup of (yet) unloaded subroutines during execution.

If the "dontscan" keyword is specified, this scanning of the source is skipped at compile time. However, as soon as an attempt is made to ececute a subroutine from this module, then first the scanning of the source is performed, before the subroutine in question is loaded.

So, you should use the "dontscan" keyword if you are reasonably sure that you will only need subroutines from the module in special cases. In all other cases it will make more sense to have the source scanned at compile time.

The "dontscan" keyword will be ignored if an application developer forces subroutines to be loaded at compile time with the now keyword.

autoload

The "autoload" keyword only makes sense when used in the context of a module. It indicates that a generic AUTOLOAD subroutine will be exported to the module's namespace. It is selected by default if you use the "load" pragma without parameters in the source of a module. See also the inherit keyword to not export the generic AUTOLOAD subroutine.

inherit

The "inherit" keyword only makes sense when used in the context of a module. It indicates that no AUTOLOAD subroutine will be exported to the module's namespace. This can e.g. be used when you need to have your own AUTOLOAD routine. That AUTOLOAD routine should then contain:

 $load::AUTOLOAD = $sub;
 goto &load::AUTOLOAD;

to access the "load" pragma functionality. Another case to use the "inherit" keyword would be in a sub-class of a module which also is "load" enhanced. In that case, the inheritance will cause the AUTOLOAD subroutine of the base class to be used, thereby accessing the "load" pragma automagically (and hence the naming of the keyword of course). See also the autoload keyword to have the module use the generic AUTOLOAD subroutine.

DIFFERENCES WITH SIMILAR MODULES

There are a number of (core) modules that more or less do the same thing as the "load" pragma.

AutoSplit / AutoLoader

The "load" pragma is very similar to the AutoSplit / AutoLoader combination. The main difference is that the splitting takes place when the "load" import is called in a module and that there are no external files created. Instead, just the offsets and lengths are recorded in a hash (when "ondemand" is active) or all the source after __END__ is eval'led (when "now" is active).

From a module developer point of view, the advantage is that you do not need to install a module before you can test it. From an application developer point of view, you have the flexibility of having everything loaded now or later (on demand).

From a memory usage point of view, the "load" offset/length hash takes up more memory than the equivalent AutoLoader setup. On the other hand, accessing the source of a subroutine may generally be faster because the file is more likely to reside in the operating system's buffers already.

As an extra feature, the "load" pragma allows an application to force all subroutines to be loaded at compile time, which is not possible with AutoLoader.

SelfLoader

The "load" pragma also has some functionality in common with the SelfLoader module. But it gives more granularity: with SelfLoader, all subroutines that are not loaded directly, will be loaded if any not yet loaded subroutine is requested. It also adds complexities if your module needs to use the <DATA> handle. So the "load" pragma gives more flexibility and fewer development complexities. And of course, an application can force all subroutines to be loaded at compile time when needed with the "load" pragma.

UNIVERSAL::can

To ensure the functioning of the ->can class method and &UNIVERSAL::can, the "load" pragma hijacks the standard UNIVERSAL::can routine so that it can check whether the subroutine/method that you want to check for, actually exists and have a code reference to it returned. This has a side effect that you the subroutine checked for, is loaded. You can use this side effect to load subroutines without calling them.

 Your::Module->can( 'loadthisnow' );

will load the subroutine "loadthisnow" of the Your::Module module without actually calling it.

CAVEATS

Currently you may not have multiple packages in the same file, nor can you have fully qualified subroutine names.

The parser that looks for package names and subroutines, is not very smart. This is intentionally so, as making it smarter will make it a lot slower, but probably still not smart enough. Therefore, the package and sub's must be at the start of a line. And the name of the sub must be on the same line as the sub.

EXAMPLES

Some code examples. Please note that these are just a part of an actual situation.

base class

 package Your::Module;
 use load;

Exports the generic AUTOLOAD subroutine and adheres to whatever the application developer specifies as mode of operation.

sub class

 package Your::Module::Adapted;
 @ISA = qw(Your::Module);
 use load ();

Does not export the generic AUTOLOAD subroutine, but inherits it from its base class. Also implicitely specifies the "dontscan" keyword, causing the source of the module to be scanned only when the first not yet loaded subroutine is about to be executed. If you only want to have the "inherit" keyword functionality, then you must specify that explicitely:

 package Your::Module::Adapted;
 @ISA = qw(Your::Module);
 use load 'inherit';

custom AUTOLOAD

 package Your::Module;
 use load 'inherit';
 
 sub AUTOLOAD {
   if (some condition) {
     $load::AUTOLOAD = $Your::Module::AUTOLOAD;
     goto &load::AUTOLOAD;
   }
   # do your own stuff
 }

If you want to use your own AUTOLOAD subroutine, but still want to use the functionality offered by the "load" pragma, you can use the above construct.

mod_perl prefork

 use load 'now';
 use Your::Module;

In pre-fork mod_perl applications (the default mod_perl applications before mod_perl 2.0), it is advantageous to load all possible subroutines when the Apache process is started. This is because the operating system will share memory using a process called "Copy On Write". So even though it will take more memory initially, that memory loss is easily evened out by the gains of having everything shared. Loading a not yet loaded subroutine in that situation, will cause otherwise shared memory to become unshared. Thereby increasing the overall memory usage, because the amount that becomes unshared is typically a lot more than the extra memory used by the subroutine (which is caused by fragmentation of allocated memory).

threaded applications and mod_perl worker

 use Your::Module;

Threaded Perl applications, of which mod_perl applications using the "worker" module are a special case, function best when subroutines are only loaded when they are actually needed. This is caused by the nature of the threading model of Perl, in which all data-structures are copied to each thread (essentially forcing them to become unshared as far as the operating system is concerned).

Benchmarks have shown that the overhead of the extra CPU is easily offset by the reduction of the amount of data that needs to be copied (and processed) when a thread is created.

TODO

The coordinates of a subroutine in a module (start,number of bytes) are stored in a hash in the load namespace. Ideally, this information should be stored in the stash of the module to which they apply. Then the internals that check for the existence of a subroutine, would see that the subroutine doesn't exist (yet), but that there is an offset and length (and implicitely, a file from %INC) from which the source could be read and evalled.

Loading all of the subroutines should maybe be handled inside the Perl parser, having it skip __END__ when the global "now" flag is set.

Possibly we should use the <DATA> handle from a module if there is one, or dup it and use that, rather than opening the file again.

AUTHOR

Elizabeth Mattijsen, <liz@dijkmat.nl>.

Please report bugs to <perlbugs@dijkmat.nl>.

COPYRIGHT

Copyright (c) 2002 Elizabeth Mattijsen <liz@dijkmat.nl>. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

AutoLoader, SelfLoader.