The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

makepp_signatures -- How makepp knows when files have changed

DESCRIPTION

c_compilation_md5md5shared_object

Each file is associated with a signature, which is a string that changes if the file has changed. Makepp compares signatures to see whether it needs to rebuild anything. The default signature for files is a concatenation of the file's modification time and its size, unless you're executing a C/C++ compilation command, in which case the default signature is a cryptographic checksum on the file's contents, ignoring comments and whitespace. If you want, you can switch to a different method, or you can define your own signature functions.

How the signature is actually used is controlled by the build check method (see makepp_build_check). Normally, if a file's signature changes, the file itself is considered to have changed, and makepp forces a rebuild.

If makepp is building a file, and you don't think it should be, you might want to check the build log (see makepplog). Makepp writes an explanation of what it thought each file depended on, and why it chose to rebuild.

There are several signature methods included in makepp. Makepp usually picks the most appropriate standard one automatically. However, you can change the signature method for an individual rule by using :signature modifier on the rule which depends on the files you want to check (see "signature" in makepp_rules), or for all rules in a makefile by using the signature statement (see "signature" in makepp_statements), or for all makefiles at once using the -m or --signature-method command line option (see "-m" in makepp_command).

Signature methods included in the distribution

plain (actually nameless)

The plain signature method is the file's modification time and the file's size, concatenated. These values are quickly obtainable from the operating system and almost always change when the file changes.

Makepp used to look only at the file's modification time, but if you run makepp several times within a second (e.g., in a script that's building several small things), sometimes modification times won't change. Then, hopefully the file's size will change.

If the case where you may run makepp several times a second is a problem for you, you may find that using the md5 method is somewhat more reliable. If makepp builds a file, it flushes its cached MD5 signatures even if the file's date hasn't changed.

For efficiency's sake, makepp won't reread the file and recompute the complex signatures below if this plain signature hasn't changed since the last time it computed it. This can theoretically cause a problem, since it's possible to change the file's contents without changing its date and size. In practice, this is quite hard to do so it's not a serious danger. In the future, as more filesystems switch to timestamps of under a second, hopefully Perl will give us access to this info, making this failsafe.

md5

Computes an MD5 checksum of the file's contents, rather than looking at the file's date or size. This means that if you change the date on the file but don't change its contents, makepp won't try to rebuild anything that depends on it.

This is particularly useful if you have some file which is often regenerated during the build process that other files depend on, but which usually doesn't actually change. If you use the md5 signature checking method, makepp will realize that the file's contents haven't changed even if the file's date has changed. (Of course, this won't help if the files have a timestamp written inside of them, as archive files do for example.)

c_compilation_md5

This is the default method, if your Perl supports MD5 (standard since 5.8). It checks if a file's name looks like C or C++ source code, including things like Corba IDL. If it does, this method applies. If it doesn't, it falls back to plain signatures for binary files (determined by name or else by content) and else to "md5". Alas you can currently tune what is to be considered C or binary only by subclassing Signature::c_compilation_md5 to get a new signature type and there overriding or extending the methods recognizes_file and excludes_file.

The idea is to be independent of formatting changes. This is done by pulling everything up as far as possible, and by eliminating insignificant spaces. Words are exempt from pulling up, since they might be macros containing __LINE__, so they remain on the line where they were.

    // ignored comment
  
    #ifdef XYZ
        #include <xyz.h>
    #endif
  
    int a = 1;
  
    void f
    (
        int b
    )
    {
        a += b + ++c;
    }
  
        /* more ignored comment */

is treated as though it were

    #ifdef XYZ
    #include<xyz.h>
    #endif
  
  
  
    int a=1;
  
    void f(
  
    int b){
  
  
    a+=b+ ++c;}

That way you can reindent your code or add or change comments without triggering a rebuild, so long as you don't change the line numbers. (This signature method recompiles if line numbers have changed because that causes calls to __LINE__ and most debugging information to change.) It also ignores whitespace and comments after the last token. This is useful for preventing a useless rebuild if your VC adds lines at a $Log$ tag when checking in.

This method is particularly useful for the following situations:

  • You want to make changes to the comments in a commonly included header file, or you want to reformat or reindent part of it. For one project that I worked on a long time ago, we were very unwilling to correct inaccurate comments in a common header file, even when they were seriously misleading, because doing so would trigger several hours of rebuilds. With this signature method, this is no longer a problem.

  • You like to save your files often, and your editor (unlike emacs) will happily write a new copy out even if nothing has changed.

  • You have C/C++ source files which are generated automatically by other build commands (e.g., yacc or some other preprocessor). For one system I work with, we have a preprocessor which (like yacc) produces two output files, a .cxx and a .h file:

        %.h %.cxx: %.qtdlg $(HLIB)/Qt/qt_dialog_generator
            $(HLIB)/Qt/qt_dialog_generator $(input)

    Every time the input file changed, the resulting .h file also was rewritten, and ordinarily this would trigger a rebuild of everything that included it. However, most of the time the contents of the .h file didn't actually change (except for a comment about the build time written by the preprocessor), so a recompilation was not actually necessary.

shared_object

This method only works if you have the utility nm in your path, and it accepts the -P option to output Posix format. In that case only the names and types of symbols in dynamically loaded libraries become part of their signature. The result is that you can change the coding of functions without having to relink the programs that use them.

In the following command the command scanner will detect an implicit dependency on $(LIBDIR)/libmylib.so, and build it if necessary. However the link command will only be reperformed whenever the library exports a different set of symbols:

    myprog: $(OBJECTS) :signature shared_object
        $(LD) -L$(LIBDIR) -lmylib $(inputs) -o $(output)

This works as long as the functions' interfaces don't change. But in that case you'd change the declaration, so you'd also need to change the callers.

Note that this method only applies to files whose name looks like a shared library. For all other files it falls back to c_compilation_md5, which may in turn fall back to others.

Custom methods

You can, if you want, define your own methods for calculating file signatures and comparing them. You will need to write a perl module to do this. Have a look at the comments in Signature.pm in the distribution, and also at the existing signature algorithms in Signature/*.pm for details.

Here are some cases where you might want a custom signature method:

  • When you want all changes in a file to be ignored. Say you always want dateStamp.o to be a dependency (to force a rebuild), but you don't want to rebuild if only dateStamp.o has changed. You could define a signature method that inherits from c_compilation_md5 that recognizes the dateStamp.o file by its name, and always returns a constant value for that file.

  • When you want to ignore part of a file. Suppose that you have a program that generates a file that has a date stamp in it, but you don't want to recompile if only the date stamp has changed. Just define a signature method similar to c_compilation_md5 that understands your file format and skips the parts you don't want to take into account.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 101:

Non-ASCII character seen before =encoding in ' '. Assuming UTF-8