The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Devel::Git::MultiBisect - Study build and test output over a range of git commits

SYNOPSIS

You will typically construct an object of a class which is a child of Devel::Git::MultiBisect, such as Devel::Git::MultiBisect::AllCommits, Devel::Git::MultiBisect::Transitions or Devel::Git::MultiBisect::BuildTransitions. All methods documented in this parent package may be called from any of these child classes.

    use Devel::Git::MultiBisect::AllCommits;
    $self = Devel::Git::MultiBisect::AllCommits->new(\%parameters);

... or

    use Devel::Git::MultiBisect::Transitions;
    $self = Devel::Git::MultiBisect::Transitions->new(\%parameters);

... or

    use Devel::Git::MultiBisect::BuildTransitions;
    $self = Devel::Git::MultiBisect::BuildTransitions->new(\%parameters);

... and then:

    $commit_range = $self->get_commits_range();

    $full_targets = $self->set_targets(\@target_args);

    $outputs = $self->run_test_files_on_one_commit($commit_range->[0]);

... followed by methods specific to the child class.

... and then perhaps also:

    $timings = $self->get_timings();

DESCRIPTION

Given a Perl library or application kept in git for version control, it is often useful to be able to compare the output collected from running one or more test files over a range of git commits. If that range is sufficiently large, a test may fail in more than one way over that range.

If that is the case, then simply asking, "When did this file start to fail?" -- a question which git bisect is designed to answer -- is insufficient. In order to identify more than one point of failure, we may need to (a) capture the test output for each commit; or, (b) capture the test output only at those commits where the output changed. The output of a run of a test file may change for a variety of reasons: test failures, segfaults, changes in the number or content of tests, etc.

Devel::Git::MultiBisect provides methods to achieve that objective. Its child classes, Devel::Git::MultiBisect::AllCommits and Devel::Git::MultiBisect::Transitions, provide different flavors of that functionality for objectives (a) and (b), respectively. Please refer to their documentation for further discussion.

Child class Devel::Git::MultiBisect::BuildTransitions focuses on failures during the build process rather than during testing. It can handle three different types of problems which arise when you run make to build a Perl library or to build Perl itself:

  • Exceptions detected by the C-compiler

  • Warnings emitted by the C-compiler

  • Warnings emitted by perl or other languages invoked during make

See the documentation for further details.

GLOSSARY

  • commit

    A source code change set entered ("committed") to a git repository. Each commit is denoted by a SHA. In this library, whenever a commit is called for as the argument to a function, you can also use a git tag.

  • commit range

    The range of sequential commits (determined by git log) requested for analysis.

  • target

    A test file from the test suite of the application or library under study.

  • test output

    What is sent to STDOUT or STDERR as a result of calling a test program such as prove or t/harness on an individual target file. Currently we assume that all such test programs are written based on the Test Anything Protocol (TAP).

  • transitional commit

    A commit at which the test output for a given target changes from that of the commit immediately preceding.

  • digest

    A string holding the output of a cryptographic process run on test output which uniquely identifies that output. (Currently, we use the Digest::SHA::md5_hex algorithm.) We assume that if the test output does not change between one or more commits, then that commit is not a transitional commit.

    Note: Before taking a digest on a particular test output, we exclude text such as timings which are highly likely to change from one run to the next and which would introduce spurious variability into the digest calculations.

  • multisection or multibisection

    A series of configure-build-test process sequences at those commits within the commit range which are selected by a bisection algorithm.

    Normally, when we bisect (via git bisect, Porting/bisect.pl or otherwise), we are seeking a single point where a Boolean result -- yes/no, true/false, pass/fail -- is returned. What the test run outputs to STDOUT or STDERR is a lesser concern.

    In multisection we bisect repeatedly to determine all points where the output of the test command changes -- regardless of whether that change is a PASS, FAIL or whatever. We capture the output for later human or programmatic examination.

METHODS

new()

  • Purpose

    Constructor.

  • Arguments

        $self = Devel::Git::MultiBisect::AllCommits->new(\%params);

    or

        $self = Devel::Git::MultiBisect::Transitions->new(\%params);

    or

        $self = Devel::Git::MultiBisect::BuildTransitions->new(\%params);

    Reference to a hash, typically the return value of Devel::Git::MultiBisect::Opts::process_options().

    The hashref passed as argument must contain key-value pairs for gitdir, outputdir. new() tests for the existence of each of these directories.

  • Return Value

    Object of Devel::Git::MultiBisect child class.

get_commits_range()

  • Purpose

    Identify the SHAs of each git commit identified by new().

  • Arguments

        $commit_range = $self->get_commits_range();

    None; all data needed is already in the object.

  • Return Value

    Array reference, each element of which is a SHA.

set_targets()

  • Purpose

    Identify the test files which will be run at different points in the commits range. We shall assume that the test file has existed with its name unchanged over the entire commit range.

  • Arguments

        $target_args = [
            't/44_func_hashes_mult_unsorted.t',
            't/45_func_hashes_alt_dual_sorted.t',
        ];
        $full_targets = $self->set_targets($target_args);

    Reference to an array holding the relative paths beneath the gitdir to the test files selected for examination.

  • Return Value

    Reference to an array holding hash references with these elements:

    • path

      Absolute paths to the test files selected for examination. Test file is tested for its existence.

    • stub

      String composed by taking an element in the array ref passed as argument and substituting underscores C(<_>) for forward slash (/) and dot (.) characters. So,

          t/44_func_hashes_mult_unsorted.t

      ... becomes:

          t_44_func_hashes_mult_unsorted_t

run_test_files_on_one_commit()

  • Purpose

    Capture the output from running the selected test files at one specific git checkout.

  • Arguments

        $outputs = $self->run_test_files_on_one_commit("2a2e54a");

    or

        $excluded_targets = [
            't/45_func_hashes_alt_dual_sorted.t',
        ];
        $outputs = $self->run_test_files_on_one_commit("2a2e54a", $excluded_targets);
    1. String holding the SHA from a single commit in the repository. This string would typically be one of the elements in the array reference returned by $self-get_commits_range()>. If no argument is provided, the method will default to using the first element in the array reference returned by $self-get_commits_range()>.

    2. Reference to array of target test files to be excluded from a particular invocation of this method. Optional, but will die if argument is not an array reference.

  • Return Value

    Reference to an array, each element of which is a hash reference with the following elements:

    • commit

      String holding the SHA from the commit passed as argument to this method (or the default described above).

    • commit_short

      String holding the value of commit (above) to the number of characters specified in the short element passed to the constructor; defaults to 7.

    • file_stub

      String holding a rewritten version of the relative path beneath gitdir of the test file being run. In this relative path forward slash (/) and dot (.) characters are changed to underscores C(<_>). So,

          t/44_func_hashes_mult_unsorted.t

      ... becomes:

          t_44_func_hashes_mult_unsorted_t'
    • file

      String holding the full path to the file holding the TAP output collected while running one test file at the given commit. The following example shows how that path is calculated. Given:

          output directory (outputdir)    => '/tmp/DQBuT_SRAY/'
          SHA (commit)                    => '2a2e54af709f17cc6186b42840549c46478b6467'
          shortened SHA (commit_short)    => '2a2e54a'
          test file (target->[$i])        => 't/44_func_hashes_mult_unsorted.t'

      ... the file is placed in the directory specified by outputdir. We then join commit_short (the shortened SHA), file_stub (the rewritten relative path) and the strings output and txt with a dot to yield this value for the file element:

          2a2e54a.t_44_func_hashes_mult_unsorted_t.output.txt
    • md5_hex

      String holding the return value of Devel::Git::MultiBisect::Auxiliary::hexdigest_one_file() run with the file designated by the file element as an argument. (More precisely, the file as modified by Devel::Git::MultiBisect::Auxiliary::clean_outputfile().)

    Example:

        [
          {
            commit => "2a2e54af709f17cc6186b42840549c46478b6467",
            commit_short => "2a2e54a",
            file => "/tmp/1mVnyd59ee/2a2e54a.t_44_func_hashes_mult_unsorted_t.output.txt",
            file_stub => "t_44_func_hashes_mult_unsorted_t",
            md5_hex => "31b7c93474e15a16d702da31989ab565",
          },
          {
            commit => "2a2e54af709f17cc6186b42840549c46478b6467",
            commit_short => "2a2e54a",
            file => "/tmp/1mVnyd59ee/2a2e54a.t_45_func_hashes_alt_dual_sorted_t.output.txt",
            file_stub => "t_45_func_hashes_alt_dual_sorted_t",
            md5_hex => "6ee767b9d2838e4bbe83be0749b841c1",
          },
        ]
  • Comment

    In this method's current implementation, we start with a git checkout from the repository at the specified commit. We configure (e.g., perl Makefile.PL) and build (e.g., make) the source code. We then test each of the test files we have targeted (e.g., prove -vb relative/path/to/test_file.t). We redirect both STDOUT and STDERR to outputfile, clean up the outputfile to remove the line containing timings (as that introduces unwanted variability in the md5_hex values) and compute the digest.

    This implementation is very much subject to change.

    If a true value for verbose has been passed to the constructor, the method prints Created [outputfile] to STDOUT before returning.

    Note: While this method is publicly documented, in actual use you probably will not need to call it directly. Instead, you will probably use either Devel::Git::MultiBisect::AllCommits::run_test_files_on_all_commits() or Devel::Git::MultiBisect::Transitions::multisect_all_targets().

get_timings()

  • Purpose

    Get information on the time a multisection took to run.

  • Arguments

    None; all data needed is already in the object.

  • Return Value

    Hash reference. The selection of elements in this hashref will depend on which subclass of Devel::Git::MultiBisect you are using and may differ among subclasses. Example:

        { elapsed => 4297, mean => 186.83, runs => 23 }

    In this example (taken from a run of one test file over 220 commits in Perl 5 blead), 23 runs were needed to achieve a result. These took 4297 seconds (approximately 71 minutes) with a mean run time of approximately 3 minutes each.

    Method will return undefined value if timings are not yet available within the object.

SUPPORT

Please report any bugs by mail to bug-Devel-Git-MultiBisect@rt.cpan.org or through the web interface at http://rt.cpan.org.

AUTHOR

James E. Keenan (jkeenan at cpan dot org). When sending correspondence, please include 'Devel::Git::MultiBisect' or 'Devel-Git-MultiBisect' in your subject line.

Creation date: October 12 2016. Last modification date: September 12 2021.

Development repository: https://github.com/jkeenan/devel-git-multibisect

ACKNOWLEDGEMENTS

Thanks to the following contributors and reviewers:

COPYRIGHT

Copyright (c) 2016-2021 James E. Keenan. United States. All rights reserved. This is free software and may be distributed under the same terms as Perl itself.