Git::Repository::Plugin::GitHooks - A Git::Repository plugin with some goodies for hook developers
version 2.11.1
# load the plugin use Git::Repository 'GitHooks'; my $git = Git::Repository->new(); my $config = $git->get_config(); my $branch = $git->get_current_branch(); my @commits = $git->get_commits($oldcommit, $newcommit); my @files_modified_by_commit = $git->filter_files_in_index('AM'); my @files_modified_by_push = $git->filter_files_in_range('AM', $oldcommit, $newcommit);
This module adds several methods useful to implement Git hooks to Git::Repository.
In particular, it is used by the standard hooks implemented by the Git::Hooks framework.
Git::Hooks
Git::Repository::Plugin::GitHooks - Add useful methods for hooks to Git::Repository
Git configuration files usually contain just ASCII characters, but values and sub-section names may contain any characters, except newline. If your config files have non-ASCII characters you should ensure that they are properly decoded by specifying their encoding like this:
$Git::Repository::Plugin::GitHooks::CONFIG_ENCODING = 'UTF-8';
The acceptable values for this variable are all the encodings supported by the Encode module.
Encode
The following methods are used by the Git::Hooks framework and are not intended to be useful for hook developers. They're described here for completeness.
This is used by Git::Hooks::run_hooks to prepare the environment for specific Git hooks before invoking the associated plugins. It's invoked with the arguments passed by Git to the hook script. NAME is the script name (usually the variable $0) and ARGS is a reference to an array containing the script positional arguments.
This loads every plugin configured in the githooks.plugin option.
This is used by Git::Hooks::run_hooks to invoke external hooks.
Returns the list of post hook functions registered with the post_hook method below.
The following methods are intended to be useful for hook developers.
Plugin developers may be interested in performing some action depending on the overall result of every check made by every other hook. As an example, Gerrit's patchset-created hook is invoked asynchronously, meaning that the hook's exit code doesn't affect the action that triggered the hook. The proper way to signal the hook result for Gerrit is to invoke it's API to make a review. But we want to perform the review once, at the end of the hook execution, based on the overall result of all enabled checks.
patchset-created
To do that, plugin developers can use this routine to register callbacks that are invoked at the end of run_hooks. The callbacks are called with the following arguments:
run_hooks
HOOK_NAME
The basename of the invoked hook.
GIT
The Git::Repository object that was passed to the plugin hooks.
ARGS...
The remaining arguments that were passed to the plugin hooks.
The callbacks may see if there were any errors signaled by the plugin hook by invoking the get_errors method on the GIT object. They may be used to signal the hook result in any way they want, but they should not die or they will prevent other post hooks to run.
get_errors
This may be used by plugin developers to cache information in the context of a Git::Repository object. SECTION is any string which becomes associated with a hash-ref. The method simply returns the hash-ref, which can be used by the caller to store any kind of information. Plugin developers are encouraged to use the plugin name as the SECTION string to avoid clashes.
This groks the configuration options for the repository by invoking git config --list. The configuration is cached during the first invocation in the object Git::Repository object. So, if the configuration is changed afterwards, the method won't notice it. This is usually ok for hooks, though, which are short-lived.
git config --list
Git::Repository
With no arguments, the options are returned as a hash-ref pointing to a two-level hash. For example, if the config options are these:
section1.a=1 section1.b=2 section1.b=3 section2.x.a=A section2.x.b=B section2.x.b=C
Then, it'll return this hash:
{ 'section1' => { 'a' => [1], 'b' => [2, 3], }, 'section2.x' => { 'a' => ['A'], 'b' => ['B', 'C'], }, }
The first level keys are the part of the option names before the last dot. The second level keys are everything after the last dot in the option names. You won't get more levels than two. In the example above, you can see that the option "section2.x.a" is split in two: "section2.x" in the first level and "a" in the second.
The values are always array-refs, even it there is only one value to a specific option. For some options, it makes sense to have a list of values attached to them. But even if you expect a single value to an option you may have it defined in the global scope and redefined in the local scope. In this case, it will appear as a two-element array, the last one being the local value.
So, if you want to treat an option as single-valued, you should fetch it like this:
$h->{section1}{a}[-1] $h->{'section2.x'}{a}[-1]
If the SECTION argument is passed, the method returns the second-level hash for it. So, following the example above:
$git->get_config('section1');
This call would return this hash:
{ 'a' => [1], 'b' => [2, 3], }
If the section doesn't exist an empty hash is returned. Any key/value added to the returned hash will be available in subsequent invocations of get_config.
get_config
If the VARIABLE argument is also passed, the method returns the value(s) of the configuration option SECTION.VARIABLE. In list context the method returns the list of all values or the empty list, if the variable isn't defined. In scalar context, the method returns the variable's last value or undef, if it's not defined.
SECTION.VARIABLE
undef
As a special case, options without values (i.e., with no equals sign after its name in the configuration file) are set to the string 'true' to force Perl recognize them as true Booleans.
Git configuration variables may be grokked as Booleans. (See git help config.) There are specific values meaning true (viz. yes, on, true, 1, and the absence of a value) and specific values meaning false (viz. no, off, false, 0, and the empty string).
git help config
yes
on
true
1
no
off
false
0
This method checks the variable's value and returns 1 or 0 representing Boolean values in Perl. If the variable's value isn't recognized as a Git Boolean the method croaks. If the variable isn't defined the method returns undef.
In the Git::Hooks documentation, all configuration variables mentioning a BOOL value are grokked with this method.
BOOL
Git configuration variables may be grokked as integers. (See git help config.) They may start with an optional signal (+ or -), followed by one or more decimal digits, and end with an optional scaling factor letter, viz. k (1024), m (1024*1024), or g (1024*1024*1024). The scaling factor may be in lower or upper-case.
+
-
k
m
g
This method checks the variable's value format and returns the corresponding Perl integer. If the variable's value isn't recognized as a Git integer the method croaks. If the variable isn't defined the method returns undef.
In the Git::Hooks documentation, all configuration variables mentioning an INT value are grokked with this method.
INT
This method should be used by plugins to record consistent error or warning messages. It gets one or two arguments. MESSAGE is a multi-line string explaining the error. INFO is an optional hash-ref which may contain additional information about the message, which will be used to complement it.
A "complete" fault is formatted like this:
[PREFIX: CONTEXT] MESSAGE DETAILS
PREFIX gives contextual information about the message. It can be set via the prefix INFO hash key. If not, the package name of the function which called fault is used, which usually happens to be the name of the plugin which detected the error.
prefix
fault
CONTEXT is additional contextual information, such as a reference name, a commit SHA-1, and a violated configuration option.
MESSAGE is the multi-line error message.
DETAILS is a multi-line string giving more details about the error. Usually showing error output from an external command.
Besides the MESSAGE, which is required, and the PREFIX, which has a default value, all other items must be informed via the INFO hash-ref with the following keys:
A string giving broad contextual information about the error message. When absent, the prefix used is the package name of the function which called fault, which is usually a Git::Hooks plugin name.
commit
The SHA-1 or a Git::Repository::Log object representing a commit. It is informed in the CONTEXT area like this (as a short SHA-1):
[PREFIX: commit SHA-1]
ref
The name of a Git reference (usually a branch). It is informed in the CONTEXT area like this:
[PREFIX: on ref REF]
option
The name of a configuration option related to the error message. It is informed in the CONTEXT area like this:
[PREFIX: violates option 'OPTION']
details
A string containing details about the error message. If present, it is appended to the MESSAGE, separated by an empty line, and with its lines prefixed by two spaces.
The method simply records the formatted error message and returns. It doesn't die.
The messages can be colorized if they go to a terminal. This can be configured by the configuration options githooks.color and githooks.color.<slot>, which are explained in the section "CONFIGURATION" in Git::Hooks documentation.
githooks.color
githooks.color.<slot>
This method returns a string specially formatted with all error messages recorded with the fault method, a header, and a footer, if requested by configuration.
This method is DEPRECATED. Please, use fault instead.
This method is DEPRECATED. Please, use get_faults instead.
get_faults
The undefined commit is a special SHA-1 used by Git in the update and pre-receive hooks to signify that a reference either was just created (as the old commit) or has been just deleted (as the new commit). It consists of 40 zeroes.
The empty tree represents an empty directory for Git.
Returns a Git::Repository::Log object representing COMMIT.
Returns a list of Git::Repository::Log objects representing every commit reachable from NEWCOMMIT but not from OLDCOMMIT.
There are two special cases, though:
If NEWCOMMIT is the undefined commit, i.e., '0000000000000000000000000000000000000000', this means that a branch, pointing to OLDCOMMIT, has been removed. In this case the method returns an empty list, meaning that no new commit has been created.
If OLDCOMMIT is the undefined commit, this means that a new branch pointing to NEWCOMMIT is being created. In this case we want all commits reachable from NEWCOMMIT but not reachable from any other branch. The syntax for this is NEWCOMMIT ^B1 ^B2 ... ^Bn", i.e., NEWCOMMIT followed by every other branch name prefixed by carets. We can get at their names using the technique described in, e.g., this discussion.
The Git::Repository::Log objects are constructed ultimately by invoking the git log command like this:
git log
git log [<options>] <revision range> [-- <paths>]
The revision range is usually just OLDCOMMIT..NEWCOMMIT, but there are some special cases which require some calculating as discussed above.
revision range
OLDCOMMIT..NEWCOMMIT
The OPTIONS optional argument is an array-ref pointing to an array of strings, which will be passed as options to the git-log command. It may be useful to grok some extra information about each commit (e.g., using --name-status).
OPTIONS
--name-status
The PATHS optional argument is an array-ref pointing to an array of strings, which will be passed as pathspecs to the git-log command. It may be useful to filter the list of commits, grokking only those affecting specific paths in the repository.
PATHS
Returns the relevant contents of the commit message file called FILENAME. It's useful during the commit-msg and the prepare-commit-msg hooks.
commit-msg
prepare-commit-msg
The file is read using the character encoding defined by the i18n.commitencoding configuration option or utf-8 if not defined.
i18n.commitencoding
utf-8
Some non-relevant contents are stripped off the file. Specifically:
diff data
Sometimes, the commit message file contains the diff data for the commit. This data begins with a line starting with the fixed string diff --git a/. Everything from such a line on is stripped off the file.
diff --git a/
comment lines
Every line beginning with a # character is stripped off the file.
#
trailing spaces
Any trailing space is stripped off from all lines in the file.
trailing empty lines
Any empty line at the end is stripped off from the file, making sure it ends in a single newline.
All this cleanup is performed to make it easier for different plugins to analyze the commit message using a canonical base.
Writes the list of strings MSG to FILENAME. It's useful during the commit-msg and the prepare-commit-msg hooks.
MSG
The file is written to using the character encoding defined by the i18n.commitencoding configuration option or utf-8 if not defined.
An empty line (\n\n) is inserted between every pair of MSG arguments, if there is more than one, of course.
\n\n
Returns the list of names of the references affected by the current push command. It's useful in the update and the pre-receive hooks.
update
pre-receive
Returns the two-element list of commit ids representing the OLDCOMMIT and the NEWCOMMIT of the affected REF.
Returns the list of commits leading from the affected REF's NEWCOMMIT to OLDCOMMIT. The commits are represented by Git::Repository::Log objects, as returned by the get_commits method.
get_commits
The optional arguments OPTIONS and PATHS are passed to the get_commits method.
Returns a hash with information about files changed in the index (aka stage area or cache) compared to HEAD. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-index --name-status command. It's useful in the pre-commit hook when you want to know which files are being modified in the upcoming commit.
git diff-index --name-status
pre-commit
FILTER specifies in which kind of changes you're interested in. It's passed as the argument to the --diff-filter option of git diff-index, which is documented like this:
--diff-filter
git diff-index
--diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]] Select only files that are Added (A), Copied (C), Deleted (D), Modified (M), Renamed (R), have their type (i.e. regular file, symlink, submodule, ...) changed (T), are Unmerged (U), are Unknown (X), or have had their pairing Broken (B). Any combination of the filter characters (including none) can be used. When * (All-or-none) is added to the combination, all paths are selected if there is any file that matches other criteria in the comparison; if there is no file that matches other criteria, nothing is selected.
Returns a hash with information about files that are changed between commits FROM and TO. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-tree --name-status command. It's useful in the update and the pre-receive hooks when you want to know which files are being modified in the commits being received by a git push command.
git diff-tree --name-status
git push
FILTER specifies in which kind of changes you're interested in. Please, read about it in the filter_name_status_in_index method above.
filter_name_status_in_index
FROM and TO are revision parameters (see git help revisions) specifying two commits. They're passed as arguments to the git diff-tree command in order to compare them and grok the files that differ between them.
git help revisions
git diff-tree
A special case occurs when FROM is the undefined commit, which happens when we're calculating the commit range in a pre-receive or update hook and a new branch or tag has been pushed. In this case we pass FROM and TO to the get_commits method to find the list of new commits being pushed and calculate the difference between the first commit's parent and TO. When the first commit has no parent (in case it's a root commit) we return an empty list.
Returns a hash with information about files that are changed in COMMIT. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-tree --name-status command. It's useful in the patchset-created and the draft-published hooks when you want to know which files are being modified in the single commit being received by a git push command.
draft-published
COMMIT is a revision parameter (see git help revisions) specifying the commit. It's passed a argument to git diff-tree in order to compare it to its parents and grok the files that changed in it.
Merge commits are treated specially. Only files that are changed in COMMIT with respect to all of its parents are returned. The reasoning behind this is that if a file isn't changed with respect to one or more of COMMIT's parents, then it must have been checked already in those commits and we don't need to check it again. In this case, since the files may have been changed differently in each branch (added, modified, deleted, etc.), the hash values are strings of letters, one for each branch.
Returns the sorted keys of the hash that would be returned by the filter_name_status_in_index method if invoked with the same arguments.
Returns the sorted keys of the hash that would be returned by the filter_name_status_in_range method if invoked with the same arguments.
filter_name_status_in_range
Returns the sorted keys of the hash that would be returned by the filter_name_status_in_commit method if invoked with the same arguments.
filter_name_status_in_commit
Returns the username of the authenticated user performing the Git action. It groks it from the githooks.userenv configuration variable specification, which is described in the Git::Hooks documentation. It's useful for most access control check plugins.
githooks.userenv
Returns the repository name as a string. Currently it knows how to grok the name from Gerrit and Bitbucket servers. Otherwise it tries to grok it from the GIT_DIR environment variable, which holds the path to the Git repository.
GIT_DIR
Returns the repository's current branch name, as indicated by the git symbolic-ref HEAD command.
git symbolic-ref HEAD
If the repository is in a detached head state, i.e., if HEAD points to a commit instead of to a branch, the method returns undef.
Returns the SHA1 of the commit represented by REV, using the command
git rev-parse --verify REV
It's useful, for instance, to grok the HEAD's SHA1 so that you can pass it to the get_commit method.
Returns the string "HEAD" if the repository already has commits. Otherwise, if it is a brand new repository, it returns the SHA1 representing the empty tree. It's useful to come up with the correct argument for, e.g., git diff during a pre-commit hook. (See the default pre-commit.sample script which comes with Git to understand how this is used.)
git diff
Returns the name of a temporary file into which the contents of the file FILE in revision REV has been copied.
It's useful for hooks that need to read the contents of changed files in order to check anything in them.
These objects are cached so that if more than one hook needs to get at them they're created only once.
By default, all temporary files are removed when the Git::Repository object is destroyed.
Any remaining ARGS are passed as arguments to File::Temp::newdir so that you can have more control over the temporary file creation.
File::Temp::newdir
If REV:FILE does not exist or if there is any other error while trying to fetch its contents the method dies.
Returns the size (in bytes) of FILE (a path relative to the repository root) in revision REV.
Returns the mode (as a number) of FILE (a path relative to the repository root) in revision REV.
This method should be invoked by hooks to see if REF is enabled according to the githooks.ref and githooks.noref options. Please, read about these options in Git::Hooks documentation.
githooks.ref
githooks.noref
REF must be a complete reference name or undef. Local hooks should pass the current branch, and server hooks should pass the references affected by the push command. If REF is undef, the method returns true.
The method decides if a reference is enabled using the following algorithm:
If REF matches any REFSPEC in githooks.ref then it is enabled.
Else, if REF matches any REFSPEC in githooks.noref then it is disabled.
Else, it is enabled.
This method is DEPRECATED. Please, use the is_reference_enabled method instead.
is_reference_enabled
Returns a Boolean indicating if REF matches one of the ref-specs in SPECS. REF is the complete name of a Git ref and SPECS is a list of strings, each one specifying a rule for matching ref names.
As a special case, it returns true if REF is undef or if there is no SPEC whatsoever, meaning that by default all refs/commits are enabled.
You may want to use it, for example, in an update, pre-receive, or post-receive hook which may be enabled depending on the particular refs being affected.
post-receive
Each SPEC rule may indicate the matching refs as the complete ref name (e.g. refs/heads/master) or by a regular expression starting with a caret (^), which is kept as part of the regexp.
refs/heads/master
^
Checks if the authenticated user (as returned by the authenticated_user method above) matches the specification, which may be given in one of the three different forms acceptable for the githooks.admin configuration configuration option, i.e., as a username, as a @group, or as a ^regex.
authenticated_user
githooks.admin
username
@group
^regex
Checks if the authenticated user (again, as returned by the authenticated_user method) matches the specifications given by the githooks.admin configuration variable. This is useful to exempt "administrators" from the restrictions imposed by the hooks.
This method returns a list of ACLs (Access Control Lists) grokked from the CFG.acl options, where CFG is a configuration session like githooks.checkfile.
CFG.acl
githooks.checkfile
The CFG.acl is a multi-valued option specifying rules allowing or denying specific users to perform specific actions on specific "things". (Commons such things are references and files). By default any user can perform any action on any thing. So, the rules are used to impose restrictions.
When a hook is invoked it groks all things that were affected in any way by the commits involved and tries to match each of them to a RULE to see if the action performed on it is allowed or denied.
A RULE takes three or four parts, like this:
(allow|deny) [ACTIONS]+ <spec> (by <userspec>)?
(allow|deny)
The first part tells if the rule allows or denies an action.
[ACTIONS]+
The second part specifies which actions are being considered by a combination of letters. The ACTIONS argument is a string containing all valid letters for the corresponding ACLs.
See the documentation of the acl option in the Git::Hooks::CheckFile and the Git::Hooks::CheckReference plugins for two examples of this.
acl
<spec>
The third part specifies which things are being considered. In its simplest form, a spec is taken as a literal string matching the thing exactly by name.
spec
If the spec starts with a caret (^) it's interpreted as a Perl regular expression, the caret being kept as part of the regexp. These specs match potentially many things.
Before being interpreted as a string or as a regexp, any sub-string of it in the form {VAR} is replaced by $ENV{VAR}. This is useful, for example, to interpolate the committer's username in the spec, in order to create personal name spaces for users.
{VAR}
$ENV{VAR}
(See the documentation of the acl option in the Git::Hooks::CheckFile and the Git::Hooks::CheckReference plugins for examples things as files and references, respectively.)
by <userspec>
The fourth part is optional. It specifies which users are being considered. It can be the name of a single user (e.g. james) or the name of a group (e.g. @devs).
james
@devs
If not specified, the RULE matches any user.
The RULEs are matched in the reverse order as they appear in the result of the command git config CFG.acl, so that later rules take precedence. This way you can have general rules in the global context and more specific rules in the repository context, naturally.
git config CFG.acl
So, the last RULE matching the action, the file, and the user, tells if the operation is allowed or denied.
If no RULE matches the operation, it is allowed by default.
In the returned list, each ACL is represented by a hash with the following keys:
Contains the original representation of the ACL, which is useful in producing error messages.
allow
A Boolean telling if the ACL is an "allow".
action
The string representation of the action (e.g. 'AMD' or 'CRUD').
The spec, which can be either a string or a pre-compiled regex object.
who
The name of a user or of a group.
As an optimization, only ACLs matching the current user, either explicitly or by not having a WHO part, are returned in the list.
Git::Repository::Plugin, Git::Hooks.
Git::Repository::Plugin
Gustavo L. de M. Chaves <gnustavo@cpan.org>
This software is copyright (c) 2020 by CPqD <www.cpqd.com.br>.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Git::Hooks, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Git::Hooks
CPAN shell
perl -MCPAN -e shell install Git::Hooks
For more information on module installation, please visit the detailed CPAN module installation guide.