The MUTABLE_*() macros cast pointers to the types shown, in such a way (compiler permitting) that casting away const-ness will give a warning; e.g.:
MUTABLE_*
const SV *sv = ...; AV *av1 = (AV*)sv; <== BAD: the const has been silently cast away AV *av2 = MUTABLE_AV(sv); <== GOOD: it may warn
MUTABLE_PTR is the base macro used to derive new casts. The other already-built-in ones return pointers to what their names indicate.
MUTABLE_PTR
The *V_FROM_REF macros extract the SvRV() from a given reference SV and return a suitably-cast to pointer to the referenced SV. When running under -DDEBUGGING, assertions are also applied that check that ref is definitely a reference SV that refers to an SV of the right type.
*V_FROM_REF
SvRV()
-DDEBUGGING
Cast-to-bool. When Perl was able to be compiled on pre-C99 compilers, a (bool) cast didn't necessarily do the right thing, so this macro was created (and made somewhat complicated to work around bugs in old compilers). Now, many years later, and C99 is used, this is no longer required, but is kept for backwards compatibility.
(bool)
These are equivalent to the correspondingly-named C99 typedefs on platforms that have those; they evaluate to int and unsigned int on platforms that don't, so that you can portably take advantage of this C99 feature.
int
unsigned int
This is a helper macro to avoid preprocessor issues, replaced by nothing unless under DEBUGGING, where it expands to an assert of its argument, followed by a comma (hence the comma operator). If we just used a straight assert(), we would get a comma with nothing before it when not DEBUGGING.
Like "lex_stuff_pvn", but takes a literal string instead of a string/length pair.
Returns two comma separated tokens of the input literal string, and its length. This is convenience macro which helps out in some API calls. Note that it can't be used as an argument to macros or functions that under some configurations might be macros, which means that it requires the full Perl_xxx(aTHX_ ...) form for any API calls where it's used.
Returns whether or not the perl currently being compiled has the specified relationship to the perl given by the parameters. For example,
#if PERL_VERSION_GT(5,24,2) code that will only be compiled on perls after v5.24.2 #else fallback code #endif
Note that this is usable in making compile-time decisions
You may use the special value '*' for the final number to mean ALL possible values for it. Thus,
#if PERL_VERSION_EQ(5,31,'*')
means all perls in the 5.31 series. And
#if PERL_VERSION_NE(5,24,'*')
means all perls EXCEPT 5.24 ones. And
#if PERL_VERSION_LE(5,9,'*')
is effectively
#if PERL_VERSION_LT(5,10,0)
This means you don't have to think so much when converting from the existing deprecated PERL_VERSION to using this macro:
PERL_VERSION
#if PERL_VERSION <= 9
becomes
use bytes
LC
The base function, e.g., isALPHA(), takes any signed or unsigned value, treating it as a code point, and returns a boolean as to whether or not the character represented by it is (or on non-ASCII platforms, corresponds to) an ASCII character in the named class based on platform, Unicode, and Perl rules. If the input is a number that doesn't fit in an octet, FALSE is returned.
isALPHA()
Variant isFOO_A (e.g., isALPHA_A()) is identical to the base function with no suffix "_A". This variant is used to emphasize by its name that only ASCII-range characters can return TRUE.
isFOO_A
isALPHA_A()
"_A"
Variant isFOO_L1 imposes the Latin-1 (or EBCDIC equivalent) character set onto the platform. That is, the code points that are ASCII are unaffected, since ASCII is a subset of Latin-1. But the non-ASCII code points are treated as if they are Latin-1 characters. For example, isWORDCHAR_L1() will return true when called with the code point 0xDF, which is a word character in both ASCII and EBCDIC (though it represents different characters in each). If the input is a number that doesn't fit in an octet, FALSE is returned. (Perl's documentation uses a colloquial definition of Latin-1, to include all code points below 256.)
isFOO_L1
isWORDCHAR_L1()
Variant isFOO_uvchr is exactly like the isFOO_L1 variant, for inputs below 256, but if the code point is larger than 255, Unicode rules are used to determine if it is in the character class. For example, isWORDCHAR_uvchr(0x100) returns TRUE, since 0x100 is LATIN CAPITAL LETTER A WITH MACRON in Unicode, and is a word character.
isFOO_uvchr
isWORDCHAR_uvchr(0x100)
Variants isFOO_utf8 and isFOO_utf8_safe are like isFOO_uvchr, but are used for UTF-8 encoded strings. The two forms are different names for the same thing. Each call to one of these classifies the first character of the string starting at p. The second parameter, e, points to anywhere in the string beyond the first character, up to one byte past the end of the entire string. Although both variants are identical, the suffix _safe in one name emphasizes that it will not attempt to read beyond e - 1, provided that the constraint s < e is true (this is asserted for in -DDEBUGGING builds). If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return FALSE, at the discretion of the implementation, and subject to change in future releases.
isFOO_utf8
isFOO_utf8_safe
p
e
_safe
e - 1
s < e
Variant isFOO_LC is like the isFOO_A and isFOO_L1 variants, but the result is based on the current locale, which is what LC in the name stands for. If Perl can determine that the current locale is a UTF-8 locale, it uses the published Unicode rules; otherwise, it uses the C library function that gives the named classification. For example, isDIGIT_LC() when not in a UTF-8 locale returns the result of calling isdigit(). FALSE is always returned if the input won't fit into an octet. On some platforms where the C library function is known to be defective, Perl changes its result to follow the POSIX standard's rules.
isFOO_LC
isDIGIT_LC()
isdigit()
Variant isFOO_LC_uvchr acts exactly like isFOO_LC for inputs less than 256, but for larger ones it returns the Unicode classification of the code point.
isFOO_LC_uvchr
Variants isFOO_LC_utf8 and isFOO_LC_utf8_safe are like isFOO_LC_uvchr, but are used for UTF-8 encoded strings. The two forms are different names for the same thing. Each call to one of these classifies the first character of the string starting at p. The second parameter, e, points to anywhere in the string beyond the first character, up to one byte past the end of the entire string. Although both variants are identical, the suffix _safe in one name emphasizes that it will not attempt to read beyond e - 1, provided that the constraint s < e is true (this is asserted for in -DDEBUGGING builds). If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return FALSE, at the discretion of the implementation, and subject to change in future releases.
isFOO_LC_utf8
isFOO_LC_utf8_safe
The C suffix in the names was meant to indicate that they correspond to the C language isalnum(3).
C
isalnum(3)
Also note, that because all ASCII characters are UTF-8 invariant (meaning they have the exact same representation (always a single byte) whether encoded in UTF-8 or not), isASCII will give the correct results when called with any byte in any string encoded or not in UTF-8. And similarly isASCII_utf8 and isASCII_utf8_safe will work properly on any string encoded or not in UTF-8.
isASCII
isASCII_utf8
isASCII_utf8_safe
Returns a boolean indicating whether the specified character is a control character, analogous to m/[[:cntrl:]]/. See the top of this section for an explanation of the variants. On EBCDIC platforms, you almost always want to use the isCNTRL_L1 variant.
m/[[:cntrl:]]/
isCNTRL_L1
Returns a boolean indicating whether the specified character is a digit, analogous to m/[[:digit:]]/. Variants isDIGIT_A and isDIGIT_L1 are identical to isDIGIT. See the top of this section for an explanation of the variants.
m/[[:digit:]]/
isDIGIT_A
isDIGIT_L1
isDIGIT
See the top of this section for an explanation of the variants.
isWORDCHAR_A, isWORDCHAR_L1, isWORDCHAR_uvchr, isWORDCHAR_LC, isWORDCHAR_LC_uvchr, isWORDCHAR_LC_utf8, and isWORDCHAR_LC_utf8_safe are also as described there, but additionally include the platform's native underscore.
isWORDCHAR_A
isWORDCHAR_L1
isWORDCHAR_uvchr
isWORDCHAR_LC
isWORDCHAR_LC_uvchr
isWORDCHAR_LC_utf8
isWORDCHAR_LC_utf8_safe
They are provided for backward compatibility, even though a word character includes more than the standard C language meaning of alphanumeric. To get the C language definition, use the corresponding "isALPHANUMERIC" variant.
isALPHANUMERIC
ß
SS
"toLOWER_L1"
toUPPER_L1
"toUPPER_uvchr"
These all return the uppercase of a character. The differences are what domain they operate on, and whether the input is specified as a code point (those forms with a cp parameter) or as a UTF-8 string (the others). In the latter case, the code point to use is the first one in the buffer of UTF-8 encoded code points, delineated by the arguments p .. e - 1.
cp
p .. e - 1
toUPPER and toUPPER_A are synonyms of each other. They return the uppercase of any lowercase ASCII-range code point. All other inputs are returned unchanged. Since these are macros, the input type may be any integral one, and the output will occupy the same number of bits as the input.
toUPPER
toUPPER_A
There is no toUPPER_L1 nor toUPPER_LATIN1 as the uppercase of some code points in the 0..255 range is above that range or consists of multiple characters. Instead use toUPPER_uvchr.
toUPPER_LATIN1
toUPPER_uvchr
toUPPER_uvchr returns the uppercase of any Unicode code point. The return value is identical to that of toUPPER_A for input code points in the ASCII range. The uppercase of the vast majority of Unicode code points is the same as the code point itself. For these, and for code points above the legal Unicode maximum, this returns the input code point unchanged. It additionally stores the UTF-8 of the result into the buffer beginning at s, and its length in bytes into *lenp. The caller must have made s large enough to contain at least UTF8_MAXBYTES_CASE+1 bytes to avoid possible overflow.
s
*lenp
UTF8_MAXBYTES_CASE+1
NOTE: the uppercase of a code point may be more than one code point. The return value of this function is only the first of these. The entire uppercase is returned in s. To determine if the result is more than a single code point, you can do something like this:
uc = toUPPER_uvchr(cp, s, &len); if (len > UTF8SKIP(s)) { is multiple code points } else { is a single code point }
toUPPER_utf8 and toUPPER_utf8_safe are synonyms of each other. The only difference between these and toUPPER_uvchr is that the source for these is encoded in UTF-8, instead of being a code point. It is passed as a buffer starting at p, with e pointing to one byte beyond its end. The p buffer may certainly contain more than one code point; but only the first one (up through e - 1) is examined. If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return the REPLACEMENT CHARACTER, at the discretion of the implementation, and subject to change in future releases.
toUPPER_utf8
toUPPER_utf8_safe
These all return the foldcase of a character. "foldcase" is an internal case for /i pattern matching. If the foldcase of character A and the foldcase of character B are the same, they match caselessly; otherwise they don't.
/i
The differences in the forms are what domain they operate on, and whether the input is specified as a code point (those forms with a cp parameter) or as a UTF-8 string (the others). In the latter case, the code point to use is the first one in the buffer of UTF-8 encoded code points, delineated by the arguments p .. e - 1.
toFOLD and toFOLD_A are synonyms of each other. They return the foldcase of any ASCII-range code point. In this range, the foldcase is identical to the lowercase. All other inputs are returned unchanged. Since these are macros, the input type may be any integral one, and the output will occupy the same number of bits as the input.
toFOLD
toFOLD_A
There is no toFOLD_L1 nor toFOLD_LATIN1 as the foldcase of some code points in the 0..255 range is above that range or consists of multiple characters. Instead use toFOLD_uvchr.
toFOLD_L1
toFOLD_LATIN1
toFOLD_uvchr
toFOLD_uvchr returns the foldcase of any Unicode code point. The return value is identical to that of toFOLD_A for input code points in the ASCII range. The foldcase of the vast majority of Unicode code points is the same as the code point itself. For these, and for code points above the legal Unicode maximum, this returns the input code point unchanged. It additionally stores the UTF-8 of the result into the buffer beginning at s, and its length in bytes into *lenp. The caller must have made s large enough to contain at least UTF8_MAXBYTES_CASE+1 bytes to avoid possible overflow.
NOTE: the foldcase of a code point may be more than one code point. The return value of this function is only the first of these. The entire foldcase is returned in s. To determine if the result is more than a single code point, you can do something like this:
uc = toFOLD_uvchr(cp, s, &len); if (len > UTF8SKIP(s)) { is multiple code points } else { is a single code point }
toFOLD_utf8 and toFOLD_utf8_safe are synonyms of each other. The only difference between these and toFOLD_uvchr is that the source for these is encoded in UTF-8, instead of being a code point. It is passed as a buffer starting at p, with e pointing to one byte beyond its end. The p buffer may certainly contain more than one code point; but only the first one (up through e - 1) is examined. If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return the REPLACEMENT CHARACTER, at the discretion of the implementation, and subject to change in future releases.
toFOLD_utf8
toFOLD_utf8_safe
These all return the lowercase of a character. The differences are what domain they operate on, and whether the input is specified as a code point (those forms with a cp parameter) or as a UTF-8 string (the others). In the latter case, the code point to use is the first one in the buffer of UTF-8 encoded code points, delineated by the arguments p .. e - 1.
toLOWER and toLOWER_A are synonyms of each other. They return the lowercase of any uppercase ASCII-range code point. All other inputs are returned unchanged. Since these are macros, the input type may be any integral one, and the output will occupy the same number of bits as the input.
toLOWER
toLOWER_A
toLOWER_L1 and toLOWER_LATIN1 are synonyms of each other. They behave identically as toLOWER for ASCII-range input. But additionally will return the lowercase of any uppercase code point in the entire 0..255 range, assuming a Latin-1 encoding (or the EBCDIC equivalent on such platforms).
toLOWER_L1
toLOWER_LATIN1
toLOWER_LC returns the lowercase of the input code point according to the rules of the current POSIX locale. Input code points outside the range 0..255 are returned unchanged.
toLOWER_LC
toLOWER_uvchr returns the lowercase of any Unicode code point. The return value is identical to that of toLOWER_L1 for input code points in the 0..255 range. The lowercase of the vast majority of Unicode code points is the same as the code point itself. For these, and for code points above the legal Unicode maximum, this returns the input code point unchanged. It additionally stores the UTF-8 of the result into the buffer beginning at s, and its length in bytes into *lenp. The caller must have made s large enough to contain at least UTF8_MAXBYTES_CASE+1 bytes to avoid possible overflow.
toLOWER_uvchr
NOTE: the lowercase of a code point may be more than one code point. The return value of this function is only the first of these. The entire lowercase is returned in s. To determine if the result is more than a single code point, you can do something like this:
uc = toLOWER_uvchr(cp, s, &len); if (len > UTF8SKIP(s)) { is multiple code points } else { is a single code point }
toLOWER_utf8 and toLOWER_utf8_safe are synonyms of each other. The only difference between these and toLOWER_uvchr is that the source for these is encoded in UTF-8, instead of being a code point. It is passed as a buffer starting at p, with e pointing to one byte beyond its end. The p buffer may certainly contain more than one code point; but only the first one (up through e - 1) is examined. If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return the REPLACEMENT CHARACTER, at the discretion of the implementation, and subject to change in future releases.
toLOWER_utf8
toLOWER_utf8_safe
These all return the titlecase of a character. The differences are what domain they operate on, and whether the input is specified as a code point (those forms with a cp parameter) or as a UTF-8 string (the others). In the latter case, the code point to use is the first one in the buffer of UTF-8 encoded code points, delineated by the arguments p .. e - 1.
toTITLE and toTITLE_A are synonyms of each other. They return the titlecase of any lowercase ASCII-range code point. In this range, the titlecase is identical to the uppercase. All other inputs are returned unchanged. Since these are macros, the input type may be any integral one, and the output will occupy the same number of bits as the input.
toTITLE
toTITLE_A
There is no toTITLE_L1 nor toTITLE_LATIN1 as the titlecase of some code points in the 0..255 range is above that range or consists of multiple characters. Instead use toTITLE_uvchr.
toTITLE_L1
toTITLE_LATIN1
toTITLE_uvchr
toTITLE_uvchr returns the titlecase of any Unicode code point. The return value is identical to that of toTITLE_A for input code points in the ASCII range. The titlecase of the vast majority of Unicode code points is the same as the code point itself. For these, and for code points above the legal Unicode maximum, this returns the input code point unchanged. It additionally stores the UTF-8 of the result into the buffer beginning at s, and its length in bytes into *lenp. The caller must have made s large enough to contain at least UTF8_MAXBYTES_CASE+1 bytes to avoid possible overflow.
NOTE: the titlecase of a code point may be more than one code point. The return value of this function is only the first of these. The entire titlecase is returned in s. To determine if the result is more than a single code point, you can do something like this:
uc = toTITLE_uvchr(cp, s, &len); if (len > UTF8SKIP(s)) { is multiple code points } else { is a single code point }
toTITLE_utf8 and toTITLE_utf8_safe are synonyms of each other. The only difference between these and toTITLE_uvchr is that the source for these is encoded in UTF-8, instead of being a code point. It is passed as a buffer starting at p, with e pointing to one byte beyond its end. The p buffer may certainly contain more than one code point; but only the first one (up through e - 1) is examined. If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return the REPLACEMENT CHARACTER, at the discretion of the implementation, and subject to change in future releases.
toTITLE_utf8
toTITLE_utf8_safe
Yields the widest unsigned integer type on the platform, currently either U32 or U64. This can be used in declarations such as
U32
U64
WIDEST_UTYPE my_uv;
or casts
my_uv = (WIDEST_UTYPE) val;
The XSUB-writer's interface to the C malloc function.
malloc
Memory obtained by this should ONLY be freed with "Safefree".
In 5.9.3, Newx() and friends replace the older New() API, and drops the first parameter, x, a debug aid which allowed callers to identify themselves. This aid has been superseded by a new build option, PERL_MEM_LOG (see "PERL_MEM_LOG" in perlhacktips). The older API is still there for use in XS modules supporting older perls.
The XSUB-writer's interface to the C malloc function. The allocated memory is zeroed with memzero. See also "Newx".
memzero
"Newx"
The XSUB-writer's interface to the C realloc function.
realloc
This should ONLY be used on memory obtained using "Newx" and friends.
MoveD is like Move but returns dest. Useful for encouraging compilers to tail-call optimise.
MoveD
Move
dest
CopyD is like Copy but returns dest. Useful for encouraging compilers to tail-call optimise.
CopyD
Copy
The XSUB-writer's interface to the C memzero function. The dest is the destination, nitems is the number of items, and type is the type.
nitems
type
ZeroD is like Zero but returns dest. Useful for encouraging compilers to tail-call optimise.
ZeroD
Zero
Fill up memory with a byte pattern (a byte repeated over and over again) that hopefully catches attempts to access uninitialized memory.
PoisonWith(0xAB) for catching access to allocated but uninitialized memory.
PoisonWith(0xEF) for catching access to freed memory.
Returns the number of elements in the input C array (so you want your zero-based indices to be less than but not equal to).
Returns a pointer to one element past the final element of the input C array.
To install less, copy and paste the appropriate command in to your terminal.
cpanm
cpanm less
CPAN shell
perl -MCPAN -e shell install less
For more information on module installation, please visit the detailed CPAN module installation guide.