The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Handy Values

Cast-to-bool. A simple (bool) expr cast may not do the right thing: if bool is defined as char, for example, then the cast from int is implementation-defined.

(bool)!!(cbool) in a ternary triggers a bug in xlc on AIX

This is a helper macro to avoid preprocessor issues, replaced by nothing unless under DEBUGGING, where it expands to an assert of its argument, followed by a comma (hence the comma operator). If we just used a straight assert(), we would get a comma with nothing before it when not DEBUGGING.

SV Manipulation Functions

Memory Management

GV Functions

Hash Manipulation Functions

Lexer interface

Like "lex_stuff_pvn", but takes a literal string instead of a string/length pair.

Handy Values

Returns two comma separated tokens of the input literal string, and its length. This is convenience macro which helps out in some API calls. Note that it can't be used as an argument to macros or functions that under some configurations might be macros, which means that it requires the full Perl_xxx(aTHX_ ...) form for any API calls where it's used.

Miscellaneous Functions

Character classification This section is about functions (really macros) that classify characters into types, such as punctuation versus alphabetic, etc. Most of these are analogous to regular expression character classes. (See "POSIX Character Classes" in perlrecharclass.) There are several variants for each class. (Not all macros have all variants; each item below lists the ones valid for it.) None are affected by use bytes, and only the ones with LC in the name are affected by the current locale.

The base function, e.g., isALPHA(), takes any signed or unsigned value, treating it as a code point, and returns a boolean as to whether or not the character represented by it is (or on non-ASCII platforms, corresponds to) an ASCII character in the named class based on platform, Unicode, and Perl rules. If the input is a number that doesn't fit in an octet, FALSE is returned.

Variant isFOO_A (e.g., isALPHA_A()) is identical to the base function with no suffix "_A". This variant is used to emphasize by its name that only ASCII-range characters can return TRUE.

Variant isFOO_L1 imposes the Latin-1 (or EBCDIC equivalent) character set onto the platform. That is, the code points that are ASCII are unaffected, since ASCII is a subset of Latin-1. But the non-ASCII code points are treated as if they are Latin-1 characters. For example, isWORDCHAR_L1() will return true when called with the code point 0xDF, which is a word character in both ASCII and EBCDIC (though it represents different characters in each). If the input is a number that doesn't fit in an octet, FALSE is returned. (Perl's documentation uses a colloquial definition of Latin-1, to include all code points below 256.)

Variant isFOO_uvchr is exactly like the isFOO_L1 variant, for inputs below 256, but if the code point is larger than 255, Unicode rules are used to determine if it is in the character class. For example, isWORDCHAR_uvchr(0x100) returns TRUE, since 0x100 is LATIN CAPITAL LETTER A WITH MACRON in Unicode, and is a word character.

Variants isFOO_utf8 and isFOO_utf8_safe are like isFOO_uvchr, but are used for UTF-8 encoded strings. The two forms are different names for the same thing. Each call to one of these classifies the first character of the string starting at p. The second parameter, e, points to anywhere in the string beyond the first character, up to one byte past the end of the entire string. Although both variants are identical, the suffix _safe in one name emphasizes that it will not attempt to read beyond e - 1, provided that the constraint s < e is true (this is asserted for in -DDEBUGGING builds). If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return FALSE, at the discretion of the implementation, and subject to change in future releases.

Variant isFOO_LC is like the isFOO_A and isFOO_L1 variants, but the result is based on the current locale, which is what LC in the name stands for. If Perl can determine that the current locale is a UTF-8 locale, it uses the published Unicode rules; otherwise, it uses the C library function that gives the named classification. For example, isDIGIT_LC() when not in a UTF-8 locale returns the result of calling isdigit(). FALSE is always returned if the input won't fit into an octet. On some platforms where the C library function is known to be defective, Perl changes its result to follow the POSIX standard's rules.

Variant isFOO_LC_uvchr acts exactly like isFOO_LC for inputs less than 256, but for larger ones it returns the Unicode classification of the code point.

Variants isFOO_LC_utf8 and isFOO_LC_utf8_safe are like isFOO_LC_uvchr, but are used for UTF-8 encoded strings. The two forms are different names for the same thing. Each call to one of these classifies the first character of the string starting at p. The second parameter, e, points to anywhere in the string beyond the first character, up to one byte past the end of the entire string. Although both variants are identical, the suffix _safe in one name emphasizes that it will not attempt to read beyond e - 1, provided that the constraint s < e is true (this is asserted for in -DDEBUGGING builds). If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return FALSE, at the discretion of the implementation, and subject to change in future releases.

A (discouraged from use) synonym is isALNUMC (where the C suffix means this corresponds to the C language alphanumeric definition). Also there are the variants isALNUMC_A, isALNUMC_L1 isALNUMC_LC, and isALNUMC_LC_uvchr.

Also note, that because all ASCII characters are UTF-8 invariant (meaning they have the exact same representation (always a single byte) whether encoded in UTF-8 or not), isASCII will give the correct results when called with any byte in any string encoded or not in UTF-8. And similarly isASCII_utf8 and isASCII_utf8_safe will work properly on any string encoded or not in UTF-8.

Miscellaneous Functions

Character case changing Perl uses "full" Unicode case mappings. This means that converting a single character to another case may result in a sequence of more than one character. For example, the uppercase of ß (LATIN SMALL LETTER SHARP S) is the two character sequence SS. This presents some complications The lowercase of all characters in the range 0..255 is a single character, and thus "toLOWER_L1" is furnished. But, toUPPER_L1 can't exist, as it couldn't return a valid result for all legal inputs. Instead "toUPPER_uvchr" has an API that does allow every possible legal result to be returned.) Likewise no other function that is crippled by not being able to give the correct results for the full range of possible inputs has been implemented here.

The first code point of the uppercased version is returned (but note, as explained at the top of this section, that there may be more.)

The first code point of the uppercased version is returned (but note, as explained at the top of this section, that there may be more).

It will not attempt to read beyond e - 1, provided that the constraint s < e is true (this is asserted for in -DDEBUGGING builds). If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return the REPLACEMENT CHARACTER, at the discretion of the implementation, and subject to change in future releases.

The first code point of the foldcased version is returned (but note, as explained at the top of this section, that there may be more).

The first code point of the foldcased version is returned (but note, as explained at the top of this section, that there may be more).

It will not attempt to read beyond e - 1, provided that the constraint s < e is true (this is asserted for in -DDEBUGGING builds). If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return the REPLACEMENT CHARACTER, at the discretion of the implementation, and subject to change in future releases.

The first code point of the lowercased version is returned (but note, as explained at the top of this section, that there may be more).

The first code point of the lowercased version is returned (but note, as explained at the top of this section, that there may be more). It will not attempt to read beyond e - 1, provided that the constraint s < e is true (this is asserted for in -DDEBUGGING builds). If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return the REPLACEMENT CHARACTER, at the discretion of the implementation, and subject to change in future releases.

The first code point of the titlecased version is returned (but note, as explained at the top of this section, that there may be more).

The first code point of the titlecased version is returned (but note, as explained at the top of this section, that there may be more).

It will not attempt to read beyond e - 1, provided that the constraint s < e is true (this is asserted for in -DDEBUGGING builds). If the UTF-8 for the input character is malformed in some way, the program may croak, or the function may return the REPLACEMENT CHARACTER, at the discretion of the implementation, and subject to change in future releases.

Yields the widest unsigned integer type on the platform, currently either U32 or 64. This can be used in declarations such as

 WIDEST_UTYPE my_uv;

or casts

 my_uv = (WIDEST_UTYPE) val;

Memory Management

Memory obtained by this should ONLY be freed with "Safefree".

In 5.9.3, Newx() and friends replace the older New() API, and drops the first parameter, x, a debug aid which allowed callers to identify themselves. This aid has been superseded by a new build option, PERL_MEM_LOG (see "PERL_MEM_LOG" in perlhacktips). The older API is still there for use in XS modules supporting older perls.

Memory obtained by this should ONLY be freed with "Safefree".

Memory obtained by this should ONLY be freed with "Safefree".

Memory obtained by this should ONLY be freed with "Safefree".

Memory obtained by this should ONLY be freed with "Safefree".

This should ONLY be used on memory obtained using "Newx" and friends.

Like Copy but returns dest. Useful for encouraging compilers to tail-call optimise.

The XSUB-writer's interface to the C memzero function. The dest is the destination, nitems is the number of items, and type is the type.

Like Zero but returns dest. Useful for encouraging compilers to tail-call optimise.

Fill up memory with a byte pattern (a byte repeated over and over again) that hopefully catches attempts to access uninitialized memory.

PoisonWith(0xAB) for catching access to allocated but uninitialized memory.

PoisonWith(0xEF) for catching access to freed memory.

PoisonWith(0xEF) for catching access to freed memory.

Handy Values

Returns the number of elements in the input C array (so you want your zero-based indices to be less than but not equal to).

Returns a pointer to one element past the final element of the input C array.