The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Unicode Support

Returns a bool as to whether or not the sequence of bytes from s up to but not including send form a "script run". utf8_target is TRUE iff the sequence starting at s is to be treated as UTF-8. To be precise, except for two degenerate cases given below, this function returns TRUE iff all code points in it come from any combination of three "scripts" given by the Unicode "Script Extensions" property: Common, Inherited, and possibly one other. Additionally all decimal digits must come from the same consecutive sequence of 10.

For example, if all the characters in the sequence are Greek, or Common, or Inherited, this function will return TRUE, provided any decimal digits in it are from the same block of digits in Common. (These are the ASCII digits "0".."9" and additionally a block for full width forms of these, and several others used in mathematical notation.) For scripts (unlike Greek) that have their own digits defined this will accept either digits from that set or from one of the Common digit sets, but not a combination of the two. Some scripts, such as Arabic, have more than one set of digits. All digits must come from the same set for this function to return TRUE.

*ret_script, if ret_script is not NULL, will on return of TRUE contain the script found, using the SCX_enum typedef. Its value will be SCX_INVALID if the function returns FALSE.

If the sequence is empty, TRUE is returned, but *ret_script (if asked for) will be SCX_INVALID.

If the sequence contains a single code point which is unassigned to a character in the version of Unicode being used, the function will return TRUE, and the script will be SCX_Unknown. Any other combination of unassigned code points in the input sequence will result in the function treating the input as not being a script run.

The returned script will be SCX_Inherited iff all the code points in it are from the Inherited script.

Otherwise, the returned script will be SCX_Common iff all the code points in it are from the Inherited or Common scripts.