Author image Yuki Kimoto
and 1 contributors

NAME

SPVM::Unicode - SPVM Unicode Utilities.

SYNOPSYS

  use Unicode;
  
  # Get Unicode codepoints from UTF-8 string with the byte offset and proceed the offset to next UTF-8 character position
  my $string = "あいうえお";
  my $pos = 0;
  while ((my $uchar = Unicode->uchar($string, \$pos)) >= 0) {
    # ...
  }

DESCRIPTION

Unicode is SPVM Unicode utilities. This module privides the methods to convert UTF-8 bytes to/from Unicode codepoints.

CLASS METHODS

ERROR_INVALID_UTF8

  static method INVALID_UTF8 : int ();

return -2. this means uchar function find invalid utf8.

is_unicode_scalar_value

  static method is_unicode_scalar_value : int ($code_point: int) {

Check if the given value is a Unicode scalar values.

The range of Unicode scalar values are the range of Unicode code points(0 to 0x10FFFF) except for the range of surrogate code points(0xD800 to 0xDFFF).

uchar

  static method uchar : int ($string : string, $offset_ref : int*);
  

Get a Unicode codepoint from UTF-8 string with the byte offset and proceed the offset to next UTF-8 character position.

If offset is over the string length, this method returns -1.

If invalid UTF-8 character is found, this method returns -2. This is the same value of the return value of ERROR_INVALID_UTF8 method.

uchar_to_utf8

  static method uchar_to_utf8 : string ($unicode_code_point : int);

Convert a Unicode codepoint to a UTF-8 character.

If the argument value is invalid Unicode code point, this method returns undef.

utf8_to_utf16

  static method utf8_to_utf16 : short[] ($utf8_string : string) {

Convert a UTF-8 string to a UTF-16 string.

utf16_to_utf8

  static method utf16_to_utf8 : string ($utf16_string : short[]) {

Convert a UTF-16 string to a UTF-8 string.

utf32_to_utf16

  static method utf32_to_utf16 : short[] ($utf32_string : int[]);

Convert a UTF-32 string to a UTF-16 string.

utf16_to_utf32

  static method utf16_to_utf32 : int[] ($utf16_string : short[]);

Convert a UTF-16 string to UTF-32 string.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 15:

Non-ASCII character seen before =encoding in '"あいうえお";'. Assuming UTF-8