Author image Yuki Kimoto
and 1 contributors


SPVM::Unicode - SPVM Unicode Utilities.


  use Unicode;
  # Get Unicode codepoints from UTF-8 string with the byte offset and proceed the offset to next UTF-8 character position
  my $string = "あいうえお";
  my $pos = 0;
  while ((my $uchar = Unicode->uchar($string, \$pos)) >= 0) {
    # ...


Unicode is SPVM Unicode utilities. This module privides the methods to convert UTF-8 bytes to/from Unicode codepoints.



  static method INVALID_UTF8 : int ();

return -2. this means uchar function find invalid utf8.


  static method is_unicode_scalar_value : int ($code_point: int) {

Check if the given value is a Unicode scalar values.

The range of Unicode scalar values are the range of Unicode code points(0 to 0x10FFFF) except for the range of surrogate code points(0xD800 to 0xDFFF).


  static method uchar : int ($string : string, $offset_ref : int*);

Get a Unicode codepoint from UTF-8 string with the byte offset and proceed the offset to next UTF-8 character position.

If offset is over the string length, this method returns -1.

If invalid UTF-8 character is found, this method returns -2. This is the same value of the return value of ERROR_INVALID_UTF8 method.


  static method uchar_to_utf8 : string ($unicode_code_point : int);

Convert a Unicode codepoint to a UTF-8 character.

If the argument value is invalid Unicode code point, this method returns undef.


  static method utf8_to_utf16 : short[] ($utf8_string : string) {

Convert a UTF-8 string to a UTF-16 string.


  static method utf16_to_utf8 : string ($utf16_string : short[]) {

Convert a UTF-16 string to a UTF-8 string.


  static method utf32_to_utf16 : short[] ($utf32_string : int[]);

Convert a UTF-32 string to a UTF-16 string.


  static method utf16_to_utf32 : int[] ($utf16_string : short[]);

Convert a UTF-16 string to UTF-32 string.

