The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Unicode::BiDiRule - RFC 5893 BiDi Rule

SYNOPSIS

  use Unicode::BiDiRule qw(check);
  $result = check($string);
  %result = check($string);

DESCRIPTION

Unicode::BiDiRule performs checking according to BiDi Rule described in RFC 5893.

Note that the word "UTF-8" in this document is used in its proper meaning.

Functions

check ( $string, [ $strict ] )

Check if a string satisfys BiDi Rule.

Parameters:

$string

A string to be checked, Unicode string or bytestring.

Note that bytestring won't be upgraded to Unicode string but will be treated as UTF-8 sequence.

$strict

If 0 is specified, won't perform following checks if the string does not contain right-to-left characters and is not BiDi label:

  • String does not begin with nonspacing mark.

  • String does not contain formatting characters, space characters and line separators.

Returns:

In scalar context: One of BIDIRULE_RTL, BIDIRULE_LTR and BIDIRULE_NOTBIDI (see "Constants"), or undef.

In array context: A list of pairs describing detail of result with these keys:

result

One of values described in "Constants".

offset

If the check fails, offset from beginning of string. If succeeds, length of string.

Offset or length is based on byte for bytestring, and based on character for Unicode string.

length

When the check fails, length of disallowed substring.

Length is based on byte for bytestring, and based on character for Unicode string. It is undefined for invalid sequence.

ord

Unicode scalar value of the first character of substring, when length item is set.

unsafe

If disallowed substring contains formatting character, true value is set. Such character can cause problem to display.

Unicode::BiDiRule::UnicodeVersion()

Returns the version of the Unicode Character Database. It should be the same as Unicode::UCD::UnicodeVersion().

Constants

Possible results of checking.

BIDIRULE_RTL

String is RTL label.

BIDIRULE_LTR

String is LTR label.

BIDIRULE_NOTBIDI (0)

String does not contain right-to-left characters but is not BiDi label, neither RTL label nor LTR label.

BIDIRULE_INVALID

String does not satisfy the rule.

Exports

None by default. :all tag exports check() and constants.

RESTRICTIONS

check() can not check Unicode string on EBCDIC platforms.

CAVEATS

  • The repertoire and property values this module can provide are restricted by Unicode database of Perl core. Table below lists implemented Unicode version by each Perl version.

      Perl's version     Implemented Unicode version
      ------------------ ---------------------------------------
      5.8.0              3.2.0
      5.8.1 - 5.8.3      4.0.0
      5.8.4 - 5.8.6      4.0.1
      5.8.7 - 5.8.8      4.1.0
      5.10.0             5.0.0
      5.8.9, 5.10.1      5.1.0
      5.12.x             5.2.0
      5.14.x             6.0.0 with Corrigendum #8
      5.16.x             6.1.0
      5.18.x             6.2.0
      5.20.x             6.3.0
      5.22.x             7.0.0, correcting erratum at 2014-10-21
  • The string which is not BiDi label, neither RTL label nor LTR label, may not always be invalid. It should be checked by another rule.

SEE ALSO

RFC 5893 Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA). https://tools.ietf.org/html/rfc5893.

AUTHOR

Hatuka*nezumi - IKEDA Soji, <hatuka@nezumi.nu>

COPYRIGHT AND LICENSE

Copyright (C) 2015, 2018 by Hatuka*nezumi - IKEDA Soji

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. For more details, see the full text of the licenses at <http://dev.perl.org/licenses/>.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.