The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Normalization

Graphemes are not very human readable and require interpolation, we can avoid both issues by not using them!

Rationale

This helps give consistency, clarity, and simplicity.

If we parse a string and find 'Commencing compilation \xe2\x80\xa6' then we have to interpolate that string into 'Commencing compilation …' before we can look it up to see if it exists in a hash.

Graphemes also add a layer of complexity that hinders translators and thus makes room for lower quality translations.

Developers have it slightly better in that they’ll recognize it but it still requires effort to figure out what it is exactly and to determine what sequence they need for a given character.

You can simply use the character itself or a bracket notation method for the handful of markup related or visually special characters

possible violations

If you get false positives then that only goes to help highlight how ambiguity adds to the reason to avoid non-bytes strings!

Contains grapheme notation

A sequence of \xe2\x98\xba\xe2\x80\xa6 will be replaced w/ [comment,grapheme “\xe2\x98\xba\xe2\x80\xa6”]

possible warnings

None

Entire filter only runs under extra filter

See "extra filters" in Locale::Maketext::Utils::Phrase::Norm for more details.