Daniel Muey
and 1 contributors

Normalization

The only single white space characters allowed are normal space and non-break-space.

Rationale

  • A tiny change in white-space[-ish] characters will make a phrase lookup fail erroneously.

  • The only other purpose of allowing characters like this would be formatting which should not be part of a phrase.

    • Such formatting is not applicable to all contexts (e.g. HTML)

    • Since it is not a translatable entity translators are likley to miss it and break your format.

    • Same text with different formatting becomes a new, redundant, phrase.

    Doing internal formatting via bracket notation’s output() methods address the first 2 completely and the third one most of the time (it can be “completely” if you give it a little thought first).

  • It is easy for a developer to miss the subtle difference and get it wrong.

  • Surrounding whitespace is likely a sign that partial phrases are in use.

That being the case we simplify consistently by using single space and non-break-space characters inside the string (and the beginning if it starts with an ellipsis).

possible violations

Invalid whitespace-like characters

The string contains white space characters besides space and non-break-space, invisible characters, or control characters.

These will be turned into “[comment,invalid char UxNNNN]” (where NNNN is the Unicode code point) so you can find them visually.

Beginning white space

These are removed.

This accounts for strings beginning with an ellipsis which should be preceded by one space.

Beginning ellipsis space should be a normal space

If a string starts with an ellipsis it should be a normal space. A non-break-space implies formatting or concatenation of 2 partial phrases, ick!

Trailing white space

These are removed.

Multiple internal white space

These are collapsed into a single space.

possible warnings

None