NAME

dtRdr::doc::Book::whitespace - issues with whitespace

Synopsis

Weird things happen when whitespace doesn't count, but sort of counts.

The annotations rely on a reliable character position, which can be very different from byte offset due to character encoding and whitespace collapses. Thus, we have to establish conventions for whitespace which can be consistently applied in all of these situations.

All your spaces are belong to one position.

The general rule is that any amount of whitespace, whether spanning a tag or not, is treated a single space character.

Nesty Books

This becomes a little difficult with book formats that contain (rendered) nested content nodes. Because of these types of books, a position needs to be able to map from global to local so that the position in a parent can be calculated given the position in a child. See dtRdr::doc::Book::annotree for "math is fun."

As for whitespace, we have to adopt a convention that a space at the end or beginning of a node needs to belong somewhere. In these examples, I'll use square brackets to represent the opening and closing of node xml tags.

[a[b][c[d]]]
[a [b][c[d]]]
[ a [b][c[d]]]
[ a[ b][c[d] ] ]

The above are not intended to be necessarily equivalent. Just representative situations.

Because lots of linebreaks and/or indentation from manual editing and/or conversion tools is so common, the situation almost always looks like this in reality.

[ a [ b ] [ c [ d ] ] ]

This should basically reduce into the following:

[a [b ][c [d ]]]

Note that:

no node starts with a space
there are no consecutive spaces, regardless of tag boundaries

This convention is important because it needs to be shared between the book base class (which does the annotation-insertion xml munging) and the individual book plugins (which build the annotation offset table to allow for position math.)

I still need to prove it, but I believe that even this should be equivalent to the canonical example above.

[ a[ b][ c[ d] ]]

And, to be pragmatic, this is not really worth chasing, since nested content nodes which are accessible both individually and from within the parent is an impossible-to-resolve-into-a-pagewise-reader concept.

Non-breaking space

Nonbreaking space is treated as a space and collapsed. Thus "   " is equivalent to " ". This is because a given html widget may or may not pass the space (e.g. from get_selection()) is a plain space or 0xA0 (NOTE that if it does, the widget shim should replace it with a plain space at the get_selection() call.)

Not sure how to deal with this yet.

http://en.wikipedia.org/wiki/Non-breaking_space
U+00A0 -- is just &nbsp;/&#160;
U+202F -- narrow (?)
U+FEFF -- zero-width (but has issues)
U+2060 -- word-joiner (replaces FEFF)

others:

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
ensp U+2002 (8194) -- en space [1]
emsp U+2003 (8195) -- em space [2]
thinsp U+2009 (8201) -- thin space [3]
zwnj U+200C (8204) -- zero width non-joiner
zwj U+200D (8205) -- zero width joiner

To install dtRdr, copy and paste the appropriate command in to your terminal.

cpanm

cpanm dtRdr

CPAN shell

perl -MCPAN -e shell
install dtRdr

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

Synopsis

All your spaces are belong to one position.

Nesty Books

Non-breaking space

Module Install Instructions