NAME

Text::TikiText - TikiText

SYNOPSIS

        use Text::Tiki;
        my $tiki = new Text::Tiki;
        
        $tiki->wiki_implicit_links(1);
        $tiki->wiki_prefix('http://www.timaoutloud.org/foo?');
        $tiki->interwiki_links(1);
        $tiki->interwiki_table(
                        { 
                                wikipedia=>'http://en2.wikipedia.org/wiki/',
                                joi=>'http://joi.ito.com/joiwiki/',
                                atom=>'http://www.intertwingly.net/wiki/pie/'
                        }
                );

        $tiki->macro_handler('BR', \&html_break, 'inline');

        print $tiki->format(\@lines);
        print $tiki->formatline($line);

DESCRIPTION

Despite the notion of a universal canvas, rich authoring of content through Web browsers is still rather poor and laborious to do. There have been attempts to create WYSIWYG(What You See Is What You Get) editor widgets to rectify this, however none of these tools are reliable cross-platform and cross-browser not and often lack the flexiblity of its read-only counterparts. This is unfortunate and nothing one person will be able to fix any time soon leaving us to cope with brain dead <textarea> and plain text.

TikiText is an attempt to work with what we have and minimize (not completely solve) these shortcomings.

I was faced with the task of architecting a way for non-developer non-markup saavy business user to publish information. Plain text (with no formatting) was not going to cut it. Nor was teaching them XHTML markup. I did an intensive study of different structured text formatting notations that have been developed in the past. These notations included a few different Wiki implements such as UseMod Wiki, MoinMoin Wiki, Text::WikiFormat, in addition to Zope's Structured Text, HTML::FromText and Textile. For one reason or another these notations fell short of my requirements. So in scratching my own itch I developed a notation I call TikiText based on my observations and key learnings.

The name Tiki came from the combination of Text formatting and wIKI and was chosen to reflect Hawaiian heritige. (For those not familiar with this mythical god of retro poleynesia it's said /tee-kee/ and not /tick-E/)

I defined the design goals for TikiText are as follows:

  • Leverage existing text formatting notions.

  • Least amount of characters from plain text.

  • Use more intuitive and common plain text email conventions.

  • Abstract users from needing to know or understand markup whenever possible.

  • Make valid and semantical XHTML markup easy. (And let CSS do its job!)

  • Easy to learn the basics. Richer functionality for those who want to dive in.

While Wikis are a part of TikiText's lineage, it was never my intention to create a new Wiki notation or tool. Based on the feedback I received from the initial releases, I've added more Wiki features to this module. (See "Wiki Functions" for more.)

This code is quite usable and has been improved over the months, but it should still be used with the understanding that it is still somewhat experimental and is just being tested and properly documented. Feedback, bug fixes, and feature implementations are appreciated. Furthermore, I realized this format is less then perfect and falls short of its design goals. My hope is that it will be refined an tweaked over time to optimize its effectiveness.

TikiText NOTATION

The first thing you must understand about TikiText and, generally speaking, most other text formatting notions is that spaces and linebreaks particularly significant. To a certain extent, tabs and puncuation are also are important to the engine's interpretation. The module attempt to handle whitespace that may be introduced while while cutting and pasting text, but it may not be perfect and unexpected results may occur.

Block-Level Formatting

Block-Level formatting is set by one or more characters at the start of line followed by a space. Multiple consecutive lines with the same starting format are treated as part of the same block. A block is terminated by at least one blank line. HTML breaks (<br />) are now supported inside of paragraphs and blockquotes.

 Paragraph:             (Line without block formatting)
 Blockquote:            > 
 Preformatted Text:     (space) or (tab)
 Code (Block):          % (A special type of PRE section where TikiText is ignored.)
 Table:                 | (See the section on Tables for more.)
 Headings:              !# (i.e !1, !2, ! alone implies level 1)
 Horizontal Line:       ---- (A line with 4+ dashes.)

List Formatting

Like block-level formatting a list is defined by one or more characters at the start of a line. List types cannot be intermixed and definition lists cannot be nested.

 Unordered List Item:       *
 Ordered List Item:         #
 Definition List Item:      ; Definition
                            : Text 

Multiple lines beginning with a : (colon) allows for multiple text definitions to be associated to a definition.

For example:

 ; foo
 : A sample name for absolutely anything, especially programs and files (especially scratch files).
 : Term of disgust.

For clarity the practice of place the semi-colon and colon on the same line is no longer supported.

Inline Formatting

Inline formatting differs from block-level formatting and lists in that they do not have to start a line. They also tend to mark a smaller piece of data. Inline elements are used within a block of list structure such as a paragraph or blockquote. Inline formatting cannot cross lines.

 Strong/Bold:           *hello world*
 Emphasis/Italics:      /hello world/
 Inserted:              +hello world+
 Delete/Strikethrough:  -hello world-
 Subscript:             ~hello world~
 Superscript:           ^hello world^
 Quote:                 "hello world"
 Code (short):          %hello world%
 Cite:                  @hello world@

Hyperlinking

Like inline formatting the notion for creating a hyperlink cannot cross lines. URLs (the text following the colon) can be an external, absolute, relative reference. (TikiText takes in everything after the colon until the first space and use that string for the href.)

 Hyperlink:     [Text to link]:URL 

Images

Simple image insertion is supported in TikiText. In this version only partial functionality has been added. Like the notion for creating a hyperlink, image markers cannot cross lines. URLs (the text following the colon) can be an external, absolute, relative reference. (TikiText takes in everything after the colon until the first space and use that string for the href.)

 Image: {Some sample alternate text}:IMG-URL 

Acronyms

Authors can create acronym tagging in TikiText and are encouraged to do so. TikiText will scan for words in all capitals followed immediately (no space) by parenthesis with the full description contained.

 Acronym: ACRONYM(The description of ACRONYM)

Tables

TikiText supports basic tables. All table blocks begin with the | (pipe) character. Each line is a row. Columns are also seperated by the pipe character. All rows should end with a pipe character. Table headers, cell aligns and columns spans are supported. Nested tables are not support nor are row spans.

 |                     Column seperator.
 |!                    All cells in this row are headings.
 |<                    Left justify this cell.
 |^                    Center this cell.
 |>                    Right justify this cell.
 |(span)||             A column span. (The last cell is spanned over blank 
                        cells that follow.)

Leading and trailing whitespaces in each cell are ignored. This way authors have the option to make tables more readable without being parsed. This assumes the author is using a fixed-width font.

For example this TikiText...

 |!heading 1|heading 2|heading 3|
 |< left    |^ center |> right  |
 |^ centered across 3 columns |||

...would produce the following table:

        <table>
        <tr>
        <th>heading 1</th>
        <th>heading 2</th>
        <th>heading 3</th>
        </tr>
        <tr>
        <td align="left">left</td>
        <td align="center">center</td>
        <td align="right">right</td>
        </tr>
        <tr>
        <td align="center" colspan="3">centered across 3 columns</td>
        </tr>
        </table> 

Automated Functions

TikiText also provides several automated features for convenience that are derived from the semantic structure of the input and standard best practices.

  • TikiText will UTF8 encode all output.

  • TikiText will generate and inserts named links for each heading.

  • TikiText will autolink URLs. The list of recognized protocols is taken from RFC 1630: "Universal Resource Identifiers in WWW" though it excludes the file protocol

  • TikiText will autolink email addresses and apply some basic spambot protection.

  • TikiText will convert symbols usually commonly represented using multiple character to their typographic equivalants. (See "Typographic Conversions".)

Typographic Conversions

TikiText will convert symbols usually commonly represented using multiple character to their typographic equivalants similar to John Gruber's SmartPants plugin for MovableType. The following is a list of multi-character representations and their numeric entity equivelents TikiText will convert.

 --                                 &#8212; (em dash)
 - (spaces on either side)          &#8211; (en dash)
 ...                                &#8230; (horizontal ellipsis)
 (R)                                &#174;  (registered tademark)
 (TM)                               &#8482; (trademark symbol)
 (C)                                &#169;  (copyright symbol)
 1/4                                &#188;  (fraction one-fourth)
 1/2                                &#189;  (fraction one-half)
 3/4                                &#190;  (fraction three-fourths)
 (digets) x (digets)                &#215;  (multiply sign)

Not Supported

This is a list of formatting that IS NOT supported by TikiText. Some of this unsupported feature is out of scope. Others are unimplemented features. Please see the the TO DO list for more information.

  • div, span, form elements, or the use of class="" to name few.

  • Mid-word inline formating.

  • Ordered List Item with specific values.

USAGE

$tiki->new()

Instaniates a new TikiText processor and automatically invokes the init and clear_handlers.

$tiki->format($text)

The "workhorse" method. Takes in a scalar or array reference assumed to be TikiText and returns XHTML as a scalar. Any handlers that have been registered will be called during the execution of this method.

$tiki->format_line($text)

Similar to format, this methods takes in a scalar (not an array reference) containing a single line of TikiText content and returns XHTML, however block formatting is not performed.

INTEGRATION

$tiki->init()

Resets processor to its default values. It does not clear out any data in the stash.

This method is automatically invoked when a new processor is instaniated.

$tiki->stash($key, [$value])

A simple data store method that can be used to pass information between applications and handlers during initialization and formatting operations. $key is required and a unique identify for retreiving data. $value is optional, but, if present, sets the value associated with the $key. Method always returns the value of $key.

$tiki->clear_handlers()

Sets all wiki, interwiki and macro handler tables to undefined.

WIKI FUNCTIONS

While Wikis are a part of TikiText's lineage, it was never the intention for TikiText to be used with or replace existing Wiki notations, however initial interest has been expressed towards this realm. TikiText is just a notation. How a WikiWord link is created and resolved is an implementation-specific trait that will vary. The ability for a developer to register callback routines that will be invoked when a WikiWord or IntraWiki link is encountered has been added as of version 0.70.

$tiki->wiki_implicit_links($boolean)

Sets wiki linking of WikiWords pattern processing via a boolean value. Default is false (0).

$tiki->wiki_prefix($wiki_url_prefix)

TikiText has a simple default wiki linking method built-in. If wiki_implicit_links is set to true (1) and a handler has not been set via wiki_links_handler, TikiText will construct one using the value set (a scalar) set by this method.

wiki_links_handler allows for a specialized wiki link generator routine to be hooked into the TikiText processor. When a WikiWord pattern is encountered, the processor calls the registered routine and passes in a reference to the TikiText processor instance that invoked it and a scalar containing the WikiWord text. Handlers are required to return a string scalar.

This is helpful for hooking TikiText into another system to provide tighter integration and/or robust or alternate functionality to the default routine. If registered, this routine will override the default wiki link routine and prefix.

If wiki_implicit_links must be set to true (1) or the handler will not be envoked.

        $tiki->wiki_links_handler(\&wiki_link);
        
        sub wiki_link {
                my($tiki, $word) = @_;
                return "WikiLink -> $word";
        }

$tiki->interwiki_links($boolean)

Sets interwiki linking processing of patterns such as [InterWikiName:Page] via a boolean value. Default is false (0).

$tiki->interwiki_table(\%hash_ref)

Similar to wiki_prefix, TikiText has a simple default wiki linking method built in. If interwiki_links is set to true (1) and a handler has not been set via interwiki_links_handler, TikiText will construct one using the key (interwikiwiki name) value (cooresponding URL prefix) pairs of the hash table reference passed in with this method.

interwiki_links_handler allows for a specialized interwiki link generator routine to be hooked into the TikiText processor. When a interwiki link pattern is encountered, the processor calls the registered routine and passes in a reference to the TikiText processor instance that invoked it and two scalars containing the interwiki prefix and page names as text. Handlers are required to return a string scalar.

This is helpful for hooking TikiText into another system to provide tighter integration and/or robust or alternate functionality to the default routine. If registered, this routine will override the default interwiki linking routine and table data.

If interwiki_links must be set to true (1) or the handler will not be envoked.

        $tiki->interwiki_links_handler(\&interwiki_link);
        
        sub interwiki_link {
                my($tiki, $wiki, $page) = @_;
                return "Wiki of prefix $word with page $page";
        }

MACROS

Macros are an experimental feature that was added in version 0.70 of TikiText where developers can develop and register their own tags. Tags take the form of ##TagName some optional additional string## in TikiText content.

The code seems stable and reliable after my tests, however I reserve the right to change this part of the API at a later date. Your feedback is appreciated.

$tiki->macros()

This method returns an array of hash references containing all of the macros that are currently registered.

$tiki->macro_handler($name, \&code_ref, 'macro_type')

This method registers a callback routine when a macro tag of name is found during processing. The macro type will determine how the result will be processed. They are as follows:

block

As its name implies, these macros are treated like block formatting and are expected to exist on a line by itself seperated by line breaks. Block macros are processed for any TikiText notation before being appended to the output. This macro type is useful for inserting other TikiText files or setting/unsetting a switch during processing.

block_post

This macro type is exactly like a block macro except its processing is deferred until all formatting has occurred and the formatting engine is about to return the output to its caller. This macro type is useful for inserting content you do not want processed for TikiText or summary content such as an index or table of contents.

inline

An inline macro can appear anywhere within a block. Inline macros are processed for any TikiText notation before being appended to the output. This macro type is useful for inserting a value that requires TikiText processing.

inline_literal

An inline_lieral macro is identical to an inline macro except that it is not processed for TikiText notation before being inserted into the output. This macro type is useful for inserting values such as a timestamp or environmental variable. It can also be used for an inline switch during processing.

When a macro pattern is encountered, the processor calls the registered handler and passes in a reference to the TikiText processor instance that invoked it and two scalars containing the macro name and an attribute string (if any) as text. The attribute string is and text after the first space found. TikiText passes in the raw string and does not enforce a specific format for the attribute. Handlers are required to return a string scalar. In the event that a macro handler does not insert any text an empty string should be returned, not a undefined value.

Here is the relevant code for a simple handler for inserting an explicit HTML break tag (multiple times if specified):

        $tiki->macro_handler('BR', \&html_break, 'inline');
        
        sub html_break {
                my($tiki,$name,$attrib) = @_;
                my $val = int($attrib) || 1;
                return '<br />' x $val;
        }

With this handler registered, this TikiText...

        This is a test of ##BR 3## the emergency broadcasting system
        

would be formatted into...

        <p>This is a test of <br /><br /><br /> the emergency broadcasting system</p>

TO DO

This engine is not entirely complete and does not fully meet its design goals. These are some of the known issues I am aware of and plan on rectifying in future releases. This is not a complete. Feedback is welcome.

    Autosizing of images. While basic image insertion has been added, the auto-insertion of height and width attributes needs to be implemented.

    Implement <table> captions and perhaps titles.

    Add cite="" processing to inline quotes and blockquote formatting. Smarter use of <q>

    Add support for an external acronym dictionary. Implemented as an automatic function of the TikiText engine it would make a best effort to find and tag acronyms based on a pre-existing external source.

    Add a switch and built-in function to enumerate headings (1, 1.1, 1.1.1, 1.1.2...).

    Better documentation -- particularly more code examples.

    Flesh out macros.

    Better charater encoding/decoding.

SEE ALSO

Text::WikiFormat, HTML::FromText, CGI::Kwiki

http://udell.roninhouse.com/bytecols/2001-06-06.html

http://www.usemod.com/cgi-bin/wiki.pl?TextFormattingRules

http://twistedmatrix.com:80/wiki/moin/HelpOnEditing

http://www.zope.org/Documentation/Articles/STX

http://www.textism.com/tools/textile/

http://daringfireball.net/projects/smartypants/

http://en2.wikipedia.org/wiki/Tiki

LICENSE

The software is released under the Artistic License. The terms of the Artistic License are described at http://www.perl.com/language/misc/Artistic.html.

AUTHOR & COPYRIGHT

Except where otherwise noted, Text::Tiki is Copyright 2003, Timothy Appnel, tima@mplode.com. All rights reserved.