NAME
PApp - creating applications for the WorldWideWeb [Vortrag]
PApp - what is it?
PApp (which is simply short for "Perl APPlication") is a basically a
collection of perl modules directed at seasoned perl programmers that
allows one to create *large* applications for stateful protocols (like
http or wap). It is possible to implement a simple-but-complete content
management system in a few hundred lines of papp-code.
About this document/presentation
The presentation I will hold at the linuxworldexpo will explain a lot of
technical details not included in this document. The slides for the
linuxworldexpo presentation will be available at
linuxworldexpo.
What was the motivation for creating PApp?
The original motivation for writing PApp came when our company (nethype
GmbH) started to implement a highly interactive website that required at
least limited forms of content management.
Creating large or even medium-sized applications for the Web using CGI
is a tedious task. Things like preserving state (a.k.a. session-tracking
and/or user-tracking) is a major headache, both for the programmer and
the security advisor.
PApp solves these and a lot of other problems we didn't originally
envision by providing a generic and easy API.
Features of PApp / Advantages over other solutions
While PApp could be mistaken to be yet another CGI-wrapper, this is not
the case. PApp concentrates on providing an application-centric view,
rather than a web-page-centric one. This means that there can be one web
page per file (as with CGI), but it is also possible to put multiple
pages into a single file (papp pages, called *modules*, are often so
short that it makes a lot of sense to group related pages together), or
to distribute them between many files. Of course, the good-old include
mechanism is still there (although not named "include").
State management / Persistance
At your option (you want this!), PApp can manage persistent variables
for a single session or user. This means that variables almost
automatically stay persistent for the whole session. This marking
mechanism is very generic: State variables (called *state keys* in PApp
parlance) can be marked as session-dependent (the default),
user-dependent (persistent over session borders, also called *preference
items*), local to a page or a group of pages or any mixture thereof.
Interesting planned extensions include things like transactions and
transaction-dependent state keys.
An example for a user-dependent state key is the language the user
selected last. A typical session-dependent variable is the flag wether
the user has authorized herself. A local variable could be data from a
multi-page transaction.
Security
A common bug in cgi scripts is passing of sensitive data in so-called
*hidden fields* of web forms, hidden from the casual user but of course
open to attacks. With PApp this is very unnatural: The state data never
leaves the server, but instead an encrypted (128 bit twofish code)
*cookie* is used (usually encoded in the URL, not to be mistaken with
the cookies netscape implements). Compromising the server key gives
access to other sessions (similar to a broken caching proxy), but still
makes it impossible to change the data.
In general, the design of PApp makes security the first priority, not
only by careful design of the network/server protocol but also by
providing easy and standardized methods for common tasks.
Session/User-tracking
Session and user-tracking are done automatically by PApp. The
application can react to session starts if necesary (e.g. by redirecting
the user to a start page when the data on the accessed page is no longer
available or initializing state keys on session start). This is also
possible when new users start a new session, of course. A session is
defined by PApp as a tree with the page that started a session (i.e. one
without or with an invalid session cookie)
User tracking beyond sessions is currently done using the http-cookie
mechanism. Care has been taken to do this sensibly, however: The session
data from cookies is ignored and a user without (or with disabled)
cookies will not be flooded with cookie request more than once or twice
a day. This is another example of how PApp can easily adapt to users.
User administration
PApp manages users per-server, not per-application. Applications can use
an access right system similar to the unix user/group mechanism to
aministrate its users, but can also implement their own system. PApp
identifies every user using a unique user id with optional attributes
like name/password/group and preferences.
I18n
I18n is short for *Internationalization*, which, in the context of PApp
means multiple language support. PApp does this using *language
tagging*, string scanning and a generic translation editor. The I18n
model of PApp is more general than the widely-known gettext model
implimented by GNU and sun, among others. The target language is chosen
using the users preferred language and protocol-specific data (e.g. the
"Accept-Language"-http-header).
Every source file can use a different language (if necessary; the
language format allows finer-grained distribution of languages but this
is not yet implemented). PApp can scan for strings in papp-sources,
text/html/xml files and even database fields (e.g. you can declare a
single database table row as english, to be translated, or as mixed
language, to be scanned for language tags). Every application specifies
the destination languages it wants to support. A translation editor (an
example application "delivered" with PApp) can be used by translators to
translate as-of-yet-untranslated messages, updates can be done on the
fly.
I18n is as easy as writing "__"Translate this"" (the tagging syntax is a
reminiscent of the widely used _("message")-syntax of C) in your
documents or program.
Unicode / Multi-charset ability
Internally, PApp supports only two datatypes: *binary* and *text*.
Binary is usually used for images or similar data, while text should be
used for html (or xml or wml...) pages. Relatively recent fixes to the
unicode standard mostly pacified the objections raised by a lot of
cultural groups against this standard, so PApp encodes everything using
unicode.
The internal representation is independent of the output encoding or
even the output character set. You can opt to output "iso-8859-1" text
or, if the user wants something else, another charset + encoding like
"iso-2022-jp". This selection can be based on program choice (i.e.
hardwired) or can be based on the users preferences or protocol data
(e.g. the "Accept-Charset"-http-header), similar to the selection of the
target language.
Speed
Speed was a major concern when designing the API. Although implemented
to a large part in perl, PApp aims at providing a similar speed than
bare in-server scripts (like "Apache::Registry"-based scripts). At least
for non-trivial ("real-world") pages, PApp pages should be very similar
in performance, but of course cannot beat "Apache::Registry". However,
features like I18n come basically at no cost, both with respect to
programming time and to the runtime, often leading to correctly tagged
applications even when multi-language was not originally a target of an
application ("because it's so easy to do").
Scalability
PApp applications easily scale to many servers if a single server cannot
sustain the load. The only limitation is that there must be a single
database-server managing the state keys, which is usually a very small
amount of processing power a PApp application consumes.
XML
PApp support for XML is two-fold: First, PApp uses XML for its own
papp-source-format specifying basic layout of an application. The
decision for using XML was not easy, as XML can be regarded as a format
designed for machines (but still decipherable and writable by humans, if
necessary), while humans are generally better suited with SGML.
However, XML is used not only for internal source files, but is fully
supported as a text format. together with its evil brother *PXML*, which
is basically xml (text) with embedded perl code (or vice versa), it
allows PApp to apply stylesheets (XSLT) to webpages either at
compliation time or at runtime. PApp applications can be written fully
in XML, and only the output stylesheets decides wether to actually
output HTML, WML (for mobile phones) or XML (for XML-capable-browsers,
if sensible).
As an added gimmick, PApp can dynamically fetch XML or PXML data (or
code!) from other sources at runtime. Content management systems usually
want to store pages (and/or layout) inside a database. An example on how
to implement this even fits as an example into the manpage for the
"PApp::XML"-module.
Protocol- & Layout-independence
Together with stylesheets, PApp applications can be written with only
minimal dependencies on layout or target protocol. It is possible to
write applications that only provide "modules", and only the final
stylesheet decides how it is rendered. Since PApp supports this
implicitly this enables layout- and protocol-independent applications.
Together with the I18n model, this enables the almost complete
seperation of translation/programming/design and environment. I18n comes
at almost no cost, while layout seperation of course requires though on
the programmer's side, while protocol independence requires translation
stylesheets to translate internal xml representations into the target
"language".
Platform-independence
Although the main target of PApp is Apache/mod_perl, an interface based
on CGI (or similar mechanisms) is possible (yet slower). PApp
applications are indifferent to the environment (i.e. it is easy to
write an application that runs both under CGI and inside apache).
Database support
Serious web applications without database support are, of course,
impossible. Therefore PApp provides a lot of conviniences for
applications: Each application (and sub-application) can define a
default database connection which is persistent, cached, and checked
(like all database connections in PApp). SQL support is compatible to
the underlying DBI interface, but PApp programmers rarely have to resort
to that API. PApp automatically caches prepared SQL statements (allowing
the query optimiser to work once). Since a code-excerpt is better than a
thousand marketing words, here is an actual example:
<:
my $st = sql_exec \my($id, name),
"select id, name from user where name like ?",
$S{name};
:>
<table><tr><th>ID</th><th>Name</th></tr>
<:
while ($st->fetch) {
?><tr><td>$id</td><td>$name</td></tr><:
}
:>
</table>
This displays all id/name-pairs in a given table using a
nicely-formatted html-table.
PApp currently uses MySQL for internal state management (MySQL is very
fast for the task at hand). Applications are free to use any database
they want, of course.
Persistent helper objects
As mentioned earlier, database connections are persistent. PApp provides
a number of "unusual" persistent objects, for example, it is possible to
tell PApp that a given callback needs to be called after a specific URL
has been clicked, independent of the target page (i.e. the target page
has to know nothing about who called it). Another helper object is the
persistent SQL row object, which maps SQL rows into perl hashes. The
following code excerpt (using the *editform* library which is part of
PApp, and fully language-tagged) displays the id and name fields of a
table in a freely-editable (HTML-) form. Updates to the database are
done automatically, thus, the example is complete:
<:
my $row = new PApp::DataRef 'DB_row',
table => "user",
where => [id => $userid];
# pre-set name
$row->{name} ||= "<username>";
ef_begin;
:><br>__"ID:" <?ef_string \$row->{id} , 5:><:
:><br>__"Name:" <?ef_string \$row->{name}, 20:><:
ef_submit __"Update";
ef_end;
:>
Debugability
Since PApp is written by its main users (or, conversely, the main users
of PApp also develop it), debugging support is relatively strong. PApp
usually is able to deliver a complete backtrace of the program,
including "interesting objects", when a fatal error occurs and features
a powerful exception mechanism to gather information from all stages
(low- to high-level) of a request. It is possible to store the backtrace
and error information into a database, providing the user with a nice
error message while mailing the administrator about the incident. The
information saved by PApp allows the programmer to precisely reproduce
the error situation (as far as possible), but usually the URL suffices
the uniquely identify the precise state of the session.
Schemes where the user could add comments to such a "coredump" or even
browse the data structures interactively (given correct access rights)
should be possible, but not implemented so far (this would be
implemented using a specialized PApp application supposedly called the
"error browser").
"Web-Widgets"
The newest addition to PApp (and therefore not final in its
implementation) are reusable components also dubbed *Web-Widgets*,
similar to reusable GUI-objects named "widgets". An example for such a
component would be a standard "forum" widget. In one of our projects we
use the same forum widget to provide "web chat", "small ads" and the
"news!" page, all with the same code but using different stylesheets to
customize the layout.
A PApp-application is, in some sense, a large state-machine. this is a
limitation of stateless protocols which is hard, but not impossible, to
circumvent (if perl only had efficient continuations). Any application
can be embedded into other applications, while retaining their own state
and their own state machine and of course their own set of variables /
state keys. Standard PApp applications are not much more than a single
page with html header/footer, authenticitation check and an embedded
"Web-widget".
Logging
Logging is a required part of any serious business. Gathering
statistical data is a basic requirenment today. PApp saves every state
key to a database for each page impression/request (usually between 300
and 900 bytes). This saved information contains everything nedeed to
recreate a given page except the actual program code. When session/state
data is being expired (necessary to put a sensible bound on the size of
the state database), PApp conviniently allows applications to gather
statistical data from individual "hits", using almost the same
environment as at runtime. Expiring and data gathering can, of course,
be done seperately if necessary.
Disadvantages
PApp is not actually a revolution, judged by its components. It does,
however draw a lot of functionality and ideas into a single,
well-contained package. Nevertheless, there are quite a few reasons on
why *NOT* to use PApp, or at least not to use PApp *YET*.
* PApp is not yet a released module, its API is in flux (with respect
to recent features), and not everything is working as it should yet.
This is fortunately only a question of time.
* Only a single person is currently developing and designing PApp for
free. This means that advances might sometimes not lead into the
direction you want, and maybe not in the schedule you want.
* PApp requires the very latest perl - at the moment, this is perl 5.7
+ custom patches, due to the buggy unicode support in earlier
versions. The latest released PApp module (on CPAN) does not rely on
unicode, but is already quite outdated with respect to current
features.
* There is a total lack of tutorials or introductory courses. While
all PApp modules are documented, it is very much impossible to learn
it using the reference documentation alone. Basically a one-day
course under four eyes is necessary to get you up and running - PApp
is not trivial. The lost time is generally made up with the increase
in productivity experienced with PApp, though, similar to learning
Perl ;)
References
* PApp is available as a standard module from CPAN, although the
current version uploaded is very old.
* The PApp homepage is available at the author's homepage, under
to the current manpages. Online demos are, unfortunately, not yet
available.
* The newest versions of PApp can be accessed using CVS as a
* The slides for this presentation will be available at the author's
the linuxworldexpo.
Apology
I'd like to apologize for any typoes or other mistakes in this document.
It was written in a single session without access to a spellchecker and
so has not been debugged it yet ;)