The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::Tabular::Web - turn a tabular file into a web application

SYNOPSIS

configure the http server (here with modperl)
  <LocationMatch "\.ftw$">
    SetHandler modperl
    PerlResponseHandler File::Tabular::Web
  </LocationMatch>
generate a CRUD application for one tabular file (scaffolding)
  cp some/data.txt /path/to/http/htdocs/some/data.txt
  perl ftw_new_app.pl /path/to/http/htdocs/some/data.txt
use the application
  http://myServer/some/data.ftw
customize
  # change some configuration options
  edit /path/to/http/htdocs/some/data.ftw 
  
  # change the views
  edit /path/to/http/htdocs/some/{data_short.tt,data_long.tt,data_edit.tt}

DESCRIPTION

Introduction

This is a simple Apache web application framework based on File::Tabular and Search::QueryParser. The framework offers builtin services for searching, displaying and updating a flat tabular datafile, possibly with attached documents (see File::Tabular::Web::Attachments and File::Tabular::Web::Attachments::Indexed).

The strong point of File::Tabular::Web is that it is built around a search engine designed from the start for Web requests : by default it searches for complete words, spanning all data fields. However, you can easily write queries that look in specific fields, using regular expressions, boolean combinations, arithmetic operators, etc. So if you are looking for simplicity and speed of development, rather than speed of execution, then you may have found a convenient tool.

We use it intensively in our Intranet for managing lists of people, rooms, meetings, internet pointers, etc., and even for more sensitive information like lists of payments or the archived judgements (minutes) of Geneva courts. Of course this is slower that a real database, but for data up to 10MB/50000 records, the difference is hardly noticeable. On the other side, ease of development and deployment and ease of importing/exporting data proved to be highly valuable assets.

Building an application

To build an application, all you need to do is :

  • Insert the data (a tabular .txt file) somewhere in your Apache htdocs tree.

  • Run the helper script ftw_new_app.pl, which automatically builds configuration and template files. The new URL becomes immediately active, without webserver configuration nor restart, so you already have a "scaffolding" application for searching, displaying, and maybe edit the data.

  • If necessary, tune various options in the configuration file, and customize the template files for presenting the data according to your needs.

In most cases, those steps will be sufficient, so they can be performed by a webmaster without Perl knowledge.

For more advanced uses, application-specific Perl subclasses can be hooked up into the framework for performing particular tasks. See for example the companion File::Tabular::Web::Attachments module, which provides services for attaching documents and indexing them through Search::Indexer, therefore providing a mini-framework for storing electronic documents.

QUICKSTART

Apache configuration

File::Tabular::Web is designed so that it can be installed once and for all in your Apache configuration. Then all applications can be added or modified on the fly, without restarting the server.

First choose a file extension for your File::Tabular::Web applications; in the examples below we assume it to be .ftw. Then configure your Apache server in one of the ways described below.

Configuration as a mod_perl handler

If you have mod_perl, the easiest way is to declare it as a mod_perl handler associated to .ftw URLs. Edit your perl.conf as follows :

  <LocationMatch "\.ftw$">
    SetHandler modperl
    PerlResponseHandler File::Tabular::Web
  </LocationMatch>

Configuration as a cgi-bin script

Create an executable file in cgi-bin directory, named ftw, and containing

   #!/path/to/perl
   use File::Tabular::Web;
   File::Tabular::Web->handler;

Then you can acces your applications as

  http://my.server/cgi-bin/ftw/path/to/my/app.ftw

Implicit call of the script through mod_actions

If your Apache has the mod_actions module (most installations have it), then it is convenient to add the following directives in httpd.conf :

  Action file-tabular-web /cgi-bin/ftw 
  AddHandler file-tabular-web .ftw

Now any file ending with ".ftw" in your htdocs tree will be treated as a File::Tabular::Web application. In other words, instead of

  http://my.server/cgi-bin/ftw/path/to/my/app.ftw

you can use URL

  http://my.server/path/to/my/app.ftw

As already explained, .ftw is just an arbitrary convention and can be replaced by any other suffix. Similarly, the file-tabular-web handler name can be arbitrarily replaced by another name.

Configuration as a fastcgi script

[probably works like cgi-bin; not tested yet]

Setting up a particular application

We'll take for example a simple people directory application.

  • First create directory htdocs/people.

  • Let's assume that you already have a list of people, in a spreadsheet or a database. Export that list into a flat text file named htdocs/people/dir.txt. If you export from an Excel Spreadsheet, do NOT export as CSV format ; choose "text (tab-separated)" instead. The datafile should contain one line per record, with a character like '|' or TAB as field separator, and field names on the first line (see File::Tabular for details).

  • Run the helper script

      perl ftw_new_app.pl --fieldSep \\t htdocs/people/dir.txt

    This will create in the same directory a configuration file dir.ftw, and a collection of HTML templates dir_short.tt, dir_long.tt, dir_modif.tt, etc. The --fieldSep option specifies which character acts as field separator (the default is '|'); other option are available, see

      perl ftw_new_app.pl --help

    for a list.

  • The URL http:://your.web.server/people/dir.ftw is now available to access the application. You may first test the default layout, and then customize the templates to suit your needs.

Note : initially all files are placed in the same directory, because it is simple and convenient; however, data and templates files are not really web resources and therefore theoretically should not belong to the htdocs tree. If you want a more structured architecture, you may move these files to a different location, and specify within the configuration how to find them (see instructions below).

WEB API

Entry points

Various entry points into the application (searching, editing, etc.) are chosen by single-letter arguments :

H

  http://myServer/some/app.ftw?H

Displays the homepage of the application (through the home view). This is the default entry point, i.e. equivalent to

  http://myServer/some/app.ftw

S

  http://myServer/some/app.ftw?S=<criteria>

Searches records matching the specified criteria, and displays a short summary of each record (through the short view). Here are some example of search criteria :

  word1 word2 word3                 # records containing these 3 words anywhere
  +word1 +word2 +word3              # idem
  word1 word2 -word3                # containing word1 and word2 but not word3
  word1 AND (word2 OR word3)        # obvious
  "word1 word2 word3"               # sequence
  word*                             # word completion
  field1:word1 field2:word2         # restricted by field
  field1 == val1  field2 > val2     # relational operators (will inspect the
                                    #   shape of supplied values to decide
                                    #   about string/numeric/date comparisons)
  field~regex                       # regex

See Search::QueryParser and File::Tabular for more details.

Additional parameters may control sorting and pagination. Ex:

  ?S=word&orderBy=birthdate:-d.m.y,lastname:alpha&count=20&start=40
count

How many items to display on one page. Default is 50.

start

Index within the list of results, telling which is the first record to display (basis is 0).

orderBy

How to sort results. This may be one or several field names, possibly followed by a specification like :num or :-alpha. Precise syntax is documented in "cmp" in Hash::Type.

max

Maximum number of records retrieved in a search (records beyond that number will be dropped).

L

  http://myServer/some/app.ftw?L=<key>

Finds the record with the given key and displays it in detail through the long view.

M

  http://myServer/some/app.ftw?M=key

If called with method GET, finds the record with the given key and displays it through the modif view (typically this view will be an HTML form).

If called with method POST, finds the record with the given key and updates it with given field names and values. After update, displays an update message through the msg view.

A

  http://myServer/some/app.ftw?A

If called with method GET, displays a form for creating a new record, through the modif view. Fields may be pre-filled by default values given in the configuration file.

If called with method POST, creates a new record, with values given by the submitted form. After record creation, displays an update message through the msg view.

D

  http://myServer/some/app.ftw?D=<key>

Deletes record with the given key. After deletion, displays an update message through the msg view.

X

  http://myServer/some/app.ftw?X

Display all records throught the download view (mnemonic : eXtract)

Additional parameters

V

Name of the view (i.e. template) that will be used instead of the default one. For example, assuming that the application has defined a print view, we can call that view through

  http://myServer/some/app.ftw?S=<criteria>&V=print

WRITING TEMPLATES

This section assumes that you already know how to write templates for the Template Toolkit (see Template).

The path for searching templates includes

  • the application directory (where the configuration file resides)

  • the directory specified within the configuration file by parameter [template]dir

  • some default directories: <server_root>/lib/tmpl/ftw/<application_name>, <server_root>/lib/tmpl/ftw/<default>, <server_root>/lib/tmpl/ftw.

Values passed to templates

self

handle to the File::Tabular::Web object; from there you can access self.url (URL of the application), self.server_root (server root directory), self.cfg (configuration information, an AppConfig object), self.mtime (modification time of the data file), self.modperl or self.cgi, and self.msg (last message). You can also call methods "can_do" or "param", like for example

  [% IF self.can_do('add') %]
     <a href="?A">Add a new record</a>
  [% END # IF %]

or

  [% self.param('myFancyParam') %]
found

structure containing the results of a search. Fields within this structure are :

count

how many records were retrieved

records

arrayref containing a slice of records

start

index of first record in the returned slice

end

index of last record in the returned slice

href link to the next slice of results (if any)

href link to the previous slice of results (if any)

Using relative URLS

All pages generated by the application have the same URL; query parameters control which page will be displayed. Therefore all internal links can just start with a question mark : the browser will recognize that this is a relative link to the same URL, with a different query string. So within templates we can write simple links like

  <a href="?H">Homepage</a>
  <a href="?S=*">See all records</a>
  <a href="?A">Add a new record</a>
  [% FOREACH record IN found.records %]
    <a href="?M=[% record.Id %]">Modify this record</a>
  [% END # FOREACH  %]

Forms

Data input

A typical form for updating or adding a record will look like

  <form method="POST">
   First Name <input name="firstname" value="[% record.firstname %]"><br>
   Last Name  <input name="lasttname" value="[% record.lastname %]">
   <input type="submit">
  </form>

Usually there is no need to specify the action of the form : the default action sent by the browser will be the same URL (including the query parameter ?A or ?M=[% record.Id %]), and when the application receives a POST request, it knows it has to update or add the record instead of displaying the form. This implies that you must use the POST method for any data modification; whereas forms for searching may use either GET or POST methods.

For convenience, deletion through a GET url of shape ?D=[% record.Id %] is supported; however, data modification through GET method is not recommended, and therefore it is preferable to write

  <form method="post">
    <input name="D" value="[% record.Id %]">
    <input type="submit" value="Delete this record">
  </form>

Searching

A typical form for searching will look like

  <form method="POST" action="[% self.url %]">
     Search : 
       <select name="S">
         <option value="">--Choose in field1--</option>
         <option value="+field1:val1">val1</option>
         <option value="+field1:val2">val2</option>
         ...
       </select>
       Other : <input name="S">
   <input type="submit">
  </form>

So the form can combine several search criteria, all passed through the S parameter. The form method can be either GET or POST; but if you choose POST, then it is recommended that you also specify

   action="[% self.url %]"

instead of relying on the implicit self-url from the browser. Otherwise the URL displayed in the browser may still contain some all criteria from a previous search, while the current form sends other search criteria --- the application will not get confused, but the user might.

Highlighting the searched words

The preMatch and postMatch parameters in the configuration file (see below) define some marker strings that will be automatically inserted in the data returned by a search, surrounding each word that was mentioned in the query. These marker strings should be chosen so that they would unlikely mix with regular data or with HTML markup : the recommanded values are

  preMatch  {[
  postMatch ]}

Then you can exploit that marking within your templates by calling the "highlight" and "unhighlight" template filters, described below.

CONFIGURATION FILE

The configuration file is always stored within the htdocs directory, at the location corresponding to the application URL : so for application http://myServer/some/data.ftw, the configuration file is in

    /path/to/http/htdocs/some/data.ftw

Because of the Apache configuration directives described above, the URL is always served by File::Tabular::Web, so there is no risk of users seing the content of the configuration file.

The configuration is written in Appconfig format. This format supports comments (starting with #), continuation lines (through final \), "heredoc" quoting style for multiline values, and section headers similar to a Windows INI file. All details about the configuration file format can be found in Appconfig::File.

Below is the list of the various recognized sections and parameters.

Global section

The global section (without any section header) can contain general-purpose parameters that can be retrieved later from the viewing templates through [% self.cfg.<param> %]; this is useful for example for setting a title or other values that will be common to all templates.

The global section may also contain some options to "new" in File::Tabular : preMatch, postMatch, avoidMatchKey, fieldSep, recordSep.

Option highlightClass defines the class name used by the "highlight" filter (default is HL).

[fixed] / [default]

The fixed and default sections simulate parameters to the request. Specifications in the fixed section are stronger than HTTP parameters; specifications in the default section are weaker : the param method for the application will first look in the fixed section, then in the HTTP request, and finally in the default section. So for example with

  [fixed]
  count=50
  [default]
  orderBy=lastname

a request like

  ?S=*&count=20

will be treated as

  ?S=*&count=50&orderBy=lastname

Relevant parameters to put in fixed or in default are described in section "S" of this documentation : for example count, orderBy, etc.

[application]

dir=/some/directory

Directory where application files reside. By default : same directory as the configuration file.

name=some_name

Name of the application (will be used for example as prefix to find template files). Single-level name, no pathnames allowed.

data=some_name

Name of the tabular file containing the data. Single-level name, must be in the application directory. By default: application name with the .txt suffix appended.

class=My::File::Tabular::Web::Subclass

Will dynamically load the specified module and use it as class for objects of this application. The specified module must be a subclass of File::Tabular::Web.

useFileCache=1

If true, the whole datafile will be slurped into memory and reused across requests (except update requests).

mtime=<format>

Format to display the last modified time of the data file, using POSIX strftime(). The result will be available to templates in [% self.mtime %]

[permissions]

This section specifies permissions to perform operations within the application. Of course we need Apache to be configured to do some kind of authentification, so that the application receives a user name through the REMOTE_USER environment variable; many authentification modules are available, see Apache/manual/howto/auth.html. Otherwise the default user name received by the application is "Anonymous".

Apache may also be configured to do some kind of authorisation checking, but this will control access to the application as a whole, whereas here we configure fine-grained permissions for various operations.

Builtin permission names are : search, read, add, delete, modif, and download. Each name also has a negative counterpart, i.e. no_search, no_read, etc.

For each of those permission names, the configuration can give a list of user names separated by commas or spaces : the current user name will be compared to this list. A permission may also specify '*', which means 'everybody' : this is the default for permissions read, search and download. There is no builtin notion of "user groups", but you can introduce such a notion by writing a subclass which overrides the "user_match" method.

Permissions may also be granted or denied on a per-record basis : writing $fieldname (starting with a literal dollar sign) means that users can access records in which the content of fieldname matches their username. Usually this is associated with an automatic user field (see below), so that the user who created a new record can later modify it.

Example :

  [permissions]
   read   = * # the default, could have been omitted
   search = * # idem
   add    = andy bill 
   modif  = $last_author # username must match content of field 'last_author'
   delete = $last_author

[fields]

The fields section specifies some specific information about fields in the tabular file.

time <field> = <format>

Declares field to be a time field, which means that whenever a record is updated, the current local time will be automatically inserted in that field. The format argument will be passed to POSIX strftime(). Ex :

  time DateModif = %d.%m.%Y    
  time TimeModif = %H:%M:%S
user = <field>

Declares field to be a user field, which means that whenever a record is updated, the current username will be automatically inserted in that field.

default <field> = <value>

Default values for some fields ; will be inserted into new records.

autoNum <field>

Activates autonumbering for new records ; the number will be stored in the given field. Automatically implies that default <field> = '#'.

Subclasses may add more entries in this section (for example for specifying fields that will hold names of attached documents).

[template]

This section specifies where to find templates for various views. The specified locations will be looked for in several directories: the application template directory (as specified by dir directive, see below), the application directory, the default File::Tabular::Web template directory (as specified by the app_tmpl_default_dir method), or the subdirectory default of the above.

dir

specifies the application template directory

short

Template for the "short" display of records (typically a table for presenting search results).

long

Template for the "long" display of records (typically for a detailed presentation of a single record ).

modif

Template for editing a record (typically this will be a form with an action to call the update URL (?M=key).

msg

Template for presenting special messages to the user (messages after a record update or deletion, or error messages).

home

Homepage for the application.

Defaults for these templates are <application_name>_short.tt, <application_name>_long.tt, etc.

METHODS

The only public method is the "handler" method, to be called from mod_perl or from a cgi-bin script.

All other methods are internal to the application, i.e. not meant to be called from external code. They are documented here in case you would want to subclass the package. If you don't need subclassing, you can ignore this whole section.

Methods starting with an underscore are meant to be private, i.e. should not be redefined in subclasses. All other methods are protected.

Currently we use plain old Perl inheritance and calls to SUPER. A future move to the C3 method resolution order (see Class::C3) is planned, but is not totally trivial because classes are sometimes loaded dynamically.

Entry point

handler

  File::Tabular::Web->handler;

This is the main entry point into the module. It creates a new request object, initializes it from information passed through the URL and through CGI parameters, processes the request, and generates the answer. In case of error, the page contains an error message.

Methods for creating / initializing "application" hashrefs

_app_new

Reads the configuration file for a given application and creates a hashref storing the information. The hashref is put in a global cache of all applications loaded so far.

This method should not be overridden in subclasses; if you need specific code to be executed, use the "app_initialize" method.

_app_read_config

Glueing code to the AppConfig module.

app_initializea

Initializes the application hashref. In particular, it creates the Template object, with appropriate settings to specify where to look for templates.

If you override this method in subclasses, you should probably call SUPER::app_initialize.

app_tmpl_default_dir

Returns the default directory containing templates. The default is <server_root>/lib/tmpl/ftw.

app_tmpl_filters

Returns a hashref of filters to be passed to the Template object (see Template::Filters).

The default contains two filters, which work together with the preMatch and postMatch parameters of the configuration file. Suppose the following configuration :

  preMatch  {[
  postMatch ]}

Then the filters are defined as follows :

highlight

Replaces strings of shape {[...[} by <span class="HL">...</span>.

The class name is HL by default, but another name can be defined through the highlightClass configuration parameter. Templates have to define a style for that class, like for example

  <style>
    .HL {background: lightblue}
  </style>
unhighlight

Replaces strings of shape {[...[} by ... (i.e. removes the marking).

These filters are intended to help highlighting the words matched by a search request ; usually this must happen after the data has been filtered for HTML entities. So a typical use in a template would be for example

  <a href="/some/url?with=[% record.foo | unhighlight | uri %]">
      link to [% record.foo | html | highlight %]
  </a>

app_phases_definition

As explained above in section "WEB API", various entry points into the application are chosen by single-letter arguments; here this method returns a table that specifies what happens for each of them.

A letter in the table is associated to a hashref, with the following keys :

pre

name of method to be executed in the "data preparation phase"

op

name of method to be executed in the "data manipulation phase"

view

name of view for displaying the results

Methods for instance creation / initialization

_new

Creates a new object, which represents an HTTP request to the application. The class for the created object is generally File::Tabular::Web, unless specified otherwise in the the configuration file (see the class entry in section "CONFIGURATION FILE").

The _new method cannot be redefined in subclasses; if you need custom code to be executed, use "initialize" or "app_initialize" (both are invoked from _new).

initialize

Code to initialize the object. The default behaviour is to setup max, count and orderBy within the object hash.

_setup_phases

Reads the phases definition table and decides about what to do in the next phases.

open_data

Retrieves the name of the datafile, decides whether it should be opened for readonly or for update, and creates a corresponding File::Tabular object. The datafile may be cached in memory if directive useFileCache is activated.

_cached_content

Implementation of the memory cache; checks the modification time of the file to detect changes and invalidate the cache.

Methods that can be called from templates

param

  [% self.param %]

With no argument, returns the list of parameter names to the current HTTP request.

  [% self.param(param_name) %]

With an argument, returns the value that was specified under $param_name in the HTTP request, or in the configuration file (see the description of [fixed]/[default] sections). The return value is always a scalar (so this is not exactly the same as calling cgi.param(...)). If the HTTP request contains multiple values under the same name, these values are joined with a space. Initial and trailing spaces are automatically removed.

If you need to access the list of values in the HTTP request, you can always call

  [% self.cgi.param(param_name) %]

or

  [% self.APR_request.param(param_name) %]

(whichever is appropriate).

can_do

  [% self.can_do($action, [$record]) %]

Tells whether the current user has permission to do $action (which might be 'modif', 'delete', etc.). See explanations above about how permissions are specified in the initialization file. Sometimes permissions are setup in a record-specific way (for example one data field may contain the names of authorized users); the second optional argument is meant for those cases, so that can_do() can inspect the current data record.

Request handling : general methods

_dispatch_request

Executes the various phases of request handling

display

Finds the template corresponding to the view name, gathers its output, and prints it together with some HTTP headers.

_emit_page

Internal method for printing headers and body, using API from modperl or CGI.

Request handling : search methods

search_key

Search a record with a specific key. Puts the result into $self->{result}.

Search records matching given criteria (see File::Tabular for details). Puts results into $self->{result}.

Initializes $self->{search_string}. Overridden in subclasses for more specific searching (like for example adding fulltext search into attached documents).

sort_and_slice

Choose a slice within the result set, according to pagination parameters count and start.

_url_for_next_slice

Returns an URL to the next or previous slice, using "params_for_next_slice".

params_for_next_slice

Returns an array of strings "param=value" that will be inserted into the URL for next or previous slice.

words_queried

List of words found in the query string (to be used for example for highlighting those words in the display).

Update Methods

empty_record

Generates an empty record (preparation for adding a new record). Fields are filled with default values specified in the configuration file.

update

Checks for permission and then performs the update. Most probably you don't want to override this method, but rather the methods before_update or after_update.

before_update

Copies values from HTTP parameters into the record, and automatically fills the user name or current time/date in appropriate fields.

after_update

Hook for any code to perform after an update (useful for example for attached documents).

rollback_update

Hook for any code to roll back whatever was performed in before_update, in case the update failed (useful for example for attached documents).

Delete Methods

delete

Checks for permission and then performs the delete. Most probably you don't want to override this method, but rather the methods before_delete or after_delete.

before_delete

Hook for any code to perform before a delete.

after_delete

Hook for any code to perform aftere a delete.

Miscellaneous methods

prepare_download

Checks for permission to download the whole dataset.

Prints help. Not implemented yet.

user_match

  $self->user_match($access_control_list)

Returns true if the current user (as stored in $self->{user} "matches" the access control list (given as an argument string).

The meaning of "matches" may be redefined in subclasses; the default implementation just performs a regex case-insensitive search within the list for a complete word equal to the username.

Override in subclasses if you need other authorization schemes (like for example dealing with groups).

key_field

Returns the name of the key field in the data file.

key

  my $key = $self->key($record);

Returns the value in the first field of the record.

AUTHOR

Laurent Dami, <laurent.d...@justice.ge.ch>

COPYRIGHT & LICENSE

Copyright 2007 Laurent Dami, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.