File::Tabular::Web - turn a tabular file into a web application
<LocationMatch "\.ftw$"> SetHandler modperl PerlResponseHandler File::Tabular::Web </LocationMatch>
cp some/data.txt /path/to/http/htdocs/some/data.txt perl ftw_new_app.pl /path/to/http/htdocs/some/data.txt
http://myServer/some/data.ftw
# change some configuration options edit /path/to/http/htdocs/some/data.ftw # change the views edit /path/to/http/htdocs/some/{data_short.tt,data_long.tt,data_edit.tt}
This is a simple Apache web application framework based on File::Tabular and Search::QueryParser. The framework offers builtin services for searching, displaying and updating a flat tabular datafile, possibly with attached documents (see File::Tabular::Web::Attachments and File::Tabular::Web::Attachments::Indexed).
The strong point of File::Tabular::Web is that it is built around a search engine designed from the start for Web requests : by default it searches for complete words, spanning all data fields. However, you can easily write queries that look in specific fields, using regular expressions, boolean combinations, arithmetic operators, etc. So if you are looking for simplicity and speed of development, rather than speed of execution, then you may have found a convenient tool.
File::Tabular::Web
We use it intensively in our Intranet for managing lists of people, rooms, meetings, internet pointers, etc., and even for more sensitive information like lists of payments or the archived judgements (minutes) of Geneva courts. Of course this is slower that a real database, but for data up to 10MB/50000 records, the difference is hardly noticeable. On the other side, ease of development and deployment and ease of importing/exporting data proved to be highly valuable assets.
To build an application, all you need to do is :
Insert the data (a tabular .txt file) somewhere in your Apache htdocs tree.
Run the helper script ftw_new_app.pl, which automatically builds configuration and template files. The new URL becomes immediately active, without webserver configuration nor restart, so you already have a "scaffolding" application for searching, displaying, and maybe edit the data.
If necessary, tune various options in the configuration file, and customize the template files for presenting the data according to your needs.
In most cases, those steps will be sufficient, so they can be performed by a webmaster without Perl knowledge.
For more advanced uses, application-specific Perl subclasses can be hooked up into the framework for performing particular tasks. See for example the companion File::Tabular::Web::Attachments module, which provides services for attaching documents and indexing them through Search::Indexer, therefore providing a mini-framework for storing electronic documents.
File::Tabular::Web is designed so that it can be installed once and for all in your Apache configuration. Then all applications can be added or modified on the fly, without restarting the server.
First choose a file extension for your File::Tabular::Web applications; in the examples below we assume it to be .ftw. Then configure your Apache server in one of the ways described below.
.ftw
If you have mod_perl, the easiest way is to declare it as a mod_perl handler associated to .ftw URLs. Edit your perl.conf as follows :
Create an executable file in cgi-bin directory, named ftw, and containing
ftw
#!/path/to/perl use File::Tabular::Web; File::Tabular::Web->handler;
Then you can acces your applications as
http://my.server/cgi-bin/ftw/path/to/my/app.ftw
If your Apache has the mod_actions module (most installations have it), then it is convenient to add the following directives in httpd.conf :
mod_actions
Action file-tabular-web /cgi-bin/ftw AddHandler file-tabular-web .ftw
Now any file ending with ".ftw" in your htdocs tree will be treated as a File::Tabular::Web application. In other words, instead of
you can use URL
http://my.server/path/to/my/app.ftw
As already explained, .ftw is just an arbitrary convention and can be replaced by any other suffix. Similarly, the file-tabular-web handler name can be arbitrarily replaced by another name.
file-tabular-web
[probably works like cgi-bin; not tested yet]
We'll take for example a simple people directory application.
First create directory htdocs/people.
Let's assume that you already have a list of people, in a spreadsheet or a database. Export that list into a flat text file named htdocs/people/dir.txt. If you export from an Excel Spreadsheet, do NOT export as CSV format ; choose "text (tab-separated)" instead. The datafile should contain one line per record, with a character like '|' or TAB as field separator, and field names on the first line (see File::Tabular for details).
Run the helper script
perl ftw_new_app.pl --fieldSep \\t htdocs/people/dir.txt
This will create in the same directory a configuration file dir.ftw, and a collection of HTML templates dir_short.tt, dir_long.tt, dir_modif.tt, etc. The --fieldSep option specifies which character acts as field separator (the default is '|'); other option are available, see
dir.ftw
dir_short.tt
dir_long.tt
dir_modif.tt
--fieldSep
perl ftw_new_app.pl --help
for a list.
The URL http:://your.web.server/people/dir.ftw is now available to access the application. You may first test the default layout, and then customize the templates to suit your needs.
http:://your.web.server/people/dir.ftw
Note : initially all files are placed in the same directory, because it is simple and convenient; however, data and templates files are not really web resources and therefore theoretically should not belong to the htdocs tree. If you want a more structured architecture, you may move these files to a different location, and specify within the configuration how to find them (see instructions below).
Various entry points into the application (searching, editing, etc.) are chosen by single-letter arguments :
http://myServer/some/app.ftw?H
Displays the homepage of the application (through the home view). This is the default entry point, i.e. equivalent to
home
http://myServer/some/app.ftw
http://myServer/some/app.ftw?S=<criteria>
Searches records matching the specified criteria, and displays a short summary of each record (through the short view). Here are some example of search criteria :
short
word1 word2 word3 # records containing these 3 words anywhere +word1 +word2 +word3 # idem word1 word2 -word3 # containing word1 and word2 but not word3 word1 AND (word2 OR word3) # obvious "word1 word2 word3" # sequence word* # word completion field1:word1 field2:word2 # restricted by field field1 == val1 field2 > val2 # relational operators (will inspect the # shape of supplied values to decide # about string/numeric/date comparisons) field~regex # regex
See Search::QueryParser and File::Tabular for more details.
Additional parameters may control sorting and pagination. Ex:
?S=word&orderBy=birthdate:-d.m.y,lastname:alpha&count=20&start=40
How many items to display on one page. Default is 50.
Index within the list of results, telling which is the first record to display (basis is 0).
How to sort results. This may be one or several field names, possibly followed by a specification like :num or :-alpha. Precise syntax is documented in "cmp" in Hash::Type.
:num
:-alpha
Maximum number of records retrieved in a search (records beyond that number will be dropped).
http://myServer/some/app.ftw?L=<key>
Finds the record with the given key and displays it in detail through the long view.
long
http://myServer/some/app.ftw?M=key
If called with method GET, finds the record with the given key and displays it through the modif view (typically this view will be an HTML form).
modif
If called with method POST, finds the record with the given key and updates it with given field names and values. After update, displays an update message through the msg view.
msg
http://myServer/some/app.ftw?A
If called with method GET, displays a form for creating a new record, through the modif view. Fields may be pre-filled by default values given in the configuration file.
If called with method POST, creates a new record, with values given by the submitted form. After record creation, displays an update message through the msg view.
http://myServer/some/app.ftw?D=<key>
Deletes record with the given key. After deletion, displays an update message through the msg view.
http://myServer/some/app.ftw?X
Display all records throught the download view (mnemonic : eXtract)
download
Name of the view (i.e. template) that will be used instead of the default one. For example, assuming that the application has defined a print view, we can call that view through
print
http://myServer/some/app.ftw?S=<criteria>&V=print
This section assumes that you already know how to write templates for the Template Toolkit (see Template).
The path for searching templates includes
the application directory (where the configuration file resides)
the directory specified within the configuration file by parameter [template]dir
[template]dir
some default directories: <server_root>/lib/tmpl/ftw/<application_name>, <server_root>/lib/tmpl/ftw/<default>, <server_root>/lib/tmpl/ftw.
<server_root>/lib/tmpl/ftw/<application_name>
<server_root>/lib/tmpl/ftw/<default>
<server_root>/lib/tmpl/ftw
self
handle to the File::Tabular::Web object; from there you can access self.url (URL of the application), self.server_root (server root directory), self.cfg (configuration information, an AppConfig object), self.mtime (modification time of the data file), self.modperl or self.cgi, and self.msg (last message). You can also call methods "can_do" or "param", like for example
self.url
self.server_root
self.cfg
self.mtime
self.modperl
self.cgi
self.msg
[% IF self.can_do('add') %] <a href="?A">Add a new record</a> [% END # IF %]
or
[% self.param('myFancyParam') %]
found
structure containing the results of a search. Fields within this structure are :
count
how many records were retrieved
records
arrayref containing a slice of records
start
index of first record in the returned slice
end
index of last record in the returned slice
next_link
href link to the next slice of results (if any)
prev_link
href link to the previous slice of results (if any)
All pages generated by the application have the same URL; query parameters control which page will be displayed. Therefore all internal links can just start with a question mark : the browser will recognize that this is a relative link to the same URL, with a different query string. So within templates we can write simple links like
<a href="?H">Homepage</a> <a href="?S=*">See all records</a> <a href="?A">Add a new record</a> [% FOREACH record IN found.records %] <a href="?M=[% record.Id %]">Modify this record</a> [% END # FOREACH %]
A typical form for updating or adding a record will look like
<form method="POST"> First Name <input name="firstname" value="[% record.firstname %]"><br> Last Name <input name="lasttname" value="[% record.lastname %]"> <input type="submit"> </form>
Usually there is no need to specify the action of the form : the default action sent by the browser will be the same URL (including the query parameter ?A or ?M=[% record.Id %]), and when the application receives a POST request, it knows it has to update or add the record instead of displaying the form. This implies that you must use the POST method for any data modification; whereas forms for searching may use either GET or POST methods.
action
?A
?M=[% record.Id %]
For convenience, deletion through a GET url of shape ?D=[% record.Id %] is supported; however, data modification through GET method is not recommended, and therefore it is preferable to write
?D=[% record.Id %]
<form method="post"> <input name="D" value="[% record.Id %]"> <input type="submit" value="Delete this record"> </form>
A typical form for searching will look like
<form method="POST" action="[% self.url %]"> Search : <select name="S"> <option value="">--Choose in field1--</option> <option value="+field1:val1">val1</option> <option value="+field1:val2">val2</option> ... </select> Other : <input name="S"> <input type="submit"> </form>
So the form can combine several search criteria, all passed through the S parameter. The form method can be either GET or POST; but if you choose POST, then it is recommended that you also specify
S
action="[% self.url %]"
instead of relying on the implicit self-url from the browser. Otherwise the URL displayed in the browser may still contain some all criteria from a previous search, while the current form sends other search criteria --- the application will not get confused, but the user might.
The preMatch and postMatch parameters in the configuration file (see below) define some marker strings that will be automatically inserted in the data returned by a search, surrounding each word that was mentioned in the query. These marker strings should be chosen so that they would unlikely mix with regular data or with HTML markup : the recommanded values are
preMatch
postMatch
preMatch {[ postMatch ]}
Then you can exploit that marking within your templates by calling the "highlight" and "unhighlight" template filters, described below.
The configuration file is always stored within the htdocs directory, at the location corresponding to the application URL : so for application http://myServer/some/data.ftw, the configuration file is in
htdocs
/path/to/http/htdocs/some/data.ftw
Because of the Apache configuration directives described above, the URL is always served by File::Tabular::Web, so there is no risk of users seing the content of the configuration file.
The configuration is written in Appconfig format. This format supports comments (starting with #), continuation lines (through final \), "heredoc" quoting style for multiline values, and section headers similar to a Windows INI file. All details about the configuration file format can be found in Appconfig::File.
#
\
Below is the list of the various recognized sections and parameters.
The global section (without any section header) can contain general-purpose parameters that can be retrieved later from the viewing templates through [% self.cfg.<param> %]; this is useful for example for setting a title or other values that will be common to all templates.
[% self.cfg.<param> %]
The global section may also contain some options to "new" in File::Tabular : preMatch, postMatch, avoidMatchKey, fieldSep, recordSep.
avoidMatchKey
fieldSep
recordSep
Option highlightClass defines the class name used by the "highlight" filter (default is HL).
highlightClass
HL
The fixed and default sections simulate parameters to the request. Specifications in the fixed section are stronger than HTTP parameters; specifications in the default section are weaker : the param method for the application will first look in the fixed section, then in the HTTP request, and finally in the default section. So for example with
fixed
default
[fixed] count=50 [default] orderBy=lastname
a request like
?S=*&count=20
will be treated as
?S=*&count=50&orderBy=lastname
Relevant parameters to put in fixed or in default are described in section "S" of this documentation : for example count, orderBy, etc.
orderBy
dir=/some/directory
Directory where application files reside. By default : same directory as the configuration file.
name=some_name
Name of the application (will be used for example as prefix to find template files). Single-level name, no pathnames allowed.
data=some_name
Name of the tabular file containing the data. Single-level name, must be in the application directory. By default: application name with the .txt suffix appended.
.txt
class=My::File::Tabular::Web::Subclass
Will dynamically load the specified module and use it as class for objects of this application. The specified module must be a subclass of File::Tabular::Web.
useFileCache=1
If true, the whole datafile will be slurped into memory and reused across requests (except update requests).
mtime=<format>
Format to display the last modified time of the data file, using POSIX strftime(). The result will be available to templates in [% self.mtime %]
[% self.mtime %]
This section specifies permissions to perform operations within the application. Of course we need Apache to be configured to do some kind of authentification, so that the application receives a user name through the REMOTE_USER environment variable; many authentification modules are available, see Apache/manual/howto/auth.html. Otherwise the default user name received by the application is "Anonymous".
REMOTE_USER
Apache/manual/howto/auth.html
Apache may also be configured to do some kind of authorisation checking, but this will control access to the application as a whole, whereas here we configure fine-grained permissions for various operations.
Builtin permission names are : search, read, add, delete, modif, and download. Each name also has a negative counterpart, i.e. no_search, no_read, etc.
search
read
add
delete
no_search
no_read
For each of those permission names, the configuration can give a list of user names separated by commas or spaces : the current user name will be compared to this list. A permission may also specify '*', which means 'everybody' : this is the default for permissions read, search and download. There is no builtin notion of "user groups", but you can introduce such a notion by writing a subclass which overrides the "user_match" method.
*
Permissions may also be granted or denied on a per-record basis : writing $fieldname (starting with a literal dollar sign) means that users can access records in which the content of fieldname matches their username. Usually this is associated with an automatic user field (see below), so that the user who created a new record can later modify it.
$fieldname
fieldname
Example :
[permissions] read = * # the default, could have been omitted search = * # idem add = andy bill modif = $last_author # username must match content of field 'last_author' delete = $last_author
The fields section specifies some specific information about fields in the tabular file.
fields
time <field> = <format>
Declares field to be a time field, which means that whenever a record is updated, the current local time will be automatically inserted in that field. The format argument will be passed to POSIX strftime(). Ex :
field
time DateModif = %d.%m.%Y time TimeModif = %H:%M:%S
user = <field>
Declares field to be a user field, which means that whenever a record is updated, the current username will be automatically inserted in that field.
default <field> = <value>
Default values for some fields ; will be inserted into new records.
autoNum <field>
Activates autonumbering for new records ; the number will be stored in the given field. Automatically implies that default <field> = '#'.
default <field> = '#'
Subclasses may add more entries in this section (for example for specifying fields that will hold names of attached documents).
This section specifies where to find templates for various views. The specified locations will be looked for in several directories: the application template directory (as specified by dir directive, see below), the application directory, the default File::Tabular::Web template directory (as specified by the app_tmpl_default_dir method), or the subdirectory default of the above.
dir
app_tmpl_default_dir
specifies the application template directory
Template for the "short" display of records (typically a table for presenting search results).
Template for the "long" display of records (typically for a detailed presentation of a single record ).
Template for editing a record (typically this will be a form with an action to call the update URL (?M=key).
?M=key
Template for presenting special messages to the user (messages after a record update or deletion, or error messages).
Homepage for the application.
Defaults for these templates are <application_name>_short.tt, <application_name>_long.tt, etc.
<application_name>_short.tt
<application_name>_long.tt
The only public method is the "handler" method, to be called from mod_perl or from a cgi-bin script.
All other methods are internal to the application, i.e. not meant to be called from external code. They are documented here in case you would want to subclass the package. If you don't need subclassing, you can ignore this whole section.
Methods starting with an underscore are meant to be private, i.e. should not be redefined in subclasses. All other methods are protected.
Currently we use plain old Perl inheritance and calls to SUPER. A future move to the C3 method resolution order (see Class::C3) is planned, but is not totally trivial because classes are sometimes loaded dynamically.
SUPER
File::Tabular::Web->handler;
This is the main entry point into the module. It creates a new request object, initializes it from information passed through the URL and through CGI parameters, processes the request, and generates the answer. In case of error, the page contains an error message.
Reads the configuration file for a given application and creates a hashref storing the information. The hashref is put in a global cache of all applications loaded so far.
This method should not be overridden in subclasses; if you need specific code to be executed, use the "app_initialize" method.
Glueing code to the AppConfig module.
Initializes the application hashref. In particular, it creates the Template object, with appropriate settings to specify where to look for templates.
If you override this method in subclasses, you should probably call SUPER::app_initialize.
SUPER::app_initialize
Returns the default directory containing templates. The default is <server_root>/lib/tmpl/ftw.
Returns a hashref of filters to be passed to the Template object (see Template::Filters).
The default contains two filters, which work together with the preMatch and postMatch parameters of the configuration file. Suppose the following configuration :
Then the filters are defined as follows :
Replaces strings of shape {[...[} by <span class="HL">...</span>.
{[...[}
<span class="HL">...</span>
The class name is HL by default, but another name can be defined through the highlightClass configuration parameter. Templates have to define a style for that class, like for example
<style> .HL {background: lightblue} </style>
Replaces strings of shape {[...[} by ... (i.e. removes the marking).
...
These filters are intended to help highlighting the words matched by a search request ; usually this must happen after the data has been filtered for HTML entities. So a typical use in a template would be for example
<a href="/some/url?with=[% record.foo | unhighlight | uri %]"> link to [% record.foo | html | highlight %] </a>
As explained above in section "WEB API", various entry points into the application are chosen by single-letter arguments; here this method returns a table that specifies what happens for each of them.
A letter in the table is associated to a hashref, with the following keys :
name of method to be executed in the "data preparation phase"
name of method to be executed in the "data manipulation phase"
name of view for displaying the results
Creates a new object, which represents an HTTP request to the application. The class for the created object is generally File::Tabular::Web, unless specified otherwise in the the configuration file (see the class entry in section "CONFIGURATION FILE").
class
The _new method cannot be redefined in subclasses; if you need custom code to be executed, use "initialize" or "app_initialize" (both are invoked from _new).
_new
Code to initialize the object. The default behaviour is to setup max, count and orderBy within the object hash.
max
Reads the phases definition table and decides about what to do in the next phases.
Retrieves the name of the datafile, decides whether it should be opened for readonly or for update, and creates a corresponding File::Tabular object. The datafile may be cached in memory if directive useFileCache is activated.
useFileCache
Implementation of the memory cache; checks the modification time of the file to detect changes and invalidate the cache.
[% self.param %]
With no argument, returns the list of parameter names to the current HTTP request.
[% self.param(param_name) %]
With an argument, returns the value that was specified under $param_name in the HTTP request, or in the configuration file (see the description of [fixed]/[default] sections). The return value is always a scalar (so this is not exactly the same as calling cgi.param(...)). If the HTTP request contains multiple values under the same name, these values are joined with a space. Initial and trailing spaces are automatically removed.
$param_name
[fixed]/[default]
cgi.param(...)
If you need to access the list of values in the HTTP request, you can always call
[% self.cgi.param(param_name) %]
[% self.APR_request.param(param_name) %]
(whichever is appropriate).
[% self.can_do($action, [$record]) %]
Tells whether the current user has permission to do $action (which might be 'modif', 'delete', etc.). See explanations above about how permissions are specified in the initialization file. Sometimes permissions are setup in a record-specific way (for example one data field may contain the names of authorized users); the second optional argument is meant for those cases, so that can_do() can inspect the current data record.
$action
can_do()
Executes the various phases of request handling
Finds the template corresponding to the view name, gathers its output, and prints it together with some HTTP headers.
Internal method for printing headers and body, using API from modperl or CGI.
Search a record with a specific key. Puts the result into $self->{result}.
$self->{result}
Search records matching given criteria (see File::Tabular for details). Puts results into $self->{result}.
Initializes $self->{search_string}. Overridden in subclasses for more specific searching (like for example adding fulltext search into attached documents).
$self->{search_string}
Choose a slice within the result set, according to pagination parameters count and start.
Returns an URL to the next or previous slice, using "params_for_next_slice".
Returns an array of strings "param=value" that will be inserted into the URL for next or previous slice.
"param=value"
List of words found in the query string (to be used for example for highlighting those words in the display).
Generates an empty record (preparation for adding a new record). Fields are filled with default values specified in the configuration file.
Checks for permission and then performs the update. Most probably you don't want to override this method, but rather the methods before_update or after_update.
before_update
after_update
Copies values from HTTP parameters into the record, and automatically fills the user name or current time/date in appropriate fields.
Hook for any code to perform after an update (useful for example for attached documents).
Hook for any code to roll back whatever was performed in before_update, in case the update failed (useful for example for attached documents).
Checks for permission and then performs the delete. Most probably you don't want to override this method, but rather the methods before_delete or after_delete.
before_delete
after_delete
Hook for any code to perform before a delete.
Hook for any code to perform aftere a delete.
Checks for permission to download the whole dataset.
Prints help. Not implemented yet.
$self->user_match($access_control_list)
Returns true if the current user (as stored in $self->{user} "matches" the access control list (given as an argument string).
$self->{user}
The meaning of "matches" may be redefined in subclasses; the default implementation just performs a regex case-insensitive search within the list for a complete word equal to the username.
Override in subclasses if you need other authorization schemes (like for example dealing with groups).
Returns the name of the key field in the data file.
my $key = $self->key($record);
Returns the value in the first field of the record.
Laurent Dami, <laurent.d...@justice.ge.ch>
<laurent.d...@justice.ge.ch>
Copyright 2007 Laurent Dami, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install File::Tabular::Web, copy and paste the appropriate command in to your terminal.
cpanm
cpanm File::Tabular::Web
CPAN shell
perl -MCPAN -e shell install File::Tabular::Web
For more information on module installation, please visit the detailed CPAN module installation guide.