Wallflower::Tutorial - Turn your Plack app into a static site
version 1.015
# do cool stuff using Plack-aware frameworks, # to generate static web sites
Static websites have a lot of advantages, when serving data that changes rarely:
speed
security
deployment by copy
Dynamic websites have more to do with JavaScript than with POST or query strings. JavaScript code run on the client can take advantage of JSON files containing useful data. Updates to the data do not necessarily have to be performed on the public side of the site.
POST
The point of saving URLs from a PSGI application to files is not simply to make a static version of a fully dynamic site.
Unlike many static website generators, wallflower does not enforce any conventions on its users.
wallflower makes it very easy to write a static website using the web framework of your choice.
To make the most of a static website, there are a few simple rules to follow, listed below:
A PSGI/Plack application can specify the Content-Type returned for any URL content.
Content-Type
On the other hand, static servers use the file extension to decide what Content-Type to send for a given file. When unable to decide what the file content is, servers usually send application/octet-stream as the Content-Type.
application/octet-stream
Since the goal of a Wallflower application is to generate a static website, all URL pathnames should have an extension.
Similarly, wallflower decides to check the body of a response for links it may contain based on its Content-Type header, as links only make sense for HTML and CSS files.
If the response has no Content-Type header, wallflower will miss some of the links, and thus not properly crawl the whole site.
Because the website can be written with any modern web framework, it's also very easy to have URLs that reply to POST requests. Obviously, these pages cannot be saved on the static destination.
It's very easy to use the Plack environment to decide which pages to enable. By default, plackup sets the Plack environment to development, while wallflower sets it to deployment.
development
deployment
It's therefore possible to hide the dynamic parts of the application in the development environment, while only the static elements are reachable from the deployment environment.
Since version 1.002, if the target file exists, wallflower will send the If-Modified-Since header with its modification date. If the application sends a 304 Not Modified response, the target file will not be modified.
If-Modified-Since
304 Not Modified
As of version 1.008, if the application sends the Last-Modified header, it we be used to set the modification time of the target file.
Last-Modified
As of version 1.014, if the 304 response contains a Content-Type header, and wallflower was run with the --follow option, it will looked for more links to follow in the existing file.
304
Note that since wallflower works without using the network, this feature is only really useful if the response generation takes time and it's possible to decide to send a 304 Not Modified without computing the entire response content.
See Plack::Middleware::ConditionalGET for basic support of 304 responses.
By default, wallflower starts by loading /, and automatically and repeatedly follows all URLs found in HTML and CSS documents, until all reachable URLs have been processed.
/
The simplest way to generate the full static site is to make sure that all URLs can be reached by repeatedly following links from the root.
When using the --url option to "mount" your application under some path, don't forget to call wallflower with that path as the root:
wallflower --application myapp.psgi --url http://localhost/myapp /myapp/
In case all of the application URL are not reachable from the root, it is possible to pass wallflower a list of starting points (one per line), using the --filter option:
wallflower --application myapp.psgi --filter urls.txt
Note that the --filter option turns wallflower into a "filter" command: it takes a list of files as argument, and if no file is given, it reads the list of URL from the command-line.
A Plack application deals with URLs. Nothing prevents the application from treating /thunk and /thunk/ differently and returning different content bodies for them.
From the perspective of a filesystem, however, if thunk is a directory, then it is semantically equivalent to thunk/. And it is impossible to have both a file and a directory with the same name inside the same parent directory.
When dealing with a static site, the server maps the URLs to the filesystem. If thunk is a directory, the server usually redirects the client to thunk/ (using a 301 Moved Permanently response). Then the default file is picked for the content, traditionally thunk/index.html.
301 Moved Permanently
Because Wallflower does not know the conventions used by the Plack application it calls, it cannot decide if /thunk should be understood as a "file" or as a "directory" when generating the file name that will receive the content for that URL.
So, if your application treats /thunk and /thunk/ as identical, you should:
Always request /thunk/ when providing a list of URLs to wallflower. Otherwise you might end up with the thunk file already existing when it tries to create files under the thunk/ directory, or vice-versa.
Make sure that links provided by your application are always to /thunk/, so that when wallflower follows links automatically, we're back to the previous case.
Try to make the application reply with status 301 Moved Permanently to requests for thunk. This is what a well-behaved web server serving static pages will do when a user-agent requests a "directory" without a final /.
At this point, you know that an application that treats /thunk ("file") and /thunk/ ("directory") differently will not work with wallflower.
wallflower will show a status code of 999 (not a valid HTTP status) in the following two cases:
999
when trying to create thunk/pam after having created a thunk file, (warning: Can't open thunk/pam for writing: Not a directory)
Can't open thunk/pam for writing: Not a directory
when trying to create thunk after having created a thunk/ directory. (warning: Can't open thunk for writing: Is a directory)
Can't open thunk for writing: Is a directory
See also the section "All URLs should have an extension", for why you should avoid extensionless URLs.
A few articles about wallflower have been published, and are listed below:
http://blogs.perl.org/users/book1/2012/09/sorry-i-cant-dance-im-holding-on-to-my-friends-purse.html
The wallflower announcement on http://blogs.perl.org/.
http://perladvent.pm.org/2012/2012-12-22.html
Wallflower in the Perl Advent Calendar 2012.
https://www.perl.com/article/102/2014/7/15/Generate-static-websites-from-dynamic-Perl-web-apps/
A presentation of Wallflower by one of its users.
https://www.perl.com/article/103/2014/7/22/Create-static-web-apps-with-Wget/
A comparison of using Wget and Wallflower to generate static web sites.
If you're interested in static web site generators, you can check:
https://github.com/bevry/staticsitegenerators-list
A list maintained by the open-company Bevry.
https://staticsitegenerators.net/
The definitive listing of Static Site Generators (according to the site).
Philippe Bruhat (BooK) <book@cpan.org>
Copyright 2012-2018 by Philippe Bruhat (BooK).
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install App::Wallflower, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::Wallflower
CPAN shell
perl -MCPAN -e shell install App::Wallflower
For more information on module installation, please visit the detailed CPAN module installation guide.