Mon 3 Sep 2007
I am currently working with a major non-profit organization to create a mirror of there current website, which happens to be created with Kintera. Much has been said on the web about the usability of Kintera and my experience is none the better (even though, I am simply trying to use the output produced by Kintera). I have come to realize that there has got to be serious flaws in the software, training process or its usage that produces such bad output. It is very hard for any software to process the website content in any sort of systematic order; I have used a multitude of softwares that claim to make offline copies of website but they all failed miserably. Let me restate - I wasn’t able to make a working offline browsable copy of this website using any commercial software. These software work for most other sites but not for thisKintera produced site.
Anyway, the task that is supposed to take a few hours, took a week to be done correctly. After struggling to stabilize absolute links for a while, I decided to give up and use a technique that I used very recently in a J2EE Project. In this project, I created a PageSnapshotFilter by extending the ServletOutputStream and HttpServletResponseWrapper; this yielded an output filter that allows capturingsnapshots of a WebPage (served by this application)as the user would see it. This allowed to store the captured snapshot for auditing and email the snapshot to the interested parties without having to create it again using String concatenation or any additional template processing.
There are many ways to do the same in the LAMP world; Apache2::Filter comes to mind. Apache2::Filter would probablybe the most logical choice for heavy duty usage (only second to my initial approach of stabilizing links and modifying the mirrored files); after initial investigation, I decided to go on another path:
Turns out, that PHP and mod_rewrite allows you to achieve the same result but much faster (as in wiring it up; Apache2::Filter would win hands down in sheer speed test) and cheaper :-).
Here is how:
1) Configure apache to serve all htm, html, asp, aspx files via filter.php (add others, if needed).
# Site pages
RewriteEngine on
RewriteRule ^/(.*\.(htm|html|asp|aspx)) /filter.php [L]
2) Create filter.php by customizing this (filterphp.txt) :
That’s it! After you customize source and target patterns to match your host names (and port), you will have a working mirror website that is an exact replica of the original website.
Leave a Reply
You must be logged in to post a comment.
