URL Scraping
2007-11-16I was looking for a way to create packaged versions of Curly Logo. For example it would be nice if you could create a version of Curly Logo that already has your favourite SQUARE and FATPEN procedures defined. One way would be to take the served pages (some XHTML and JavaScript) edit it them and host them yourself. How tedious.
Then I thought that Curly Logo could examine the URL, see if it contained a suitably encoded Logo program and execute the program if so. This is useful because «http://www.amberfrog.com/logo/», «http://www.amberfrog.com/logo/?foo», and «http://www.amberfrog.com/logo/#foo» are different URLs that happen to load the same resource. Because it’s exactly the same resource my server can serve cached copies, and in fact your browser can use the ETag header to not even bother getting the same object again. Wins all round.
So what Curly Logo does is decodeURI everything past the ‘#’ in its URL (available in «window.location») and execute it as Logo as if it was typed in. I call this technique URL scraping.
The pedagogic value for Curly Log is obvious. It means I can supply versions of Curly Logo that show a simple triangle being drawn, how to draw glasses on Ingrid Bergman, and draw rainbow-ripple star. All inside a (quite long) URL; no server change necessary. These custom versions of Curly Logo are bookmarkable and exchangeable (via e-mail, chat, and so on).
Aside: Firefox makes the turtle bug-eyed when I use a «#» in the URL. I have no idea why. Surely it’s a bug?
The advantage of URL scraping is that nothing changes on the server, so everyone benefits from caching and in some cases it would also open up the possibility of serving static HTML (as I do). There’s no reason that the JavaScript doing the scraping has to use the «#» part of the URL, it was just convenient for me. You could have a server side rewrite rule that maps /foo/red/ and /foo/blue/ to the same object (thereby still gaining the benefits of caching) and have the JavaScript sense the last directory component of the URL; it could pick a CSS theme using that value for example, meaning that different URLs give your users different themes on the same site, but the server transmits the same object in either case. The possibilities are endless.