In the last weeks I have played with Racket (formerly PLT Scheme), a powerful language derived from Scheme. Here I describe a couple of problems I solved using Racket, in order to underline the difference between this language and Python.
First, let me say that I am really impressed by the quality of the implementation of Racket. It is fast (apart from the execution times for the GUI), it has a huge set of libraries, it is easy to use and it has a number of nice facilities.
Project Euler again!
The first script I wrote was to solve one of the easiest problems proposed in the Project Euler site, that is problem 15. This kind of problem can be solved using a divide-and-conquer algorithm, so here comes my first attempt:
(define (num-of-solutions width height) (+ (if (> width 1) (num-of-solutions (sub1 width) height) 1) (if (> height 1) (num-of-solutions width (sub1 height)) 1)))
Such solution relies on the fact that at any point in the grid one can only move left or down (provided that there is room to move in that direction), and that once you make a move you are trying to solve the same problem for a rectangular grid with a smaller size.
This solution however has a problem: when running num-of-solutions
on a 20x20 grid, the function takes forever to evaluate. The problem
lies in the fact that each non-trivial call to num-of-solutions
spawns two recursive calls to itself, so that the number of calls
grows quickly.
There is a simple solution: since many of the calls share the same
parameters (and hence return the same number), we can use a technique
called memoization: when calling num-of-solutions
with some
parameters for width
and height
, save the return value in a table
so that subsequent calls to the function with the same parameter will
simply look up the table instead of re-running the function. (See SICP for an example applied to the calculation of Fibonacci's numbers, a
problem stunningly similar to the one we are solving here.)
In order to memoize num-of-solution
I might have relied on standard techniques available for the Scheme language, but I instead had a look
in the excellent Racket documentation and in PLaneT (the collection of
separate packages available for Racket). In a few seconds I found the
memoize package, which does exactly what I wanted: just changing
define
into define/memo
will make num-of-solution
orders of
magnitude faster:
(require (planet dherman/memoize:3:1)) (define/memo (num-of-solutions width height) (+ (if (> width 1) (num-of-solutions (sub1 width) height) 1) (if (> height 1) (num-of-solutions width (sub1 height)) 1))) (num-of-solutions 20 20)
And here comes a nice touch. I did not know how to install this
additional package in Racket, so I wrote the example above in the IDE
and compiled it, hoping that any error message would help me to
understand what to do. Guess what? The IDE automatically connected to
PLanetT, downloaded and installed the package and continued the
execution: in a few seconds I had the answer! (Compare this with
Python: it is like if any time you import
some library not available
the interpreter downloads and installs it automatically. How nice!)
A more complex example
The second program I wrote was considerably longer. I had a problem with a TWiki site I am administering: every user would attach huge PDF files to wiki pages instead of properly using a separate site (I shall not bother you with details about how this site works: let just assume it is a FTP site). This caused any backup of the TWiki site to be quite huge, so I decided to fix this by manually moving the biggest PDF files to the FTP site and then change each link to the new place. As there are potentially thousands of links to fix, this clearly required an automated solution. And here comes Racket.
My plan was to download each page containing huge attachments in a text file, search for the TWiki markup signalling a link to any file already moved to the FTP site and modify the link properly. The caveats for such task are:
- I am not going to move all the files, just the biggest ones. So the program should be smart enough to decide which links to change and which not.
- So far I have considered the remote site as a FTP site, but this
is not really the case. Specifically, to get a file from there
the URL must contain the file name and a unique integer ID. So
links to this site cannot be created automatically, they must be
specified one by one. An example might be
http:/foo/bar/filename.pdf?node=1234
. Thankfully the file name is always present!
The program I created relies heavily on Racket's regular expressions,
an addition of Racket over the Scheme language (note that regexps
are native in Racket – like Ruby but unlike Python, which requires
you to use import re
at the beginning of your script). It takes two
input files: the first one contains the wiki page, the second one
contains a list of links to the remote site. The code extracts bare
file names from the links and produces an associative list (similar to
dictionaries in Python), which then uses to fix links in the wiki
page.
The most difficult thing for me was to think how to implement
iterations using recursion instead of for
loops (and such code has a
lot of iterations: looping over links, looping over markups…):
however, I am willing to master this technique as the use of recursion
is common in functional languages.
So far the code has worked like a charm. I have not benchmarked it, but it works so fast that execution times are negligible (of course I created a compiled executable in order not to start Racket every time I need to use it).