2011-02-22

Experiments with Racket

In the last weeks I have played with Racket (formerly PLT Scheme), a powerful language derived from Scheme. Here I describe a couple of problems I solved using Racket, in order to underline the difference between this language and Python.

First, let me say that I am really impressed by the quality of the implementation of Racket. It is fast (apart from the execution times for the GUI), it has a huge set of libraries, it is easy to use and it has a number of nice facilities.

Project Euler again!

The first script I wrote was to solve one of the easiest problems proposed in the Project Euler site, that is problem 15. This kind of problem can be solved using a divide-and-conquer algorithm, so here comes my first attempt:

(define (num-of-solutions width height)
  (+ (if (> width 1)
         (num-of-solutions (sub1 width) height)
         1)
     (if (> height 1)
         (num-of-solutions width (sub1 height))
         1)))

Such solution relies on the fact that at any point in the grid one can only move left or down (provided that there is room to move in that direction), and that once you make a move you are trying to solve the same problem for a rectangular grid with a smaller size.

This solution however has a problem: when running num-of-solutions on a 20x20 grid, the function takes forever to evaluate. The problem lies in the fact that each non-trivial call to num-of-solutions spawns two recursive calls to itself, so that the number of calls grows quickly.

There is a simple solution: since many of the calls share the same parameters (and hence return the same number), we can use a technique called memoization: when calling num-of-solutions with some parameters for width and height, save the return value in a table so that subsequent calls to the function with the same parameter will simply look up the table instead of re-running the function. (See SICP for an example applied to the calculation of Fibonacci's numbers, a problem stunningly similar to the one we are solving here.)

In order to memoize num-of-solution I might have relied on standard techniques available for the Scheme language, but I instead had a look in the excellent Racket documentation and in PLaneT (the collection of separate packages available for Racket). In a few seconds I found the memoize package, which does exactly what I wanted: just changing define into define/memo will make num-of-solution orders of magnitude faster:

(require (planet dherman/memoize:3:1))

(define/memo (num-of-solutions width height)
  (+ (if (> width 1)
         (num-of-solutions (sub1 width) height)
         1)
     (if (> height 1)
         (num-of-solutions width (sub1 height))
         1)))

(num-of-solutions 20 20)

And here comes a nice touch. I did not know how to install this additional package in Racket, so I wrote the example above in the IDE and compiled it, hoping that any error message would help me to understand what to do. Guess what? The IDE automatically connected to PLanetT, downloaded and installed the package and continued the execution: in a few seconds I had the answer! (Compare this with Python: it is like if any time you import some library not available the interpreter downloads and installs it automatically. How nice!)

A more complex example

The second program I wrote was considerably longer. I had a problem with a TWiki site I am administering: every user would attach huge PDF files to wiki pages instead of properly using a separate site (I shall not bother you with details about how this site works: let just assume it is a FTP site). This caused any backup of the TWiki site to be quite huge, so I decided to fix this by manually moving the biggest PDF files to the FTP site and then change each link to the new place. As there are potentially thousands of links to fix, this clearly required an automated solution. And here comes Racket.

My plan was to download each page containing huge attachments in a text file, search for the TWiki markup signalling a link to any file already moved to the FTP site and modify the link properly. The caveats for such task are:

  1. I am not going to move all the files, just the biggest ones. So the program should be smart enough to decide which links to change and which not.
  2. So far I have considered the remote site as a FTP site, but this is not really the case. Specifically, to get a file from there the URL must contain the file name and a unique integer ID. So links to this site cannot be created automatically, they must be specified one by one. An example might be http:/foo/bar/filename.pdf?node=1234. Thankfully the file name is always present!

The program I created relies heavily on Racket's regular expressions, an addition of Racket over the Scheme language (note that regexps are native in Racket – like Ruby but unlike Python, which requires you to use import re at the beginning of your script). It takes two input files: the first one contains the wiki page, the second one contains a list of links to the remote site. The code extracts bare file names from the links and produces an associative list (similar to dictionaries in Python), which then uses to fix links in the wiki page.

The most difficult thing for me was to think how to implement iterations using recursion instead of for loops (and such code has a lot of iterations: looping over links, looping over markups…): however, I am willing to master this technique as the use of recursion is common in functional languages.

So far the code has worked like a charm. I have not benchmarked it, but it works so fast that execution times are negligible (of course I created a compiled executable in order not to start Racket every time I need to use it).