2011-09-28

Playing with OCaml and Haskell (Problem 34 again!)

In the past months I had the chance to have a deeper look at Haskell and Erlang (in a previous post about functional languages I dismissed them quickly). I have realized that Erlang is the best language when you need to run thousands of lightweight processes (e.g. serving web requests) but has little to offer to scientists. On the other side, Haskell has a lot of potential, as I found by porting my solution of Project Euler’s problem 34.

Let’s first recall my original purpose. I wanted to test how faster Scheme/LISP/Clojure were than Python (at the time my language of choice for quick development). I implemented a number of programs which solved problem 34 (finding the sum of all the numbers which are equal to the sum of the factorials of their digits, like 145 = 1! + 4! + 5!) and measured their performance. I found with great amazement that in some cases LISP-like languages were one order of magnitude faster than Python (14 seconds instead of 140).

This summer I decided to do the same test with OCaml and the Glasgow Haskell Compiler (GHC). First, here is my OCaml program:

The fast_fact variable is an array of 10 elements containing the factorials of all the digits. (Note that this implementation differs slightly from my LISP solutions, where I used a dictionary.) The digits function returns a list of the digits in a number. Unlike my LISP programs (where I converted the number into a string and then converted its characters back to numbers), here I chose to extract the digits by using a division by 10 in a loop: this should be faster, as I do no longer need to allocate space for a string. The test_number function determines if num satisfies the requirement of problem 34. The -- function returns a list containing the numbers from i to j.

OCaml programs can be run either using the interpreter or compiled using the optimizing compiler. I chose the latter and timed the running time of the executable using the Unix time program. On my computer the program takes roughly 3 seconds to execute, which makes it roughly 4 times faster than the fastest LISP solution I wrote (a Scheme program compiled with the now-discontinued Ikarus Scheme).

By navigating in the Project Euler’s website I found that one of the other users (“ix”) posted a Haskell solution that is incredibly compact:

I was struck in amazement by the compactness of the code. But the greatest surprise came when I used GHC to compile it and measured the running time: less than 0.2 seconds! These two lines of code are 4 orders of magnitude faster than my 24 lines of Python code. Four. Orders. Of. Magnitude. That is unbelievable. And note that the code does not pre-calculate the factorials for each digit (as I did in the OCaml program, see the definition of fast_fact).

The fact that the Haskell solution is one order of magnitude faster than OCaml is probably due to the use of -- in problem34.ml: this means that OCaml needs to build a list containing all the numbers from 10 to 10,000,000 before applying test_number. It is possible to overcome this limitation by building an explicit cycle, but I wanted something as much compact as possible. (The code would have been more compact had OCaml provided something like the Haskell syntax .. instead of making me define the -- operator.) On the contrary Haskell, thanks to its lazyness, never builds the list [10..10000000] in memory. (If you know Python, it is like if OCaml were using range while Haskell used xrange.)

Considering that debugging one line of (Haskell) code is considerably easier than debugging 24 lines of (Python) code, I think I shall use Haskell more and more in my next mathematical/scientific projects.

P.S.: This time I used markdown instead of Org-mode to write this post. It produces less bloated HTML and it seems to be enough for posts of this kind. I also used gist to include syntax-highlighted snippets of code, which is much better than the code produced by Org-mode.